Error bad escape end of pattern at position 0

1 vote and 20 comments so far on Reddit

First, the code:

import json
import os.path
import re

env_vars = dict(os.environ)

vars_return = json.dumps(env_vars)
#print(vars_return)

f_path = "C:/Users/Meep/PycharmProjects/deployment.template.yml"
f = open(f_path, 'r')
new_lines = []
for line in f:
    for json_var in vars_return:
        json_with_extra_stuff = "-=" + json_var + "=-"
        line = re.sub(json_var, json_with_extra_stuff, vars_return)
    new_lines.append(line)
new_lines = "".join(new_lines)
f_path = "C:/Users/Meep/PycharmProjects/deployment.yml"
test = open(f_path, "w")
test.write(new_lines)
test.close()    

Alright, so I’m trying to replace instances in the first yaml, that match environment variables in my environment, with the same instances, but with some special characters added. I have tried this a few different ways, with no success, and now with re.sub() I’m getting:

Traceback (most recent call last): File «C:/Users/Meep/PycharmProjects/variable_replacement/variable_replacement.py», line 19, in <module>

File «C:UsersErrananaconda3libsre_parse.py», line 245, in __next raise error(«bad escape (end of pattern)», re.error: bad escape (end of pattern) at position 0

I am not sure what bad escape is referring to. In most examples on Google, it involves including a blackslash in the replacement statement and needing to add more, but that’s not the case here. Anyways, I’m dumb, please help me understand what this error is saying.

@DawNIng-github

Dear authors,
Thanks for your work. When I ran the file, «MoLFI_demo.py», based on Python3.7, I got the error as following.

Traceback (most recent call last):
  File "~/anaconda3/envs/tf/lib/python3.7/sre_parse.py", line 1021, in parse_template
    this = chr(ESCAPES[this][1])
KeyError: '\s'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "MoLFI_demo.py", line 14, in <module>
    parser.parse(log_file)
  File "../logparser/MoLFI/MoLFI.py", line 41, in parse
    loader = logloader.LogLoader(self.log_format, self.n_workers)
  File "../logparser/utils/logloader.py", line 38, in __init__
    self.headers, self.regex = self._generate_logformat_regex(self.logformat)
  File "../logparser/utils/logloader.py", line 79, in _generate_logformat_regex
    splitter = re.sub(' +', 's+', splitters[k])
  File "~/anaconda3/envs/tf/lib/python3.7/re.py", line 192, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "~/anaconda3/envs/tf/lib/python3.7/re.py", line 309, in _subx
    template = _compile_repl(template, pattern)
  File "~/anaconda3/envs/tf/lib/python3.7/re.py", line 300, in _compile_repl
    return sre_parse.parse_template(repl, pattern)
  File "~/anaconda3/envs/tf/lib/python3.7/sre_parse.py", line 1024, in parse_template
    raise s.error('bad escape %s' % this, len(this))
re.error: bad escape s at position 0

I’d be grateful if you could help me.

@zhujiem

@ankit-nassa

You can also try replacing import re to import regex as re

gongel, Wapiti08, Glitchii, jaspreetkaur96, niyanchun, huhui, jcjview, dorost1234, spatiebalk, Mattcrmx, and 4 more reacted with thumbs up emoji
Wapiti08, ddibiasi, and blancsw reacted with heart emoji

@fuyunliu

template_regex = re.sub(r'\ +', r's+', template_regex)

what do you mean sub \ + with s+ ?

@shoaib-intro

template_regex = re.sub(r'\ +', r's+', template_regex)

what do you mean sub \ + with s+ ?
Hi @fuyunliu
I faced same error bad escape sequence error at this line have you solved it?

@devharsh

Traceback (most recent call last):

File «/usr/local/Cellar/python@3.9/3.9.12_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/sre_parse.py», line 1041, in parse_template
this = chr(ESCAPES[this][1])

KeyError: ‘s’

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File «/usr/local/lib/python3.9/site-packages/spyder_kernels/py3compat.py», line 356, in compat_exec
exec(code, globals, locals)

File «/Users/devharsh/Downloads/Drain_demo.py», line 28, in
parser.parse(log_file)

File «/Users/devharsh/Downloads/Drain.py», line 293, in parse
self.outputResult(logCluL)

File «/Users/devharsh/Downloads/Drain.py», line 224, in outputResult
self.df_log[«ParameterList»] = self.df_log.apply(self.get_parameter_list, axis=1)

File «/usr/local/lib/python3.9/site-packages/pandas/core/frame.py», line 8839, in apply
return op.apply().finalize(self, method=»apply»)

File «/usr/local/lib/python3.9/site-packages/pandas/core/apply.py», line 727, in apply
return self.apply_standard()

File «/usr/local/lib/python3.9/site-packages/pandas/core/apply.py», line 851, in apply_standard
results, res_index = self.apply_series_generator()

File «/usr/local/lib/python3.9/site-packages/pandas/core/apply.py», line 867, in apply_series_generator
results[i] = self.f(v)

File «/Users/devharsh/Downloads/Drain.py», line 347, in get_parameter_list
template_regex = re.sub(r’ +’, r’s+’, template_regex)

File «/usr/local/Cellar/python@3.9/3.9.12_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/re.py», line 210, in sub
return _compile(pattern, flags).sub(repl, string, count)

File «/usr/local/Cellar/python@3.9/3.9.12_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/re.py», line 327, in _subx
template = _compile_repl(template, pattern)

File «/usr/local/Cellar/python@3.9/3.9.12_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/re.py», line 318, in _compile_repl
return sre_parse.parse_template(repl, pattern)

File «/usr/local/Cellar/python@3.9/3.9.12_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/sre_parse.py», line 1044, in parse_template
raise s.error(‘bad escape %s’ % this, len(this))

error: bad escape s

@devharsh

error when I changed import re to import regex as re

Traceback (most recent call last):

File «/usr/local/lib/python3.9/site-packages/spyder_kernels/py3compat.py», line 356, in compat_exec
exec(code, globals, locals)

File «/Users/devharsh/Downloads/Drain_demo.py», line 28, in
parser.parse(log_file)

File «/Users/devharsh/Downloads/Drain.py», line 293, in parse
self.outputResult(logCluL)

File «/Users/devharsh/Downloads/Drain.py», line 224, in outputResult
self.df_log[«ParameterList»] = self.df_log.apply(self.get_parameter_list, axis=1)

File «/usr/local/lib/python3.9/site-packages/pandas/core/frame.py», line 8839, in apply
return op.apply().finalize(self, method=»apply»)

File «/usr/local/lib/python3.9/site-packages/pandas/core/apply.py», line 727, in apply
return self.apply_standard()

File «/usr/local/lib/python3.9/site-packages/pandas/core/apply.py», line 851, in apply_standard
results, res_index = self.apply_series_generator()

File «/usr/local/lib/python3.9/site-packages/pandas/core/apply.py», line 867, in apply_series_generator
results[i] = self.f(v)

File «/Users/devharsh/Downloads/Drain.py», line 347, in get_parameter_list
template_regex = re.sub(r’ +’, r’s+’, template_regex)

File «/usr/local/lib/python3.9/site-packages/regex/regex.py», line 278, in sub
return pat.sub(repl, string, count, pos, endpos, concurrent, timeout)

File «/usr/local/lib/python3.9/site-packages/regex/regex.py», line 700, in _compile_replacement_helper
is_group, items = _compile_replacement(source, pattern, is_unicode)

File «/usr/local/lib/python3.9/site-packages/regex/_regex_core.py», line 1737, in _compile_replacement
raise error(«bad escape %s» % ch, source.string, source.pos)

error: bad escape s

@estebanpw

Just adding this in case someone else has this problem in 2022 when using python 3.8.10.
I changed the following line:

splitter = re.sub(' +', 's+', splitters[k])

to

splitter = re.sub(' +', '\s+', splitters[k])

and that fixed it. Note that I changed this only in the function def _generate_logformat_regex(self, logformat): of the file Spell/Spell.py (because I was using spell) and not in utils/logloader.py.

Understanding Python re(gex)?

This chapter will show how to match metacharacters literally. Examples will be discussed for both manually as well as programmatically constructed patterns. You’ll also learn about escape sequences supported by the re module.

Escaping with backslash

You have seen a few metacharacters and escape sequences that help to compose a RE. To match the metacharacters literally, i.e. to remove their special meaning, prefix those characters with a (backslash) character. To indicate a literal character, use \. This assumes you are using raw strings and not normal strings.

# even though ^ is not being used as anchor, it won't be matched literally
>>> bool(re.search(r'b^2', 'a^2 + b^2 - C*3'))
False
# escaping will work
>>> bool(re.search(r'b^2', 'a^2 + b^2 - C*3'))
True

# match ( or ) literally
>>> re.sub(r'(|)', '', '(a*b) + c')
'a*b + c'

# note that the input string is also a raw string here
>>> re.sub(r'\', '/', r'learnbyexample')
'/learn/by/example'

As emphasized earlier, regular expressions is just another tool to process text. Some examples and exercises presented in this book can be solved using normal string methods as well. It is a good practice to reason out whether regular expressions is needed for a given problem.

>>> eqn = 'f*(a^b) - 3*(a^b)'

# straightforward search and replace, no need RE shenanigans
>>> eqn.replace('(a^b)', 'c')
'f*c - 3*c'

re.escape()

Okay, what if you have a string variable that must be used to construct a RE — how to escape all the metacharacters? Relax, the re.escape() function has got you covered. No need to manually take care of all the metacharacters or worry about changes in future versions.

>>> expr = '(a^b)'
# print used here to show results similar to raw string
>>> print(re.escape(expr))
(a^b)

# replace only at the end of string
>>> eqn = 'f*(a^b) - 3*(a^b)'
>>> re.sub(re.escape(expr) + r'Z', 'c', eqn)
'f*(a^b) - 3*c'

Recall that in the Alternation section, join was used to dynamically construct RE pattern from an iterable of strings. However, that didn’t handle metacharacters. Here are some examples on how you can use re.escape() so that the resulting pattern will match the strings from the input iterable literally.

# iterable of strings, assume alternation precedence sorting isn't needed
>>> terms = ['a_42', '(a^b)', '2|3']
# using 're.escape' and 'join' to construct the pattern
>>> pat1 = re.compile('|'.join(re.escape(s) for s in terms))
# using only 'join' to construct the pattern
>>> pat2 = re.compile('|'.join(terms))

>>> print(pat1.pattern)
a_42|(a^b)|2|3
>>> print(pat2.pattern)
a_42|(a^b)|2|3

>>> s = 'ba_423 (a^b)c 2|3 a^b'
>>> pat1.sub('X', s)
'bX3 Xc X a^b'
>>> pat2.sub('X', s)
'bXX (a^b)c X|X a^b'

Escape sequences

Certain characters like tab and newline can be expressed using escape sequences as t and n respectively. These are similar to how they are treated in normal string literals. However, b is for word boundaries as seen earlier, whereas it stands for the backspace character in normal string literals.

The full list is mentioned at the end of docs.python: Regular Expression Syntax section as a b f n N r t u U v x \. Do read the documentation for details as well as how it differs for byte data.

>>> re.sub(r't', ':', 'atbtc')
'a:b:c'

>>> re.sub(r'n', ' ', '1n2n3')
'1 2 3'

warning If an escape sequence is not defined, you’ll get an error.

>>> re.search(r'e', 'hello')
re.error: bad escape e at position 0

You can also represent a character using hexadecimal escape of the format xNN where NN are exactly two hexadecimal characters. If you represent a metacharacter using escapes, it will be treated literally instead of its metacharacter feature.

# x20 is space character
>>> re.sub(r'x20', '', 'h e l l o')
'hello'

# x7c is '|' character
>>> re.sub(r'2x7c3', '5', '12|30')
'150'
>>> re.sub(r'2|3', '5', '12|30')
'15|50'

info See ASCII code table for a handy cheatsheet with all the ASCII characters and their hexadecimal representations.

Octal escapes will be discussed in the Backreference section. The Codepoints and Unicode escapes section will discuss escapes for unicode characters using u and U.

Cheatsheet and Summary

Note Description
prefix metacharacters with to match them literally
\ to match literally
re.escape() automatically escape all metacharacters
ex: '|'.join(re.escape(s) for s in iterable)
t escape sequences like those supported in string literals
b word boundary in RE but backspace in string literals
e undefined escapes will result in an error
xNN represent a character using hexadecimal value
x7c will match | literally

This short chapter discussed how to match metacharacters literally. re.escape() helps if you are using input strings sourced from elsewhere to build the final RE. You also saw how to use escape sequences to represent characters and how they differ from normal string literals.

Exercises

a) Transform the given input strings to the expected output using the same logic on both strings.

>>> str1 = '(9-2)*5+qty/3-(9-2)*7'
>>> str2 = '(qty+4)/2-(9-2)*5+pq/4'

##### add your solution here for str1
'35+qty/3-(9-2)*7'
##### add your solution here for str2
'(qty+4)/2-35+pq/4'

b) Replace (4)| with 2 only at the start or end of the given input strings.

>>> s1 = r'2.3/(4)|6 foo 5.3-(4)|'
>>> s2 = r'(4)|42 - (4)|3'
>>> s3 = 'two - (4)\|n'

>>> pat = re.compile()        ##### add your solution here

>>> pat.sub('2', s1)
'2.3/(4)\|6 foo 5.3-2'
>>> pat.sub('2', s2)
'242 - (4)\|3'
>>> pat.sub('2', s3)
'two - (4)\|n'

c) Replace any matching element from the list items with X for given the input strings. Match the elements from items literally. Assume no two elements of items will result in any matching conflict.

>>> items = ['a.b', '3+n', r'xyz', 'qty||price', '{n}']
>>> pat = re.compile()      ##### add your solution here

>>> pat.sub('X', '0a.bcd')
'0Xcd'
>>> pat.sub('X', 'E{n}AMPLE')
'EXAMPLE'
>>> pat.sub('X', r'43+n2 axyze')
'4X2 aXe'

d) Replace the backspace character b with a single space character for the given input string.

>>> ip = '123b456'
>>> ip
'123x08456'
>>> print(ip)
12456

>>> re.sub()        ##### add your solution here
'123 456'

e) Replace all occurrences of e with e.

>>> ip = r'there are common aspects among the alternations'

>>> re.sub()     ##### add your solution here
'there are common aspects among the alternations'

f) Replace any matching item from the list eqns with X for given the string ip. Match the items from eqns literally.

>>> ip = '3-(a^b)+2*(a^b)-(a/b)+3'
>>> eqns = ['(a^b)', '(a/b)', '(a^b)+2']

##### add your solution here

>>> pat.sub('X', ip)
'3-X*X-X+3'

>>> path
'd:/\temp\\'
>>> pat = '[{}]+'.format(re.escape('\/'))
>>> re.sub(pat, '\', path)
Traceback (most recent call last):
  File "<pyshell#78>", line 1, in <module>
    re.sub(pat, '\', path)
  File "C:UsersСергейAppDataLocalProgramsPythonPython35libre.py", line 182, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "C:UsersСергейAppDataLocalProgramsPythonPython35libre.py", line 325, in _subx
    template = _compile_repl(template, pattern)
  File "C:UsersСергейAppDataLocalProgramsPythonPython35libre.py", line 312, in _compile_repl
    p = sre_parse.parse_template(repl, pattern)
  File "C:UsersСергейAppDataLocalProgramsPythonPython35libsre_parse.py", line 849, in parse_template
    s = Tokenizer(source)
  File "C:UsersСергейAppDataLocalProgramsPythonPython35libsre_parse.py", line 225, in __init__
    self.__next()
  File "C:UsersСергейAppDataLocalProgramsPythonPython35libsre_parse.py", line 239, in __next
    self.string, len(self.string) - 1) from None
sre_constants.error: bad escape (end of pattern) at position 0
>>> pat
'[\\\/]+'
>>> 

In JS it works:

> 'd:/\temp\\'.replace(new RegExp('[\\\/]+', 'g'), '\')
"d:temp"
There is a problem with your replacement template. Python string literal '\' is Python string containing a single backslash character. But backslash has special meaning in a replacement template, it starts escapes and backreferences. For using a literal backslash, it should be escaped: r'\' or '\\'.

>>> re.sub(pat, r'\', path)
'd:\temp\'

Если вам нужно имя файла, регулярные выражения — не ваш ответ.

Python имеет модуль pathlib, предназначенный для обработки путей к файлам, и его объекты, помимо методов получения изолированного имени файла, обрабатывают все возможные угловые случаи, также имеют методы для открытия, перечисления файлов и выполнения всех обычных действий с файлом.

Чтобы получить базовое имя файла из пути, просто используйте его автоматические свойства:

In [1]: import pathlib

In [2]: name = pathlib.Path("/home/user/JHN097567898_01102019_050514_svc_dc.tar")

In [3]: name.name
Out[3]: 'JHN097567898_01102019_050514_svc_dc.tar'

In [4]: name.parent
Out[4]: PosixPath('/home/user')

В противном случае, даже если вы не будете использовать pathlib, поскольку os.path.sep является одним символом, в использовании re.split не будет никаких преимуществ — подойдет обычный string.split. На самом деле, есть еще os.path.split, который, предшествующий pathlib, всегда делал то же самое:

In [6]: name = "/home/user/JHN097567898_01102019_050514_svc_dc.tar"

In [7]: import os

In [8]: os.path.split(name)[-1]
Out[8]: 'JHN097567898_01102019_050514_svc_dc.tar'

И последнее (и в данном случае, по крайней мере, по крайней мере), причина ошибки заключается в том, что вы находитесь в Windows, а ваш символ os.path.sep — это «» — этот символ сам по себе не является полным регулярным выражением, поскольку механизм регулярных выражений ожидает символ, указывающий на специальная последовательность после «». Чтобы его можно было использовать с нашей ошибкой, вам необходимо сделать:

 re.split(re.escape(os.path.sep), "myfilepath")

Понравилась статья? Поделить с друзьями:
  • Error bad duplicates
  • Error bad crc checksum
  • Error bad argument type numberp nil
  • Error backpropagation algorithm
  • Error azk 9038 ошибка удаленного вызова задания