By Lenin Mishra
in
python
—
Jan 22, 2022
Handle Lookup error exceptions in Python using try-except block.
The LookupError
exception in Python forms the base class for all exceptions that are raised when an index or a key is not found for a sequence or dictionary respectively.
You can use LookupError
exception class to handle both IndexError
and KeyError
exception classes.
- LookupError
--> IndexError
--> KeyError
Example 1 — Handling IndexError exception
Code/Output
# lists
x = [1, 2, 3, 4]
try:
print(x[10])
except LookupError as e:
print(f"{e}, {e.__class__}")
>>> list index out of range, <class 'IndexError'>
# strings
x = "Pylenin"
try:
print(x[10])
except LookupError as e:
print(f"{e}, {e.__class__}")
>>> string index out of range, <class 'IndexError'>
# tuples
x = (1, 2, 3, 4)
try:
print(x[10])
except LookupError as e:
print(f"{e}, {e.__class__}")
>>> tuple index out of range, <class 'IndexError'>
As you can see, it is possible to catch IndexError
exceptions using the LookupError
exception class. By using e.__class__
method also helps you to identify the type of LookupError
. In the above example, it is an IndexError
.
Example 2 — Handling KeyError exception
Code
pylenin_info = {'name': 'Lenin Mishra',
'age': 28,
'language': 'Python'}
user_input = input('What do you want to learn about Pylenin==> ')
try:
print(f'{user_input} is {pylenin_info[user_input]}')
except LookupError as e:
print(f'{e}, {e.__class__}')
Output
What do you want to learn about Pylenin==> wife
'wife', <class 'KeyError'>
Check out other Python Built-in Exception classes in Python.
built-in-exception-classes — Pylenin
A programmer who aims to democratize education in the programming world and help his peers achieve the career of their dreams.
Summary: in this tutorial, you learn how to handle exceptions in Python in the right way by using the try
statement.
Introduction to the exception handling in Python
To handle exceptions, you use the try
statement. The try
statement has the following clauses:
Code language: Python (python)
try: # code that you want to protect from exceptions except <ExceptionType> as ex: # code that handle the exception finally: # code that always execute whether the exception occurred or not else: # code that excutes if try execute normally (an except clause must be present)
Let’s examine the try
statement in greater detail.
try
In the try
clause, you place the code that protects from one or more potential exceptions. It’s a good practice to keep the code as short as possible. Often, you’ll have a single statement in the try
clause.
The try
clause appears exactly one time in the try
statement.
except
In the except
clause, you place the code that handles a specific exception type. A try
statement can have zero or more except
clauses. Typically, each except
clause handles different exception types in specific ways.
In an except
clause, the as ex
is optional. And the <ExceptionType>
is also optional. However, if you omit the <ExceptionType> as ex
, you’ll have a bare exception handler.
When specifying exception types in the except
clauses, you place the most specific to least specific exceptions from top to bottom.
If you have the same logic that handles different exception types, you can group them in a single except
clause. For example:
Code language: Python (python)
try: ... except <ExceptionType1> as ex: log(ex) except <ExceptionType2> as ex: log(ex)
Become
Code language: Python (python)
try: ... except (<ExceptionType1>, <ExceptionType2>) as ex: log(ex)
It’s important to note that the except
order matters because Python will run the first except
clause whose exception type matches the occurred exception.
finally
The finally
clause may appear zero or 1 time in a try
statement. The finally
clause always executes whether an exception occurred or not.
else
The else
clause also appears zero or 1 time. And the else
clause is only valid if the try statement has at least one except
clause.
Typically, you place the code that executes if the try
clause terminates normally.
The following defines a function that returns the result of a number by another:
Code language: Python (python)
def divide(a, b): return a / b
If you pass 0 to the second argument, you’ll get a ZeroDivisionError
exception:
Code language: Python (python)
divide(10, 0)
Error:
Code language: Python (python)
ZeroDivisionError: division by zero
To fix it, you can handle the ZeroDivisionError
exception in the divide()
function as follows:
Code language: Python (python)
def divide(a, b): try: return a / b except ZeroDivisionError as ex: return None
In this example, the divide()
function returns None
if the ZeroDivisionError
occurs:
Code language: Python (python)
def divide(a, b): try: return a / b except ZeroDivisionError as ex: return None
When using the divide()
function, you need to check if the result is None
:
Code language: Python (python)
result = divide(10, 0) if result is not None: print('result:', result) else: print('Invalid inputs')
But returning None
may not be the best because others may accidentally evaluate the result in the if
statement like this:
Code language: Python (python)
result = divide(10, 0) if result: print('result:', result) else: print('Invalid inputs')
In this case, it works. However, it won’t work if the first argument is zero. For example:
Code language: Python (python)
result = divide(0, 10) if result: print('result:', result) else: print('Invalid inputs')
A better approach is to raise an exception to the caller if the ZeroDivisionError
exception occurred. For example:
Code language: Python (python)
def divide(a, b): try: return a / b except ZeroDivisionError as ex: raise ValueError('The second argument (b) must not be zero')
In this example, the divide()
function will raise an error if b
is zero. To use the divide()
function, you need to catch the ValueError
exception:
Code language: Python (python)
def divide(a, b): try: return a / b except ZeroDivisionError as ex: raise ValueError('The second argument (b) must not be zero') try: result = divide(10, 0) except ValueError as e: print(e) else: print('result:', result)
Output:
Code language: Python (python)
The second argument (b) must not be zero
It’s a good practice to raise an exception instead of returning None
in special cases.
Except order example
When you catch an exception in the except clause, you need to place the exceptions from most specific to the least specific in terms of exception hierarchy.
The following shows three exception classes: Exception
, LookupError
, and IndexError
:
If you catch the exception, you need to place them in the following order: IndexError, LookupErorr, and Exception.
For example, the following defines a list of three strings and attempts to access the 4th element:
Code language: Python (python)
colors = ['red', 'green', 'blue'] try: print(colors[3]) except IndexError as e: print(type(e), 'Index error') except LookupError as e: print(type(e), 'Lookup error')
It issues the following error:
Code language: Python (python)
<class 'IndexError'> Index error
The colors[3]
access causes an IndexError
exception. However, if you swap the except
clauses and catch the LookupError
first and the IndexError
second like this:
Code language: Python (python)
colors = ['red', 'green', 'blue'] try: print(colors[3]) except LookupError as e: print(type(e), 'Lookup error') except IndexError as e: print(type(e), 'Index error')
Output:
Code language: Python (python)
<class 'IndexError'> Lookup error
The exception is still IndexError
but the following message is misleading.
Bare exception handlers
When you want to catch any exception, you can use the bare exception handlers. A bare exception handler does not specify an exception type:
Code language: Python (python)
try: ... except: ...
It’s equivalent to the following:
Code language: Python (python)
try: ... except BaseException: ...
A bare exception handler will catch any exceptions including the SystemExit and KeyboardInterupt exceptions.
A bare exception will make it harder to interrupt a program with Control-C and disguise other programs.
If you want to catch all exceptions that signal program errors, you can use except Exception instead:
Code language: Python (python)
try: ... except Exception: ...
In practice, you should avoid using bare exception handlers. If you don’t know exceptions to catch, just let the exception occurs and then modify the code to handle these exceptions.
To get exception information from a bare exception handler, you use the exc_info()
function from the sys
module.
The sys.exc_info()
function returns a tuple that consists of three values:
type
is the type of the exception occurred. It’s a subclass of theBaseException
.value
is the instance of the exception type.traceback
is an object that encapsulates the call stack at the point where the exception originally ocurred.
The following example uses the sys.exc_info()
function to examine the exception when a string is divided by a number:
Code language: Python (python)
import sys try: '20' / 2 except: exc_info = sys.exc_info() print(exc_info)
Output:
Code language: Python (python)
(<class 'TypeError'>, TypeError("unsupported operand type(s) for /: 'str' and 'int'"), <traceback object at 0x000001F19F42E700>)
The output shows that the code in the try clause causes a TypeError
exception. Therefore, you can modify the code to handle it specifically as follows:
Code language: Python (python)
try: '20' / 2 except TypeError as e: print(e)
Output:
Code language: Python (python)
unsupported operand type(s) for /: 'str' and 'int'
Summary
- Use the
try
statement to handle exception. - Place only minimal code that you want to protect from potential exceptions in the
try
clause. - Handle exceptions from most specific to least specific in terms of exception types. The order of
except
clauses is important. - The finally always executes whether the exceptions occurred or not.
- The
else
clause only executes when thetry
clause terminates normally. Theelse
clause is valid only if thetry
statement has at least oneexcept
clause. - Avoid using bare exception handlers.
Did you find this tutorial helpful ?
Следующие исключения являются исключениями, которые обычно возникают во время исполнения программы.
Содержание:
- Исключение StopIteration
- Исключение StopAsyncIteration
- Исключение ArithmeticError
- Исключение AssertionError
- Исключение AttributeError
- Исключение BufferError
- Исключение EOFError
- Исключение ImportError
- Исключение ModuleNotFoundError
- Исключение LookupError
- Исключение IndexError
- Исключение KeyError
- Исключение MemoryError
- Исключение NameError
- Исключение UnboundLocalError
- Исключение OSError
- Исключение ReferenceError
- Исключение RuntimeError
- Исключение NotImplementedError
- Исключение RecursionError
- Исключение SyntaxError
- Исключение IndentationError
- Исключение TabError
- Исключение SystemError
- Исключение TypeError
- Исключение ValueError
- Исключение UnicodeError
- Исключение EnvironmentError
- Исключение IOError
- Исключение WindowsError
StopIteration
:
Исключение StopIteration
вызывается встроенной функцией next()
и методом итератора __next__()
, чтобы сигнализировать, что итератор больше не производит никаких элементов.
Объект исключения имеет единственный атрибут value
, который задается в качестве аргумента при создании исключения и по умолчанию равен None
.
Когда функция генератора или сопрограммы возвращается, создается новый экземпляр StopIteration
, и значение, возвращаемое функцией, используется в качестве параметра value
для конструктора исключения.
Если код генератора прямо или косвенно поднимает StopIteration
, он преобразуется в RuntimeError
, сохраняя StopIteration
как причину нового исключения.
StopAsyncIteration
:
Исключение StopAsyncIteration
вызывается методом __next__()
объекта асинхронного итератора, чтобы остановить итерацию.
ArithmeticError
:
AssertionError
:
Исключение AssertionError
вызывается когда оператор assert
терпит неудачу.
AttributeError
:
Исключение AttributeError
вызывается при сбое ссылки на атрибут или присвоения. Если объект не поддерживает ссылки на атрибуты или назначения атрибутов вообще, вызывается TypeError
.
BufferError
:
Исключение BufferError
вызывается когда операция, связанная с буфером, не может быть выполнена.
EOFError
:
Исключение EOFError
вызывается когда функция input()
попадает в состояние конца файла без чтения каких-либо данных. Когда методы io.IOBase.read()
and io.IOBase.readline()
возвращают пустую строку при попадании в EOF.
ImportError
:
Исключение ImportError
вызывается когда оператор import
имеет проблемы при попытке загрузить модуль. Также ImportError
поднимается, когда “из списка» в конструкция from ... import
имеет имя, которое не может быть найдено.
Атрибуты name
и path
можно задать с помощью аргументов конструктора, содержащих только ключевые слова. При установке они представляют имя модуля, который был предпринят для импорта, и путь к любому файлу, который вызвал исключение, соответственно.
-
ModuleNotFoundError
:Исключение
ModuleNotFoundError
подклассImportError
, который вызывается операторомimport
, когда модуль не может быть найден. Он также вызывается, когда вsys.modules
имеет значениеNone
.
LookupError
:
Исключение LookupError
— базовый класс для исключений, возникающих при недопустимости ключа или индекса, используемого в сопоставлении или последовательности: IndexError
, KeyError
. Исключение LookupError
может быть вызван непосредственно codecs.lookup()
.
-
IndexError
:Исключение
IndexError
вызывается когда индекс последовательности находится вне диапазона. Индексы среза усекаются без каких либо предупреждений, чтобы попасть в допустимый диапазон. Если индекс не является целым числом, поднимается исключениеTypeError
. -
KeyError
:Исключение
KeyError
вызывается когда ключ сопоставления словаря не найден в наборе существующих ключей.
MemoryError
:
Исключение MemoryError
вызывается, когда операции не хватает памяти, но ситуация все еще может быть спасена путем удаления некоторых объектов. Значение представляет собой строку, указывающую какой внутренней операции не хватило памяти. Обратите внимание, что из-за базовой архитектуры управления памятью интерпретатор не всегда может полностью восстановиться в этой ситуации. Тем не менее, возникает исключение, чтобы можно было напечатать трассировку стека.
NameError
:
Исключение NameError
вызывается, когда локальное или глобальное имя не найдено. Значение — это сообщение об ошибке, содержащее имя, которое не удалось найти.
-
UnboundLocalError
:Исключение
UnboundLocalError
вызывается, когда ссылка сделана на локальную переменную в функции или методе, но никакое значение не было привязано к этой переменной. Это подклассNameError
.
OSError
:
ReferenceError
:
Исключение ReferenceError
вызывается, когда слабый эталонный прокси-сервер, созданный функцией weakref.proxy()
используется для доступа к атрибуту референта после сбора его мусора.
RuntimeError
:
Исключение RuntimeError
вызывается при обнаружении ошибки, которая не попадает ни в одну из других категорий. Связанное значение является строкой, указывающей, что именно пошло не так.
-
NotImplementedError
:Исключение
NotImplementedError
получено изRuntimeError
. В определяемых пользователем базовых классах абстрактные методы должны вызывать это исключение, когда им требуется, чтобы производные классы переопределяли метод, или когда класс разрабатывается, чтобы указать, что реальная реализация все еще должна быть добавлена.Заметки:
- Его не следует использовать для указания того, что оператор или метод вообще не предполагается поддерживать — в этом случае либо оставьте оператор/метод неопределенным, либо, установите его в None.
NotImplementedError
иNotImplemented
не являются взаимозаменяемыми, даже если они имеют схожие имена и цели. Смотрите подробностиNotImplemented
о том, когда его использовать.
-
RecursionError
:Исключение
RecursionError
получено изRuntimeError
. ИсключениеRecursionError
вызывается, когда интерпретатор обнаруживает, что максимальная глубина рекурсииsys.getrecursionlimit()
превышена.
SyntaxError
:
Исключение SyntaxError
вызывается, когда синтаксический анализатор обнаруживает синтаксическую ошибку. Ошибка данного типа может произойти в инструкции import
, при вызове встроенной функции exec()
или eval()
, или при чтении первоначального сценария или стандартный ввода, также в интерактивном режиме.
Экземпляры этого класса имеют атрибуты filename
, lineno
, offset
и text
для облегчения доступа к информации. Функция str()
экземпляра исключения возвращает только сообщение.
-
IndentationError
:Исключение
IndentationError
служит базовым классом для синтаксических ошибок, связанных с неправильным отступом. Это подклассSyntaxError
. -
TabError
:
Исключение TabError
вызывается, когда отступ содержит несоответствующее использование символов табуляции и пробелов. Это подкласс IndentationError
.
SystemError
:
Исключение SystemError
вызывается, когда интерпретатор обнаруживает внутреннюю ошибку, но ситуация не выглядит настолько серьезной, чтобы заставить его отказаться от всякой надежды. Ассоциированное значение — это строка, указывающая, что пошло не так (в терминах низкого уровня).
TypeError
:
Исключение TypeError
вызывается, когда операция или функция применяется к объекту неподходящего типа. Связанное значение представляет собой строку, содержащую сведения о несоответствии типов.
Исключение TypeError
может быть вызвано пользовательским кодом, чтобы указать, что попытка выполнения операции над объектом не поддерживается и не должна поддерживаться. Если объект предназначен для поддержки данной операции, но еще не предоставил реализацию, то вызывайте исключение NotImplementedError
.
Передача аргументов неправильного типа, например передача списка, когда ожидается целое число, должна привести к TypeError
, но передача аргументов с неправильным значением, например число вне ожидаемых границ, должна привести к ValueError
.
ValueError
:
Исключение ValueError
вызывается, когда операция или функция получает аргумент, который имеет правильный тип, но недопустимое значение, и ситуация не описывается более точным исключением, таким как IndexError
.
UnicodeError
:
EnvironmentError
:
Доступно только в Windows.
IOError
:
Доступно только в Windows.
WindowsError
:
Доступно только в Windows.
- Exceptions are error scenarios that alter the normal execution flow of the program.
- The process of taking care of the possible exceptions is called exception handling.
- If exceptions are not handled properly, the program may terminate prematurely. It can cause data corruption or unwanted results.
- Python exception handling is achieved by three keyword blocks – try, except, and finally.
- The try block contains the code that may raise exceptions or errors.
- The except block is used to catch the exceptions and handle them.
- The catch block code is executed only when the corresponding exception is raised.
- There can be multiple catch blocks. We can also catch multiple exceptions in a single catch block.
- The finally block code is always executed, whether the program executed properly or it raised an exception.
- We can also create an “else” block with try-except block. The code inside the else block is executed if there are no exceptions raised.
How to Handle Exceptions in Python?
Let’s look at an example where we need exception handling.
def divide(x, y): print(f'{x}/{y} is {x / y}') divide(10, 2) divide(10, 0) divide(10, 4)
If we run the above program, we get the following output.
10/2 is 5.0 Traceback (most recent call last): File "/Users/pankaj/Documents/PycharmProjects/PythonTutorialPro/hello-world/exception_handling.py", line 6, in <module> divide(10, 0) File "/Users/pankaj/Documents/PycharmProjects/PythonTutorialPro/hello-world/exception_handling.py", line 2, in divide print(f'{x}/{y} is {x / y}') ZeroDivisionError: division by zero
The second call to the divide() function raised ZeroDivisionError exception and the program terminated.
We never got the output of the third call to divide() method because we didn’t do exception handling in our code.
Let’s rewrite the divide() method with proper exception handling. If someone tries to divide by 0, we will catch the exception and print an error message. This way the program will not terminate prematurely and the output will make more sense.
def divide(x, y): try: print(f'{x}/{y} is {x / y}') except ZeroDivisionError as e: print(e) divide(10, 2) divide(10, 0) divide(10, 4)
Output:
10/2 is 5.0 division by zero 10/4 is 2.5
What is BaseException Class?
The BaseException class is the base class of all the exceptions. It has four sub-classes.
- Exception – this is the base class for all non-exit exceptions.
- GeneratorExit – Request that a generator exit.
- KeyboardInterrupt – Program interrupted by the user.
- SystemExit – Request to exit from the interpreter.
Some Built-In Exception Classes
Some of the built-in exception classes in Python are:
- ArithmeticError – this is the base class for arithmetic errors.
- AssertionError – raised when an assertion fails.
- AttributeError – when the attribute is not found.
- BufferError
- EOFError – reading after end of file
- ImportError – when the imported module is not found.
- LookupError – base exception for lookup errors.
- MemoryError – when out of memory occurs
- NameError – when a name is not found globally.
- OSError – base class for I/O errors
- ReferenceError
- RuntimeError
- StopIteration, StopAsyncIteration
- SyntaxError – invalid syntax
- SystemError – internal error in the Python Interpreter.
- TypeError – invalid argument type
- ValueError – invalid argument value
Some Built-In Warning Classes
The Warning class is the base class for all the warnings. It has the following sub-classes.
- BytesWarning – bytes, and buffer related warnings, mostly related to string conversion and comparison.
- DeprecationWarning – warning about deprecated features
- FutureWarning – base class for warning about constructs that will change semantically in the future.
- ImportWarning – warning about mistakes in module imports
- PendingDeprecationWarning – warning about features that will be deprecated in future.
- ResourceWarning – resource usage warnings
- RuntimeWarning – warnings about dubious runtime behavior.
- SyntaxWarning – warning about dubious syntax
- UnicodeWarning – Unicode conversion-related warnings
- UserWarning – warnings generated by the user code
Handling Multiple Exceptions in a Single Except Block
A try block can have multiple except blocks. We can catch specific exceptions in each of the except blocks.
def divide(x, y): try: print(f'{x}/{y} is {x / y}') except ZeroDivisionError as e: print(e) except TypeError as e: print(e) except ValueError as e: print(e)
The code in every except block is the same. In this scenario, we can handle multiple exceptions in a single except block. We can pass a tuple of exception objects to an except block to catch multiple exceptions.
def divide(x, y): try: print(f'{x}/{y} is {x / y}') except (ZeroDivisionError, TypeError, ValueError) as e: print(e)
Catch-All Exceptions in a Single Except Block
If we don’t specify any exception class in the except block, it will catch all the exceptions raised by the try block. It’s beneficial to have this when we don’t know about the exceptions that the try block can raise.
The empty except clause must be the last one in the exception handling chain.
def divide(x, y): try: print(f'{x}/{y} is {x / y}') except ZeroDivisionError as e: print(e) except: print("unknown error occurred")
Using else Block with try-except
The else block code is optional. It’s executed when there are no exceptions raised by the try block.
def divide(x, y): try: print(f'{x}/{y} is {x / y}') except ZeroDivisionError as e: print(e) else: print("divide() function worked fine.") divide(10, 2) divide(10, 0) divide(10, 4)
Output:
The else block code executed twice when the divide() function try block worked without any exception.
Using finally Block with try-except
The finally block code is executed in all the cases, whether there is an exception or not. The finally block is used to close resources and perform clean-up activities.
def divide(x, y): try: print(f'{x}/{y} is {x / y}') except ZeroDivisionError as e: print(e) else: print("divide() function worked fine.") finally: print("close all the resources here") divide(10, 2) divide(10, 0) divide(10, 4)
Output:
Now that we have seen everything related to exception handling in Python, the final syntax is:
try -> except 1...n -> else -> finally
We can have many except blocks for a try block. But, we can have only one else and finally block.
Creating Custom Exception Class
We can create a custom exception class by extending Exception class. The best practice is to create a base exception and then derive other exception classes. Here are some examples of creating user-defined exception classes.
class EmployeeModuleError(Exception): """Base Exception Class for our Employee module""" pass class EmployeeNotFoundError(EmployeeModuleError): """Error raised when employee is not found in the database""" def __init__(self, emp_id, msg): self.employee_id = emp_id self.error_message = msg class EmployeeUpdateError(EmployeeModuleError): """Error raised when employee update fails""" def __init__(self, emp_id, sql_error_code, sql_error_msg): self.employee_id = emp_id self.error_message = sql_error_msg self.error_code = sql_error_code
The naming convention is to suffix the name of exception class with “Error”.
Raising Exceptions
We can use raise keyword to throw an exception from our code. Some of the possible scenarios are:
- Function input parameters validation fails
- Catching an exception and then throwing a custom exception
class ValidationError(Exception): pass def divide(x, y): try: if type(x) is not int: raise TypeError("Unsupported type") if type(y) is not int: raise TypeError("Unsupported type") except TypeError as e: print(e) raise ValidationError("Invalid type of arguments") if y is 0: raise ValidationError("We can't divide by 0.") try: divide(10, 0) except ValidationError as ve: print(ve) try: divide(10, "5") except ValidationError as ve: print(ve)
Output:
We can't divide by 0. Unsupported type Invalid type of arguments
Nested try-except Blocks Example
We can have nested try-except blocks in Python. In this case, if an exception is raised in the nested try block, the nested except block is used to handle it. In case the nested except is not able to handle it, the outer except blocks are used to handle the exception.
x = 10 y = 0 try: print("outer try block") try: print("nested try block") print(x / y) except TypeError as te: print("nested except block") print(te) except ZeroDivisionError as ze: print("outer except block") print(ze)
Output:
outer try block nested try block outer except block division by zero
Python Exception Handling Best Practices
- Always try to handle the exception in the code to avoid abnormal termination of the program.
- When creating a custom exception class, suffix its name with “Error”.
- If the except clauses have the same code, try to catch multiple exceptions in a single except block.
- Use finally block to close heavy resources and remove heavy objects.
- Use else block to log successful execution of the code, send notifications, etc.
- Avoid bare except clause as much as possible. If you don’t know about the exceptions, then only use it.
- Create module-specific exception classes for specific scenarios.
- You can catch exceptions in an except block and then raise another exception that is more meaningful.
- Always raise exceptions with meaningful messages.
- Avoid nested try-except blocks because it reduces the readability of the code.
References:
- Python Exception Handling Documentation
def __init__(self, raw_hdfs_file, fs, mode, encoding=None, errors=None): self.mode = mode self.base_mode, is_text = common.parse_mode(self.mode) self.buff_size = raw_hdfs_file.buff_size if self.buff_size <= 0: self.buff_size = common.BUFSIZE if is_text: self.__encoding = encoding or self.__class__.ENCODING self.__errors = errors or self.__class__.ERRORS try: codecs.lookup(self.__encoding) codecs.lookup_error(self.__errors) except LookupError as e: raise ValueError(e) else: if encoding: raise ValueError( "binary mode doesn't take an encoding argument") if errors: raise ValueError("binary mode doesn't take an errors argument") self.__encoding = self.__errors = None cls = io.BufferedReader if self.base_mode == "r" else io.BufferedWriter self.f = cls(raw_hdfs_file, buffer_size=self.buff_size) self.__fs = fs info = fs.get_path_info(self.f.raw.name) self.__name = info["name"] self.__size = info["size"] self.closed = False
def validate_encoding_error_handler(setting, value, option_parser, config_parser=None, config_section=None): try: codecs.lookup_error(value) except AttributeError: # prior to Python 2.3 if value not in ("strict", "ignore", "replace", "xmlcharrefreplace"): raise ( LookupError( 'unknown encoding error handler: "%s" (choices: ' '"strict", "ignore", "replace", or "xmlcharrefreplace")' % value ), None, sys.exc_info()[2], ) except LookupError: raise ( LookupError( 'unknown encoding error handler: "%s" (choices: ' '"strict", "ignore", "replace", "backslashreplace", ' '"xmlcharrefreplace", and possibly others; see documentation for ' "the Python ``codecs`` module)" % value ), None, sys.exc_info()[2], ) return value
def test_lookup_error(self): #sanity self.assertRaises(LookupError, codecs.lookup_error, "blah garbage xyz") def garbage_error1(someError): pass codecs.register_error("blah garbage xyz", garbage_error1) self.assertEqual(codecs.lookup_error("blah garbage xyz"), garbage_error1) def garbage_error2(someError): pass codecs.register_error("some other", garbage_error2) self.assertEqual(codecs.lookup_error("some other"), garbage_error2)
def change_encoding(file, encoding=None, errors=ERRORS): encoding = encoding or file.encoding errors = errors or file.errors codecs.lookup_error(errors) newfile = io.TextIOWrapper(file.buffer, encoding, errors, line_buffering=file.line_buffering) newfile.mode = file.mode newfile._changed_encoding = True return newfile
def register_surrogateescape(): """ Registers the surrogateescape error handler on Python 2 (only) """ if utils.PY3: return try: codecs.lookup_error(FS_ERRORS) except LookupError: codecs.register_error(FS_ERRORS, surrogateescape_handler)
def validate_encoding_error_handler(setting, value, option_parser, config_parser=None, config_section=None): try: codecs.lookup_error(value) except LookupError: raise LookupError( 'unknown encoding error handler: "%s" (choices: ' '"strict", "ignore", "replace", "backslashreplace", ' '"xmlcharrefreplace", and possibly others; see documentation for ' 'the Python ``codecs`` module)' % value) return value
def test_lookup(self): self.assertEquals(codecs.strict_errors, codecs.lookup_error("strict")) self.assertEquals(codecs.ignore_errors, codecs.lookup_error("ignore")) self.assertEquals(codecs.strict_errors, codecs.lookup_error("strict")) self.assertEquals( codecs.xmlcharrefreplace_errors, codecs.lookup_error("xmlcharrefreplace") ) self.assertEquals( codecs.backslashreplace_errors, codecs.lookup_error("backslashreplace") )
def test_lookup(self): if test_support.due_to_ironpython_bug("http://tkbgitvstfat01:8080/WorkItemTracking/WorkItem.aspx?artifactMoniker=148421"): return self.assertEquals(codecs.strict_errors, codecs.lookup_error("strict")) self.assertEquals(codecs.ignore_errors, codecs.lookup_error("ignore")) self.assertEquals(codecs.strict_errors, codecs.lookup_error("strict")) self.assertEquals( codecs.xmlcharrefreplace_errors, codecs.lookup_error("xmlcharrefreplace") ) self.assertEquals( codecs.backslashreplace_errors, codecs.lookup_error("backslashreplace") )
def latscii_error( uerr ): key = ord(uerr.object[uerr.start:uerr.end]) try: return unichr(decoding_map[key]), uerr.end except KeyError: handler = codecs.lookup_error('replace') return handler(uerr)
def open_file_read_unicode(fname, which_error_handler="replace-if-possible"): """Open and read the file named 'fname', returning a Unicode string. It will also try to gloss over any Unicode-decoding errors that may occur, such as: UnicodeDecodeError: 'utf8' codec can't decode byte 0x97 in position 867373: invalid start byte It will return the string read (as a Unicode string object), plus a boolean value of whether the string contains non-ASCII Unicode. It will also return a list of objects describing any Unicode-decoding errors that occurred. (So IN SUMMARY, it returns a tuple of THREE ITEMS. I HOPE THIS IS CLEAR.) """ error_handler = codecs.lookup_error(which_error_handler) error_handler.reset() # Note that we open the file with the encoding "utf-8-sig", since this # encoding will remove the BOM (byte-order mark) if present. # See http://docs.python.org/library/codecs.html ; search for "-sig". f = codecs.open(fname, encoding="utf-8-sig", errors=which_error_handler) # 's' will be a Unicode string, which may or may not contain non-ASCII. s = f.read() return (s, contains_non_ascii_unicode(s), error_handler.errors)
def validate_encoding_error_handler(name, value): try: codecs.lookup_error(value) except AttributeError: # prior to Python 2.3 if value not in ('strict', 'ignore', 'replace'): raise (LookupError( 'unknown encoding error handler: "%s" (choices: ' '"strict", "ignore", or "replace")' % value), None, sys.exc_info()[2]) except LookupError: raise (LookupError( 'unknown encoding error handler: "%s" (choices: ' '"strict", "ignore", "replace", "backslashreplace", ' '"xmlcharrefreplace", and possibly others; see documentation for ' 'the Python ``codecs`` module)' % value), None, sys.exc_info()[2]) return value
def quote_ident(self, str): encodable = str.encode("utf-8", "strict").decode("utf-8") nul_index = encodable.find("x00") if nul_index >= 0: error = UnicodeEncodeError("NUL-terminated utf-8", encodable, nul_index, nul_index + 1, "NUL not allowed") error_handler = codecs.lookup_error(errors) replacement, _ = error_handler(error) encodable = encodable.replace("x00", replacement) return '"' + encodable.replace('"', '""') + '"'
def encode(self, input, errors='strict'): error = codecs.lookup_error(errors) def repl(match): start, end = match.span() return encoding_map.get(match.group()) or error(UnicodeEncodeError(encoding, input, start, end, "undefined conversion emoji"))[0] output = google_emoji_re.sub(repl, input) return (base_codec.encode(output, errors)[0], len(input))
def quote_identifier(s, errors="strict"): encodable = s.encode("utf-8", errors).decode("utf-8") nul_index = encodable.find("x00") if nul_index >= 0: error = UnicodeEncodeError("NUL-terminated utf-8", encodable, nul_index, nul_index + 1, "NUL not allowed") error_handler = codecs.lookup_error(errors) replacement, _ = error_handler(error) encodable = encodable.replace("x00", replacement) return """ + encodable.replace(""", """") + """
def convert(conv, data, final, errors='strict'): try: res = conv.convert(data, finished=final, options=Option.DontUseReplacementChar) return (res,len(res)) except UnicodeEncodeError as uerr: rep,rp = codecs.lookup_error(errors)(uerr) try: prefix = conv.convert(uerr.object[:uerr.start] + rep, finished=final, options=Option.DontUseReplacementChar) except UnicodeEncodeError: raise UnicodeEncodeError(*(uerr.args[:4] + ('cannot convert replacement %r to target encoding' % rep,))) suffix = Codec.convert(conv, data[rp:], final, errors) return (prefix+suffix[0],rp+suffix[1]) except UnicodeDecodeError as uerr: rep,rp = codecs.lookup_error(errors)(uerr) prefix = conv.convert(uerr.object[:uerr.start], finished=final, options=Option.DontUseReplacementChar) suffix = Codec.convert(conv, data[rp:], final, errors) return (prefix+rep+suffix[0],rp+suffix[1])
def imap_utf7_decode(input, errors='strict'): error = codecs.lookup_error(errors) output = [] shifted = 0 b64 = False i = 0 while i < len(input): b = input[i] if b64: if b == 0x2d: # '-' if shifted == i: output.append('&') else: dec = bytes(input[shifted:i]) + b'=' * ((4 - (i - shifted)) % 4) try: utf16 = base64.b64decode(dec, altchars=b'+,', validate=True) output.append(utf16.decode('utf-16-be')) except (binascii.Error, UnicodeDecodeError) as e: if isinstance(e, binascii.Error): reason = 'invalid Base64' else: reason = 'invalid UTF-16BE' exc = UnicodeDecodeError('imap-utf-7', input, shifted - 1, i + 1, reason) replace, i = error(exc) shifted = i output.append(replace) b64 = False continue shifted = i + 1 b64 = False else: if b == 0x26: # '&' output.append(codecs.decode(input[shifted:i], 'ascii')) shifted = i + 1 b64 = True if b < 0x20 or b > 0x7e: output.append(codecs.decode(input[shifted:i], 'ascii')) exc = UnicodeDecodeError('imap-utf-7', input, i, i + 1, 'character must be Base64 encoded') replace, i = error(exc) shifted = i output.append(replace) continue i += 1 if b64: exc = UnicodeDecodeError('imap-utf-7', input, len(input), len(input), 'input does not end in US-ASCII') replace, cont = error(exc) output.append(replace) else: output.append(codecs.decode(input[shifted:], 'ascii')) return ''.join(output), len(input)
def _quote_identifier(self, s, errors="ignore"): encodable = s.encode("utf-8", errors).decode("utf-8") nul_index = encodable.find("x00") if nul_index >= 0: error = UnicodeEncodeError("utf-8", encodable, nul_index, nul_index + 1, "NUL not allowed") error_handler = codecs.lookup_error(errors) replacement, _ = error_handler(error) encodable = encodable.replace("x00", replacement) return u'"' + encodable.replace('"', '""') + u'"'
def _fscodec(): encoding = sys.getfilesystemencoding() if encoding == 'mbcs': errors = 'strict' else: try: from codecs import lookup_error lookup_error('surrogateescape') except LookupError: errors = 'strict' else: errors = 'surrogateescape' def fsencode(filename): """ Encode filename to the filesystem encoding with 'surrogateescape' error handler, return bytes unchanged. On Windows, use 'strict' error handler if the file system encoding is 'mbcs' (which is the default encoding). """ if isinstance(filename, six.binary_type): return filename elif isinstance(filename, six.text_type): return filename.encode(encoding, errors) else: raise TypeError("expect bytes or str, not %s" % type(filename).__name__) def fsdecode(filename): """ Decode filename from the filesystem encoding with 'surrogateescape' error handler, return str unchanged. On Windows, use 'strict' error handler if the file system encoding is 'mbcs' (which is the default encoding). """ if isinstance(filename, six.text_type): return filename elif isinstance(filename, six.binary_type): return filename.decode(encoding, errors) else: raise TypeError("expect bytes or str, not %s" % type(filename).__name__) return fsencode, fsdecode
def test_fake_error_class(self): handlers = [ codecs.strict_errors, codecs.ignore_errors, codecs.replace_errors, codecs.backslashreplace_errors, codecs.xmlcharrefreplace_errors, codecs.lookup_error('surrogateescape'), codecs.lookup_error('surrogatepass'), ] for cls in UnicodeEncodeError, UnicodeDecodeError, UnicodeTranslateError: class FakeUnicodeError(str): __class__ = cls for handler in handlers: with self.subTest(handler=handler, error_class=cls): self.assertRaises(TypeError, handler, FakeUnicodeError()) class FakeUnicodeError(Exception): __class__ = cls for handler in handlers: with self.subTest(handler=handler, error_class=cls): with self.assertRaises((TypeError, FakeUnicodeError)): handler(FakeUnicodeError())
def ignore_unicode_errors(errors='ignore'): """Overwrite the ``strict`` codecs error handler temporarily. This is useful e.g. if the engine truncates a string, which results in a string that contains a splitted multi-byte character at the end of the string. :param str errors: Error handler that will be looked up via :func:`codecs.lookup_error`. :raise LookupError: Raised if the error handler was not found. Example: .. code:: python import memory # Allocate four bytes to create an erroneous string ptr = memory.alloc(4) # Write data to the memory that will usually result in a # UnicodeDecodeError ptr.set_uchar(ord('a'), 0) ptr.set_uchar(ord('b'), 1) ptr.set_uchar(226, 2) # Add the invalid byte ptr.set_uchar(0, 3) # Indicate the end of the string with ignore_unicode_errors(): # Read the data as a string. Now, it will only print 'ab', because # the invalid byte has been removed/ignored. print(ptr.get_string_array()) """ old_handler = codecs.lookup_error('strict') codecs.register_error('strict', codecs.lookup_error(errors)) try: yield finally: codecs.register_error('strict', old_handler)
def latscii_error(uerr): text = uerr.object[uerr.start:uerr.end] ret = '' for c in text: key = ord(c) try: ret += unichr(decoding_map[key]) except KeyError: handler = codecs.lookup_error('replace') return handler(uerr) return ret, uerr.end
def _quote(s, errors='strict'): encodable = s.encode('utf-8', errors).decode('utf-8') nul_index = encodable.find('x00') if nul_index >= 0: error = UnicodeEncodeError('NUL-terminated utf-8', encodable, nul_index, nul_index + 1, 'NUL not allowed') error_handler = codecs.lookup_error(errors) replacement, _ = error_handler(error) encodable = encodable.replace('x00', replacement) return '"' + encodable.replace('"', '""') + '"'
def _quote_id(self, s, errors=u"strict"): encodable = s.encode("utf-8", errors).decode(u"utf-8") nul_index = encodable.find(u"x00") if nul_index >= 0: error = UnicodeEncodeError(u"NUL-terminated utf-8", encodable, nul_index, nul_index + 1, u"NUL not allowed") error_handler = codecs.lookup_error(errors) replacement, _ = error_handler(error) encodable = encodable.replace(u"x00", replacement) return u""" + encodable.replace(u""", u"""") + u"""
def __init__(self, transmogrifier, name, options, previous): self.previous = previous if options.get('from'): from_ = options['from'].strip().lower() if from_ != 'unicode': if from_ == 'default': from_ = _get_default_encoding(transmogrifier.context) # Test if the decoder is available codecs.getdecoder(from_) self.from_ = from_ self.from_error_handler = options.get( 'from-error-handler', self.from_error_handler).strip().lower() # Test if the error handler is available codecs.lookup_error(self.from_error_handler) if options.get('to'): to = options['to'].strip().lower() if to != 'unicode': if to == 'default': to = _get_default_encoding(transmogrifier.context) # Test if the encoder is available codecs.getencoder(to) self.to = to self.to_error_handler = options.get( 'to-error-handler', self.to_error_handler).strip().lower() # Test if the error handler is available codecs.lookup_error(self.to_error_handler) self.matcher = Matcher(*options['keys'].splitlines()) self.condition = Condition(options.get('condition', 'python:True'), transmogrifier, name, options)
def quote_identifier(s, errors="strict"): # Quotes a SQLite identifier. Source: http://stackoverflow.com/a/6701665 encodable = s.encode("utf-8", errors).decode("utf-8") nul_index = encodable.find("x00") if nul_index >= 0: error = UnicodeEncodeError("NUL-terminated utf-8", encodable, nul_index, nul_index + 1, "NUL not allowed") error_handler = codecs.lookup_error(errors) replacement, _ = error_handler(error) encodable = encodable.replace("x00", replacement) return """ + encodable.replace(""", """") + """
def _fscodec(): encoding = sys.getfilesystemencoding() errors = "strict" if encoding != "mbcs": try: codecs.lookup_error("surrogateescape") except LookupError: pass else: errors = "surrogateescape" def fsencode(filename): """ Encode filename to the filesystem encoding with 'surrogateescape' error handler, return bytes unchanged. On Windows, use 'strict' error handler if the file system encoding is 'mbcs' (which is the default encoding). """ if isinstance(filename, bytes): return filename else: return filename.encode(encoding, errors) return fsencode
def test_longstrings(self): # test long strings to check for memory overflow problems errors = [ "strict", "ignore", "replace", "xmlcharrefreplace", "backslashreplace"] # register the handlers under different names, # to prevent the codec from recognizing the name for err in errors: codecs.register_error("test." + err, codecs.lookup_error(err)) l = 1000 errors += [ "test." + err for err in errors ] for uni in [ s*l for s in (u"x", u"u3042", u"axe4") ]: for enc in ("ascii", "latin-1", "iso-8859-1", "iso-8859-15", "utf-8", "utf-7", "utf-16"): for err in errors: try: uni.encode(enc, err) except UnicodeError: pass
def quote_identifier(s, errors="replace"): ''' SqLite does not provide an identifier sanitizer so we use this method ''' encodable = s.encode("utf-8", errors).decode("utf-8") nul_index = encodable.find("x00") if nul_index >= 0: error = UnicodeEncodeError("NUL-terminated utf-8", encodable, nul_index, nul_index + 1, "NUL not allowed") error_handler = codecs.lookup_error(errors) replacement, _ = error_handler(error) encodable = encodable.replace("x00", replacement) return """ + encodable.replace(""", """") + """
def decode(self, input, errors='strict', final=True): error_function = codecs.lookup_error(errors) input_buffer = ByteBuffer.wrap(array('b', input)) decoder = Charset.forName(self.encoding).newDecoder() output_buffer = CharBuffer.allocate(min(max(int(len(input) / 2), 256), 1024)) builder = StringBuilder(int(decoder.averageCharsPerByte() * len(input))) while True: result = decoder.decode(input_buffer, output_buffer, False) pos = output_buffer.position() output_buffer.rewind() builder.append(output_buffer.subSequence(0, pos)) if result.isUnderflow(): if final: _process_incomplete_decode(self.encoding, input, error_function, input_buffer, builder) break _process_decode_errors(self.encoding, input, result, error_function, input_buffer, builder) return builder.toString(), input_buffer.position()
def test_badandgoodsurrogateescapeexceptions(self): surrogateescape_errors = codecs.lookup_error('surrogateescape') # "surrogateescape" complains about a non-exception passed in self.assertRaises( TypeError, surrogateescape_errors, 42 ) # "surrogateescape" complains about the wrong exception types self.assertRaises( TypeError, surrogateescape_errors, UnicodeError("ouch") ) # "surrogateescape" can not be used for translating self.assertRaises( TypeError, surrogateescape_errors, UnicodeTranslateError("udc80", 0, 1, "ouch") ) # Use the correct exception for s in ("a", "udc7f", "udd00"): with self.subTest(str=s): self.assertRaises( UnicodeEncodeError, surrogateescape_errors, UnicodeEncodeError("ascii", s, 0, 1, "ouch") ) self.assertEqual( surrogateescape_errors( UnicodeEncodeError("ascii", "audc80b", 1, 2, "ouch")), (b"x80", 2) ) self.assertRaises( UnicodeDecodeError, surrogateescape_errors, UnicodeDecodeError("ascii", bytearray(b"a"), 0, 1, "ouch") ) self.assertEqual( surrogateescape_errors( UnicodeDecodeError("ascii", bytearray(b"ax80b"), 1, 2, "ouch")), ("udc80", 2) )