Python base64 binascii error incorrect padding

I have some data that is base64 encoded that I want to convert back to binary even if there is a padding error in it. If I use base64.decodestring(b64_string) it raises an 'Incorrect padding' er...

There are two ways to correct the input data described here, or, more specifically and in line with the OP, to make Python module base64’s b64decode method able to process the input data to something without raising an un-caught exception:

  1. Append == to the end of the input data and call base64.b64decode(…)
  2. If that raises an exception, then

    i. Catch it via try/except,

    ii. (R?)Strip any = characters from the input data (N.B. this may not be necessary),

    iii. Append A== to the input data (A== through P== will work),

    iv. Call base64.b64decode(…) with those A==-appended input data

The result from Item 1. or Item 2. above will yield the desired result.

Caveats

This does not guarantee the decoded result will be what was originally encoded, but it will (sometimes?) give the OP enough to work with:

Even with corruption I want to get back to the binary because I can still get some useful info from the ASN.1 stream»).

See What we know and Assumptions below.

TL;DR

From some quick tests of base64.b64decode(…)

  1. it appears that it ignores non-[A-Za-z0-9+/] characters; that includes ignoring =s unless they are the last character(s) in a parsed group of four, in which case the =s terminate the decoding (a=b=c=d= gives the same result as abc=, and a==b==c== gives the same result as ab==).

  2. It also appears that all characters appended are ignored after the point where base64.b64decode(…) terminates decoding e.g. from an = as the fourth in a group.

As noted in several comments above, there are either zero, or one, or two, =s of padding required at the end of input data for when the [number of parsed characters to that point modulo 4] value is 0, or 3, or 2, respectively. So, from items 3. and 4. above, appending two or more =s to the input data will correct any [Incorrect padding] problems in those cases.

HOWEVER, decoding cannot handle the case where the [total number of parsed characters modulo 4] is 1, because it takes a least two encoded characters to represent the first decoded byte in a group of three decoded bytes. In uncorrupted encoded input data, this [N modulo 4]=1 case never happens, but as the OP stated that characters may be missing, it could happen here. That is why simply appending =s will not always work, and why appending A== will work when appending == does not. N.B. Using [A] is all but arbitrary: it adds only cleared (zero) bits to the decoded, which may or not be correct, but then the object here is not correctness but completion by base64.b64decode(…) sans exceptions.

What we know from the OP and especially subsequent comments is

  • It is suspected that there are missing data (characters) in the
    Base64-encoded input data
  • The Base64 encoding uses the standard 64 place-values plus padding:
    A-Z; a-z; 0-9; +; /; = is padding. This is confirmed, or at least
    suggested, by the fact that openssl enc ... works.

Assumptions

  • The input data contain only 7-bit ASCII data
  • The only kind of corruption is missing encoded input data
  • The OP does not care about decoded output data at any point after that corresponding to any missing encoded input data

Github

Here is a wrapper to implement this solution:

https://github.com/drbitboy/missing_b64

Содержание

  1. OIDC + Python3 + Incorrect padding during base64decode #525
  2. Comments
  3. Incorrect padding error #2513
  4. Comments
  5. Python: игнорировать ошибку неправильного заполнения при декодировании base64
  6. binascii.Error: Incorrect padding #2176
  7. Comments

OIDC + Python3 + Incorrect padding during base64decode #525

When attempting to decode the token, I run into this issue in _load_oid_token

The non py3 block appends «==» to the token part before decoding it. If i use that, it works fine.

The text was updated successfully, but these errors were encountered:

Ran into this issue today as well on 3.6.4

Same on 3.5.5 and 3.6.6.

This is the reference to the PR to fix this: kubernetes-client/python-base#70

Same on Python 3.4

@bpicolo Could you please help to fix the issue? We need to run python 3.6.5 with this client.

Ran into the same issue with Python 3.7 with the latest version of this client (8.0.0 at this time of writing)

Same on 3.7.2 and 3.6.0

Same on python 3.7.2 + kubernetes 9.0.0

Anyone know why no fix for this has been merged?

@yliaog — would you (or any owner) be able to provide a comment on this? Just wondering what your thoughts are on this issue as it’s been around for a while and probably affecting quite a large user base.

it looks @micw523 is reviewing the fix PR kubernetes-client/python-base#70, @micw523 do you have any further comment on the PR? we’d like to get it merged soon. Thanks.

Hi @yliaog, kubernetes-client/python-base#70 seems to have been abandoned and the fork does not exist any more. kubernetes-client/python-base#79 is active and seems like the author just pushed some new commits.

However, due to the new version of minikube, we need to have #797 merged first to unblock the PRs. We can’t have the CI to pass unless we get that one through.

Thanks @micw523 i’ve lgtm’ed #797

and minor correction, the PR actively worked on is in python-base repo:
kubernetes-client/python-base#79

#797 has merged and it looks like kubernetes-client/python-base#79 passed CI.
Just waiting on approval there?

Can I bump this? My existing code is Python 3.6 and I can’t port it back to 2.x due to other libraries. The PR works for me, can we get it pushed into PyPi plz? Thank you.

Источник

Incorrect padding error #2513

Hi,
This is in regards to the pubsub module. When I send a test message with data «testing 1 2 3» from the web console to a simple python application it fails to decode the message and throws a

I get the same error if I send the same message using the Go libraries.
I do not get the error if I send the message using python however.
All I’m doing is receiving and printing out the messages.

The text was updated successfully, but these errors were encountered:

When I send a test message with data «testing 1 2 3»

Can you give some code snippets indicating how you do this? Thanks.

I was using the Google Cloud console to do so by clicking on the publish button. I also did it using the Go library I’ll get you the code for that later today.

Here is what the go code looks like, we sent the data «testing 1 2 3» .

I would also add that «testing 1 2 3 4» works fine so I’m guessing its something to do with how other languages pad the base64 encoded bytestring. Maybe you can do something like this,

To check for incorrect padding and add them in as necessary?

@Teddy-Schmitz, this is actually the github repository for the Google Cloud Python library.

In your Go example(which I’m not super familiar with Go just FYI), it appears that you may need to cast data to bytes.

@omaray, do you know who to contact about this being an issue in the console?

@daspecster I’d guess this is our problem too. It seems like we ASSUME that base64 values are properly padded but other libraries may strip padding to reduce payload size?

@dhermes, oh I see. For some reason I dove head first into the Go code. (Go is intriguing).

@Teddy-Schmitz sorry about that, I think I understand the problem better now.

It appears that the console is sending message data that is not base64 encoded at all.

The docs for GRPC and REST say that.

The message payload. For JSON requests, the value of this field must be base64-encoded.

So I guess it’s ‘assumed’ that you should base64 encode your message before typing/pasting it in the console?

I have a branch with a fix(if you can call it that) but I think the right solution is to talk to the console.cloud.google.com/cloudpubsub people.

@daspecster sorry for the confusion, I don’t think I explained the problem properly.

I have done some further testing, if I send the same message using the Google Console the Go Library will correctly return me the message. So I’m guessing it is being base64 encoded at the server receiving the POST request. As for the Go library I took a quick peek at their source code and it is base64 encoding the payload before sending it off to pubsub. I believe @dhermes is right and other languages strip off the padding to save space. If the console wasn’t base64 encoding the message at all Go would also fail with an error.

At this point its pretty dangerous to pass messages from non-python code to python.

👍 this seems to be an issue related to the Cloud Console — but the Python Library should handle this more gracefully. Looks like the proposed fix by @daspecster would handle it.

OK great, I’ll get that rolling then!

FWIW, stripping off the padding to «save bandwidth» is utterly bogus for a JSON-based protocol.

Источник

Python: игнорировать ошибку неправильного заполнения при декодировании base64

У меня есть данные в кодировке base64, которые я хочу преобразовать обратно в двоичные, даже если в них есть ошибка заполнения. Если я использую

это вызывает ошибку «Неправильное заполнение». Есть другой способ?

ОБНОВЛЕНИЕ: Спасибо за отзывы. Честно говоря, все упомянутые методы казались несколько неудачными, поэтому я решил попробовать openssl. Следующая команда сработала:

Как сказано в других ответах, данные base64 могут быть повреждены различными способами.

Однако, как говорит Википедия , удаление отступа (символы ‘=’ в конце данных в кодировке base64) «без потерь»:

С теоретической точки зрения символ заполнения не нужен, так как количество пропущенных байтов можно вычислить из количества цифр Base64.

Так что, если это единственное, что «не так» с вашими данными base64, можно просто добавить отступы. Я придумал это, чтобы иметь возможность анализировать URL-адреса «данных» в WeasyPrint, некоторые из которых были base64 без заполнения:

Просто добавьте отступ по мере необходимости. Однако прислушайтесь к предупреждению Майкла.

Кажется, вам просто нужно добавить отступ к байтам перед декодированием. На этот вопрос есть много других ответов, но я хочу указать, что (по крайней мере, в Python 3.x) base64.b64decode любые дополнительные отступы будут обрезаны, если их достаточно.

Итак, что-то вроде: b’abc=’ работает так же хорошо, как b’abc==’ (как b’abc=====’ ).

Это означает, что вы можете просто добавить максимальное количество символов заполнения, которое вам когда-либо понадобится, а это три ( b’===’ ), и base64 обрежет все ненужные.

Это позволяет вам писать:

«Неправильное заполнение» может означать не только «недостающее заполнение», но также (хотите верьте, хотите нет) «неправильное заполнение».

Если предложенные методы «добавления заполнения» не работают, попробуйте удалить некоторые завершающие байты:

Обновление: любые попытки добавить отступы или удалить, возможно, плохие байты с конца, должны выполняться ПОСЛЕ удаления любых пробелов, в противном случае расчеты длины будут нарушены.

Было бы неплохо, если бы вы показали нам (короткий) образец данных, которые вам необходимо восстановить. Отредактируйте свой вопрос и скопируйте / вставьте результат print repr(sample) .

Обновление 2: возможно, что кодирование было выполнено безопасным для URL-адресов способом. В этом случае вы сможете увидеть в своих данных символы минуса и подчеркивания, и вы сможете декодировать их с помощью base64.b64decode(strg, ‘-_’)

Если вы не видите в данных символы минуса и подчеркивания, но видите знаки плюса и косой черты, значит, у вас другая проблема, и вам могут понадобиться уловки add-padding или remove-cruft.

Если вы не видите в данных ни минуса, ни подчеркивания, ни плюса, ни косой черты, вам нужно определить два альтернативных символа; это будут те, кого нет в [A-Za-z0-9]. Затем вам нужно будет поэкспериментировать, чтобы увидеть, в каком порядке они должны использоваться во втором аргументе base64.b64decode()

Обновление 3 : Если ваши данные «конфиденциальны»:
(а) вы должны сообщить об этом заранее
(б) мы можем изучить другие способы понимания проблемы, которая, скорее всего, будет связана с тем, какие символы используются вместо + и / в кодирующий алфавит или другие символы форматирования или посторонние символы.

Одним из таких способов было бы изучить, какие нестандартные символы присутствуют в ваших данных, например

Кредит идет на комментарий где-то здесь.

Если есть ошибка заполнения, это, вероятно, означает, что ваша строка повреждена; Строки в кодировке base64 должны иметь длину, кратную четырем. Вы можете попробовать добавить символ заполнения ( = ) самостоятельно, чтобы сделать строку кратной четырем, но он уже должен иметь это, если что-то не так.

Проверьте документацию к источнику данных, который вы пытаетесь декодировать. Возможно ли, что вы хотели использовать base64.urlsafe_b64decode(s) вместо base64.b64decode(s) ? Это одна из причин, по которой вы могли видеть это сообщение об ошибке.

Расшифруйте строку s, используя безопасный для URL-адресов алфавит, который заменяет — вместо + и _ вместо / в стандартном алфавите Base64.

Это, например, относится к различным API Google, таким как Google Identity Toolkit и полезные нагрузки Gmail.

Добавление отступов довольно . неудобно. Вот функция, которую я написал с помощью комментариев в этой ветке, а также вики-страницу для base64 (она на удивление полезна) https://en.wikipedia.org/wiki/Base64#Padding .

Вы можете просто использовать, base64.urlsafe_b64decode(data) если пытаетесь декодировать веб-изображение. Он автоматически позаботится о заполнении.

Есть два способа исправить входные данные, описанные здесь, или, более конкретно и в соответствии с OP, сделать так, чтобы метод b64decode модуля Python base64 мог обрабатывать входные данные во что-то, не вызывая неперехваченного исключения:

    Добавьте == в конец входных данных и вызовите base64.b64decode (. )

Если это вызывает исключение, тогда

я. Поймать через try / except,

II. (R?) Удалите любые символы = из входных данных (NB, это может быть необязательно),

iii. Добавьте A == к входным данным (A == — P == будет работать),

iv. Вызов base64.b64decode (. ) с этими A == — добавленными входными данными

Результат из пункта 1 или пункта 2 выше даст желаемый результат.

Предостережения

Это не гарантирует, что декодированный результат будет тем, что было изначально закодировано, но он (иногда?) Даст OP достаточно для работы:

Даже с повреждением я хочу вернуться к двоичному файлу, потому что я все еще могу получить некоторую полезную информацию из потока ASN.1 «).

См. Что мы знаем и предположения ниже.

TL; DR

Из некоторых быстрых тестов base64.b64decode (. )

похоже, что он игнорирует символы, отличные от [A-Za-z0-9 + /]; который включает игнорирование = s, если они не являются последними символами в проанализированной группе из четырех, и в этом случае = s завершает декодирование (a = b = c = d = дает тот же результат, что и abc =, и a = = b == c == дает тот же результат, что и ab ==).

Также кажется, что все добавленные символы игнорируются после точки, в которой base64.b64decode (. ) завершает декодирование, например, с знака = как четвертого в группе.

Как отмечалось в нескольких комментариях выше, в конце входных данных требуется либо ноль, либо один, либо два = s заполнения, когда значение [количество проанализированных символов до этой точки по модулю 4] равно 0 или 3, или 2 соответственно. Итак, из пунктов 3. и 4. выше, добавление двух или более = s к входным данным исправит любые проблемы с [неправильным заполнением] в этих случаях.

ОДНАКО, декодирование не может обработать случай, когда [общее количество проанализированных символов по модулю 4] равно 1, потому что требуется как минимум два закодированных символа для представления первого декодированного байта в группе из трех декодированных байтов. В ООН повреждена кодированные входные данные, это [N по модулю 4] = 1 случай никогда не бывает, но как ОП говорится , что символы могут отсутствовать, это может произойти здесь. Вот почему простое добавление = s не всегда будет работать, и почему добавление A == будет работать, а добавление == — нет. NB. Использование [A] почти произвольно: оно добавляет к декодируемым только очищенные (нулевые) биты, что может быть правильным или неправильным, но тогда объект здесь не правильность, а завершение с помощью base64.b64decode (. ) без исключений .

Что мы знаем из OP и особенно последующих комментариев, так это

  • Предполагается, что во входных данных в кодировке Base64 отсутствуют данные (символы).
  • В кодировке Base64 используются стандартные 64 разрядных значения плюс заполнение: AZ; az; 0-9; +; /; = — это отступ. Это подтверждается или, по крайней мере, предполагается тем фактом, что это openssl enc . работает.

Предположения

  • Входные данные содержат только 7-битные данные ASCII.
  • Единственный вид повреждения — это отсутствие закодированных входных данных.
  • OP не заботится о декодированных выходных данных в любой момент после этого, соответствующий любым отсутствующим кодированным входным данным.

Источник

binascii.Error: Incorrect padding #2176

Try to login with Yahoo

test env: python3.6
django==2.1.5
python3-openid version_info = (3, 1, 0)

binascii.Error: Incorrect padding

Internal Server Error: /accounts/openid/login/
Traceback (most recent call last):
File «/home//project//src/mypy/lib/python3.6/site-packages/django/core/handlers/exception.py», line 34, in inner
response = get_response(request)
File «/home//project//src/mypy/lib/python3.6/site-packages/django/core/handlers/base.py», line 126, in _get_response
response = self.process_exception_by_middleware(e, request)
File «/home//project//src/mypy/lib/python3.6/site-packages/django/core/handlers/base.py», line 124, in _get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File «/home//project//src/mypy/lib/python3.6/site-packages/allauth/socialaccount/providers/openid/views.py», line 43, in login
auth_request = client.begin(form.cleaned_data[‘openid’])
File «/home//project//src/mypy/lib/python3.6/site-packages/openid/consumer/consumer.py», line 359, in begin
return self.beginWithoutDiscovery(service, anonymous)
File «/home//project//src/mypy/lib/python3.6/site-packages/openid/consumer/consumer.py», line 382, in beginWithoutDiscovery
auth_req = self.consumer.begin(service)
File «/home//project//src/mypy/lib/python3.6/site-packages/openid/consumer/consumer.py», line 610, in begin
assoc = self._getAssociation(service_endpoint)
File «/home//project//src/mypy/lib/python3.6/site-packages/openid/consumer/consumer.py», line 1178, in _getAssociation
assoc = self.store.getAssociation(endpoint.server_url)
File «/home//project//src/mypy/lib/python3.6/site-packages/allauth/socialaccount/providers/openid/utils.py», line 104, in getAssociation
base64.decodestring(stored_assoc.secret.encode(‘utf-8’)),
File «/home//project//src/mypy/lib/python3.6/base64.py», line 554, in decodestring
return decodebytes(s)
File «/home//project//src/mypy/lib/python3.6/base64.py», line 546, in decodebytes
return binascii.a2b_base64(s)
binascii.Error: Incorrect padding
[2019-01-15 00:37:41,282 log.py:228 — log_response()] Internal Server Error: /accounts/openid/login/

The text was updated successfully, but these errors were encountered:

Источник

I have some data that is base64 encoded that I want to convert back to binary even if there is a padding error in it. If I use

base64.decodestring(b64_string)

it raises an ‘Incorrect padding’ error. Is there another way?

UPDATE: Thanks for all the feedback. To be honest, all the methods mentioned sounded a bit hit
and miss so I decided to try openssl. The following command worked a treat:

openssl enc -d -base64 -in b64string -out binary_data
1) Solution

As said in other responses, there are various ways in which base64 data could be corrupted.

However, as Wikipedia says, removing the padding (the ‘=’ characters at the end of base64 encoded data) is «lossless»:

From a theoretical point of view, the padding character is not needed,
since the number of missing bytes can be calculated from the number
of Base64 digits.

So if this is really the only thing «wrong» with your base64 data, the padding can just be added back. I came up with this to be able to parse «data» URLs in WeasyPrint, some of which were base64 without padding:

import base64
import re

def decode_base64(data, altchars=b'+/'):
    """Decode base64, padding being optional.

    :param data: Base64 data as an ASCII byte string
    :returns: The decoded byte string.

    """
    data = re.sub(rb'[^a-zA-Z0-9%s]+' % altchars, b'', data)  # normalize
    missing_padding = len(data) % 4
    if missing_padding:
        data += b'='* (4 - missing_padding)
    return base64.b64decode(data, altchars)

Tests for this function: weasyprint/tests/test_css.py#L68

2) Solution

It seems you just need to add padding to your bytes before decoding. There are many other answers on this question, but I want to point out that (at least in Python 3.x) base64.b64decode will truncate any extra padding, provided there is enough in the first place.

So, something like: b'abc=' works just as well as b'abc==' (as does b'abc=====').

What this means is that you can just add the maximum number of padding characters that you would ever need—which is two (b'==')—and base64 will truncate any unnecessary ones.

This lets you write:

base64.b64decode(s + b'==')

which is simpler than:

base64.b64decode(s + b'=' * (-len(s) % 4))
3) Solution

Just add padding as required. Heed Michael’s warning, however.

b64_string += "=" * ((4 - len(b64_string) % 4) % 4) #ugh
4) Solution

Use

string += '=' * (-len(string) % 4)  # restore stripped '='s

Credit goes to a comment somewhere here.

>>> import base64

>>> enc = base64.b64encode('1')

>>> enc
>>> 'MQ=='

>>> base64.b64decode(enc)
>>> '1'

>>> enc = enc.rstrip('=')

>>> enc
>>> 'MQ'

>>> base64.b64decode(enc)
...
TypeError: Incorrect padding

>>> base64.b64decode(enc + '=' * (-len(enc) % 4))
>>> '1'

>>> 
5) Solution

«Incorrect padding» can mean not only «missing padding» but also (believe it or not) «incorrect padding».

If suggested «adding padding» methods don’t work, try removing some trailing bytes:

lens = len(strg)
lenx = lens - (lens % 4 if lens % 4 else 4)
try:
    result = base64.decodestring(strg[:lenx])
except etc

Update: Any fiddling around adding padding or removing possibly bad bytes from the end should be done AFTER removing any whitespace, otherwise length calculations will be upset.

It would be a good idea if you showed us a (short) sample of the data that you need to recover. Edit your question and copy/paste the result of print repr(sample).

Update 2: It is possible that the encoding has been done in an url-safe manner. If this is the case, you will be able to see minus and underscore characters in your data, and you should be able to decode it by using base64.b64decode(strg, '-_')

If you can’t see minus and underscore characters in your data, but can see plus and slash characters, then you have some other problem, and may need the add-padding or remove-cruft tricks.

If you can see none of minus, underscore, plus and slash in your data, then you need to determine the two alternate characters; they’ll be the ones that aren’t in [A-Za-z0-9]. Then you’ll need to experiment to see which order they need to be used in the 2nd arg of base64.b64decode()

Update 3: If your data is «company confidential»:
(a) you should say so up front
(b) we can explore other avenues in understanding the problem, which is highly likely to be related to what characters are used instead of + and / in the encoding alphabet, or by other formatting or extraneous characters.

One such avenue would be to examine what non-«standard» characters are in your data, e.g.

from collections import defaultdict
d = defaultdict(int)
import string
s = set(string.ascii_letters + string.digits)
for c in your_data:
   if c not in s:
      d[c] += 1
print d
6) Solution

If there’s a padding error it probably means your string is corrupted; base64-encoded strings should have a multiple of four length. You can try adding the padding character (=) yourself to make the string a multiple of four, but it should already have that unless something is wrong

7) Solution

Incorrect padding error is caused because sometimes, metadata is also present in the encoded string
If your string looks something like: ‘data:image/png;base64,…base 64 stuff….’
then you need to remove the first part before decoding it.

Say if you have image base64 encoded string, then try below snippet..

from PIL import Image
from io import BytesIO
from base64 import b64decode
imagestr = 'data:image/png;base64,...base 64 stuff....'
im = Image.open(BytesIO(b64decode(imagestr.split(',')[1])))
im.save("image.png")
8) Solution

You can simply use base64.urlsafe_b64decode(data) if you are trying to decode a web image. It will automatically take care of the padding.

9) Solution

Check the documentation of the data source you’re trying to decode. Is it possible that you meant to use base64.urlsafe_b64decode(s) instead of base64.b64decode(s)? That’s one reason you might have seen this error message.

Decode string s using a URL-safe alphabet, which substitutes — instead
of + and _ instead of / in the standard Base64 alphabet.

This is for example the case for various Google APIs, like Google’s Identity Toolkit and Gmail payloads.

10) Solution

There are two ways to correct the input data described here, or, more specifically and in line with the OP, to make Python module base64’s b64decode method able to process the input data to something without raising an un-caught exception:

  1. Append == to the end of the input data and call base64.b64decode(…)
  2. If that raises an exception, then

    i. Catch it via try/except,

    ii. (R?)Strip any = characters from the input data (N.B. this may not be necessary),

    iii. Append A== to the input data (A== through P== will work),

    iv. Call base64.b64decode(…) with those A==-appended input data

The result from Item 1. or Item 2. above will yield the desired result.

Caveats

This does not guarantee the decoded result will be what was originally encoded, but it will (sometimes?) give the OP enough to work with:

Even with corruption I want to get back to the binary because I can still get some useful info from the ASN.1 stream»).

See What we know and Assumptions below.

TL;DR

From some quick tests of base64.b64decode(…)

  1. it appears that it ignores non-[A-Za-z0-9+/] characters; that includes ignoring =s unless they are the last character(s) in a parsed group of four, in which case the =s terminate the decoding (a=b=c=d= gives the same result as abc=, and a==b==c== gives the same result as ab==).

  2. It also appears that all characters appended are ignored after the point where base64.b64decode(…) terminates decoding e.g. from an = as the fourth in a group.

As noted in several comments above, there are either zero, or one, or two, =s of padding required at the end of input data for when the [number of parsed characters to that point modulo 4] value is 0, or 3, or 2, respectively. So, from items 3. and 4. above, appending two or more =s to the input data will correct any [Incorrect padding] problems in those cases.

HOWEVER, decoding cannot handle the case where the [total number of parsed characters modulo 4] is 1, because it takes a least two encoded characters to represent the first decoded byte in a group of three decoded bytes. In uncorrupted encoded input data, this [N modulo 4]=1 case never happens, but as the OP stated that characters may be missing, it could happen here. That is why simply appending =s will not always work, and why appending A== will work when appending == does not. N.B. Using [A] is all but arbitrary: it adds only cleared (zero) bits to the decoded, which may or not be correct, but then the object here is not correctness but completion by base64.b64decode(…) sans exceptions.

What we know from the OP and especially subsequent comments is

  • It is suspected that there are missing data (characters) in the
    Base64-encoded input data
  • The Base64 encoding uses the standard 64 place-values plus padding:
    A-Z; a-z; 0-9; +; /; = is padding. This is confirmed, or at least
    suggested, by the fact that openssl enc ... works.

Assumptions

  • The input data contain only 7-bit ASCII data
  • The only kind of corruption is missing encoded input data
  • The OP does not care about decoded output data at any point after that corresponding to any missing encoded input data

Github

Here is a wrapper to implement this solution:

https://github.com/drbitboy/missing_b64

11) Solution

Adding the padding is rather… fiddly. Here’s the function I wrote with the help of the comments in this thread as well as the wiki page for base64 (it’s surprisingly helpful) https://en.wikipedia.org/wiki/Base64#Padding.

import logging
import base64
def base64_decode(s):
    """Add missing padding to string and return the decoded base64 string."""
    log = logging.getLogger()
    s = str(s).strip()
    try:
        return base64.b64decode(s)
    except TypeError:
        padding = len(s) % 4
        if padding == 1:
            log.error("Invalid base64 string: {}".format(s))
            return ''
        elif padding == 2:
            s += b'=='
        elif padding == 3:
            s += b'='
        return base64.b64decode(s)
12) Solution

I got this error without any use of base64. So i got a solution that error is in localhost it works fine on 127.0.0.1

13) Solution

In my case Gmail Web API was returning the email content as a base64 encoded string, but instead of encoded with the standard base64 characters/alphabet, it was encoded with the «web-safe» characters/alphabet variant of base64. The + and / characters are replaced with - and _. For python 3 use base64.urlsafe_b64decode().

14) Solution
def base64_decode(data: str) -> str:
    
    data = data.encode("ascii")

    rem = len(data) % 4

    if rem > 0:
        data += b"=" * (4 - rem)
    return base64.urlsafe_b64decode(data).decode('utf-8')
15) Solution

I ran into this problem as well and nothing worked.
I finally managed to find the solution which works for me. I had zipped content in base64 and this happened to 1 out of a million records…

This is a version of the solution suggested by Simon Sapin.

In case the padding is missing 3 then I remove the last 3 characters.

Instead of «0gA1RD5L/9AUGtH9MzAwAAA==»

We get «0gA1RD5L/9AUGtH9MzAwAA»

        missing_padding = len(data) % 4
        if missing_padding == 3:
            data = data[0:-3]
        elif missing_padding != 0:
            print ("Missing padding : " + str(missing_padding))
            data += '=' * (4 - missing_padding)
        data_decoded = base64.b64decode(data)   

According to this answer Trailing As in base64 the reason is nulls. But I still have no idea why the encoder messes this up…

16) Solution

You should use

base64.b64decode(b64_string, ' /')

By default, the altchars are '+/'.

17) Solution

In case this error came from a web server: Try url encoding your post value. I was POSTing via «curl» and discovered I wasn’t url-encoding my base64 value so characters like «+» were not escaped so the web server url-decode logic automatically ran url-decode and converted + to spaces.

«+» is a valid base64 character and perhaps the only character which gets mangled by an unexpected url-decode.

18) Solution

Simply add additional characters like «=» or any other and make it a multiple of 4 before you try decoding the target string value. Something like;

if len(value) % 4 != 0: #check if multiple of 4
    while len(value) % 4 != 0:
        value = value + "="
    req_str = base64.b64decode(value)
else:
    req_str = base64.b64decode(value)
19) Solution

In my case I faced that error while parsing an email. I got the attachment as base64 string and extract it via re.search. Eventually there was a strange additional substring at the end.

dHJhaWxlcgo8PCAvU2l6ZSAxNSAvUm9vdCAxIDAgUiAvSW5mbyAyIDAgUgovSUQgWyhcMDAyXDMz
MHtPcFwyNTZbezU/VzheXDM0MXFcMzExKShcMDAyXDMzMHtPcFwyNTZbezU/VzheXDM0MXFcMzEx
KV0KPj4Kc3RhcnR4cmVmCjY3MDEKJSVFT0YK

--_=ic0008m4wtZ4TqBFd+sXC8--

When I deleted --_=ic0008m4wtZ4TqBFd+sXC8-- and strip the string then parsing was fixed up.

So my advise is make sure that you are decoding a correct base64 string.

20) Solution

Clear your browser cookie and recheck again, it should work.

21) Solution

In my case I faced this error, after deleting the venv for the perticular project and it showing error for each fields so I tried by changing the BROWSER(Chrome to Edge), And actually it worked..

Comments Section

There’s surely something simpler that maps 0 to 0, 2 to 1 and 1 to 2.

Why are you expanding to a multiple of 3 instead of 4?

The underlying binary data is ASN.1. Even with corruption I want to get back to the binary because I can still get some useful info from the ASN.1 stream.

That’s what the wikipedia article on base64 seems to imply.

@bp: In base64 encoding each 24 bits (3 bytes) binary input is encoded as 4 bytes output. output_len % 3 makes no sense.

Did you actually TRY using base64.b64decode(strg, '-_')? That is a priori, without you bothering to supply any sample data, the most likely Python solution to your problem. The «methods» proposed were DEBUG suggestions, NECESSARILY «hit and miss» given the paucity of the information supplied.

@John Machin: Yes, I did TRY your method but it didn’t work. The data is company confidential.

The data is comprised from the standard base64 character set. I’m pretty sure the problem is because 1 or more characters are missing — hence the padding error. Unless, there is a robust solution in Python, I’ll go with my solution of calling openssl.

A «solution» that silently ignores errors is scarcely deserving of the term «robust». As I mentioned earlier, the various Python suggestions were methods of DEBUGGING to find out what the problem is, preparatory to a PRINCIPLED solution … aren’t you interested in such a thing?

My requirement is NOT to solve the problem of why the base64 is corrupt — it comes from a source I have no control over. My requirement is to provide information about the data received even if it is corrupt. One way to do this is to get the binary data out of the corrupt base64 so I can glean information from the underlying ASN.1. stream. I asked the original question because I wanted an answer to that question not the answer to another question — such as how to debug corrupt base64.

Note: ASCII not Unicode, so to be safe, you might want to str(data)

Try base64.urlsafe_b64decode(s)

This is good with a one caveat. base64.decodestring is deprecated, use base64.b64_decode

He means this comment: stackoverflow.com/questions/2941995/… (http://stackoverflow.com/questions/2941995/python-ignore-incorrect-padding-error-when-base64-decoding#comment12174484_2942039)

This does not answer the question at all. Plus, urlsafe_b64decode also requires padding.

Well, there was an issue I had before answering this question, which was related to Google’s Identity Toolkit. I was getting the incorrect padding error (I believe it was on the server) even tough the padding appeared to be correct. Turned out that I had to use base64.urlsafe_b64decode.

To clarify on @ariddell comment base64.decodestring has been deprecated for base64.decodebytes in Py3 but for version compatibility better to use base64.b64decode.

I agree that it doesn’t answer the question, rdb, yet it was exactly what I needed to hear as well. I rephrased the answer to a bit nicer tone, I hope this works for you, Daniel.

Perfectly fine. I didn’t notice that it sounded somewhat unkind, I only thought that it would be the quickest fix if it would fix the issue, and, for that reason, should be the first thing to be tried. Thanks for your change, it is welcome.

That does not work in python 3.7. assert len(altchars) == 2, repr(altchars)

Okay that’s not too «ugly» thanks :) By the way I think you never need more than 2 padding chars. Base64 algorithm works on groups of 3 chars at a time and only needs padding when your last group of chars is only 1 or 2 chars in length.

Because the base64 module does ignore invalid non-base64 characters in the input, you first have to normalise the data. Remove anything that’s not a letter, digit / or +, and then add the padding.

Just normalize the string, remove anything that is not a Base64 character. Anywhere, not just start or end.

@Otto the padding here is for decoding, which works on groups of 4 chars. Base64 encoding works on groups of 3 chars :)

but if you know that during encoding maximally 2 will ever be added, which may become «lost» later, forcing you to re-add them before decoding, then you know you will only need to add maximally 2 during decoding too. #ChristmasTimeArgumentForTheFunOfIt

@Otto I believe you are right. While a base64 encoded string with length, for example, 5 would require 3 padding characters, a string of length 5 is not even a valid length for a base64 encoded string. You’d get the error: binascii.Error: Invalid base64-encoded string: number of data characters (5) cannot be 1 more than a multiple of 4. Thanks for pointing this out!

Could you provide the output of this: sorted(list(set(b64_string))) please? Without revealing anything company-confidential, that should reveal which characters were used to encode the original data, which in turn may supply enough information to provide a non-hit-or-miss solution.

Yes, I know it’s already solved, but, to be honest, the openssl solution also sounds hit-or-miss to me.

Just appending === always works. Any extra = chars are seemingly safely discarded by Python.

not true, if you want to decode a jwt for security checks, you will need it

This answer does not seem related to the question. Could you please explain more in where the issue was located and how it is related?

I got this issue on django while running the application on my chrome browser. Normally django application run on localhost. But today it doesn’t work on localhost So I have to change this localhost to 127.0.0.1 . So now its work.It also works on other browser like firefox without changing localhost

super odd, but this also worked for me — not sure why, but thanks!

cannot believe that worked and adding additional ‘=’s didn’t. Mine ended with «T4NCg==» and no amount of adding or subtracting ‘=’s made any difference until I removed the ‘g’ on the end. I notice ‘g’ != ‘A’

If you want to explain, please do so in your answer rather than in a comment.

This is the only answer that worked for me our of all the answers on this page

Related Topics
python
base64

Mentions
Lyubomir
Martijn Pieters
John Machin
Pang
Badp
Sam
Colidyre
Warvariuc
Daniil Mashkin
Michael Mrozek
Fun Lovin Coder
Simon Sapin
Ekevoo
Henry Woody
Vinee
Henrik Heimbuerger
Daniel F
Bryan Lott
Brian Carcich
Nooras Fatima Ansari
Curtis Yallop
Quoc
Mitzi
Chenws
Matteo Italia
Syed Mauze Rehan
Pilathraj
Benjamin Atkin

References
2941995/python-ignore-incorrect-padding-error-when-base64-decoding

Понравилась статья? Поделить с друзьями:
  • Pytest error not found
  • Pygame error unsupported image format
  • Pygame error failed loading libvorbisfile 3 dll не найден указанный модуль
  • Pycharm ошибка интерпретатора
  • Pycharm как изменить цветовую схему