Key error not in index pandas - Исправление ошибок и поиск оптимальных решений проблем

I have a dataframe called delivery and when I print(delivery.columns) I get the following:

Index(['Complemento_endereço', 'cnpj', 'Data_fundação', 'Número',
   'Razão_social', 'CEP', 'situacao_cadastral', 'situacao_especial', 'Rua',
   'Nome_Fantasia', 'last_revenue_normalized', 'last_revenue_year',
   'Telefone', 'email', 'Capital_Social', 'Cidade', 'Estado',
   'Razão_social', 'name_bairro', 'Natureza_Jurídica', 'CNAE', '#CNAE',
   'CNAEs_secundários', 'Pessoas', 'percent'],
  dtype='object')

Well, we can clearly see that there is a column ‘Rua’.

Also, if I print(delivery.Rua) I get a proper result:

82671                         R JUDITE MELO DOS SANTOS
817797                                R DOS GUAJAJARAS
180081           AV MARCOS PENTEADO DE ULHOA RODRIGUES
149373                                 AL MARIA TEREZA
455511                               AV RANGEL PESTANA
...

Even if I write «if ‘Rua’ in delivery.columns: print(‘here I am’)» it does print the ‘here I am’. So ‘Rua’ is in fact there.

Well, in the immediate line after I have this code:

delivery=delivery.set_index('cnpj')[['Razão_social','Nome_Fantasia','Data_fundação','CEP','Estado','Cidade','Bairro','Rua','Número','Complemento_endereço','Telefone','email','Capital_Social', 'CNAE', '#CNAE', 'Natureza_Jurídica','Pessoas' ]]

And voilá, I get this weird error:

Traceback (most recent call last):
File "/file.py", line 45, in <module>
   'Telefone', 'email', 'Capital_Social', 'Cidade', 'Estado',
   'Razão_social', 'name_bairro', 'Natureza_Jurídica', 'CNAE', '#CNAE',
'Telefone','email','Capital_Social', 'CNAE', '#CNAE', 'Natureza_Jurídica','Pessoas' ]]
   'CNAEs_secundários', 'Pessoas', 'percent'],
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/frame.py", line 1991, in __getitem__
  dtype='object')
return self._getitem_array(key)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/frame.py", line 2035, in _getitem_array
indexer = self.ix._convert_to_indexer(key, axis=1)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/indexing.py", line 1214, in _convert_to_indexer
raise KeyError('%s not in index' % objarr[mask])
KeyError: "['Rua'] not in index"

Can someone help? I tried stackoverflow but no one could help. I’m starting to think I’m crazy and ‘Rua’ is an illusion of my troubled mind.

ADDITIONAL INFO

I’m using this code right before the error line:

delivery=pd.DataFrame()

for i in selection.index:
    sample=groups.get_group(selection['#CNAE'].loc[i]).sample(selection['samples'].loc[i])
    delivery=pd.concat((delivery,sample)).sort_values('Capital_Social',ascending=False)


print(delivery.columns)
print(delivery.Rua)
print(delivery.set_index('cnpj').columns)

delivery=delivery.set_index('cnpj')[['Razão_social','Nome_Fantasia','Data_fundação','CEP','Estado','Cidade','Bairro','Rua','Número','Complemento_endereço',
                                 'Telefone','email','Capital_Social', 'CNAE', '#CNAE', 'Natureza_Jurídica','Pessoas' ]]

EDIT

New weird stuff:
I gave up and deleted ‘Rua’ from that last piece of code, wishing that it would work. For my surprise, I had the same problem but now with the column ‘Número’.

delivery=delivery.set_index('cnpj')[['Razão_social','Nome_Fantasia','Data_fundação','CEP','Estado','Cidade','Bairro','Número','Complemento_endereço',
                                                 'Telefone','email','Capital_Social', 'CNAE', '#CNAE', 'Natureza_Jurídica' ]]

KeyError: "['Número'] not in index"

EDIT 2

And then I gave up on ‘Número’ and took it out. Then the same problem happened with ‘Complemento_endereço’. Then I deleted ‘Complemento_endereço’. And it happend to ‘Telefone’ and so on.

** EDIT 3 **

If I do a pd.show_versions(), that’s the output:

INSTALLED VERSIONS

commit: None
python: 3.5.0.final.0
python-bits: 64
OS: Darwin
OS-release: 16.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.1
nose: None
pip: 8.1.2
setuptools: 18.2
Cython: None
numpy: 1.11.0
scipy: 0.17.1
statsmodels: 0.6.1
xarray: None
IPython: None
sphinx: None
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.1
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.5.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.1.3
pymysql: 0.7.11.None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None
None

Источник

17 авг. 2022 г.
читать 2 мин

Одна ошибка, с которой вы можете столкнуться при использовании pandas:

KeyError : 'column_name'

Эта ошибка возникает, когда вы пытаетесь получить доступ к несуществующему столбцу в pandas DataFrame.

Обычно эта ошибка возникает, когда вы просто неправильно пишете имена столбцов или случайно включаете пробел до или после имени столбца.

В следующем примере показано, как исправить эту ошибку на практике.

Как воспроизвести ошибку

Предположим, мы создаем следующие Pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'points': [25, 12, 15, 14, 19, 23, 25, 29],
 'assists': [5, 7, 7, 9, 12, 9, 9, 4],
 'rebounds': [11, 8, 10, 6, 6, 5, 9, 12]})

#view DataFrame
df

points assists rebounds
0 25 5 11
1 12 7 8
2 15 7 10
3 14 9 6
4 19 12 6
5 23 9 5
6 25 9 9
7 29 4 12

Затем предположим, что мы пытаемся напечатать значения в столбце с именем «точка»:

#attempt to print values in 'point' column
print(df['point'])

KeyError  Traceback (most recent call last)
/srv/conda/envs/notebook/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
 3360 try:
-> 3361 return self._engine.get_loc(casted_key)
 3362 except KeyError as err:

/srv/conda/envs/notebook/lib/python3.7/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

/srv/conda/envs/notebook/lib/python3.7/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError : 'point'

Поскольку в нашем DataFrame нет столбца «точка», мы получаем KeyError .

Как исправить ошибку

Чтобы исправить эту ошибку, просто убедитесь, что мы правильно написали имя столбца.

Если мы не уверены во всех именах столбцов в DataFrame, мы можем использовать следующий синтаксис для печати каждого имени столбца:

#display all column names of DataFrame
print(df.columns.tolist ())

['points', 'assists', 'rebounds']

Мы видим, что есть столбец с именем «точки», поэтому мы можем исправить нашу ошибку, правильно написав имя столбца:

#print values in 'points' column
print(df['points'])

0 25
1 12
2 15
3 14
4 19
5 23
6 25
7 29
Name: points, dtype: int64

Мы избегаем ошибки, потому что правильно написали имя столбца.

Дополнительные ресурсы

В следующих руководствах объясняется, как исправить другие распространенные ошибки в Python:

Как исправить: столбцы перекрываются, но суффикс не указан
Как исправить: объект «numpy.ndarray» не имеет атрибута «добавлять»
Как исправить: при использовании всех скалярных значений необходимо передать индекс

Источник

One error you may encounter when using pandas is:

KeyError: 'column_name'

This error occurs when you attempt to access some column in a pandas DataFrame that does not exist.

Typically this error occurs when you simply misspell a column names or include an accidental space before or after the column name.

The following example shows how to fix this error in practice.

How to Reproduce the Error

Suppose we create the following pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'points': [25, 12, 15, 14, 19, 23, 25, 29],
                   'assists': [5, 7, 7, 9, 12, 9, 9, 4],
                   'rebounds': [11, 8, 10, 6, 6, 5, 9, 12]})

#view DataFrame
df

points	assists	rebounds
0	25	5	11
1	12	7	8
2	15	7	10
3	14	9	6
4	19	12	6
5	23	9	5
6	25	9	9
7	29	4	12

Then suppose we attempt to print the values in a column called ‘point’:

#attempt to print values in 'point' column
print(df['point'])

KeyError                                  Traceback (most recent call last)
/srv/conda/envs/notebook/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3360             try:
-> 3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:

/srv/conda/envs/notebook/lib/python3.7/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

/srv/conda/envs/notebook/lib/python3.7/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'point'

Since there is no ‘point’ column in our DataFrame, we receive a KeyError.

How to Fix the Error

The way to fix this error is to simply make sure we spell the column name correctly.

If we’re unsure of all of the column names in the DataFrame, we can use the following syntax to print each column name:

#display all column names of DataFrame
print(df.columns.tolist())

['points', 'assists', 'rebounds']

We can see that there is a column called ‘points’, so we can fix our error by spelling the column name correctly:

#print values in 'points' column
print(df['points'])

0    25
1    12
2    15
3    14
4    19
5    23
6    25
7    29
Name: points, dtype: int64

We avoid an error because we spelled the column name correctly.

Additional Resources

The following tutorials explain how to fix other common errors in Python:

How to Fix: columns overlap but no suffix specified
How to Fix: ‘numpy.ndarray’ object has no attribute ‘append’
How to Fix: if using all scalar values, you must pass an index

Источник

The KeyError in Pandas occurs when you try to access the columns in pandas DataFrame, which does not exist, or you misspell them.

Typically, we import data from the excel name, which imports the column names, and there are high chances that you misspell the column names or include an unwanted space before or after the column name.

The column names are case-sensitive, and if you make a mistake, then Python will raise an exception KeyError: ‘column_name

Let us take a simple example to demonstrate KeyError in Pandas. In this example, we create a pandas DataFrame of employee’s data, and let’s say we need to print all the employee names.

# import pandas library
import pandas
import numpy as np

# create pandas DataFrame
df =  pandas.DataFrame(np.array([["Jack", 22, "US"], ["Chandler", 55, "Canada"], ["Ross", 48, "India"]]),
                   columns=['name', 'age', 'country'])

# print names of employee
print(df["Name"])

Output

    raise KeyError(key) from err
KeyError: 'Name'

When we run the program, Python raises KeyError, since we have misspelled the “name” column as “Name”.

Solution KeyError in Pandas

We can fix the issue by correcting the spelling of the key. If we are not sure what the column names are, we can print all the columns into the list as shown below.

# import pandas library
import pandas
import numpy as np

# create pandas DataFrame
df =  pandas.DataFrame(np.array([["Jack", 22, "US"], ["Chandler", 55, "Canada"], ["Ross", 48, "India"]]),
                   columns=['name', 'age', 'country'])

# print names of employee
print(df["name"])

Output

0        Jack
1    Chandler
2        Ross
Name: name, dtype: object

We can now see a column called “name,” and we can fix our code by providing the correct spelling as a key to the pandas DataFrame, as shown below.

We can also avoid the KeyErrors raised by the compilers when an invalid key is passed. The DataFrame has a get method where we can give a column name and retrieve all the column values.

Syntax : DataFrame.get( 'column_name' , default = default_value_if_column_is_not_present)

If there are any misspelled or invalid columns, the default value will be printed instead of raising a KeyError. Let’s look at an example to demonstrate how this works.

# import pandas library
import pandas
import numpy as np

# create pandas DataFrame
df = pandas.DataFrame(np.array([["Jack", 22, "US"], ["Chandler", 55, "Canada"], ["Ross", 48, "India"]]),
                      columns=['name', 'age', 'country'])

# print names of employee
print(df.get("Name", default="Name is not present"))

‘Output

Name is not present

And if we provide the correct column name to the DataFrame.get() method, it will list all the column values present in that.

# import pandas library
import pandas
import numpy as np

# create pandas DataFrame
df = pandas.DataFrame(np.array([["Jack", 22, "US"], ["Chandler", 55, "Canada"], ["Ross", 48, "India"]]),
                      columns=['name', 'age', 'country'])

# print names of employee
print(df.get("name", default="Name is not present"))

Output

0        Jack
1    Chandler
2        Ross
Name: name, dtype: object

Avatar Of Srinivas Ramakrishna

Srinivas Ramakrishna is a Solution Architect and has 14+ Years of Experience in the Software Industry. He has published many articles on Medium, Hackernoon, dev.to and solved many problems in StackOverflow. He has core expertise in various technologies such as Microsoft .NET Core, Python, Node.JS, JavaScript, Cloud (Azure), RDBMS (MSSQL), React, Powershell, etc.

Источник