Input contains nan infinity or a value too large for dtype float64 как исправить - Исправление ошибок и поиск оптимальных решений проблем

When using a dataset for analysis, you must check your data to ensure it only contains finite numbers and no NaN values (Not a Number). If you try to pass a dataset that contains NaN or infinity values to a function for analysis, you will raise the error: ValueError: input contains nan, infinity or a value too large for dtype(‘float64’).

To solve this error, you can check your data set for NaN values using numpy.isnan() and infinite values using numpy.isfinite(). You can replace NaN values using nan_to_num() if your data is in a numpy array or SciKit-Learn’s SimpleImputer.

This tutorial will go through the error in detail and how to solve it with the help of code examples.

Python ValueError: input contains nan, infinity or a value too large for dtype(‘float64’)
- What is a ValueError?
- What is a NaN in Python?
- What is inf in Python?
Example #1: Dataset with NaN Values
- Solution #1: using nan_to_num()
- Solution #2: using SimpleImputer
Example #2: Dataset with NaN and inf Values
- Solution #1: Using nan_to_num
- Solution #2: Using fillna()
- Solution #3: using SimpleImputer
Summary

Python ValueError: input contains nan, infinity or a value too large for dtype(‘float64’)

What is a ValueError?

In Python, a value is the information stored within a particular object. You will encounter a ValueError in Python when you use a built-in operation or function that receives an argument with the right type but an inappropriate value.

What is a NaN in Python?

In Python, a NaN stands for Not a Number and represents undefined entries and missing values in a dataset.

What is inf in Python?

Infinity in Python is a number that is greater than every other numeric value and can either be positive or negative. All arithmetic operations performed on an infinite value will produce an infinite number. Infinity is a float value; there is no way to represent infinity as an integer. We can use float() to represent infinity as follows:

pos_inf=float('inf')

neg_inf=-float('inf')

print('Positive infinity: ', pos_inf)

print('Negative infinity: ', neg_inf)

Positive infinity:  inf
Negative infinity:  -inf

We can also use the math, decimal, sympy, and numpy modules to represent infinity in Python.

Let’s look at some examples where we want to clean our data of NaN and infinity values.

Example #1: Dataset with NaN Values

In this example, we will generate a dataset consisting of random numbers and then randomly populate the dataset with NaN values. We will try to cluster the values in the dataset using the AffinityPropagation in the Scikit-Learn library.

Note: The use of the AffinityPropagation to cluster on random data is just an example to demonstrate the source of the error. The function you are trying to use may be completely different to AffinityPropagation, but the data preprocessing described in this tutorial will still apply.

The data generation looks as follows:

# Import numpy and AffinityPropagation

import numpy as np

from sklearn.cluster import AffinityPropagation

# Number of NaN values to put into data

n = 4

data = np.random.randn(20)

# Get random indices in the data

index_nan = np.random.choice(data.size, n, replace=False)

# Replace data with NaN

data.ravel()[index_nan]=np.nan

print(data)

Let’s look at the data:

[-0.0063374  -0.974195    0.94467842  0.38736788  0.84908087         nan
  1.00582645         nan  1.87585201 -0.98264992 -1.64822932  1.24843544
  0.88220504 -1.4204208   0.53238027         nan  0.83446561         nan
 -0.04655628 -1.09054183]

The data consists of twenty random values, four of which are NaN, and the rest are numerical values. Let’s try to fit the data using the AffinityPropagation() class.

af= AffinityPropagation(random_state=5).fit([data])

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

We raise the error because the AffinityPropagation.fit() cannot handle NaN, infinity or extremely large values. Our data contains NaN values, and we need to preprocess the data to replace them with suitable values.

Solution #1: using nan_to_num()

To check if a dataset contains NaN values, we can use the isnan() function from NumPy. If we pair this function with any(), we will check if there are any instances of NaN. We can replace the NaN values using the nan_to_num() method. Let’s look at the code and the clean data:

print(np.any(np.isnan(data)))

data = np.nan_to_num(data)

print(data)

True
[-0.0063374  -0.974195    0.94467842  0.38736788  0.84908087  0.
  1.00582645  0.          1.87585201 -0.98264992 -1.64822932  1.24843544
  0.88220504 -1.4204208   0.53238027  0.          0.83446561  0.
 -0.04655628 -1.09054183]

The np.any() part of the code returns True because our dataset contains at least one NaN value. The clean data has zeros in place of the NaN values. Let’s fit on the clean data:

af= AffinityPropagation(random_state=5).fit([data])

This code will execute without any errors.

Solution #2: using SimpleImputer

Scikit-Learn provides a class for imputation called SimpleImputer. We can use the SimpleImputer to replace NaN values. To replace NaN values in a one-dimensional dataset, we need to set the strategy parameter in the SimpleImputer to constant. First, we will generate the data:

import numpy as np

n = 4

data = np.random.randn(20)

index_nan = np.random.choice(data.size, n, replace=False)

data.ravel()[index_nan]=np.nan

print(data)

The data looks like this:

[ 1.4325319   0.61439789  0.3614522   1.38531346         nan  0.6900916
  0.50743745  0.48544145         nan         nan  0.17253557         nan
 -1.05027802  0.09648188  1.15971533  0.29005307  2.35040023  0.44103513
 -0.03235852 -0.78142219]

We can use the SimpleImputer class to fit and transform the data as follows:

from sklearn.impute import SimpleImputer

imp_mean = SimpleImputer(missing_values=np.nan, strategy='constant', fill_value=0)

imputer = imp_mean.fit([data])

data = imputer.transform([data])

print(data)

The clean data looks like this:

[[ 1.4325319   0.61439789  0.3614522   1.38531346  0.          0.6900916
   0.50743745  0.48544145  0.          0.          0.17253557  0.
  -1.05027802  0.09648188  1.15971533  0.29005307  2.35040023  0.44103513
  -0.03235852 -0.78142219]]

And we can pass the clean data to the AffinityPropagation clustering method as follows:

af= AffinityPropagation(random_state=5).fit(data)

We can also use the SimpleImputer class on multi-dimensional data to replace NaN values using the mean along each column. We have to set the imputation strategy to “mean”, and using the mean is only valid for numeric data. Let’s look at an example of a 3×3 nested list that contains NaN values:

from sklearn.impute import SimpleImputer

data = [[7, 2, np.nan], 
        [4, np.nan, 6], 
        [10, 5, 9]]

We can replace the NaN values as follows:

imp_mean = SimpleImputer(missing_values=np.nan, strategy='mean')

imp_mean.fit(data)

data = imp_mean.transform(data)

print(data)

[[ 7.   2.   7.5]
 [ 4.   3.5  6. ]
 [10.   5.   9. ]]

We replaced the np.nan values with the mean of the real numbers along the columns of the nested list. For example, in the third column, the real numbers are 6 and 9, so the mean is 7.5, which replaces the np.nan value in the third column.

We can use the other imputation strategies media and most_frequent.

Example #2: Dataset with NaN and inf Values

This example will generate a dataset consisting of random numbers and then randomly populate the dataset with NaN and infinity values. We will try to cluster the values in the dataset using the AffinityPropagation in the Scikit-Learn library. The data generation looks as follows:

import numpy as np

from sklearn.cluster import AffinityPropagation

n = 4

data = np.random.randn(20)

index_nan = np.random.choice(data.size, n, replace=False)

index_inf = np.random.choice(data.size, n, replace=False)

data.ravel()[index_nan]=np.nan

data.ravel()[index_inf]=np.inf

print(data)

[-0.76148741         inf  0.10339756         nan         inf -0.75013509
  1.2740893          nan -1.68682986         nan  0.57540185 -2.0435754
  0.99287213         inf  0.5838198          inf -0.62896815 -0.45368201
  0.49864775 -1.08881703]

The data consists of twenty random values, four of which are NaN, four are infinity, and the rest are numerical values. Let’s try to fit the data using the AffinityPropagation() class.

af= AffinityPropagation(random_state=5).fit([data])

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

We raise the error because the dataset contains NaN values and infinity values.

Solution #1: Using nan_to_num

To check if a dataset contains NaN values, we can use the isnan() function from NumPy. If we pair this function with any(), we will check if there are any instances of NaN.

To check if a dataset contains infinite values, we can use the isfinite() function from NumPy. If we pair this function with any(), we will check if there are any instances of infinity.

We can replace the NaN and infinity values using the nan_to_num() method. The method will set NaN values to zero and infinity values to a very large number. Let’s look at the code and the clean data:

print(np.any(np.isnan(data)))

print(np.all(np.isfinite(data)))

data = np.nan_to_num(data)

print(data)

True

False

[-7.61487414e-001  1.79769313e+308  1.03397556e-001  0.00000000e+000
  1.79769313e+308 -7.50135085e-001  1.27408930e+000  0.00000000e+000
 -1.68682986e+000  0.00000000e+000  5.75401847e-001 -2.04357540e+000
  9.92872128e-001  1.79769313e+308  5.83819800e-001  1.79769313e+308
 -6.28968155e-001 -4.53682014e-001  4.98647752e-001 -1.08881703e+000]

We replaced the NaN values with zeroes and the infinity values with 1.79769313e+308. We can fit on the clean data as follows:

af= AffinityPropagation(random_state=5).fit([data])

This code will execute without any errors. If we do not want to replace infinity with a very large number but with zero, we can convert the infinity values to NaN using:

data[data==np.inf] = np.nan

And then pass the data to the nan_to_num method, converting all the NaN values to zeroes.

Solution #2: Using fillna()

We can use Pandas to convert our dataset to a DataFrame and replace the NaN and infinity values using the Pandas fillna() method. First, let’s look at the data generation:

import numpy as np

import pandas as pd

from sklearn.cluster import AffinityPropagation

n = 4

data = np.random.randn(20)

index_nan = np.random.choice(data.size, n, replace=False)

index_inf = np.random.choice(data.size, n, replace=False)

data.ravel()[index_nan]=np.nan

data.ravel()[index_inf]=np.inf

print(data

[ 0.41339801         inf         nan  0.7854321   0.23319745         nan
  0.50342482         inf -0.82102161 -0.81934623  0.23176869 -0.61882322
  0.12434801 -0.21218049         inf -1.54067848         nan  1.78086445
         inf  0.4881174 ]

The data consists of twenty random values, four of which are NaN, four are infinity, and the rest are numerical values. We can convert the numpy array to a DataFrame as follows:

df = pd.DataFrame(data)

Once we have the DataFrame, we can use the replace method to replace the infinity values with NaN values. Then, we will call the fillna() method to replace all NaN values in the DataFrame.

df.replace([np.inf, -np.inf], np.nan, inplace=True)

df = df.fillna(0)

We can use the to_numpy() method to convert the DataFrame back to a numpy array as follows:

data = df.to_numpy()

print(data)

[[ 0.41339801]
 [ 0.        ]
 [ 0.        ]
 [ 0.7854321 ]
 [ 0.23319745]
 [ 0.        ]
 [ 0.50342482]
 [ 0.        ]
 [-0.82102161]
 [-0.81934623]
 [ 0.23176869]
 [-0.61882322]
 [ 0.12434801]
 [-0.21218049]
 [ 0.        ]
 [-1.54067848]
 [ 0.        ]
 [ 1.78086445]
 [ 0.        ]
 [ 0.4881174 ]]

We can now fit on the clean data using the AffinityPropagation class as follows:

af= AffinityPropagation(random_state=5).fit(data)

print(af.cluster_centers_)

The clustering algorithm gives us the following cluster centres:

[[ 0.        ]
 [ 0.50342482]
 [-0.81934623]
 [-1.54067848]
 [ 1.78086445]]

We can also use Pandas to drop columns with NaN values using the dropna() method. For further reading on using Pandas for data preprocessing, go to the article: Introduction to Pandas: A Complete Tutorial for Beginners.

Solution #3: using SimpleImputer

Let’s look at an example of using the SimpleImputer to replace NaN and infinity values. First, we will look at the data generation:

import numpy as np

n = 4

data = np.random.randn(20)

index_nan = np.random.choice(data.size, n, replace=False)

index_inf = np.random.choice(data.size, n, replace=False)

data.ravel()[index_nan]=np.nan

data.ravel()[index_inf]=np.inf

print(data)

[-0.5318616          nan  0.12842066         inf         inf         nan
  1.24679674  0.09636847  0.67969774  1.2029146          nan  0.60090616
 -0.46642723         nan  1.58596659  0.47893738  1.52861316         inf
 -1.36273437         inf]

The data consists of twenty random values, four of which are NaN, four are infinity, and the rest are numerical values. Let’s try to use the SimpleImputer to clean our data:

from sklearn.impute import SimpleImputer

imp_mean = SimpleImputer(missing_values=np.nan, strategy='constant', fill_value=0)

imputer = imp_mean.fit([data])

data = imputer.transform([data])

print(data)

ValueError: Input contains infinity or a value too large for dtype('float64').

We raise the error because the SimpleImputer method does not support infinite values. To solve this error, you can replace the np.inf with np.nan values as follows:

data[data==np.inf] = np.nan

imp_mean = SimpleImputer(missing_values=np.nan, strategy='constant', fill_value=0)

imputer = imp_mean.fit([data])

data = imputer.transform([data])

print(data)

With all infinity values replaced with NaN values, we can use the SimpleImputer to transform the data. Let’s look at the clean dataset:

[[-0.5318616   0.          0.12842066  0.          0.          0.
   1.24679674  0.09636847  0.67969774  1.2029146   0.          0.60090616
  -0.46642723  0.          1.58596659  0.47893738  1.52861316  0.
  -1.36273437  0.        ]]

Consider the case where we have multi-dimensional data with NaN and infinity values, and we want to use the SimpleImputer method. In that case, we can replace the infinite by using the Pandas replace() method as follows:

from sklearn.impute import SimpleImputer

data = [[7, 2, np.nan], 
        [4, np.nan, 6], 
        [10, 5, np.inf]]

df = pd.DataFrame(data)

df.replace([np.inf, -np.inf], np.nan, inplace=True)

data = df.to_numpy()

Then we can use the SimpleImputer to fit and transform the data. In this case, we will replace the missing values with the mean along the column where each NaN value occurs.

imp_mean = SimpleImputer(missing_values=np.nan, strategy='mean')

imp_mean.fit(data)

data = imp_mean.transform(data)

print(data)

The clean data looks like this:

[[ 7.   2.   6. ]
 [ 4.   3.5  6. ]
 [10.   5.   6. ]]

Summary

Congratulations on reading to the end of this tutorial! If you pass a NaN or an infinite value to a function, you may raise the error: ValueError: input contains nan, infinity or a value too large for dtype(‘float64’). This commonly occurs as a result of not preprocessing data before analysis. To solve this error, check your data for NaN and inf values and either remove them or replace them with real numbers.

You can only replace NaN values with the SimpleImputer method. If you try to replace infinity values with the SimpleImputer, you will raise the ValueError. Ensure that you convert all positive and negative infinity values to NaN before using the SimpleImputer.

For further reading on ValueErrors, go to the article: How to Solve Python ValueError: I/O operation on closed file.

or further reading on Scikit-learn, go to the article: How to Solve Sklearn ValueError: Unknown label type: ‘continuous’.

Go to the online courses page on Python to learn more about coding in Python for data science and machine learning.

Have fun and happy researching!

Источник

One common error you may encounter when using Python is:

ValueError: Input contains infinity or a value too large for dtype('float64').

This error usually occurs when you attempt to use some function from the scikit-learn module, but the DataFrame or matrix you’re using as input has NaN values or infinite values.

The following example shows how to resolve this error in practice.

How to Reproduce the Error

Suppose we have the following pandas DataFrame:

import pandas as pd
import numpy as np

#create DataFrame
df = pd.DataFrame({'x1': [1, 2, 2, 4, 2, 1, 5, 4, 2, 4, 4],
                   'x2': [1, 3, 3, 5, 2, 2, 1, np.inf, 0, 3, 4],
                   'y': [np.nan, 78, 85, 88, 72, 69, 94, 94, 88, 92, 90]})

#view DataFrame
print(df)

    x1   x2     y
0    1  1.0   NaN
1    2  3.0  78.0
2    2  3.0  85.0
3    4  5.0  88.0
4    2  2.0  72.0
5    1  2.0  69.0
6    5  1.0  94.0
7    4  inf  94.0
8    2  0.0  88.0
9    4  3.0  92.0
10   4  4.0  90.0

Now suppose we attempt to fit a multiple linear regression model using functions from scikit-learn:

from sklearn.linear_model import LinearRegression

#initiate linear regression model
model = LinearRegression()

#define predictor and response variables
X, y = df[['x1', 'x2']], df.y

#fit regression model
model.fit(X, y)

#print model intercept and coefficients
print(model.intercept_, model.coef_)

ValueError: Input contains infinity or a value too large for dtype('float64').

We receive an error since the DataFrame we’re using has both infinite and NaN values.

How to Fix the Error

The way to resolve this error is to first remove any rows from the DataFrame that contain infinite or NaN values:

#remove rows with any values that are not finite
df_new = df[np.isfinite(df).all(1)]

#view updated DataFrame
print(df_new)

    x1   x2     y
1    2  3.0  78.0
2    2  3.0  85.0
3    4  5.0  88.0
4    2  2.0  72.0
5    1  2.0  69.0
6    5  1.0  94.0
8    2  0.0  88.0
9    4  3.0  92.0
10   4  4.0  90.0

The two rows that had infinite or NaN values have been removed.

We can now proceed to fit our linear regression model:

from sklearn.linear_model import LinearRegression

#initiate linear regression model
model = LinearRegression()

#define predictor and response variables
X, y = df_new[['x1', 'x2']], df_new.y

#fit regression model
model.fit(X, y)

#print model intercept and coefficients
print(model.intercept_, model.coef_)

69.85144124168515 [ 5.72727273 -0.93791574]

Notice that we don’t receive any error this time because we first removed the rows with infinite or NaN values from the DataFrame.

Additional Resources

The following tutorials explain how to fix other common errors in Python:

How to Fix in Python: ‘numpy.ndarray’ object is not callable
How to Fix: TypeError: ‘numpy.float64’ object is not callable
How to Fix: Typeerror: expected string or bytes-like object

Источник

In this article I will provide you code examples in Python to resolve valueerror: input contains nan, infinity or a value too large for dtype(‘float64’). As indicated by the error, it occurs when data contains NaN or infinity. Such data can’t be processed because they have no definite bounds.

Error Code – Let’s first replicate the error –

matrix = np.random.rand(5,5)
matrix[0,:] = np.inf
matrix[2,:] = -np.inf

print(matrix)

# Output:
array([[       inf,        inf,        inf,        inf,        inf],
       [0.87362809, 0.28321499, 0.7427659 , 0.37570528, 0.35783064],
       [      -inf,       -inf,       -inf,       -inf,       -inf],
       [0.72877665, 0.06580068, 0.95222639, 0.00833664, 0.68779902],
       [0.90272002, 0.37357483, 0.92952479, 0.072105  , 0.20837798]])

This matrix has infinite numbers. If you perform some operations like in sklearn, you will get this error –

valueerror: input contains nan, infinity or a value too large for dtype('float64')

Solutions

The obvious solution is to check for NaN and infinity in your matrix and replace those values with something meaningful and workable.

Method 1 – Check NaN & infinity using np.any() & np.all()

np.any(np.isnan(matrix))
np.all(np.isfinite(matrix))

Method 2 – For dataframes, use this function for cleaning –

def clean_dataset(df):
    assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"
    df.dropna(inplace=True)
    indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1)
    return df[indices_to_keep].astype(np.float64)

Method 3 – Reset index of dataframe –

df = df.reset_index()

But, this method will add an index to the dataframe.

Method 4 – Replace NaN & infinite with some value –

df.replace([np.inf, -np.inf], np.nan, inplace=True)

The above code will replace all infinite values with NaN. Next, we will replace NaN with some number –

df.fillna(999, inplace=True)

Method 5 – Using numpy nan_to_num() function –

df = np.nan_to_num(df)

Method 6 – For X_train –

X_train = X_train.replace((np.inf, -np.inf, np.nan), 0).reset_index(drop=True)

Method 7 – Detect all NaN and infinite in your data –

index = 0
for i in p[:,0]:
    if not np.isfinite(i):
        print(index, i)
    index +=1

This will print all the values which are not finite including NaN and infinite.

Method 8 – Dropping all NaN & infinite –

df = df.replace([np.inf, -np.inf], np.nan)
df = df.dropna()
df = df.reset_index()

Method 9 – Replace NaN & infinite with max float64 –

inputArray[inputArray == inf] = np.finfo(np.float64).max

This is Akash Mittal, an overall computer scientist. He is in software development from more than 10 years and worked on technologies like ReactJS, React Native, Php, JS, Golang, Java, Android etc. Being a die hard animal lover is the only trait, he is proud of.

Related Tags

Error,
python error,
python-short

Источник

I am trying to fit my data into my model which takes numpy as input, so I feed the model with the dataframe values

stacked_averaged_models.fit(train.values, y_train1)

I am getting the following error

ValueError                                Traceback (most recent call last)
<ipython-input-145-9ba69af8df05> in <module>()
      1 X_traintrain = train.as_matrix().astype(np.float)
      2 from sklearn.metrics import r2_score
----> 3 stacked_averaged_models.fit(train.values, y_train1)
      4 stacked_train_pred = stacked_averaged_models.predict(train.values)
      5 stacked_pred = np.expm1(stacked_averaged_models.predict(test.values))

<ipython-input-140-dfca4af6e9d1> in fit(self, X, y)
     18                 instance = clone(model)
     19                 self.base_models_[i].append(instance)
---> 20                 instance.fit(X[train_index], y[train_index])
     21                 y_pred = instance.predict(X[holdout_index])
     22                 out_of_fold_predictions[holdout_index, i] = y_pred

~Anaconda3envsdeeplearninglibsite-packagessklearnpipeline.py in fit(self, X, y, **fit_params)
    248         Xt, fit_params = self._fit(X, y, **fit_params)
    249         if self._final_estimator is not None:
--> 250             self._final_estimator.fit(Xt, y, **fit_params)
    251         return self
    252 

~Anaconda3envsdeeplearninglibsite-packagessklearnlinear_modelcoordinate_descent.py in fit(self, X, y, check_input)
    705                              order='F', dtype=[np.float64, np.float32],
    706                              copy=self.copy_X and self.fit_intercept,
--> 707                              multi_output=True, y_numeric=True)
    708             y = check_array(y, order='F', copy=False, dtype=X.dtype.type,
    709                             ensure_2d=False)

~Anaconda3envsdeeplearninglibsite-packagessklearnutilsvalidation.py in check_X_y(X, y, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)
    574     if multi_output:
    575         y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,
--> 576                         dtype=None)
    577     else:
    578         y = column_or_1d(y, warn=True)

~Anaconda3envsdeeplearninglibsite-packagessklearnutilsvalidation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    451                              % (array.ndim, estimator_name))
    452         if force_all_finite:
--> 453             _assert_all_finite(array)
    454 
    455     shape_repr = _shape_repr(array.shape)

~Anaconda3envsdeeplearninglibsite-packagessklearnutilsvalidation.py in _assert_all_finite(X)
     42             and not np.isfinite(X).all()):
     43         raise ValueError("Input contains NaN, infinity"
---> 44                          " or a value too large for %r." % X.dtype)
     45 
     46 

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

I did a check on NaN and infinity, it did pass the test

X_traintrain = train.as_matrix().astype(np.float)
print(np.any(np.isnan(X_traintrain)))
print(np.all(np.isfinite(X_traintrain)))

Output:

False
True

How else can I solve, or at least, debug this?

X1      X2       X3     X4       X5    X6   X7     X8   Y1      Y2
0.64    784.00  343.00  220.50  3.50    5   0.00    0   10.56   16.67
0.62    808.50  367.50  220.50  3.50    2   0.00    0   8.60    12.07
0.62    808.50  367.50  220.50  3.50    5   0.00    0   8.50    12.04
0.98    514.50  294.00  110.25  7.00    2   0.10    1   24.58   26.47

This is few rows of my dataset

Источник

The “valueerror: input contains nan, infinity, or a value too large for dtype(‘float64’)” error message could be removed through scaling technique before modeling or getting the boolean mask back. This article describes all the causes of this error and the solutions and strategies suggested by experts.

Keep reading to learn what the answer is to fix this error.

Why Are You Getting Valueerror: Input Contains Nan, Infinity, or a Value Too Large for Dtype(‘float64’).?

You are getting this error because of using the older versions of libraries, significant errors, prediction of NaN or Inf, iterated auto_arima, affinity propagation, data mistake,

– Older Versions of Libraries

If you are working in some statistical library in Python and are getting this error, there is a high chance that you have not updated your library. Even the older versions of libraries like Pandas and Statsmodel could also cause this error.

– Large Errors

Having large errors can also be the cause of this error. If you have colossal errors, the Auto-ARIMA cannot find a reasonable solution; in that case, you might face the valueerror: input contains nan, infinity, or a value too significant for dtype(‘float64’) error.

– Prediction of Nan or Inf

Sometimes prediction of NaN and Infinite values also be the cause of that error. Especially when the prediction of NaN or Inf is in R or auto_arima.

– Iterated Auto-arima

Your auto-arima can be iterated after a group on different IDs. Let’s suppose each of your IDs has dates between 25 to 28, and the prediction is only for a single day; in that case, you can get this type of error. Your ID can face failure because of many dates, which can be non-consecutive. Additionally, that can also happen even if your Auto-arima runs on numerical arrays and have no reference to dates.

– Affinity Propagation

Sometimes this error appears while using the affinity propagation algorithm on Sklearn. This problem often occurs in the input matrix.

– Data Mistake

This error might appear because of the dataset. A mistake in data can lead you to get the error. However, the confusing thing here is that it would seem like a Sklearn, as there would be a mistake in the data.

– Having Nan and Infinity in the Input Data

One of the leading causes of this error is having the NaN and infinity in the input data. If you have Nan and inf instead of zeros and finite numbers, valueerror: input contains nan, infinity, or a value too large for dtype(‘float64’) can appear. Here you need to get a boolean mask back and tuple with i, j coordinates.

We covered several causes of toft error. Let’s see the solution to each cause.

– Older Versions of Libraries

If you are using the older versions of libraries, the simple solution to this cause would be to update all the libraries you are using.

– Large Errors

If there are significant errors in your code, you might face that error. In that case, you can try not using the Auto-Arima if the series is challenging to forecast.

– Prediction of Nan or Inf in R or Auto_arima

If you think the problem is the prediction of a NaN or Inf, you should try using a scaling technique before modeling. However, scaling should not be required because it should be tested unscaled before looking at what effects scaling provides.

If the problem is still not solved, you should try within the function, because when the modules are sequenced, there is a chance of failures during cross-validation. In addition, you can even try/accept block fails in the software-engineering pipeline sometimes.

– Iterated Auto_arima

In the case of Iterated Auto-arima, if you are facing failure because of the ID dates, make sure if any non-consecutive dates are causing the problem; if you find such a mistake, remove those non-consecutive dates, and you will no longer see that error.

– Affinity Propagation

If you are using the affinity propagation algorithm on Sklearn and having this issue, the first thing you need to do is to remove infinite values in the matrix, and for that reason, you can use the following code:

mat[np.isfinite(mat) == True]=0

This will remove the infinite values in the matrix, and you can get rid of the problem.

– Data Mistake

If you are confident that there is some mistake in the dataset, you should check your data multiple times. Make sure while conversion to float64, both of them should be finite and not NaN.

Make sure that you do not have any missing values in your dataset. To avoid this problem, you can use the imputer class as well. You should always keep a check on the dataset. It is extremely important to process the correct data, one mistake in the dataset and you will not get the expected results.

You can clean the data set of NaA and Inf and missing cells by using the following function as well:

import pandas as pd
import numpy as np
def clean_dataset(df):
assert isinstance (df, pd.DataFrame), “df need to be a pd.DataFrame”
df.dropna(inplace=True)
indices_to _keep=~df.isin([np.nan, np.inf, -np.inf]).any(1)
return df[indices_to_keep].astype(np.float64)

– Another Way To Impute the Missing Values

You can also use the following code to get rid of infinite values. Using the following code means creating a separate function to eliminate the missing values.

df.replaces([np.inf, -np.inf], np.nan, inplace=True)

But if the problem is still not solved, then the problem might occur in the math between input and output. This math depends on some conditions. Nonetheless, there are high chances that this might be because of not fulfilling the matrix criteria. Make sure you read the documentation for the function you are using.

– Having Nan and Infinity in the Input Data

To remove the NaN and infinity in the input data, you need to get a boolean mask back with true for positions containing NaNs, and for that, you can use no.isnan(X). Note that you also need to get back a tuple with i, j coordinates of NaNs, and for that, you can use np.where(np.isnan(X)). In the final step, you need to replace NaN with zero and infinite with finite numbers, and for that, you can use the np.nan_to_num(X).

As an alternative, you can also use the sklearn.impute.SimpleImputer, that will mean or median imputations of missing values. You can also use the pandas’ pd.DataFrame(X).fillna() will allow you to fill something other than zeros.

You can use the x!=x to check the NaN. If x!=x return none, the np.isnan(X) might get failed.

Replacing the NaN values with zeros is not always the best way. If you replace the NaN values with zeros, the zeros might be introduced arbitrarily, which can lead to skewing your variables. Zero sometimes is not even an acceptable value in your variables, which means the variables will not have a true zero value.

If you want to replace the missing values with the mean values of a rolling window by using a rolling average, you can use the .rolling() for that. You can also use the module <b>missingpy</b> for more robustness.

Another solution to this error could be if you add some black rows while forming X vectors; this is how the NaNs might not get into the data frame in the first place.

FAQ Section

– What Are Nan Values?

NaN stands for not a number, and it represents the missing value in the data. Having NaN value in the data is one of the major problems. The NaN values are special floating-point values, and you can’t convert them to any other type of float. It is very necessary to remove the NaN values to get the expected results.

– How To Replace the Nan Values With Zeros in Pandas Dataframe?

You can use different functions to get rid of NaN values and replace them with zeros. You can use the fillna () function to fill Na/NaN values. The second method you can use is dataframe.replace() function used to replace the string, list, regex, dictionary, etc., in the dataframe.

Steps To Replace Nan Values:

Use df[‘DataFrame Column’] = df[‘DataFrame Column’].fillna(0) in Pandas for one coumn.
Then use df[‘DataFrame Column’] = df[‘DataFrame Column’].replace(np.nan,0) in numpy for one column.
Then use df.fillna(0) in Pandas for the whole DataFrame
Then use df.replace(np.nan, 0) in Pandas for the whole DataFrame

– What Is Dtype(‘float64’)?

The float64 is actually a Pandas type of float, and it represents numeric characters with decimals. Suppose a column contains numbers and missing values; pandas will display an error with dtype(‘float64’) if your missing value has a decimal.

– What if Your Dataset Contains No Nan or Infinite Value, and Still Throws the Dtype(‘float64’) Error?

This is the major issue where when you will check the NaN or infinite values, you would find no NaN or infinite values, but the point is, are you checking in the right way? Most beginners make this mistake and use np.all() to check if all the data are infinite, which is not the right way. You should use np.any() to help you find the NaN and infinite values; however, when you use the right method, you will definitely find the NaN and infinite values.

Conclusion

Let’s understand what we learned today:

There could be multiple causes of that error; you should identify the cause in your case and then go for the correct solution.
The cause of the valueerror: input contains nan, infinity, or a value too large for dtype(‘float64’) could be because of having any NaN or infinity value in the dataset, using older versions of libraries, large errors, prediction of NaN or Inf, iterated Auto_arima, affinity propagation, or any mistake in the data.
If you have identified the cause of the error in your case, you use the solution method described above.

You have understood all the causes of this error, in addition, we have covered all the solutions for each cause. So this error should be a hurdle for you to perform with your favorite framework. Use this article as a guide and find the right solution.

Author
Recent Posts

Position Is Everything: Your Go-To Resource for Learn & Build: CSS,JavaScript,HTML,PHP,C++ and MYSQL.

Источник

При действии MinMaxScaler и при самому предсказанию с помощью рандомного леса выдаётся ошибка:

C:ProgramDataAnaconda3libsite-packagesnumpycore_methods.py:32: RuntimeWarning: invalid value encountered in reduce
return umr_sum(a, axis, dtype, out, keepdims)
ValueError Traceback (most recent call last)
in ()
2
3 scaler = MinMaxScaler()
—-> 4 x_train = scaler.fit_transform(x_train)
5 x_validation = scaler.fit_transform(x_validation)

C:ProgramDataAnaconda3libsite-packagessklearnbase.py in fit_transform(self, X, y, **fit_params)
515 if y is None:
516 # fit method of arity 1 (unsupervised transformation)
–> 517 return self.fit(X, **fit_params).transform(X)
518 else:
519 # fit method of arity 2 (supervised transformation)

C:ProgramDataAnaconda3libsite-packagessklearnpreprocessingdata.py in fit(self, X, y)
306 # Reset internal state before fitting
307 self._reset()
–> 308 return self.partial_fit(X, y)
309
310 def partial_fit(self, X, y=None):

C:ProgramDataAnaconda3libsite-packagessklearnpreprocessingdata.py in partial_fit(self, X, y)
332
333 X = check_array(X, copy=self.copy, warn_on_dtype=True,
–> 334 estimator=self, dtype=FLOAT_DTYPES)
335
336 data_min = np.min(X, axis=0)

C:ProgramDataAnaconda3libsite-packagessklearnutilsvalidation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
451 % (array.ndim, estimator_name))
452 if force_all_finite:
–> 453 _assert_all_finite(array)
454
455 shape_repr = _shape_repr(array.shape)

C:ProgramDataAnaconda3libsite-packagessklearnutilsvalidation.py in _assert_all_finite(X)
42 and not np.isfinite(X).all()):
43 raise ValueError(“Input contains NaN, infinity”
—> 44 “ or a value too large for %r.” % X.dtype)
45
46

ValueError: Input contains NaN, infinity or a value too large for dtype(‘float64’).

Не знаю, с этим делать, NaN быть не должно, так как юзал fillna(0). Чисел, которые не влазят во float64 тоже быть не должно. Понимаю, что инфы мало, но, может, кто натолкнёт на мысль, где хотя б искать проблему.

Источник

Я использую sklearn и у меня проблемы с распространением сродства. Я построил матрицу ввода и продолжаю получать следующую ошибку.

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

Я бегал

np.isnan(mat.any()) #and gets False
np.isfinite(mat.all()) #and gets True

Я пытался с помощью

mat[np.isfinite(mat) == True] = 0

удалить бесконечные значения, но это тоже не сработало. Что я могу сделать, чтобы избавиться от бесконечных значений в моей матрице, чтобы я мог использовать алгоритм распространения сродства?

Я использую Anaconda и Python 2.7.9.

09 июль 2015, в 18:48

Источник

13 ответов

Это может произойти внутри scikit, и это зависит от того, что вы делаете. Я рекомендую прочитать документацию для функций, которые вы используете. Возможно, вы используете тот, который зависит, например. на вашей матрице, являющейся положительно определенной и не отвечающей этим критериям.

РЕДАКТИРОВАТЬ. Как я мог пропустить это:

np.isnan(mat.any()) #and gets False
np.isfinite(mat.all()) #and gets True

очевидно, неверно. Правильно будет:

np.any(np.isnan(mat))

np.all(np.isfinite(mat))

Вы хотите проверить, что какой-либо элемент является NaN, а не является ли возвращаемое значение функции any числом…

Marcus Müller
09 июль 2015, в 17:13

Я получил такое же сообщение об ошибке при использовании sklearn с пандами. Мое решение состоит в том, чтобы сбросить индекс моего dataframe df перед запуском любого кода sklearn:

df = df.reset_index()

Я сталкивался с этой проблемой много раз, когда удалял некоторые записи в моем df, такие как

df = df[df.label=='desired_one']

Jun Wang
24 дек. 2017, в 04:06

Размеры моего входного массива были искажены, так как у моего ввода csv были пустые пространства.

Ethan Waldie
14 июль 2015, в 21:34

Это проверка, с которой она не выполняется:

https://github.com/scikit-learn/scikit-learn/blob/0.17.X/sklearn/utils/validation.py#L51

Что говорит

def _assert_all_finite(X):
    """Like assert_all_finite, but only for ndarray."""
    X = np.asanyarray(X)
    # First try an O(n) time, O(1) space solution for the common case that
    # everything is finite; fall back to O(n) space np.isfinite to prevent
    # false positives from overflow in sum method.
    if (X.dtype.char in np.typecodes['AllFloat'] and not np.isfinite(X.sum())
            and not np.isfinite(X).all()):
        raise ValueError("Input contains NaN, infinity"
                         " or a value too large for %r." % X.dtype)

Поэтому убедитесь, что у вас есть значения, отличные от NaN, на вашем входе. И все эти значения фактически являются значениями float. Ни одно из значений не должно быть Inf.

tuxdna
13 апр. 2016, в 15:33

Это моя функция (основанная на этом) для очистки набора данных nan, Inf и отсутствующих ячеек (для искаженных наборов данных):

import pandas as pd

def clean_dataset(df):
    assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"
    df.dropna(inplace=True)
    indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1)
    return df[indices_to_keep].astype(np.float64)

Boern
05 окт. 2017, в 09:23

У меня была ошибка после попытки выбрать подмножество строк:

df = df.reindex(index=my_index)

Оказывается, что my_index содержал значения, которые не содержались в df.index, поэтому функция reindex вставила несколько новых строк и заполнила их nan.

Elias Strehle
15 фев. 2018, в 16:44

С этой версией python 3:

/opt/anaconda3/bin/python --version
Python 3.6.0 :: Anaconda 4.3.0 (64-bit)

Глядя на детали ошибки, я обнаружил строки кодов, вызывающие сбой:

/opt/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py in _assert_all_finite(X)
     56             and not np.isfinite(X).all()):
     57         raise ValueError("Input contains NaN, infinity"
---> 58                          " or a value too large for %r." % X.dtype)
     59 
     60 

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

Из этого я смог извлечь правильный способ проверить, что происходит с моими данными, используя тот же тест, который не дал сообщение об ошибке: np.isfinite(X)

Затем с быстрым и грязным контуром я смог найти, что мои данные действительно содержат nans:

print(p[:,0].shape)
index = 0
for i in p[:,0]:
    if not np.isfinite(i):
        print(index, i)
    index +=1

(367340,)
4454 nan
6940 nan
10868 nan
12753 nan
14855 nan
15678 nan
24954 nan
30251 nan
31108 nan
51455 nan
59055 nan
...

Теперь мне нужно только удалить значения в этих индексах.

Raphvanns
10 авг. 2017, в 23:09

У меня была такая же ошибка, и в моем случае X и y были dataframes, поэтому мне пришлось сначала преобразовать их в матрицы:

X = X.as_matrix().astype(np.float)
y = y.as_matrix().astype(np.float)

tekumara
02 июль 2017, в 11:18

я получил ту же ошибку. он работал с df.fillna(-99999, inplace=True) перед выполнением любой замены, замены и т.д.

Cohen
08 июнь 2018, в 13:15

ValueError: Вход содержит NaN, бесконечность или значение, слишком большое для dtype (‘float64’).

почему я принимаю эту ошибку?

мой код:

import pandas as pd
from sklearn.metrics import r2_score
import statsmodels.api as sm

veriler = pd.read_csv('dataset.csv')

x = veriler.iloc[:,1:7]
y = veriler.iloc[:,7:]
X = x.values
Y = y.values

print(veriler.corr())

#Decision Tree Regression
from sklearn.tree import DecisionTreeRegressor
r_dt = DecisionTreeRegressor(random_state=0)
r_dt.fit(X,Y)

print('dt ols')
model4 = sm.OLS(r_dt.predict(X),X)
print(model4.fit().summary())

print("Decision Tree R2 degeri:")
print(r2_score(Y, r_dt.predict(X)) )

atike taştan
16 апр. 2019, в 17:54

пытаться

mat.sum()

Если сумма ваших данных равна бесконечности (больше, чем максимальное значение с плавающей запятой, которое составляет 3.402823e + 38), вы получите эту ошибку.

смотрите функцию _assert_all_finite в validation.py из исходного кода Scikit:

if is_float and np.isfinite(X.sum()):
    pass
elif is_float:
    msg_err = "Input contains {} or a value too large for {!r}."
    if (allow_nan and np.isinf(X).any() or
            not allow_nan and not np.isfinite(X).all()):
        type_err = 'infinity' if allow_nan else 'NaN, infinity'
        # print(X.sum())
        raise ValueError(msg_err.format(type_err, X.dtype))

Rick Hill
14 март 2019, в 09:48

Если вы не можете найти проблему в X, проверьте у

kztd
31 дек. 2018, в 21:18

В моем случае проблема заключалась в том, что многие функции scikit возвращали пустые массивы, которые не имеют индекса панд. Так что произошло несоответствие индекса, когда я использовал эти массивы для создания новых DataFrames, а затем попытался смешать их с исходными данными.

luca
25 июнь 2018, в 11:03

Ещё вопросы

1Python — большое имя файла RENAME ERROR (ошибка Win 3)
0Невозможно получить непрочитанные письма из почтового ящика office365 с помощью функции PHP imap_headers ()
0передача другой модели $ в renderPartial
1Удалите смежные дубликаты элементов из списка массивов в Java (с panache)
0вызов ajax в контроллере: $ http не определен
0JQuery IMG URL заполнен переменной
0получить максимальное значение из 1 таблицы на основе 3 таблиц
0Как добавить текущее время в формате datetime в текстовое поле в yii?
1Сборка колоды карт в Java с использованием 2 разных ENUMS
0Перегрузка << и >> хорошо работает с указателями C ++
1Пользовательская легенда из нескольких строк с двумя маркерами для одного и того же текста
0Phonegap и JqueryMobile блокируют события и функции пользовательского интерфейса
0передача данных td на скрытый ввод и последующее размещение с помощью PHP
0Как получить доступ к device_vector из функтора
0CSS: недискретная минимальная высота по отношению к родителю, но также может расширить родительский
1Почему производительность снижается при подборе модели Random Forest после снижения с помощью PCA?
1«Метод должен возвращать результат» при вызове другого метода, который выдает только исключение
1Python3 — Проблемы с определением типа объекта
0Ошибка при открытии файла Excel, созданного с помощью phpExcel
1MinMax Scale Sparse Matrix исключая нулевые элементы
0Выпадающее меню скрыто за
1Комментарий об обновлении Android Маркета?
1Как пропустить точки данных в боке?
1Entityframework не выполняет хранимую процедуру
1Возврат XML-контента в WebAPI OWIN для Twilio
1«Очистка» от неточностей двойных значений
0Дождитесь выполнения функции, пока новые данные не отобразятся в браузере
0Бинарное дерево поиска
1Фильтр панд df по значениям
1Отправьте запятые в значении для файла cookie HttpWebRequest
1Преобразовать многомерный массив (Object [,]) в строку []
1Проверка срабатывает до нажатия кнопки в MVC3
1Слияние фреймов данных на основе списков номеров строк
1Пограничный браузер XMLHttpRequest прерывание не работает
0Передача списка в качестве типа возврата c ++
0Угловой + минификация не работает в производстве и невозможно проверить
1Конвертировать 2D массив в изображение в Java?
0PHP — получить атрибут LDAP thumbnailphoto для отображения в HTML
0Измените UL стиля выбора JQuery для работы без BootstrapJS
1Java regex работает в моей системе, но не на сервере
0Как отсортировать элементы на основе целочисленных значений в элементах?
1Как исправить первые три элемента массива (продолжайте отображаться как ноль)?
1Количество клиентов сокетов, подключенных к серверу
1Записывать каждое измененное свойство в базу данных
1C # Winform сохранить настройки
0Использование std :: begin (collection) против collection.begin () в C ++
1Пул соединений в Tomcat 7
1el.style.setProperty против el.style.property
1Получение идентификатора программно сгенерированной сетки кнопок в Android
0Как получить доступ к константам без статического пути

Источник

Table of contents

Python ValueError: input contains nan, infinity or a value too large for dtype(‘float64’)

What is a ValueError?

What is a NaN in Python?

What is inf in Python?

Example #1: Dataset with NaN Values

Solution #1: using nan_to_num()

Solution #2: using SimpleImputer

Example #2: Dataset with NaN and inf Values

Solution #1: Using nan_to_num

Solution #2: Using fillna()

Solution #3: using SimpleImputer

Summary

How to Reproduce the Error

How to Fix the Error

Additional Resources

Solutions

Related Tags

Why Are You Getting Valueerror: Input Contains Nan, Infinity, or a Value Too Large for Dtype(‘float64’).?

– Older Versions of Libraries

– Large Errors

– Prediction of Nan or Inf

– Iterated Auto-arima

– Affinity Propagation

– Data Mistake

– Having Nan and Infinity in the Input Data

– Older Versions of Libraries

– Large Errors

– Prediction of Nan or Inf in R or Auto_arima

– Iterated Auto_arima

– Affinity Propagation

– Data Mistake

– Another Way To Impute the Missing Values

– Having Nan and Infinity in the Input Data

FAQ Section

– What Are Nan Values?

– How To Replace the Nan Values With Zeros in Pandas Dataframe?

– What Is Dtype(‘float64’)?

– What if Your Dataset Contains No Nan or Infinite Value, and Still Throws the Dtype(‘float64’) Error?

Conclusion

13 ответов

Ещё вопросы

Читайте также: