17 авг. 2022 г.
читать 1 мин
Одна ошибка, с которой вы можете столкнуться в Python:
numpy.linalg.LinAlgError: Singular matrix
Эта ошибка возникает, когда вы пытаетесь инвертировать сингулярную матрицу, которая по определению является матрицей с нулевым определителем и не может быть инвертирована.
В этом руководстве рассказывается, как устранить эту ошибку на практике.
Как воспроизвести ошибку
Предположим, мы создаем следующую матрицу с помощью NumPy:
import numpy as np
#create 2x2 matrix
my_matrix = np.array([[1., 1.], [1., 1.]])
#display matrix
print(my_matrix)
[[1. 1.]
[1. 1.]]
Теперь предположим, что мы пытаемся использовать функцию inv() из NumPy для вычисления обратной матрицы:
from numpy import inv
#attempt to invert matrix
inv(my_matrix)
numpy.linalg.LinAlgError: Singular matrix
Мы получаем ошибку, потому что созданная нами матрица не имеет обратной матрицы.
Примечание.Ознакомьтесь с этой страницей Wolfram MathWorld, на которой показаны 10 различных примеров матриц, не имеющих обратной матрицы.
По определению матрица сингулярна и не может быть обращена, если ее определитель равен нулю.
Вы можете использовать функцию det() из NumPy для вычисления определителя данной матрицы, прежде чем пытаться ее инвертировать:
from numpy import det
#calculate determinant of matrix
det(my_matrix)
0.0
Определитель нашей матрицы равен нулю, что объясняет, почему мы сталкиваемся с ошибкой.
Как исправить ошибку
Единственный способ обойти эту ошибку — просто создать невырожденную матрицу.
Например, предположим, что мы используем функцию inv() для инвертирования следующей матрицы:
import numpy as np
from numpy. linalg import inv, det
#create 2x2 matrix that is not singular
my_matrix = np.array([[1., 7.], [4., 2.]])
#display matrix
print(my_matrix)
[[1. 7.]
[4. 2.]]
#calculate determinant of matrix
print(det(my_matrix))
-25.9999999993
#calculate inverse of matrix
print(inv(my_matrix))
[[-0.07692308 0.26923077]
[ 0.15384615 -0.03846154]]
Мы не получаем никакой ошибки при инвертировании матрицы, потому что матрица не является единственной.
Дополнительные ресурсы
В следующих руководствах объясняется, как исправить другие распространенные ошибки в Python:
Как исправить: объект numpy.float64 не вызывается
Как исправить: объект ‘numpy.ndarray’ не вызывается
Как исправить: объект numpy.float64 не может быть интерпретирован как целое число
One error you may encounter in Python is:
numpy.linalg.LinAlgError: Singular matrix
This error occurs when you attempt to invert a singular matrix, which by definition is a matrix that has a determinant of zero and cannot be inverted.
This tutorial shares how to resolve this error in practice.
How to Reproduce the Error
Suppose we create the following matrix using NumPy:
import numpy as np
#create 2x2 matrix
my_matrix = np.array([[1., 1.], [1., 1.]])
#display matrix
print(my_matrix)
[[1. 1.]
[1. 1.]]
Now suppose we attempt to use the inv() function from NumPy to calculate the inverse of the matrix:
from numpy import inv
#attempt to invert matrix
inv(my_matrix)
numpy.linalg.LinAlgError: Singular matrix
We receive an error because the matrix that we created does not have an inverse matrix.
Note: Check out this page from Wolfram MathWorld that shows 10 different examples of matrices that have no inverse matrix.
By definition, a matrix is singular and cannot be inverted if it has a determinant of zero.
You can use the det() function from NumPy to calculate the determinant of a given matrix before you attempt to invert it:
from numpy import det
#calculate determinant of matrix
det(my_matrix)
0.0
The determinant of our matrix is zero, which explains why we run into an error.
How to Fix the Error
The only way to get around this error is to simply create a matrix that is not singular.
For example, suppose we use the inv() function to invert the following matrix:
import numpy as np
from numpy.linalg import inv, det
#create 2x2 matrix that is not singular
my_matrix = np.array([[1., 7.], [4., 2.]])
#display matrix
print(my_matrix)
[[1. 7.]
[4. 2.]]
#calculate determinant of matrix
print(det(my_matrix))
-25.9999999993
#calculate inverse of matrix
print(inv(my_matrix))
[[-0.07692308 0.26923077]
[ 0.15384615 -0.03846154]]
We don’t receive any error when inverting the matrix because the matrix is not singular.
Additional Resources
The following tutorials explain how to fix other common errors in Python:
How to Fix: ‘numpy.float64’ object is not callable
How to Fix: ‘numpy.ndarray’ object is not callable
How to Fix: ‘numpy.float64’ object cannot be interpreted as an integer
Hi, this is a (simplified) case I encountered while working on seaborn.pairplot
.
data: data.1000.txt
singular.py:
import pandas import seaborn data = pandas.read_table("data.1000.txt", index_col="cell") seaborn.pairplot(data.drop(columns="cluster").iloc[:,0:6], hue="batch")
commands:
~/w/experiments $ python3 --version
Python 3.6.6
~/w/experiments $ pip3 show seaborn
Name: seaborn
Version: 0.9.0
Summary: seaborn: statistical data visualization
Home-page: https://seaborn.pydata.org
Author: Michael Waskom
Author-email: mwaskom@nyu.edu
License: BSD (3-clause)
Location: /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages
Requires: numpy, scipy, pandas, matplotlib
Required-by:
~/w/experiments $ python3 singular.py
Traceback (most recent call last):
File "singular.py", line 4, in <module>
seaborn.pairplot(data.drop(columns="cluster").iloc[:,0:6], hue="batch")
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/seaborn/axisgrid.py", line 2111, in pairplot
grid.map_diag(kdeplot, **diag_kws)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/seaborn/axisgrid.py", line 1399, in map_diag
func(data_k, label=label_k, color=color, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/seaborn/distributions.py", line 691, in kdeplot
cumulative=cumulative, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/seaborn/distributions.py", line 294, in _univariate_kdeplot
x, y = _scipy_univariate_kde(data, bw, gridsize, cut, clip)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/seaborn/distributions.py", line 366, in _scipy_univariate_kde
kde = stats.gaussian_kde(data, bw_method=bw)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/scipy/stats/kde.py", line 172, in __init__
self.set_bandwidth(bw_method=bw_method)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/scipy/stats/kde.py", line 499, in set_bandwidth
self._compute_covariance()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/scipy/stats/kde.py", line 510, in _compute_covariance
self._data_inv_cov = linalg.inv(self._data_covariance)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/scipy/linalg/basic.py", line 975, in inv
raise LinAlgError("singular matrix")
numpy.linalg.linalg.LinAlgError: singular matrix
I’ve never used the Python statsmodels package, and I’m not familiar with it, but based upon the error messages, I think I have a pretty good guess as to what is probably going on, and it’s not a bug—the problem is with your input. According to wikipedia, a key step in Kernal Density Estimation is bandwidth estimation. As described in the link, for Gaussian basis functions (which, based upon your error message referencing gaussian_kde()
, appears to be what you are in fact using), one common choice for estimating the bandwidth, in the special case of one dimensional KDE, requires the sample standard deviation $hat{sigma}$ as an input.
Your mention that the error arises, specifically, in cases where your input is a list of repeated instances of the same number. Imagine calculating the sample standard deviation for a list of identical numbers, what do you get? Well, essentially what you are simulating in that case is a dirac delta function, so effectively your sample standard deviation is $hat{sigma} = 0$. Based upon the fact the next error down in the stack trace, below the exception caught by gaussian_kde()
, occurs within a method called set_bandwidth()
, I would say that what appears to be happening is that you are feeding the code a distribution whose standard deviation is zero, and the code is attempting to use this value to calculate an initial guess for the KDE bandwidth parameter, and it’s choking because zero isn’t really a valid value.
«O.K.», you reply, «but your explanation doesn’t mention anything about linear algebra or singular matrices—why does the error manifest itself within a linear algebra routine, specifically?» Good question. I’m not absolutely certain, but here’s what I suspect is happening. The concept of standard deviation, or its square, the variance, is really an inherently one dimensional concept. The more general concept, valid for multi-variate distributions, is the covariance matrix. The code that you are using is likely designed to be as general as possible, in order to be able to handle a case where the user feeds it a multivariate distribution. In fact, you’ll notice as you work your way further down the stack trace, that the next method down, below set_bandwidth()
, is one called compute_covariance()
. If you know much about covariance matrices, it turns out that a popular way of analyzing and thinking about them is to reduce them to what are known as principal components. The effect of principal component analysis, after it is performed, is to diagonalize the initial covariance matrix, creating an effectively equivalent new matrix which has been transformed in such a way that it consists exclusively of a set of one-dimensional variances lined up only along the diagonal. These variances may be identified as the eigenvalues of the original, non-diagonal matrix, and it also turns out that, in linear algebra, one of the properties of matrices is that those which are related by having identical eigenvalues also inevitably have identical determinants.
So, what I suspect is happening in your case is that, by giving the code repeated instances of the same values as input, you are creating a covariance matrix which has at least one eigenvalue equal to zero, and this condition means that the determinant is zero also, since in the special case of a diagonalized matrix, the determinant will be simply the product of all the values along the diagonal. So, what do we call it when a matrix has a determinant of zero? According to the definition of an invertible matrix, «A square matrix that is not invertible is called singular or degenerate. A square matrix is singular if and only if its determinant is 0.» And that’s why you are getting the error at the bottom of the stack trace—at some point, the code needs to invert the covariance matrix for whatever reason (you can see that the final method at the bottom of the trace is called _data_inv_cov()
) but it can’t do it because the matrix is singular and therefore non-invertible.
Bottom line, what’s effectively happening is that, by giving the code repeated instances of the same number to use as input, you are basically generating the linear algebra equivalent of a divide-by-zero error.