I am trying to install torch with CUDA support.
Here is the result of my collect_env.py
script:
PyTorch version: 1.7.1+cu101
Is debug build: False
CUDA used to build PyTorch: 10.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.1 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: Could not collect
Python version: 3.9 (64-bit runtime)
Is CUDA available: False
CUDA runtime version: 10.1.243
GPU models and configuration: GPU 0: GeForce GTX 1080
Nvidia driver version: 460.39
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] torch==1.7.1+cu101
[pip3] torchaudio==0.7.2
[pip3] torchvision==0.8.2+cu101
[conda] blas 1.0 mkl
[conda] cudatoolkit 10.1.243 h6bb024c_0
[conda] mkl 2020.2 256
[conda] mkl-service 2.3.0 py39he8ac12f_0
[conda] mkl_fft 1.3.0 py39h54f3939_0
[conda] mkl_random 1.0.2 py39h63df603_0
[conda] numpy 1.19.2 py39h89c1606_0
[conda] numpy-base 1.19.2 py39h2ae0177_0
[conda] torch 1.7.1+cu101 pypi_0 pypi
[conda] torchaudio 0.7.2 pypi_0 pypi
[conda] torchvision 0.8.2+cu101 pypi_0 pypi
Process finished with exit code 0
Here is the output of nvcc - V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
Finally, here is the output of nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.39 Driver Version: 460.39 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 00000000:01:00.0 On | N/A |
| 0% 52C P0 46W / 180W | 624MiB / 8116MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 873 G /usr/lib/xorg/Xorg 101MiB |
| 0 N/A N/A 1407 G /usr/lib/xorg/Xorg 419MiB |
| 0 N/A N/A 2029 G ...AAAAAAAAA= --shared-files 90MiB |
+-----------------------------------------------------------------------------+
However, when I try to run
print(torch.cuda.is_available())
I get the following error:
UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
return torch._C._cuda_getDeviceCount() > 0
I have performed a reboot, and I have followed the post-installation steps as detailed in here
preface
Today, a project using pytorch on the viewing server suddenly made an error after upgrading. The whole content of the error report is limited by the title. I’ll send it below.
builtins. RuntimeError: CUDA unknown error – this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_ VISIBLE_ DEVICES after program start. Setting the available devices to be zero.
Screenshot of error reporting
Later, I consulted some materials, and the following are some solutions.
Solution:
Method 1: add environment variables
Since I started the project as a docker container, I installed VIM after entering the container, and then in ~/Bashrc finally added something.
export CUDA_ VISIBLE_ DEVICES=0
Since the selected graphics card number is 0 when building the container, the number I configured above is 0.
Check $CUDA after restarting the container_ VISIBLE_ The devices output is normal, but the problem is not solved, and the error is still reported.
Method 2: add environment variables to the code
Add the following code at the beginning of the initialization CUDA area.
import os
os.environ['CUDA_VISIBLE_DEVICES'] =‘0’
It still hasn’t solved the problem.
Method 3: restart the server
Referring to some articles, I mentioned that if the system upgrades the graphics card driver without restarting, it will also lead to the same error.
So I restarted the server and solved the problem.
Read More:
I am trying to check the GPU device name but after executing this code. I got this unknown runtime error. please help me to solve this problem and give complete instructions to solve this error. Thanks.
(base) kumar@kumar:~$ conda activate pytorch
(pytorch) kumar@kumar:~$ python
Python 3.8.5 (default, Sep 4 2020, 07:30:14)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.__version__)
1.9.0a0+gitb39eeb0
>>> print(torch.version.cuda)
11.2
>>> print(torch.cuda.current_device())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/kumar/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/cuda/__init__.py", line 430, in current_device
_lazy_init()
File "/home/kumar/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/cuda/__init__.py", line 170, in _lazy_init
torch._C._cuda_init()
RuntimeError: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero.
>>> exit()
Here is the output of nvcc -V
(pytorch) kumar@kumar:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0
(pytorch) kumar@kumar:~$
here is the output of nvidia-smi
Thu Apr 8 15:04:49 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.67 Driver Version: 460.39 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr: Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 3070 Off | 00000000:01:00.0 On | N/A |
| 0% 38C P8 10W / 220W | 525MiB / 7982MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1015 G /usr/lib/xorg/Xorg 70MiB |
| 0 N/A N/A 1542 G /usr/lib/xorg/Xorg 257MiB |
| 0 N/A N/A 1675 G /usr/bin/gnome-shell 89MiB |
| 0 N/A N/A 3560 G ...AAAAAAAAA= --shared-files 94MiB |
+-----------------------------------------------------------------------------+
However, when I try to run this code:
print(torch.cuda.current_device())
I get the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/kumar/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/cuda/__init__.py", line 430, in current_device
_lazy_init()
File "/home/kumar/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/cuda/__init__.py", line 170, in _lazy_init
torch._C._cuda_init()
RuntimeError: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero.