Cuda unknown error this may be due to an incorrectly set up environment

I am trying to install torch with CUDA support.

Here is the result of my collect_env.py script:

PyTorch version: 1.7.1+cu101
Is debug build: False
CUDA used to build PyTorch: 10.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.1 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: Could not collect

Python version: 3.9 (64-bit runtime)
Is CUDA available: False
CUDA runtime version: 10.1.243
GPU models and configuration: GPU 0: GeForce GTX 1080
Nvidia driver version: 460.39
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] torch==1.7.1+cu101
[pip3] torchaudio==0.7.2
[pip3] torchvision==0.8.2+cu101
[conda] blas                      1.0                         mkl  
[conda] cudatoolkit               10.1.243             h6bb024c_0  
[conda] mkl                       2020.2                      256  
[conda] mkl-service               2.3.0            py39he8ac12f_0  
[conda] mkl_fft                   1.3.0            py39h54f3939_0  
[conda] mkl_random                1.0.2            py39h63df603_0  
[conda] numpy                     1.19.2           py39h89c1606_0  
[conda] numpy-base                1.19.2           py39h2ae0177_0  
[conda] torch                     1.7.1+cu101              pypi_0    pypi
[conda] torchaudio                0.7.2                    pypi_0    pypi
[conda] torchvision               0.8.2+cu101              pypi_0    pypi

Process finished with exit code 0

Here is the output of nvcc - V

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

Finally, here is the output of nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.39       Driver Version: 460.39       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 00000000:01:00.0  On |                  N/A |
|  0%   52C    P0    46W / 180W |    624MiB /  8116MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       873      G   /usr/lib/xorg/Xorg                101MiB |
|    0   N/A  N/A      1407      G   /usr/lib/xorg/Xorg                419MiB |
|    0   N/A  N/A      2029      G   ...AAAAAAAAA= --shared-files       90MiB |
+-----------------------------------------------------------------------------+

However, when I try to run

print(torch.cuda.is_available())

I get the following error:

UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at  /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0

I have performed a reboot, and I have followed the post-installation steps as detailed in here

Источник

preface

Today, a project using pytorch on the viewing server suddenly made an error after upgrading. The whole content of the error report is limited by the title. I’ll send it below.

builtins. RuntimeError: CUDA unknown error – this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_ VISIBLE_ DEVICES after program start. Setting the available devices to be zero.

Screenshot of error reporting

Later, I consulted some materials, and the following are some solutions.

Solution:

Method 1: add environment variables

Since I started the project as a docker container, I installed VIM after entering the container, and then in ~/Bashrc finally added something.

export CUDA_ VISIBLE_ DEVICES=0

Since the selected graphics card number is 0 when building the container, the number I configured above is 0.

Check $CUDA after restarting the container_ VISIBLE_ The devices output is normal, but the problem is not solved, and the error is still reported.

Method 2: add environment variables to the code

Add the following code at the beginning of the initialization CUDA area.

import os
os.environ['CUDA_VISIBLE_DEVICES'] =‘0’

It still hasn’t solved the problem.

Method 3: restart the server

Referring to some articles, I mentioned that if the system upgrades the graphics card driver without restarting, it will also lead to the same error.

So I restarted the server and solved the problem.

Источник

I am trying to check the GPU device name but after executing this code. I got this unknown runtime error. please help me to solve this problem and give complete instructions to solve this error. Thanks.

(base) kumar@kumar:~$ conda activate pytorch
        (pytorch) kumar@kumar:~$ python
        Python 3.8.5 (default, Sep  4 2020, 07:30:14) 
        [GCC 7.3.0] :: Anaconda, Inc. on linux
        Type "help", "copyright", "credits" or "license" for more information.
        >>> import torch
        >>> print(torch.__version__)
        1.9.0a0+gitb39eeb0
        >>> print(torch.version.cuda)
        11.2
        >>> print(torch.cuda.current_device())
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
          File "/home/kumar/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/cuda/__init__.py", line 430, in current_device
            _lazy_init()
          File "/home/kumar/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/cuda/__init__.py", line 170, in _lazy_init
            torch._C._cuda_init()
        RuntimeError: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero.
        >>> exit()

Here is the output of nvcc -V

  (pytorch) kumar@kumar:~$ nvcc -V
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2021 NVIDIA Corporation
    Built on Sun_Feb_14_21:12:58_PST_2021
    Cuda compilation tools, release 11.2, V11.2.152
    Build cuda_11.2.r11.2/compiler.29618528_0
    (pytorch) kumar@kumar:~$

here is the output of nvidia-smi

Thu Apr  8 15:04:49 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.67       Driver Version: 460.39       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr: Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3070    Off  | 00000000:01:00.0  On |                  N/A |
|  0%   38C    P8    10W / 220W |    525MiB /  7982MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1015      G   /usr/lib/xorg/Xorg                 70MiB |
|    0   N/A  N/A      1542      G   /usr/lib/xorg/Xorg                257MiB |
|    0   N/A  N/A      1675      G   /usr/bin/gnome-shell               89MiB |
|    0   N/A  N/A      3560      G   ...AAAAAAAAA= --shared-files       94MiB |
+-----------------------------------------------------------------------------+

However, when I try to run this code:

print(torch.cuda.current_device())

I get the following error:

Traceback (most recent call last):
              File "<stdin>", line 1, in <module>
              File "/home/kumar/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/cuda/__init__.py", line 430, in current_device
                _lazy_init()
              File "/home/kumar/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/cuda/__init__.py", line 170, in _lazy_init
                torch._C._cuda_init()
            RuntimeError: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero.

Источник

Read More:

Читайте также: