Cuda error invalid device function

Содержание

RuntimeError: CUDA error: invalid device function #234
Comments
CUDA error: invalid device function #346
Comments
RuntimeError: CUDA error: invalid device function ROIAlign_forward_cuda #62
Comments
To Reproduce
Environment
Problem
Solve

RuntimeError: CUDA error: invalid device function #234

After configuring CUDA (driver 440.82, cuda 10.2, ubuntu18.04) and pytorch (pytorch 1.5 + 3d), I test pytorch3d with this deform_source_mesh_to_target_mesh tutorials.

First, I get a warning in load this dolphin.obj (provided mesh). There exist the same warning for other testing meshes. This issue may be related to #165 JMingKuo’s anwser

pytorch3d/io/obj_io.py:70: UserWarning: Faces have invalid indices
warnings.warn(«Faces have invalid indices»)

The important issue is the cuda error in

RuntimeError Traceback (most recent call last)
in
—-> 7 plot_pointcloud(src_mesh, «Source mesh»)
8 plot_pointcloud(trg_mesh, «Target mesh»)
in plot_pointcloud(mesh, title)
1 def plot_pointcloud(mesh, title=»»):
2 # Sample points uniformly from the surface of the mesh.
—-> 3 points = sample_points_from_meshes(mesh, 5000)
4 x, y, z = points.clone().detach().cpu().squeeze().unbind(1)
5 fig = plt.figure(figsize=(5, 5))
/. /pytorch3d/ops/sample_points_from_meshes.py in sample_points_from_meshes(meshes, num_samples, return_normals)
54 # Only compute samples for non empty meshes
55 with torch.no_grad():
—> 56 areas, _ = mesh_face_areas_normals(verts, faces) # Face areas can be zero.
57 max_faces = meshes.num_faces_per_mesh().max().item()
58 areas_padded = packed_to_padded(
/. /pytorch3d/ops/mesh_face_areas_normals.py in forward(ctx, verts, faces)
44 print(torch.isnan(faces).any())
45 ctx.save_for_backward(verts, faces)
—> 46 areas, normals = _C.face_areas_normals_forward(verts, faces)
47 return areas, normals

RuntimeError: CUDA error: invalid device function

I have read some related issues, most of these errors are due to a NaN input/mesh. However, in my case, I just use the provided dolphin.obj.

The text was updated successfully, but these errors were encountered:

I think this is unrelated to the values in the inputs. This is hardware specific: I think there is a mismatch between the compute capability of your GPU device and the compute capabilities for which pytorch3d has been built.

Did you build pytorch3d yourself or are you using a conda package? What GPU are you using?

@bottler Thanks for your reply. My environments are Ubuntu18.04, GPU Driver 440.82, cuda 10.2, and pytorch 1.5 with several Titan RTXs. The Installation of pytorch3d follows this official instuctions.

conda create -n pytorch3d python=3.8
conda activate pytorch3d
conda install -c pytorch pytorch torchvision cudatoolkit=10.2
conda install -c conda-forge -c fvcore fvcore
.
git clone https://github.com/facebookresearch/pytorch3d.git
cd pytorch3d && pip install -e .

I build with source code. All things are right?

I think you have installed correctly, so I don’t know exactly what’s wrong. Maybe a different set of cuda tools is being found. I think you have compute capability 7.5 so you could try the build with

You could also try the prebuilt conda packages; it would be interesting to know if they work.

@bottler Sadly, I try hard to fix this error.
rm -rf build/ **/*.so
pip uninstall pytorch3d # I find the preview line can’t clean perfectly
NVCC_FLAGS=»-gencode=arch=compute_75,code=sm_75″ pip install -e .

The building is quite successful.
Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
Obtaining file:///data/pytorch3d
Requirement already satisfied: torchvision>=0.4 in /data/Miniconda3/envs/pytorch3d/lib/python3.8/site-packages (
from pytorch3d==0.2.0) (0.6.0a0+82fd1c8)Requirement already satisfied: fvcore in /data/Miniconda3/envs/pytorch3d/lib/python3.8/site-packages (from pytor
ch3d==0.2.0) (0.1.1.post20200616)Requirement already satisfied: numpy in /data/Miniconda3/envs/pytorch3d/lib/python3.8/site-packages (from torchv
ision>=0.4->pytorch3d==0.2.0) (1.18.1)Requirement already satisfied: torch in /data/Miniconda3/envs/pytorch3d/lib/python3.8/site-packages (from torchv
ision>=0.4->pytorch3d==0.2.0) (1.5.0)Requirement already satisfied: pillow>=4.1.1 in /data/Miniconda3/envs/pytorch3d/lib/python3.8/site-packages (fro
m torchvision>=0.4->pytorch3d==0.2.0) (7.1.2)Requirement already satisfied: tqdm in /data/Miniconda3/envs/pytorch3d/lib/python3.8/site-packages (from fvcore-
pytorch3d==0.2.0) (4.46.1)Requirement already satisfied: termcolor>=1.1 in /data/Miniconda3/envs/pytorch3d/lib/python3.8/site-packages (fr
om fvcore->pytorch3d==0.2.0) (1.1.0)Requirement already satisfied: pyyaml>=5.1 in /data/Miniconda3/envs/pytorch3d/lib/python3.8/site-packages (from
fvcore->pytorch3d==0.2.0) (5.3.1)Requirement already satisfied: portalocker in /data/Miniconda3/envs/pytorch3d/lib/python3.8/site-packages (from
fvcore->pytorch3d==0.2.0) (1.7.0)Requirement already satisfied: tabulate in /data/Miniconda3/envs/pytorch3d/lib/python3.8/site-packages (from fvc
ore->pytorch3d==0.2.0) (0.8.7)Requirement already satisfied: yacs>=0.1.6 in /data/Miniconda3/envs/pytorch3d/lib/python3.8/site-packages (from
fvcore->pytorch3d==0.2.0) (0.1.7)Requirement already satisfied: future in /data/Miniconda3/envs/pytorch3d/lib/python3.8/site-packages (from torch
->torchvision>=0.4->pytorch3d==0.2.0) (0.18.2)Installing collected packages: pytorch3d
Attempting uninstall: pytorch3d
Found existing installation: pytorch3d 0.2.0
Uninstalling pytorch3d-0.2.0:
Successfully uninstalled pytorch3d-0.2.0
Running setup.py develop for pytorch3d
Successfully installed pytorch3d

The import part of the tutorial is fine,
However, the error is still existing. Same warning and cuda error for mesh operation. T_T
I also changed the specific cuda device from 0 to 1, still get same results

I can not build with conda, like:
conda install pytorch3d -c pytorch3d

Collecting package metadata (current_repodata.json): failed
UnavailableInvalidChannel: The channel is not accessible or is invalid.
channel name: pytorch3d
channel url: https://mirrors.tuna.tsinghua.edu.cn/anaconda/pytorch3d
error code: 404
You will need to adjust your conda configuration to proceed.
Use conda config —show channels to view your configuration’s current state,
and use conda config —show-sources to view config file locations.

It seems to be a network connection error. Do you think this conda install method can a solution?

Источник

CUDA error: invalid device function #346

My host machine is installed with driver 375.39, Cuda 8.0
My source code is compiled with cuda 7.5.
Can I run the compiled binary with nvidia-docker?

Currently I am getting error:

From this post, feels like I need to compile my code again with CUDA 8.0?
But from my understanding, container just need to load the correct CUDA so files (e.g., libcudart.so) and nvidia userspace driver (libcuda.so) (and nvidia-docker automatically does so) I should be fine. Is there a requirement that program need to be compiled with correct cuda version as well?

The text was updated successfully, but these errors were encountered:

Do you have a Pascal GPU?
IIRC in CUDA 7.5, some libraries from the CUDA toolkit didn’t have PTX code, so they can’t run on newer GPUs.

@flx42 Thanks for replying! It’s a titanx, yes I think it’s a Pascal GPU.
Could you please let me know if my following understanding is correct?

nvidia-docker will load the appropriate libcuda.so according to host driver version.
Since my docker image is pacakged with CUDA 7.5, it will have correct cuda 7.5 so files.
But the problem here is that CUDA 7.5 don’t have PTX code, it can not be run on new GPUs.

@flx42 Another question, greatly appreciate if you could answer! Does nvidia-docker auto mount CUDA so files as well (e.g., libcudart.so). If so, how does it know which cuda verison to mount?

Titan X could be Maxwell or Pascal, what’s the output of nvidia-smi ?
If you are indeed trying to use Caffe, then yes you should switch to CUDA 8.0, it will be faster and simpler.

Does nvidia-docker auto mount CUDA so files as well (e.g., libcudart.so). If so, how does it know which cuda verison to mount?

There is some confusion here, libcuda.so comes from the driver and it is mounted at start time by nvidia-docker. No CUDA file is mounted, you don’t even need the CUDA toolkit to be installed on the host, only the driver.
libcudart.so comes from the toolkit and is included in the CUDA image.

@flx42 Thank you for the detailed explanation as well as how to check if titanx is pascal! I just checked it’s pascal:

Ok, so in some cases an application compiled with CUDA 7.5 might work, it depends if you compiled the application itself with PTX, and I think it also depends on which libraries from the CUDA toolkit you are using.

I see, thank you! A lot of great information!
I am closing this issue now.

Источник

RuntimeError: CUDA error: invalid device function ROIAlign_forward_cuda #62

Attempting to forward inference the panoptic fpn model results in a CUDA error.

To Reproduce

Attempting to run a predictor using the model panoptic_fpn_R_101_dconv_cascade_gn_3x.yaml .

The following error is produced:

Environment

The text was updated successfully, but these errors were encountered:

It seems like you did not build detectron2 correctly. You may have wrong values in the TORCH_CUDA_ARCH_LIST environment variable when you build it. Could you check this environment variable at the time you build it?

I deleted the build folder and the detectron2/_C.cpython-36m-x86_64-linux-gnu.so file and rebuilt running the command in the root repo directory

I’m running on a 1080ti, which should be covered under «6.1». This results in the same errors as above.

Is there a way either of you can let others reproduce this issue in docker or colab?

I’m actually just trying to get object detection on LVIS running, and I’m able to successfully run the model when I switch the maskrcnn backbone out for a retinanet (which doesn’t use ROIAlign). I unfortunately don’t have time rn to try to set up docker or colab to replicate.

I was able to reproduce the same error when I use the wrong version of cuda.

What I did:
I install pytorch from conda install pytorch torchvision cudatoolkit=10.1 -c pytorch , however my local cuda runtime and nvcc are in 10.0.
In this case, I can observe the same error.
Please check whether your cuda version is correct.

The updated collect_env in e85114c can now show the type of error I met.

as followup from #78 . I installed new env with CUDA 9.2 and this solved my issue. Could the problem be since as stated at https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md all models are trained with CUDA 9.2 ?

No it’s unrelated to model zoo.
It’s likely because cuda 9.2 is just what your computer is using.

I ran into this error as well. Re-installed Pytorch corresponding to a lower CUDA version (that matches my system CUDA). I was able to resolve the issue.

It seems that mismatched NVCC vs CUDA Runtime version is the root cause. Closing but feel free to reopen if this does not solve your issue.

@ppwwyyxx what should TORCH_CUDA_ARCH_LIST ideally be set to if one is using cuda/10.0 or cuda/10.1 with pytorch 1.3? nvcc —version shows me cuda 10.0 as well, I’m not sure what you mean by ^^ mismatch between nvcc and cuda runtimes since they’re always the same for me.
The build happens successfully but I get this error upon running demo.py:

Posting the error here because it seems related, can make a new issue if you recommend.
Thanks!

what should TORCH_CUDA_ARCH_LIST ideally be set

The best option is to unset it (i.e., no such env variable).

If you cannot solve the issue with existing information, please open a new one following the template.

I was able to reproduce the same error when I use the wrong version of cuda.

Hello, I want to use detectron2. but when I prepared the conda environment, something went wrong. First I installed pytorch from conda install pytorch torchvision cudatoolkit=10.1 -c pytorch , but as you mentioned, I got error and found that my local cuda runtime and nvcc are in 10.0 (I build my conda enironment in LXD container, and I have no right to change local cuda runtime and nvcc version.). So I used conda install -c pytorch pytorch=1.3.0 cudatoolkit=10.0 to install pytorch for cuda 10.0. However, I got the error issue 459, I can only choose cuda 9.0 or cuda 10.0, and I see detectron can only run with cuda 9.2 and cuda 10.1. Could you please tell me how can I solve this?

Detectron2 can run with cuda 10.0.

#459 is caused by incorrect installation of torchvision as explained there.

Detectron2 can run with cuda 10.0.

#459 is caused by incorrect installation of torchvision as explained there.

Thanks for your reply,I delete the build file in detedtron2 and rebuild it, it works well for me now

Hi, I am just trying to run detectron2 for panoptic segmentation with PyTorch 1.4.0 and CUDA 10.2, I encountered same cuda error for ROIAlign_forward_cuda . I tried to install detectron2 using 1) local source code, and 2) pip install. I also double checked that python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/index.html and it seems CUDA10.2 is also compatible with detectron2. What kind of further step can I take?

Most likely the solution to your problem is already in https://detectron2.readthedocs.io/tutorials/install.html#common-installation-issues.
If you need help to solve an unexpected issue you observed, please include details following the issue template.

Great thanks! I checked that CUDA version for detectron2 and torch are mis-matched. I just re-install detectron2 with CUDA 10.1 and match pytorch as well. Now it works! Thanks again

I was able to reproduce the same error when I use the wrong version of cuda.

So, how did you solve this?

It seems that mismatched NVCC vs CUDA Runtime version is the root cause. Closing but feel free to reopen if this does not solve your issue.

Yes, that’s the key to solve my problem

Problem

first, briefly introduce my problem: I’m new to Detectron2 and only one GPU(GeForce GTX 1080Ti). I choose to build Detectron2 from Source:

everything is fine and detectron2 is installed successfully

but when I try to train

Solve

I check the cuda version

before this I install cudatoolkit=10.2，but now i choose the earlier version

after rebuilt Detectron2，the problem solved！！！

Источник

My host machine is installed with driver 375.39, Cuda 8.0
My source code is compiled with cuda 7.5.
Can I run the compiled binary with nvidia-docker?

Currently I am getting error:

CUDA error: invalid device function

Do you have a Pascal GPU?
IIRC in CUDA 7.5, some libraries from the CUDA toolkit didn’t have PTX code, so they can’t run on newer GPUs.

@flx42 Thanks for replying! It’s a titanx, yes I think it’s a Pascal GPU.
Could you please let me know if my following understanding is correct?

@flx42 Another question, greatly appreciate if you could answer! Does nvidia-docker auto mount CUDA so files as well (e.g., libcudart.so). If so, how does it know which cuda verison to mount?

Titan X could be Maxwell or Pascal, what’s the output of nvidia-smi?
If you are indeed trying to use Caffe, then yes you should switch to CUDA 8.0, it will be faster and simpler.

Does nvidia-docker auto mount CUDA so files as well (e.g., libcudart.so). If so, how does it know which cuda verison to mount?

@flx42 Thank you for the detailed explanation as well as how to check if titanx is pascal! I just checked it’s pascal:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39                 Driver Version: 375.39                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN X (Pascal)    Off  | 0000:01:00.0      On |                  N/A |
| 23%   29C    P8    10W / 250W |    103MiB / 12188MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1580    G   /usr/lib/xorg/Xorg                             101MiB |
+-----------------------------------------------------------------------------+

I see, thank you! A lot of great information!
I am closing this issue now.

@helinwang @flx42
Thanks, I learned much from your discussion. Now i encounted this same problem.

Can i update the cuda of docker’s containter from 7.5 to 8.0 in the environment of nvidia-docker, not pulling the new docker’s image.

When i install the new cuda 8.0 in docker container, the error message showd :
«sudo: /etc/sudoers is world writable»

Greatly appreciate if you could answer!

@xuyifeng-nwpu this doesn’t seem to be an error related with CUDA, I can’t help you here.

NVIDIA

locked and limited conversation to collaborators

Jul 5, 2017

Источник

I have a GPU card GeForce GTX 295 and visual studio 2012 and cuda with version 6.5. I run a simple code like

#include "stdafx.h" 
#include <stdio.h> 
#include <cuda.h> 
// Kernel that executes on the CUDA device
 __global__ void square_array(float *a, int N)
 { 
  int idx = blockIdx.x * blockDim.x + threadIdx.x; 
  if (idx<N) a[idx] = a[idx] * a[idx]; } 
 // main routine that executes on the host
 int main(void)
 {   float *a_h, *a_d;  // Pointer to host & device arrays   
const int N = 10;  // Number of elements in arrays   
size_t size = N * sizeof(float);  
 a_h = (float *)malloc(size);        // Allocate array on host   
cudaMalloc((void **) &a_d, size);   // Allocate array on device   // Initialize host array and copy it to CUDA device  
 for (int i=0; i<N; i++) a_h[i] = (float)i;   
cudaMemcpy(a_d, a_h, size, cudaMemcpyHostToDevice);   // Do calculation on device:   
int block_size = 4;  
 int n_blocks = N/block_size + (N%block_size == 0 ? 0:1);   
square_array <<< n_blocks, block_size >>> (a_d, N);  

// Retrieve result from device and store it in host array   
cudaMemcpy(a_h, a_d, sizeof(float)*N, cudaMemcpyDeviceToHost); 
  // Print results  
 for (int i=0; i<N; i++) 
printf("%d %fn", i, a_h[i]);  
 // Cleanup  
 free(a_h); 
cudaFree(a_d); }

In this code ,when I use command cudaGetLastError (void) after calling the kernel, at console window an error display «Invalid device function» .How can I get rid of it?
Sample codes of cuda kit 6.5 are being run successfully with visual studio 2012.enter code here

Источник

1. Compilation failure due to incorrect `CUDA_HOME`¶

In some cases where your default CUDA directory is linked to an old CUDA version (MinkowskiEngine requires CUDA >= 10.0), you might face some compilation issues that give you segmentation fault errors during compilation.

NVCC ...
Segmentation fault

To confirm, you should check your paths.

$ echo $CUDA_HOME
/usr/local/cuda

$ ls -al $CUDA_HOME
..... /usr/local/cuda -> /usr/local/cuda-10.2

$ ls /usr/local/
bin cuda cuda-10.2 cuda-11.0 ...

In this case, make sure you set the environment variable CUDA_HOME to the right path and install the MinkowskiEngine.

export CUDA_HOME=/usr/local/cuda-10.2; python setup.py install

2. Compilation failure due to incorrect `CUDA_HOME`¶

Some applications modify the environment variable CUDA_HOME on your .bashrc see #12.
This makes the pytorch CPPExtension module to fail leading to problems like src/common.hpp:40:10: fatal error: cublas_v2.h: No such file or directory.

If you encounter this issue, try to set your CUDA_HOME explicitly.

export CUDA_HOME=/usr/local/cuda; python setup.py install

Or you can use the path to nvcc to automatically set the cuda home.

export CUDA_HOME=$(dirname $(dirname $(which nvcc))); python setup.py install

Compilation failure due to Out Of Memory (OOM)¶

The setup.py calls the number of CPUs for multi-threaded parallel compilation. However, when installing the MinkowskiEngine on a cluster, sometimes the compilation might fail due to excessive memory usage. Please provide enough memory to the job for fast compilation. Another option when you have a limited memory is to compile without parallel compilation.

cd /path/to/MinkowskiEngine
make  # single threaded compilation
python setup.py install

Compilation issues after an upgrade¶

In a rare case, you might face an compilation issue after you upgrade MinkowskiEngine, pytorch or CUDA. In general, when you get an undefined symbol error such (e.g., _ZNK13CoordsManagerILh5EiE8toStringB5cxx11Ev), or thrust::system::system_error, try to compile the entire library again using one of the following methods.

Force compiling all object files¶

cd /path/to/MinkowskiEngine
make clean
python setup.py install --force

From a new conda virtual environment¶

If above method doesn’t work, try to create a new conda environment. We found that it sometimes solves the compilation issues.

conda create -n py3-mink-2 python=3.7 anaconda
conda activate py3-mink-2
conda install openblas numpy
conda install pytorch torchvision -c pytorch

Then,

cd /path/to/MinkowskiEngine
conda activate py3-mink-2
make clean
python setup.py install --force

CUDA Version mismatch: `undefined symbol` and `invalid device function`.¶

In some cases when the conda pytorch uses a different CUDA version, you might get an undefined symbol error or CUDA error: invalid device function.
Try to reinstall pytorch with the correct CUDA version that you are using to compile MinkowskiEngine.

To find out your CUDA version, run nvcc --version.

To install the correct CUDA libraries for anaconda pytorch, install cudatoolkit=x.x along with pytorch. For example,

conda install pytorch torchvision cudatoolkit=10.1 -c pytorch

In this example, we assumed that you are using CUDA 10.1, but please make sure that you are installing the correct version. Then, use the following code snippet to create a new conda environment, and install MinkowskiEngine.

conda create -n py3-mink-2 python=3.7 anaconda
conda activate py3-mink-2
conda install openblas numpy
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch  # Make sure to use the correct cudatoolkit version

cd /path/to/MinkowskiEngine
conda activate py3-mink-2
make clean
python setup.py install --force

GPU Out-Of-Memory during training¶

Unlike neural networks with dense tensors where the input batches always require the same bytes, the sparse tensors have different number of non-zero elements or length for different batches, which results in new memory allocation if the current batch is larger than the allocated memory. Such repeated memory allocation will result in Out-Of-Memory error and thus one must clear the GPU cache at a regular interval.

def training(...):
    ...
    sinput = ME.SparseTensor(...)
    loss = criterion(...)
    loss.backward()
    optimizer.step()

    ...

    torch.cuda.empty_cache()

Источник

It appears that a large number of unit tests are failing on CUDA 10.1.

$ ctest --rerun-failed
Total Test time (real) =  12.26 sec

The following tests FAILED:
	 66 - UnitTestCudaArrayHandle (Failed)
	 67 - UnitTestCudaArrayHandleFancy (Failed)
	 68 - UnitTestCudaArrayHandleVirtualCoordinates (Failed)
	 69 - UnitTestCudaBitField (Failed)
	 70 - UnitTestCudaCellLocatorRectilinearGrid (Child aborted)
	 71 - UnitTestCudaCellLocatorUniformBins (Child aborted)
	 72 - UnitTestCudaCellLocatorUniformGrid (Failed)
	 73 - UnitTestCudaComputeRange (Failed)
	 74 - UnitTestCudaColorTable (Failed)
	 75 - UnitTestCudaDataSetExplicit (Failed)
	 76 - UnitTestCudaDataSetSingleType (Child aborted)
	 77 - UnitTestCudaDeviceAdapter (Failed)
	 78 - UnitTestCudaGeometry (Failed)
	 79 - UnitTestCudaImplicitFunction (Failed)
	 80 - UnitTestCudaMath (Failed)
	 82 - UnitTestCudaPointLocatorUniformGrid (Child aborted)
	 83 - UnitTestCudaVirtualObjectHandle (Failed)
	109 - UnitTestDataSetBuilderExplicit (Failed)
	110 - UnitTestDataSetBuilderRectilinear (Failed)
	118 - UnitTestFieldRangeCompute (Failed)
	122 - UnitTestMultiBlock (Failed)
	131 - UnitTestFieldRangeGlobalCompute (Failed)
	133 - UnitTestSerializationDataSet (Failed)
Errors while running CTest

More info on a single failing unit test (it appears that the cause is the same for all of them):

 ./bin/UnitTests_vtkm_cont_cuda_testing UnitTestCudaArrayHandle
*** vtkm::UInt8 ***************
Try operations on empty arrays.
*** vtkm::Int64 ***************
Try operations on empty arrays.
*** vtkm::Float32 ***************
Try operations on empty arrays.
*** vtkm::Vec< vtkm::Float64, 3 > ***************
Try operations on empty arrays.
*** vtkm::UInt8 ***************
Check array with user provided memory.
Check out execution array behavior.
***** Uncaught VTKm exception thrown.
CUDA Error: invalid device function
Unchecked asynchronous error @ /home/4nt/vtk-m/vtkm/cont/cuda/internal/CudaAllocator.cu:110

And under cuda-memcheck:

 /usr/local/cuda-10.1/bin/cuda-memcheck ./bin/UnitTests_vtkm_cont_cuda_testing UnitTestCudaArrayHandle
========= CUDA-MEMCHECK
*** vtkm::UInt8 ***************
Try operations on empty arrays.
*** vtkm::Int64 ***************
Try operations on empty arrays.
*** vtkm::Float32 ***************
Try operations on empty arrays.
*** vtkm::Vec< vtkm::Float64, 3 > ***************
Try operations on empty arrays.
*** vtkm::UInt8 ***************
Check array with user provided memory.
Check out execution array behavior.
========= Program hit cudaErrorInvalidValue (error 1) due to "invalid argument" on CUDA API call to cudaPointerGetAttributes. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x38c7d3]
=========     Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0xf047f9]
...
=========     Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0x651bc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0x64d9a]
=========
========= Program hit cudaErrorInvalidValue (error 1) due to "invalid argument" on CUDA API call to cudaGetLastError. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x38c7d3]
=========     Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0xf0e3d3]
...
=========     Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0x651bc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0x64d9a]
=========
========= Program hit cudaErrorInvalidDeviceFunction (error 98) due to "invalid device function" on CUDA API call to cudaLaunchKernel. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x38c7d3]
=========     Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0xf13615]
...
=========     Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0x651bc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0x64d9a]
=========
========= Program hit cudaErrorInvalidDeviceFunction (error 98) due to "invalid device function" on CUDA API call to cudaGetLastError. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x38c7d3]
=========     Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0xf0e3d3]
...
=========     Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0x71ca0]
***** Uncaught VTKm exception thrown.
=========     Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0x71515]
CUDA Error: invalid device function
=========     Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0x710fc]
Unchecked asynchronous error @ /home/4nt/vtk-m/vtkm/cont/cuda/internal/CudaAllocator.cu:110
=========     Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0x70c19]
...
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:./bin/UnitTests_vtkm_cont_cuda_testing [0x64d9a]
=========
========= ERROR SUMMARY: 4 errors

System info:

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.2 LTS
Release:	18.04
Codename:	bionic
$ nvidia-smi
Mon Jul 22 10:23:55 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.34       Driver Version: 430.34       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2070    Off  | 00000000:65:00.0  On |                  N/A |
|  0%   45C    P0    30W / 185W |   1213MiB /  7979MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1285      G   /usr/bin/gnome-shell                         176MiB |
|    0      1998      G   /usr/lib/xorg/Xorg                           482MiB |
|    0      2129      G   /usr/bin/gnome-shell                         386MiB |
|    0      6945      G   ...-token=BAB748F2325B6E879753DBB4E9D9726C   134MiB |
+-----------------------------------------------------------------------------+

Commit that I built to reproduce: 468ee61c

Edited Jul 22, 2019 by

Источник

Обнаружена проблема такого рода: вполне вероятно, что архитектура операции компиляции cuda не соответствует вашей текущей видеокарте. Хотя он может быть скомпилирован и передан, во время выполнения будут ошибки.

Решение:

'--gpu-architecture=compute_61', # change 	compute_70 -> compute_61
        '--gpu-code=sm_61',      #change 		sm_70 -> sm_61

Интеллектуальная рекомендация

указатель-события: нет; решить проблему сбоя клика

На работе сделал выпадающий список. Фон стрелки вниз добавляется к form-select :: after, но при нажатии стрелки событие раскрывающегося списка не может быть запущено. Так что добавьтеpointer-events: n…

Как идея соединяет MySQL?

1. Открытая идея 2. Справа есть база данных, щелкните 3. Нажмите » +» 4. Продолжайте нажимать 5. Выберите MySQL 6. Введите, где находится база данных, имя пользователя, пароль, тестовое соед…

CSRF и SSRF

Введение в уязвимости CSRF CSRF (подделка межсайтовых запросов, подделка межсайтовых запросов) относится к использованию недействительной идентификационной информации жертвы (файлы cookie, сеансы и т….

Разработка управления приложениями

Получить всю информацию о приложении PackageManager Android управляет пакетами приложений через PackageManager, и мы можем использовать его для получения информации о приложениях на текущем устройстве…

Анализ исходного кода пула потоков -jdk1.8

openjdk адрес загрузки http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/tags Логические шаги пула потоков, с которыми поставляется Java, — это, в основном, следующие шаги: Реализация псевдокода Отправить ис…

Используйте инструменты в макете XML:

В макете, если некоторые фиксированные значения атрибута не установлены, некоторое представление не будет видно, когда будет видна макет. Все, что мы можем увидеть эффект предварительного просмотра, к…

Войдите в JVM

1. Введение в JVM 1.1 Концепция JVM Введение в виртуальную машину: JVM (аббревиатура от Java Virtual Machine. Java Virtual Machine.), JVM — это настраиваемый компьютер, которого на самом деле не сущес…

пользователи Linux и группы пользователей

Пользователь категория Профиль пользователь Root (Root пользователя) Команда Советы Упорядочить #, имеет самую высокую задачу разрешения любого разрешения файла недействительно для корневого пользоват…

Котлин Базовый — класс и атрибуты

Давайте напишем простой JavaBean класса Student в Java, только с одним свойством, имя. Тот же класс в Котлин это: PUBLIC в Котлин является видимость по умолчанию, поэтому его можно опустить. Этот вид …

Статьи по теме

RuntimeError: CUDA error: invalid device ordinal
RuntimeError: CUDA error: device-side assert triggered
RuntimeError: CUDA error: device-side assert triggered
Check failed: error == cudaSuccess (10 vs. 0) invalid device ordinal
Check failed: error == cudaSuccess (10 vs. 0) invalid device ordinal
RuntimeError: cuda runtime error (59) : device-side assert triggered
CUDA ERROR: device-side assert triggered at xxx
RuntimeError: cuda runtime error (38) : no CUDA-capable device is detected at /pytorch/aten/src/THC/
CUDA VISIBLE DEVICE
Check failed: error == cudaSuccess (38 vs. 0) no CUDA-capable device is detected

Связанные теги

Сообщить об ошибке
cuda
gpu
error
RunTimeError
RuntimeError: cuda runtime err
Pytorch
Обнаружение цели
debug
yolo_JDE

Источник

RuntimeError: CUDA error: invalid device function #234

CUDA error: invalid device function #346

RuntimeError: CUDA error: invalid device function ROIAlign_forward_cuda #62

To Reproduce

Environment

Problem

Solve

1. Compilation failure due to incorrect `CUDA_HOME`¶

2. Compilation failure due to incorrect `CUDA_HOME`¶

Compilation failure due to Out Of Memory (OOM)¶

Compilation issues after an upgrade¶

Force compiling all object files¶

From a new conda virtual environment¶

CUDA Version mismatch: `undefined symbol` and `invalid device function`.¶

GPU Out-Of-Memory during training¶

Интеллектуальная рекомендация

указатель-события: нет; решить проблему сбоя клика

Как идея соединяет MySQL?

CSRF и SSRF

Разработка управления приложениями

Вам также может понравиться

Анализ исходного кода пула потоков -jdk1.8

Используйте инструменты в макете XML:

Войдите в JVM

пользователи Linux и группы пользователей

Котлин Базовый — класс и атрибуты

Статьи по теме

популярные статьи

рекомендованная статья

Связанные теги

Читайте также:

RuntimeError: CUDA error: invalid device function #234

CUDA error: invalid device function #346

RuntimeError: CUDA error: invalid device function ROIAlign_forward_cuda #62

To Reproduce

Environment

Problem

Solve

1. Compilation failure due to incorrect CUDA_HOME¶

2. Compilation failure due to incorrect CUDA_HOME¶

Compilation failure due to Out Of Memory (OOM)¶

Compilation issues after an upgrade¶

Force compiling all object files¶

From a new conda virtual environment¶

CUDA Version mismatch: undefined symbol and invalid device function.¶

GPU Out-Of-Memory during training¶

Интеллектуальная рекомендация

Вам также может понравиться

Статьи по теме

популярные статьи

рекомендованная статья

Связанные теги

Читайте также:

1. Compilation failure due to incorrect `CUDA_HOME`¶

2. Compilation failure due to incorrect `CUDA_HOME`¶

CUDA Version mismatch: `undefined symbol` and `invalid device function`.¶