Error an nvidia kernel module nvidia drm appears to already be loaded in your kernel - Исправление ошибок и поиск оптимальных решений проблем

Problem

I have had the same problem twice when I installing the Nvidia driver. The first time only me in server so the environment is purely, but the next time I have more partner in server, environment is much more complicated than the first time.

I spend more time than the first time, so I think I need to note how to solve it prevent I have this problem again.

Today, I use the following command to check my GPU state:

nvidia-smi

But the return information tell me the GPU driver is disappear (Occasionally this happens), I want to install it again.

So I use the command to do it:

sudo sh NVIDIA-Linux-x86_64-440.64.run

But I got a bad news:

An NVIDIA kernel module 'nvidia-drm' appears to already be loaded in your kernel...

It seems I want to install a CUDA driver when I use GPU to startup the X-Window. If I want to solved it, I need to stop my GPU.

Solution 1: Remove Nvidia driver

We can use the following instructions to remove the Nvidia driver:

sudo apt-get purge nvidia*
sudo apt-get autoremove
sudo reboot

After rebooting, we can install the driver again.

Solution 2: Stop all processes use GPU

First, we need Super User authority.

sudo -i

And close all the processes use GPU, and disable nvidia driver:

systemctl isolate multi-user.target
modprobe -r nvidia-drm

And then we can reinstall the GPU driver.

參考資料

https://askubuntu.com/questions/830916/how-to-install-cuda-8-0-on-ubuntu-16-04-with-nvidia-geforce-gtx-1080
https://unix.stackexchange.com/questions/440840/how-to-unload-kernel-module-nvidia-drm

Источник

Я пытаюсь установить самый последний драйвер NVIDIA в Debian Stretch. Я скачал NVIDIA-Linux-x86_64-390.48.runс здесь , но когда я пытаюсь сделать

sudo sh ./NVIDIA-Linux-x86_64-390.48.run

как предложено, появляется сообщение об ошибке.

ERROR: An NVIDIA kernel module 'nvidia-drm' appears to already be loaded in your kernel.  This may be because it is in use (for example, by an X server, a CUDA program, or 
         the NVIDIA Persistence Daemon), but this may also happen if your kernel was configured without support for module unloading.  Please be sure to exit any programs    
         that may be using the GPU(s) before attempting to upgrade your driver.  If no GPU-based programs are running, you know that your kernel supports module unloading,   
         and you still receive this message, then an error may have occured that has corrupted an NVIDIA kernel module's usage count, for which the simplest remedy is to     
         reboot your computer.

Когда я пытаюсь выяснить, кто использует nvidia-drm(или nvidia_drm), я ничего не вижу.

~$ sudo lsof | grep nvidia-drm
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs
      Output information may be incomplete.
~$ sudo lsof -e /run/user/1000/gvfs | grep nvidia-drm
~$

И когда я пытаюсь удалить его, он говорит, что он используется.

~$ sudo modprobe -r nvidia-drm
modprobe: FATAL: Module nvidia_drm is in use.
~$

Я перезагрузился и начал работать в текстовом режиме (нажав Ctrl + Alt + F2 перед вводом имени пользователя / пароля), но я получил ту же ошибку.

Кроме того, как мне «знать, что мое ядро поддерживает выгрузку модулей»?

Я получаю несколько предупреждений при загрузке, связанных с nvidia, не знаю, связаны ли они, хотя:

Apr 30 00:46:15 debian-9 kernel: nvidia: loading out-of-tree module taints kernel.
Apr 30 00:46:15 debian-9 kernel: nvidia: module license 'NVIDIA' taints kernel.
Apr 30 00:46:15 debian-9 kernel: Disabling lock debugging due to kernel taint
Apr 30 00:46:15 debian-9 kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  375.82  Wed Jul 19 21:16:49 PDT 2017 (using threaded interrupts)

Ответы:

Я полагаю, вы хотите остановить диспетчер дисплея, который, как я подозреваю, будет использовать драйверы Nvidia.

После перехода в текстовую консоль (нажатие Ctrl+ Alt+ F2) и входа в систему как root, используйте следующую команду, чтобы отключить графическую цель, которая поддерживает работу диспетчера отображения:

# systemctl isolate multi-user.target

На данный момент, я ожидаю, что вы сможете выгрузить драйверы Nvidia, используя modprobe -r(или rmmodнапрямую):

# modprobe -r nvidia-drm

После того, как вам удалось заменить / обновить его и вы готовы снова запустить графическое окружение, вы можете использовать эту команду:

# systemctl start graphical.target

lsofперечисляет все файлы, которые используются процессами пространства пользователя. Но nvidia_drmэто модуль ядра, поэтому lsofне обязательно видеть, действительно ли он используется. (Файл модуля не будет открыт, потому что ядро уже полностью загрузило его в ОЗУ. Но модуль может предоставлять услуги пользовательскому пространству или другим компонентам ядра, и это предотвращает выгрузку модуля.)

Запустите lsmod | grep nvidia.drmи посмотрите числа справа от имени nvidia_drmмодуля. Первое число — это просто размер модуля; второй счетчик использования. Чтобы успешно удалить модуль, счетчик использования должен быть сначала 0.

Если сервер X11 работает и использует nvidiaдрайвер, то nvidia_drmмодуль ядра, скорее всего, будет использоваться. Таким образом, вам нужно, по крайней мере, переключиться на текстовую консоль и выключить сервер X11. Обычно это можно сделать, остановив любую используемую службу X Display Manager (зависит от используемой среды рабочего стола).

Как говорится в сообщении об ошибке, если вы работаете nvidia-persistenced, вам также необходимо остановить это, прежде чем вы сможете выгрузить nvidia_drmмодуль.

У меня была аналогичная проблема.

* Причина: пакет nvidia.drm использовался

Я исправил это, очистив все пакеты NVIDIA.

Удалите все предыдущие установки NVIDIA с помощью этих 2 команд:

$ sudo apt-get purge nvidia*

$ sudo apt-get autoremove

Модуль должен быть удален.

Перезагрузитесь и выйдите.

Вы сообщаете в комментариях, что остановка службы systemd-logind возвращает вас к графическому логину. Если у вас есть графический логин, то X работает, поэтому видеодрайвер загружен и используется. Это, вероятно, частично объясняет, почему используется модуль nvidia-drm.

Кроме того, вы предаете очевидное заблуждение, когда говорите

Я перезагрузился и начал работать в текстовом режиме (нажав Ctrl + Alt + F2 перед вводом имени пользователя / пароля), но я получил ту же ошибку.

Нажатие Ctrl + Alt + F2 переключает на виртуальный терминал # 2, который вполне может быть настроен для входа в текстовом режиме, но это далеко от «запуска в текстовом режиме». Если у вас был графический экран входа в виртуальный терминал по умолчанию, то X работает, и переключение на другой VT не изменит этого. Вы просто входите в сеанс без X.

Первое и самое простое, что можно попробовать — это действительно отключить X-сервер. Старый способ сделать это — войти в сеанс в текстовом режиме и выполнить команду

telinit 3

переключиться на уровень выполнения 3. Это должно работать и с systemd, но вместо этого можно использовать собственный системный способ

systemctl isolate multi-user.target

Конечно, оба требуют привилегий, поэтому вам нужно использовать sudoили делать себя root.

Если это не удалит модуль или, по крайней мере, не позволит вам сделать это вручную, тогда вашей следующей лучшей ставкой будет загрузка системы непосредственно на уровень запуска 3 (многопользовательская цель) или, возможно, даже на уровень запуска 1 (спасательная цель). Я обычно делаю это, добавляя «3» (или «1») в конец списка аргументов ядра во время загрузки через загрузчик. Вы также можете изменить цель загрузки по умолчанию, как описано в этой статье .

Также обратите внимание, что драйвер nVidia доступен в готовых пакетах для большинства дистрибутивов Linux. Немногие включают эти пакеты в свои собственные стандартные репозитории, потому что драйвер, в конце концов, является проприетарным, но вы наверняка найдете надежное стороннее репозиторий, в котором он есть. Я настоятельно рекомендую использовать такие пакеты вместо того, чтобы запускать установщик напрямую, но чтобы попасть туда, где вы сейчас находитесь, вам может понадобиться сначала вручную удалить драйвер.

Установка CUDA

1) Загрузите последнюю версию CUDA Toolkit

2) Переключитесь на tty3, нажав Ctl + Alt + F3

3) Перед продолжением выгрузите nvidia-drm.

3a) Изолировать multi-user.target

sudo systemctl isolate multi-user.target

3b) Обратите внимание, что nvidia-drm в настоящее время используется.

lsmod | grep nvidia.drm

3c) Выгрузить nvidia-drm

sudo modprobe -r nvidia-drm

4d) Обратите внимание, что nvidia-drm больше не используется.

lsmod | grep nvidia.drm

5) Перейдите в папку загрузки и запустите установку cuda.

sudo sh cuda_10.1.168_418.67_linux.run

6) Ответьте на любые запросы во время установки.

7) После завершения установки убедитесь, что версия CUDA обновлена.

nvidia-smi

Запустите графический интерфейс снова.

sudo systemctl start graphical.target

Была такая же проблема с Debian Stretch при попытке установить драйверы Nvidia. Когда в текстовом моде единственным решением было удалить драйвер, переустановить gdm и gnome-shell. Я знаю, что это неуклюжее решение, но я помню, что сначала попытался исправить оболочку gnome и удалить только драйвер Nvidia и переустановить GDM. Оказалось, что было намного проще просто переустановить всю оболочку.

Я также столкнулся с той же проблемой. Причиной ошибки было то, что я случайно выбрал «Install nvidia driver» во время установки cuda.

Итак, при установке CUDA, когда вы сталкиваетесь со следующими опциями:

Установить драйвер ускоренной графики NVIDIA для Linux-x86_64 384,81? (У) ы / (п) о / (д) ПИФ:

Пожалуйста, выберите q , проблема будет решена.

у меня получилось поменять систему, чтобы начать больше текста

systemctl set-default runlevel3.target

затем перезапустите и установите драйвер nvidia cuda, как только вы закончите, вы можете изменить систему для запуска в графическом режиме

systemctl set-default runlevel5.target

Остановка systemd-logindисправила это для меня:

sudo systemctl stop systemd-logind

Это предлагается в качестве обходного пути в этом выпуске github на странице nvidia-xrun github:

Хорошие новости, парни, виноват systemd-logind. Текущий обходной путь — выполнить следующую команду после выхода из сеанса «nvidia-xrun». Sudo systemctl stop systemd-logind

Затем вам придется вручную удалить другие модули nvidia и вручную отключить DGPU. Вот фрагмент кода, который запускается после выхода из сеанса «nvidia-xrun».
echo 'Unloading nvidia_drm module' 
execute "sudo rmmod nvidia_drm"

echo 'Unloading nvidia_modeset module' 
execute "sudo rmmod nvidia_modeset"

echo 'Unloading nvidia module' 
execute "sudo rmmod nvidia"

echo 'Turning off nvidia GPU' 
execute "sudo tee /proc/acpi/bbswitch <<<OFF"

echo -n 'Current state of nvidia GPU: ' 
execute "cat /proc/acpi/bbswitch"
Системный вопрос на Github

Ссылка с портала Nvidia для разработчиков Linux

Источник

I’m trying to install the most up-to-date NVIDIA driver in Debian Stretch. I’ve downloaded NVIDIA-Linux-x86_64-390.48.run from here, but when I try to do

sudo sh ./NVIDIA-Linux-x86_64-390.48.run

as suggested, an error message appears.

ERROR: An NVIDIA kernel module 'nvidia-drm' appears to already be loaded in your kernel.  This may be because it is in use (for example, by an X server, a CUDA program, or 
         the NVIDIA Persistence Daemon), but this may also happen if your kernel was configured without support for module unloading.  Please be sure to exit any programs    
         that may be using the GPU(s) before attempting to upgrade your driver.  If no GPU-based programs are running, you know that your kernel supports module unloading,   
         and you still receive this message, then an error may have occured that has corrupted an NVIDIA kernel module's usage count, for which the simplest remedy is to     
         reboot your computer.

When I try to find out who is using nvidia-drm (or nvidia_drm), I see nothing.

~$ sudo lsof | grep nvidia-drm
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs
      Output information may be incomplete.
~$ sudo lsof -e /run/user/1000/gvfs | grep nvidia-drm
~$

And when I try to remove it, it says it’s being used.

~$ sudo modprobe -r nvidia-drm
modprobe: FATAL: Module nvidia_drm is in use.
~$

I have rebooted and started in text-only mode (by pressing Ctrl+Alt+F2 before giving username/password), but I got the same error.

Besides it, how do I «know that my kernel supports module unloading»?

I’m getting a few warnings on boot up related to nvidia, no idea if they’re related, though:

Apr 30 00:46:15 debian-9 kernel: nvidia: loading out-of-tree module taints kernel.
Apr 30 00:46:15 debian-9 kernel: nvidia: module license 'NVIDIA' taints kernel.
Apr 30 00:46:15 debian-9 kernel: Disabling lock debugging due to kernel taint
Apr 30 00:46:15 debian-9 kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  375.82  Wed Jul 19 21:16:49 PDT 2017 (using threaded interrupts)

filbranden

20.3k3 gold badges56 silver badges83 bronze badges

asked Apr 30, 2018 at 4:07

I imagine you want to stop the display manager which is what I’d suspect would be using the Nvidia drivers.

After change to a text console (pressing Ctrl+Alt+F2) and logging in as root, use the following command to disable the graphical target, which is what keeps the display manager running:

# systemctl isolate multi-user.target

At this point, I’d expect you’d be able to unload the Nvidia drivers using modprobe -r (or rmmod directly):

# modprobe -r nvidia-drm

Once you’ve managed to replace/upgrade it and you’re ready to start the graphical environment again, you can use this command:

# systemctl start graphical.target

Rui F Ribeiro

54.8k26 gold badges144 silver badges221 bronze badges

answered May 4, 2018 at 16:14

filbrandenfilbranden

20.3k3 gold badges56 silver badges83 bronze badges

CUDA Installation

1) Download the latest CUDA Toolkit

2) Switch to tty3 by pressing Ctl+Alt+F3

3) Unload nvidia-drm before proceeding.

3a) Isolate multi-user.target

sudo systemctl isolate multi-user.target

3b) Note that nvidia-drm is currently in use.

lsmod | grep nvidia.drm

3c) Unload nvidia-drm

sudo modprobe -r nvidia-drm

4d) Note that nvidia-drm is not in use anymore.

lsmod | grep nvidia.drm

5) Go to your download folder and run the cuda installation.

sudo sh cuda_10.1.168_418.67_linux.run

6) Answer any prompts during installation.

7) When installation has finished, confirm that the CUDA Version has been updated.

nvidia-smi

Start the GUI again.

sudo systemctl start graphical.target

answered Jun 11, 2019 at 8:25

I solved this problem by disabling the GUI, rebooting, logging in and installing the driver, enabling GUI, and reboot.

Please make sure you know your username and password!!!

Open a terminal and write

sudo systemctl set-default multi-user.target
sudo reboot 0

Now login and you’ll get to a terminal directly, install the driver Do note that I am installing here the 440.44 so you need to modify for your driver version.

sudo ./NVIDIA-Linux-x86_64-440.44.run

After installing the driver enable the GUI and Reboot:

sudo systemctl set-default graphical.target
sudo reboot 0

You should be done

In my case, nvidia-smi reported the new version 440.44, whine in the Ubuntu 18.04 Software & Updates Utilities, Additional Drivers Tab shows 435!! Another NVIDIA mystery, but heck my new docker works!!!

answered Jan 5, 2020 at 19:36

Dave BDave B

711 silver badge1 bronze badge

lsof lists any files that are in use by userspace processes. But nvidia_drm is a kernel module, so lsof won’t necessarily see whether or not it is actually in use. (The module file won’t be open because the kernel has already completely loaded it into RAM. But the module might be providing services to the userspace or other kernel components, and that is what prevents the unloading of the module.)

Run lsmod | grep nvidia.drm and see the numbers to the right of the nvidia_drm module name. The first number is simply the size of the module; the second is the use count. In order to successfully remove the module, the use count must be 0 first.

If the X11 server is running and using the nvidia driver, then the nvidia_drm kernel module will most assuredly be in use. So you’ll need, at the very least, switch into text console and shutdown the X11 server. Usually this can be done by stopping whichever X Display Manager service you’re using (depends on which desktop environment you’re using).

As the error message said, if you are running nvidia-persistenced, you’ll need to stop that too before you can unload the nvidia_drm module.

answered Apr 30, 2018 at 6:39

telcoMtelcoM

77.8k3 gold badges101 silver badges197 bronze badges

I had a similar problem.

*Reason: nvidia.drm package was in use

I fixed it by purging all NVIDIA packages.

Remove all previous NVIDIA installations with these 2 commands:

$ sudo apt-get purge nvidia*

$ sudo apt-get autoremove

Module should be removed.

Reboot and go forth.

answered Sep 28, 2018 at 23:42

KellyKelly

2705 silver badges10 bronze badges

You report in comments that stopping the systemd-logind service takes you back to the graphic login. If you have a graphical login then X is running, so the video driver is loaded and in use. This very likely explains in part why the nvidia-drm module is in use.

Additionally, you betray an apparent misconception when you say

I have rebooted and started in text-only mode (by pressing Ctrl+Alt+F2
before giving username/password), but I got the same error.

Pressing Ctrl+Alt+F2 switches to a virtual terminal #2, which may well be configured for text-mode login, but that’s a far cry from «starting in text mode». If you had a graphical login screen on the default virtual terminal then X is running, and switching to a different VT doesn’t change that. You’re just logging in to a non-X session.

The first and easiest thing to try is to actually shut down the X server. The old-school way to do this would be to log in to your text-mode session and execute the command

telinit 3

to switch to runlevel 3. That should work with systemd, too, but the native systemd way would be to instead run

systemctl isolate multi-user.target

Both of those require privilege, of course, so you’ll need to use sudo or make yourself root.

If that doesn’t remove the module, or at least make it possible for you to do so manually, then your next best bet would be to boot the system directly into runlevel 3 (multi-user target), or maybe even into runlevel 1 (rescue target). I usually do this by adding «3» (or «1») to the end of the kernel argument list at boot time via the bootloader. You can also change the default boot target as described in this article.

Do also note that the nVidia driver is available in pre-built packages for most Linux distros. Few include those packages in their own standard repos because the driver is, after all, proprietary, but you can surely find a reputable 3rd-party repo that has it. I strongly recommend using such packages instead of running the installer directly, but to get there from where you are now, you may need to first manually uninstall the driver.

answered May 7, 2018 at 15:03

Stopping systemd-logind fixed it for me:

sudo systemctl stop systemd-logind

This is suggested as a workaround in this github issue on the nvidia-xrun github page:

Good news guys, systemd-logind is the culprit here. The current
workaround is to run the following command after logging out from the
«nvidia-xrun» session sudo systemctl stop systemd-logind

Then, you’ll have manually remove the other nvidia modules and switch
off the DGPU manually. Here’s the code snippet that runs after you log
out from the «nvidia-xrun» session.
echo 'Unloading nvidia_drm module' 
execute "sudo rmmod nvidia_drm"

echo 'Unloading nvidia_modeset module' 
execute "sudo rmmod nvidia_modeset"

echo 'Unloading nvidia module' 
execute "sudo rmmod nvidia"

echo 'Turning off nvidia GPU' 
execute "sudo tee /proc/acpi/bbswitch <<<OFF"

echo -n 'Current state of nvidia GPU: ' 
execute "cat /proc/acpi/bbswitch"
Systemd issue on Github

Reference link from Nvidia Linux Developers portal

terdon♦

226k62 gold badges426 silver badges636 bronze badges

answered Jun 11, 2019 at 13:04

Had the same problem with Debian Stretch when trying to install the Nvidia drivers. When in text mod my only solution was to remove the driver, reinstall gdm and gnome-shell. I know it’s a clumsy solution, but I remember I first tried fixing the gnome-shell and only removing Nvidia driver and reinstalling GDM. Turned out it was much easier to just reinstall the whole shell.

answered May 1, 2018 at 11:55

I also encountered the same problem. The reason for the error was that I accidentally selected «Install nvidia driver» during the installation of cuda.

So, during the installation of CUDA, when you encounter the following options:

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 384.81?
(y)es/(n)o/(q)uit:

Please select q , the problem will be solved.

answered May 7, 2019 at 2:05

JNingJNing

1012 bronze badges

what worked for me was to change system to start in text more

systemctl set-default runlevel3.target

then restart and install nvidia cuda driver
once finished you may want to change system to start in graphics mode again

systemctl set-default runlevel5.target

answered Jun 8, 2019 at 4:09

The accepted answer by filbranden got me in the right direction, but did not quite work for me (it seems that at least the nouveau driver was always loaded and causing problems).

What did work for me, however (as shown here in more detail), was to temporary boot to console mode (text mode). This seemed to make sure that no nvidia or nouveau driver was loaded.

I then followed @filbrandens answer for stopping/removing nvidia-drm and then I installed the nvidia driver from there and rebooted the system. The latter may have been not necessary but since it worked I listed it here.

AdminBee

20.8k17 gold badges47 silver badges69 bronze badges

answered Aug 20, 2020 at 12:57

Additional answer for the one who are facing this problem.

For me, I have to switch the driver to be not Nvidia on software&update, then, reboot and perform installation again.

answered Aug 21, 2021 at 13:12

I was getting this error during boot:

[54.285826] [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000800] Failed to grab modeset ownership

during run of the NVIDIA Builder/Installer/Configurator

It told me to use the software Updater App (additional Drivers). But this did not work because there was not other options to change the Driver

If you are a new install you could get errors with:
gcc, make, pkg-config, & libglvnd

If you are installing a new driver downloaded manually
e.g. before running this with sh…

$ sudo sh {NVIDIA-Linux-x86_64-510.60.02.run}

1st as per other answers…

$ systemctl isolate multi-user.target

ensure gcc, make, pkg-config, & libglvnd-dev are install
$ sudo apt install gcc
$ sudo apt install make
$ sudo apt install gcpkg-configc
$ sudo apt install libglvnd-dev

also

$ sudo apt update && sudo apt upgrade

and
$ sudo apt autoremove (if you like)

Now run the builder:

$ sudo sh NVIDIA-Linux-x86_64-510.60.02.run

should get no errors during the run of the driver builder/installer/configurator

turn graphics back on???

$ systemctl start graphical.target

possibly do a reboot to check on boot everything is now ok.

Pic of my previous state:
NVIDIA 470 needed updating to 510

Final Outcome:
Wish it was cleaner:

answered Apr 23, 2022 at 16:44

Источник

Содержание

Arch Linux
#1 2020-11-16 20:15:12
Nvidia driver problems
unixforum.org
Не устанавливается драйвер Nvidia Debian
Не устанавливается драйвер Nvidia Debian
Re: Не устанавливается драйвер Nvidia Debian
Re: Не устанавливается драйвер Nvidia Debian
Re: Не устанавливается драйвер Nvidia Debian
Re: Не устанавливается драйвер Nvidia Debian
How to solve An NVIDIA kernel module nvidia appears to already be loaded in your kernel
Junyong Lee
Share on
Leave a comment
You may also enjoy
Jump multiple remote hosts using ProxyCommand (SSH Tunneling)
How to permanently add passphrase of private key to ssh-agent (macOS, Ubuntu, and Windows)
How to configure SSH without Passwords
An NVIDIA kernel module ‘nvidia-uvm’ appears to already be loaded in your kernel #62
Comments
Arch Linux
#1 2017-03-15 00:05:40
nvidia_drm module
#2 2017-03-15 08:43:46
Re: nvidia_drm module
#3 2017-03-15 10:03:22
Re: nvidia_drm module
#4 2017-03-15 10:43:45
Re: nvidia_drm module
#5 2017-03-15 18:11:02
Re: nvidia_drm module
#6 2017-03-15 19:29:42
Re: nvidia_drm module
#7 2017-03-15 19:47:15
Re: nvidia_drm module
#8 2017-03-15 20:10:54
Re: nvidia_drm module
#9 2017-03-15 20:49:01
Re: nvidia_drm module
#10 2017-03-15 21:01:12
Re: nvidia_drm module
#11 2017-03-15 23:27:06
Re: nvidia_drm module
#12 2017-03-16 07:57:11
Re: nvidia_drm module

Arch Linux

You are not logged in.

#1 2020-11-16 20:15:12

Nvidia driver problems

my machine has an Nvidia Geforce GTX 1060 6GB graphics card and I want a working driver for it.
Unfortunately the «nvidia» package has a bug that makes the system freeze (see https://forums.developer.nvidia.com/t/4 … nts/155250 and https://bbs.archlinux.org/viewtopic.php?id=260131)

aplattne … 5.38.patch 28
2. Apply it to the .run package with
bash NVIDIA-Linux-x86_64-455.38.run —apply-patch reduce-kmalloc-limit-455.38.patch
3. Install the resulting .run package
bash NVIDIA-Linux-x86_64-455.38-custom.run

The Problem is that I failed to install this driver:

When I run NVIDIA-Linux-x86_64-455.38-custom.run
I get:

ERROR: An NVIDIA kernel module ‘nvidia-drm’ appears to already be loaded in
your kernel. This may be because it is in use (for example, by an X
server, a CUDA program, or the NVIDIA Persistence Daemon), but this
may also happen if your kernel was configured without support for
module unloading. Please be sure to exit any programs that may be
using the GPU(s) before attempting to upgrade your driver. If no
GPU-based programs are running, you know that your kernel supports
module unloading, and you still receive this message, then an error
may have occured that has corrupted an NVIDIA kernel module’s usage
count, for which the simplest remedy is to reboot your computer.

Источник

unixforum.org

Форум для пользователей UNIX-подобных систем

Темы без ответов
Активные темы
Поиск
Статус форума

Не устанавливается драйвер Nvidia Debian

Сообщение rain_99 » 30.07.2016 23:10

Итак товарищи.
Вводная:
Есть дебиан stable

На нем стоит видеокарточка Nvidia

Внезапно оказывается, что через HDMI выход видеокарточка не выдает сигнал.
Переустанавливаю этот же драйвер, а он мне говорит:

Re: Не устанавливается драйвер Nvidia Debian

Сообщение Bizdelnick » 30.07.2016 23:16

в консол и
вку́пе (с чем-либо)
в общем
в ообще

в течени е (часа)
нович ок
нюанс
п о умолчанию

приемл емо
пробле ма
проб овать
тра фик

Re: Не устанавливается драйвер Nvidia Debian

Сообщение rain_99 » 30.07.2016 23:32

Да как обычно ставил.

Re: Не устанавливается драйвер Nvidia Debian

Сообщение Bizdelnick » 30.07.2016 23:43

Это необычно. Обычно — это apt-get install nvidia-driver .

в консол и
вку́пе (с чем-либо)
в общем
в ообще

в течени е (часа)
нович ок
нюанс
п о умолчанию

приемл емо
пробле ма
проб овать
тра фик

Re: Не устанавливается драйвер Nvidia Debian

Сообщение rain_99 » 30.07.2016 23:48

Источник

How to solve An NVIDIA kernel module nvidia appears to already be loaded in your kernel

When I tried to install a new nvidia-driver, the error message An NVIDIA kernel module ‘nvidia’ appears to already be loaded in your kernel appears.

February 10, 2020 less than 1 minute read

Junyong Lee

Ph.D. candidate at POSTECH

When I tried to install a new nvidia-driver, the error message An NVIDIA kernel module ‘nvidia’ appears to already be loaded in your kernel appears.

We can resolve it by first finding the process using nvidia,

and killing all the process using it:

Updated: February 10, 2020

Jump multiple remote hosts using ProxyCommand (SSH Tunneling)

June 12, 2022 2 minute read

This article introduces how to ssh-jump on a remote intermediate server(s) to ssh-connect into a target server with a single command.

How to permanently add passphrase of private key to ssh-agent (macOS, Ubuntu, and Windows)

May 10, 2022 5 minute read

Even if we’ve set a passwordless ssh login environment using private and public keys, we still need to type in the passphrase for the private key when loggin.

How to configure SSH without Passwords

May 9, 2022 4 minute read

This article introduces a secure private/public key-based SSH connection method to log into remote servers from a local machine. Here, we can think of the pu.

Источник

An NVIDIA kernel module ‘nvidia-uvm’ appears to already be loaded in your kernel #62

I’m upgrading my driver and hitting this error during install.

I’ve hit it before and found that nvidia-docker was causing the problem. Can you remind me which command I need to run to be able to move ahead with my driver installation?

The text was updated successfully, but these errors were encountered:

Solved it with some combination of these commands:

service nvidia-docker stop should do it (if UVM is only used by nvidia-docker )

I also encountered the same problem. The reason for the error was that I accidentally selected «Install nvidia driver» during the installation of cuda.

So, during the installation of CUDA, when you encounter the following options:

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 384.81?
(y)es/(n)o/(q)uit:

Please select q , the problem will be solved.

I also encountered the same problem. The reason for the error was that I accidentally selected «Install nvidia driver» during the installation of cuda.

So, during the installation of CUDA, when you encounter the following options:

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 384.81?
(y)es/(n)o/(q)uit:

Please select q , the problem will be solved.

Источник

Arch Linux

You are not logged in.

#1 2017-03-15 00:05:40

nvidia_drm module

I’m trying to find out what the hell the nvidia_drm module does. After my last system upgrade X would no longer display anything on screen. After 4 hours of pretty blind poking about I blacklisted the module nvidia_drm and now everything works again. If I enable it X stops working again. So it’s pretty reproducible. But I’d be very interested to know what is going on.

#2 2017-03-15 08:43:46

Re: nvidia_drm module

#3 2017-03-15 10:03:22

Re: nvidia_drm module

No. I use X with i3. So basically I have startx run from my bash_profile, with i3 as the window manager.

#4 2017-03-15 10:43:45

Re: nvidia_drm module

A bit of digging around in sources and found this in a README:

Chapter 33. Direct Rendering Manager Kernel Modesetting (DRM KMS)

Apparently it is still considered «experimental».

#5 2017-03-15 18:11:02

Re: nvidia_drm module

Yes, but the information that it impacts plain stupid old X11 is new to me.
(gnome might try to run on wayland what presently doesn’t work because nvidia is on eglstreams rather than gbm)

Do you at least use some fancy OpenGL compositor? (eg. compton)? Please? 😉

#6 2017-03-15 19:29:42

Re: nvidia_drm module

I have to admit this is kind of above my pay grade. I know almost nothing about graphics and the only time I ever open a GUI is Firefox. Spend 90% of my time in URXVT, never play games and really hardly need graphics at all.

So the answer is: I don’t even know what I’ve got. But I did a vanilla Arch install, do pacman -Syu most days and have a dell laptop with an Nvidia card in it. FWIW.

#7 2017-03-15 19:47:15

Re: nvidia_drm module

i3 isn’t compositing itself, so unless you deliberately setup some 3rd compositor (as mentioned, compton is popuplar — these things make windows translucent and add shadows etc.) you’re not running one.

Please consider to dump «lspci | grep VGA», «pacman -Qi nvidia» and an xorg log from a failed session (eg. Xorg.1.log right after successfully logging in after a failed login, in doubt, warch the timestamps) for the records (in case similar reports come in), if you can spare the time and the effort?
Many thanks in advance 😉

#8 2017-03-15 20:10:54

Re: nvidia_drm module

@seth thought this might help tested on a system with a discerete nvidida GPU no integrated GPU using nvidida packages
no blacklisting

xorg.log with blacklisting

#9 2017-03-15 20:49:01

Re: nvidia_drm module

If expectably fails loading the modesetting driver, but the nvidia driver seems unconsidered.

#10 2017-03-15 21:01:12

Re: nvidia_drm module

xorg.log excerpt without blacklisting

#11 2017-03-15 23:27:06

Re: nvidia_drm module

@seth sorry for the delay. Hope I got this right. I logged in successfully. Then I loaded the module nvidia_drm and then I logged back in again.

I should add the only way to exit the black screen that I know if is Ctrl+Alt+Backspace

So first here is the output of lspci:

Next here is the output from pacman:

Finally the Xorg contents (/var/log/Xorg.0.log):

Last edited by scandox (2017-03-15 23:28:02)

#12 2017-03-16 07:57:11

Re: nvidia_drm module

The bottom line is that w/o nvidia_drm, the nvidia xorg driver isn’t met.
What if you «convince» xorg to load the nvidia driver? (having nvidia-settings write an xorg.conf should provide a screen section linking a device section that configures the nvidia driver, the block or maybe just unload nvidia_drm and try to start X11)

@scandox
There’s much more going on — lspci only lists an intel IGP, xorg matches intel, nouveau and nvidia, all of them and the modesetting and vesa drivers are loaded, then ultimately modesetting and nvidia remain.
I can see where intel kms and nvidia_drm might run into conflicts (the modesetting driver tries to cover the nvidia GPU). but I don’t understand the lspci output (contrasting the xorg log)

Is the lspci output complete?
What about «lspci | grep -i nvidia»?
Why is nouveau considered, «lsmod | grep nouveau»? — Why is xf86-video-nouveau installed itfp.
It’s also weird that the modesetting driver controls the intel chip despite xf86-video-intel seems installed

[ 271.053] (II) LoadModule: «ramdac»
[ 271.053] (II) Module «ramdac» already built-in
[ 271.303] (II) NVIDIA(0): Validated MetaModes:
[ 271.303] (II) NVIDIA(0): «NULL»
[ 271.303] (II) NVIDIA(0): Virtual screen size determined to be 640 x 480
[ 271.303] (WW) NVIDIA(0): Unable to get display device for DPI computation.
[ 271.422] (II) NVIDIA(0): Built-in logo is bigger than the screen.
[ 271.422] (II) NVIDIA(0): Setting mode «NULL»
[ 271.427] (WW) NVIDIA(0): Option «PrimaryGPU» is not used
[ 271.427] (II) Loading sub module «dri2»
[ 271.427] (II) LoadModule: «dri2»
[ 271.427] (II) Module «dri2» already built-in

Источник

Hello,
After updating proxmox kernel, it seems my Nvidia drivers are not working on my LXC with Plex.
I uninstalled the driver from both hypervisor and LXC.
I installed the last driver (NVIDIA-Linux-x86_64-510.54.run) on hypervisor.
However when I am trying to install the same driver on LXC I get this error:
"An NVIDIA kernel module 'nvidia-drm' appears to already be loaded in your kernel"

I tried modprobe -r nvidia-drm but did not help.
Any hints to fix this ?

PS: Any way to avoid this happening on every kernel update ?

Thank you!

Last edited: Mar 7, 2022

dcsapak

Proxmox Staff Member

Staff member

: Feb 1, 2016

: 8,640

: 1,128

: 174

: 34

: Vienna

the lxc cannot load kernel modules, so that log seems normal? (since it’s already loaded on the host)
maybe there is a way to install the driver without loading the module?

Neobin

Famous Member

: Apr 29, 2021

: 1,147

: 410

: 83

Do you use the --no-kernel-module option for the nvidia-driver installation inside the LXC?

Do you use the --no-kernel-module option for the nvidia-driver installation inside the LXC?

Of course I did .. forgot!

Thank you!

Источник

Hello,

Nvidia suggested

1. Download http://people.freedesktop.org/~aplattne … 5.38.patch 28
2. Apply it to the .run package with
bash NVIDIA-Linux-x86_64-455.38.run —apply-patch reduce-kmalloc-limit-455.38.patch
3. Install the resulting .run package
bash NVIDIA-Linux-x86_64-455.38-custom.run

The Problem is that I failed to install this driver:

When I run NVIDIA-Linux-x86_64-455.38-custom.run
I get:

So I removed the nvidia driver package (pacman -R nvidia) and rebooted. It rebooted into a text console and I ran NVIDIA-Linux-x86_64-455.38-custom.run again, but it tells me to unload Nouveau. I tried modprobe -r nouveau, but this returned “modprobe: FATAL: Module nouveau is in use.”. I also tried rmmod -f nouveau, but this made by screen go dark. The same problem arises when I try to execute it while arch-chrooting into my system from the Arch iso.

Any ideas on how to install this driver?

Источник

I’m getting the following error when trying to apply a publicly available AWS DL AMI to a EMR cluster (emr-6.2.0, spark 3.0.1)

From the puppet.log file from the app-phase:

ERROR: An NVIDIA kernel module 'nvidia' appears to already be loaded in your kernel. This may be because it is in use (for example, by an X server, a CUDA program, or the NVIDIA Persistence Daemon), but this may also happen if your kernel was configured without support for module unloading. Please be sure to exit any programs that may be using the GPU(s) before attempting to upgrade your driver. If no GPU-based programs are running, you know that your kernel supports module unloading, and you still receive this message, then an error may have occured that has corrupted an NVIDIA kernel module's usage count, for which the simplest remedy is to reboot your computer.

Linux version within the cluster is

Linux ip-10-14-1-68 4.14.26-46.32.amzn1.x86_64 #1 SMP Fri Mar 30 22:29:54 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
NAME="Amazon Linux AMI"
VERSION="2018.03"
ID="amzn"
ID_LIKE="rhel fedora"
VERSION_ID="2018.03"
PRETTY_NAME="Amazon Linux AMI 2018.03"
ANSI_COLOR="0;33"
CPE_NAME="cpe:/o:amazon:linux:2018.03:ga"
HOME_URL="http://aws.amazon.com/amazon-linux-ami/"

DLAMI = ami-058964fc61ad6c7c8
Tensorflow version = 2.4.1

Is there any fix or workaround to this issue? Some clean way to remove Nvidia perhaps and allow it to be reinstalled or kill some processes holding onto it? or something else.

Источник

Problem

Solution 1: Remove Nvidia driver

Solution 2: Stop all processes use GPU

參考資料

Установка CUDA

CUDA Installation

Arch Linux

#1 2020-11-16 20:15:12

Nvidia driver problems

unixforum.org

Не устанавливается драйвер Nvidia Debian

Не устанавливается драйвер Nvidia Debian

Re: Не устанавливается драйвер Nvidia Debian

Re: Не устанавливается драйвер Nvidia Debian

Re: Не устанавливается драйвер Nvidia Debian

Re: Не устанавливается драйвер Nvidia Debian

How to solve An NVIDIA kernel module nvidia appears to already be loaded in your kernel

Junyong Lee

You may also enjoy

Jump multiple remote hosts using ProxyCommand (SSH Tunneling)

How to permanently add passphrase of private key to ssh-agent (macOS, Ubuntu, and Windows)

How to configure SSH without Passwords

An NVIDIA kernel module ‘nvidia-uvm’ appears to already be loaded in your kernel #62

Arch Linux

#1 2017-03-15 00:05:40

nvidia_drm module

#2 2017-03-15 08:43:46

Re: nvidia_drm module

#3 2017-03-15 10:03:22

Re: nvidia_drm module

#4 2017-03-15 10:43:45

Re: nvidia_drm module

#5 2017-03-15 18:11:02

Re: nvidia_drm module

#6 2017-03-15 19:29:42

Re: nvidia_drm module

#7 2017-03-15 19:47:15

Re: nvidia_drm module

#8 2017-03-15 20:10:54

Re: nvidia_drm module

#9 2017-03-15 20:49:01

Re: nvidia_drm module

#10 2017-03-15 21:01:12

Re: nvidia_drm module

#11 2017-03-15 23:27:06

Re: nvidia_drm module

#12 2017-03-16 07:57:11

Re: nvidia_drm module

dcsapak

Proxmox Staff Member

Neobin

Famous Member

Читайте также: