[Impact]
Sometimes, auditd will get stuck when starting up, causing systemd to kill it after a while since it (systemd) never got the start notification.
Upstream troubleshooted this to be caused by calling a syslog() function inside a signal handler.
[Test Case]
There is no reliable test case to reproduce the bug, other than trying the fixed packages on an affected system where the hang occurs more frequently.
Basically:
sudo systemctl stop auditd
sudo systemctl start auditd
should work reliably. Do not run that in a tight loop, however, as that will trigger a it’s-restarting-too-frequently failure.
[Where problems could occur]
— if auditd fails to start, then the first fallback is syslog, and if that is not picking up the audit messages, the last resort is the kernel buffer, which can fill up. In the case it fills up, audit logs will be lost.
— it’s possible to configure the audit system to panic() the machine if audit messages are lost or otherwise not able to be recorded (auditctl -f 2; default is 1 which is printk())
— the update restarts auditd as expected. Misconfiguration on very very busy systems could mean that audit logs would be lost during the brief moment the service is restarted. If that’s the case, this update would just be one more way to trigger it, but not be the root cause of the problem
— similarly, as is usual with updates that restart services, it’s possible than an incorrect configuration for auditd is present, but was never loaded before. The restart will load the config, and will fail in such a case.
— this update removes a logging statement that occurs during startup:
(«dispatcher %d reaped», pid)
It’s unlikely, but possible, that some monitoring software could be looking for that message in the logs. It won’t be there anymore after this update.
[Other Info]
The patch is committed upstream and part of the 2.8.5 release, which is present in Focal and later.
The real fix for this bug is just dropping the audit_msg() call in the signal handler code. But the original reporter of the bug, who is also who came up with the fix (see https://bugzilla.redhat.com/show_bug.cgi?id=1587995#c4) stated that with the 3 changes in the patch the startup hang didn’t happen to him anymore. Since this bug is difficult to reproduce elsewhere (either you have it, or you don’t), I chose to keep the 3 changes instead of just the removal of the audit_msg() call.
[Original Description]
This happens sometimes when installing auditd on Ubuntu 18.04.2, most installations work successfully, though. Re-running the install also fixes the issue, but the failure breaks our automation. The log from the failure looks like this:
# apt install auditd
…
Setting up auditd (1:2.8.2-1ubuntu1) …
Created symlink /etc/systemd/system/multi-user.target.wants/auditd.service → /lib/systemd/system/auditd.service.
Job for auditd.service failed because a timeout was exceeded.
See «systemctl status auditd.service» and «journalctl -xe» for details.
invoke-rc.d: initscript auditd, action «start» failed.
● auditd.service — Security Auditing Service
Loaded: loaded (/lib/systemd/system/auditd.service; enabled; vendor preset: enabled)
Active: failed (Result: timeout) since Tue 2019-09-17 18:43:06 UTC; 11ms ago
Docs: man:auditd(8)
https://github.com/linux-audit/audit-documentation
Process: 9702 ExecStart=/sbin/auditd (code=killed, signal=KILL)
Sep 17 18:40:06 compute-node21 systemd[1]: Starting Security Auditing Service…
Sep 17 18:40:06 compute-node21 auditd[9703]: Started dispatcher: /sbin/audispd pid: 9705
Sep 17 18:40:06 compute-node21 audispd[9705]: No plugins found, exiting
Sep 17 18:41:36 compute-node21 systemd[1]: auditd.service: Start operation timed out. Terminating.
Sep 17 18:43:06 compute-node21 systemd[1]: auditd.service: State ‘stop-sigterm’ timed out. Killing.
Sep 17 18:43:06 compute-node21 systemd[1]: auditd.service: Killing process 9702 (auditd) with signal SIGKILL.
Sep 17 18:43:06 compute-node21 systemd[1]: auditd.service: Killing process 9703 (auditd) with signal SIGKILL.
Sep 17 18:43:06 compute-node21 systemd[1]: auditd.service: Control process exited, code=killed status=9
Sep 17 18:43:06 compute-node21 systemd[1]: auditd.service: Failed with result ‘timeout’.
Sep 17 18:43:06 compute-node21 systemd[1]: Failed to start Security Auditing Service.
dpkg: error processing package auditd (—configure):
installed auditd package post-installation script subprocess returned error exit status 1
On a Redhat Linux 8 system, the /var/log/audit is mounted on a dedicated partition. Somehow the size of it was allocated quite small. And we needed to increase it.
After it was remounted to a bigger partition, auditd would not start. Tried to manually start the service and got the exit status code 6
root@joetest01:/var/log# systemctl start auditd
Job for auditd.service failed because the control process exited with error code.
See "systemctl status auditd.service" and "journalctl -xe" for details.
root@joetest01:/var/log# systemctl status auditd
● auditd.service - Security Auditing Service
Loaded: loaded (/usr/lib/systemd/system/auditd.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2021-12-02 20:22:49 EST; 4min 4s ago
Docs: man:auditd(8)
https://github.com/linux-audit/audit-documentation
Process: 1695914 ExecStart=/sbin/auditd (code=exited, status=6)
Dec 02 20:22:49 joetest01 systemd[1]: Starting Security Auditing Service…
Dec 02 20:22:49 joetest01 systemd[1]: auditd.service: Control process exited, code=exited status=6
Dec 02 20:22:49 joetest01 systemd[1]: auditd.service: Failed with result 'exit-code'.
Dec 02 20:22:49 joetest01 systemd[1]: Failed to start Security Auditing Service.
I Then tried to start the audit daemo from the command line to see if it would output something that was more useful.
/sbin/auditd -f
The command above will put it in debug mode where it write more info to stdout.
Also used “journalctl -xe” to see logs and saw “Permission denied” error.
root@joetest01:/var/log# journalctl -xe
Dec 02 20:22:49 joetest01 systemd[1]: Starting Security Auditing Service…
-- Subject: Unit auditd.service has begun start-up
-- Defined-By: systemd-- Unit auditd.service has begun starting up.
Dec 02 20:22:49 joetest01 kernel: kauditd_printk_skb: 1 callbacks suppressed
Dec 02 20:22:49 joetest01 kernel: audit: type=1400 audit(1638494569.921:9146): avc: denied { read } for pid=1695914 comm="auditd" name="/" dev="sda7" ino=128 scontext=system_u:system_r:auditd_t:s0 tconte>
Dec 02 20:22:49 joetest01 auditd[1695914]: Could not open dir /var/log/audit (Permission denied)
Dec 02 20:22:49 joetest01 auditd[1695914]: The audit daemon is exiting.
Dec 02 20:22:49 joetest01 systemd[1]: auditd.service: Control process exited, code=exited status=6
Dec 02 20:22:49 joetest01 systemd[1]: auditd.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- The unit auditd.service has entered the 'failed' state with result 'exit-code'.
Dec 02 20:22:49 joetest01 systemd[1]: Failed to start Security Auditing Service.
-- Subject: Unit auditd.service has failed
-- Defined-By: systemd
-- Unit auditd.service has failed.
-- The result is failed.
So the error gave me a clue that it might be related to SELinux security context. Check with “-Z” option of “ls” and it showed “unlabeled_t” instead of “auditd_log_t“
root@aheicsporaprd01:/var/log# ls -Zd /var/log/audit
system_u:object_r:unlabeled_t:s0 /var/log/audit
Restored the context of SELinux with the command “restorecon” and confirmed again:
root@aheicsporaprd01:/var/log# restorecon /var/log/audit
root@aheicsporaprd01:/var/log# ls -Zd /var/log/audit
system_u:object_r:auditd_log_t:s0 /var/log/audit
The service has started to work again:
root@joetest01:/var/log# systemctl start auditd
root@joetest01:/var/log# systemctl status auditd
● auditd.service - Security Auditing Service
Loaded: loaded (/usr/lib/systemd/system/auditd.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2021-12-02 20:27:28 EST; 5s ago
Docs: man:auditd(8)
https://github.com/linux-audit/audit-documentation
Process: 1701973 ExecStartPost=/sbin/augenrules --load (code=exited, status=1/FAILURE)
Process: 1701966 ExecStart=/sbin/auditd (code=exited, status=0/SUCCESS)
Main PID: 1701967 (auditd)
Tasks: 5 (limit: 1645960)
Memory: 2.9M
CGroup: /system.slice/auditd.service
├─1701967 /sbin/auditd
├─1701969 /sbin/audisp-syslog LOG_LOCAL6
└─1701970 /sbin/audisp-syslog LOG_LOCAL6
Dec 02 20:27:28 joetest01 audispd[1701970]: node=joetest01 type=SYSCALL msg=audit(1638494848.444:9360): arch=c000003e syscall=44 success=yes exit=1064 a0=3 a1=7ffd86465de0 a2=428 a3=0 items=0 ppid=170>
Dec 02 20:27:28 joetest01 audispd[1701969]: node=joetest01 type=SYSCALL msg=audit(1638494848.444:9360): arch=c000003e syscall=44 success=yes exit=1064 a0=3 a1=7ffd86465de0 a2=428 a3=0 items=0 ppid=170>
Dec 02 20:27:28 joetest01 audispd[1701970]: node=joetest01 type=SOCKADDR msg=audit(1638494848.444:9360): saddr=100000000000000000000000 SADDR={ saddr_fam=netlink nlnk-fam=16 nlnk-pid=0 }
Dec 02 20:27:28 joetest01 audispd[1701969]: node=joetest01 type=SOCKADDR msg=audit(1638494848.444:9360): saddr=100000000000000000000000 SADDR={ saddr_fam=netlink nlnk-fam=16 nlnk-pid=0 }
Dec 02 20:27:28 joetest01 audispd[1701970]: node=joetest01 type=PROCTITLE msg=audit(1638494848.444:9360): proctitle=2F7362696E2F617564697463746C002D52002F6574632F61756469742F61756469742E72756C6573
Dec 02 20:27:28 joetest01 audispd[1701969]: node=joetest01 type=PROCTITLE msg=audit(1638494848.444:9360): proctitle=2F7362696E2F617564697463746C002D52002F6574632F61756469742F61756469742E72756C6573
Dec 02 20:27:28 joetest01 audispd[1701970]: node=joetest01 type=EOE msg=audit(1638494848.444:9360):
Dec 02 20:27:28 joetest01 audispd[1701969]: node=joetest01 type=EOE msg=audit(1638494848.444:9360):
Dec 02 20:27:28 joetest01 audispd[1701970]: node=joetest01 type=SERVICE_START msg=audit(1638494848.446:9361): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=audi>
Dec 02 20:27:28 joetest01 audispd[1701969]: node=joetest01 type=SERVICE_START msg=audit(1638494848.446:9361): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=audi>
lines 1-25/25 (END)
Environment
- Red Hat Enterprise Linux 7
- audit
Issue
- Auditd service can’t be started with this error:
$ systemctl status auditd.service
● auditd.service - Security Auditing Service
Loaded: loaded (/etc/systemd/system/auditd.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2020-09-11 13:21:51 CEST; 4s ago
Docs: man:auditd(8)
https://github.com/linux-audit/audit-documentation
Process: 16697 ExecStart=/sbin/auditd (code=exited, status=1/FAILURE)
Sep 11 13:21:51 srv systemd[1]: Starting Security Auditing Service...
Sep 11 13:21:51 srv auditd[16698]: Started dispatcher: /sbin/audispd pid: 16700
Sep 11 13:21:51 srv auditd[16698]: Error setting audit daemon pid (File exists)
Sep 11 13:21:51 srv auditd[16698]: Unable to set audit pid, exiting
Sep 11 13:21:51 srv auditd[16698]: The audit daemon is exiting.
Sep 11 13:21:51 srv systemd[1]: auditd.service: control process exited, code=exited status=1
Sep 11 13:21:51 srv systemd[1]: Failed to start Security Auditing Service.
Sep 11 13:21:51 srv systemd[1]: Unit auditd.service entered failed state.
Sep 11 13:21:51 srv systemd[1]: auditd.service failed.
- Although there is no pid file
/var/run/auditd.pid
.
Resolution
-
Another tool than auditd has been registered to the kernel.
-
This can be found with
auditctl -s
:# auditctl -s enabled 1 failure 0 => pid 9307 rate_limit 10000 backlog_limit 4096 lost 12071 backlog 0 loginuid_immutable 0 unlocked
-
=> the pid 9307 is registered on the
NETLINK_AUDIT
and is not auditd, thus/var/run/auditd.pid
is empty:root 9307 2.2 0.4 462256 80052 ? SNl Aug26 874:02 /usr/bin/osqueryd
-
This tools is not provided by Red Hat, but we can found this information: (External Link !): https://osquery.readthedocs.io/en/stable/deployment/process-auditing/
Linux process auditing osquery uses the Linux Audit System to collect and process audit events from the kernel. It accomplishes this by monitoring the execve() syscall. Auditd should not be running when using osquery's process auditing, as it will conflict with osqueryd over access to the audit netlink socket. You should also ensure auditd is not configured to start at boot.
=> Thus, you can’t use auditd with osquery enabled.
-
Please disable osquery to be able to enable auditd.
Root Cause
- Auditd is not able to register its pid to the kernel (
audit_replace(requesting_pid)
) because a pid is already registered by another tool. - The message File exists is about the PID present in the kernel
- The source code corresponding on auditd daemon is:
/*
* This function returns -1 on error and 1 on success.
*/
int audit_set_pid(int fd, uint32_t pid, rep_wait_t wmode)
{
struct audit_status s;
struct audit_reply rep;
struct pollfd pfd[1];
int rc;
memset(&s, 0, sizeof(s));
s.mask = AUDIT_STATUS_PID;
s.pid = pid;
=> rc = audit_send(fd, AUDIT_SET, &s, sizeof(s)); <= it sends the struct s with AUDIT_STATUS_PID and the pid of auditd
if (rc < 0) {
audit_msg(audit_priority(errno),
=> "Error setting audit daemon pid (%s)", <= it receives an error as file exists
strerror(-rc));
return rc;
}
- On the kernel side:
if (s.mask & AUDIT_STATUS_PID) {
/* NOTE: we are using task_tgid_vnr() below because
* the s.pid value is relative to the namespace
* of the caller; at present this doesn't matter
* much since you can really only run auditd
* from the initial pid namespace, but something
* to keep in mind if this changes */
int new_pid = s.pid;
pid_t requesting_pid = task_tgid_vnr(current);
if ((!new_pid) && (requesting_pid != audit_pid))
return -EACCES;
=> if (audit_pid && new_pid &&
audit_replace(requesting_pid) != -ECONNREFUSED)
=> return -EEXIST; <= the return message EEXIST
if (audit_enabled != AUDIT_OFF)
audit_log_config_change("audit_pid", new_pid, audit_pid, 1);
audit_pid = new_pid;
audit_nlk_portid = NETLINK_CB(skb).portid;
}
Diagnostic Steps
- in audit log (
/var/log/audit/audit.log
), we can see that theset-pid
is failing:
type=DAEMON_START msg=audit(1600773483.093:8108): op=start ver=2.8.5 format=raw kernel=3.10.0-862.6.3.el7.x86_64 auid=4294967295 pid=73492 uid=0 ses=4294967295 res=success
type=DAEMON_ABORT msg=audit(1600773483.093:8109): op=set-pid auid=4294967295 pid=73492 uid=0 ses=4294967295 res=failed
- If we strace the process, we can see that auditd is able to open and write the pid file:
76036 1600774118.371816 socket(AF_NETLINK, SOCK_RAW, NETLINK_AUDIT) = 3<socket:[681464412]> <0.000027>
76036 1600774118.371889 fcntl(3<socket:[681464412]>, F_SETFD, FD_CLOEXEC) = 0 <0.000030>
76036 1600774118.371970 open("/var/run/auditd.pid", O_WRONLY|O_CREAT|O_TRUNC|O_NOFOLLOW, 0644) = 4</run/auditd.pid> <0.000040>
76036 1600774118.372062 write(4</run/auditd.pid>, "76036n", 6) = 6 <0.000032>
76036 1600774118.372140 close(4</run/auditd.pid>) = 0 <0.000024>
- But when it tries to send the info to the kernel through the socket, it gets an error:
76036 1600774118.374116 sendto(3<socket:[681464412]>, "20350351", 16, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 16 <0.000088>
76036 1600774118.374263 poll([{fd=3<socket:[681464412]>, events=POLLIN}], 1, 500) = 1 ([{fd=3, revents=POLLIN}]) <0.000026>
76036 1600774118.374352 recvfrom(3<socket:[681464412]>, "$214)120350351", 8988, MSG_PEEK|MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36 <0.000027>
76036 1600774118.374432 recvfrom(3<socket:[681464412]>, "$214)120350351", 8988, MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36 <0.000027>
76036 1600774118.374523 poll([{fd=3<socket:[681464412]>, events=POLLIN}], 1, 100) = 1 ([{fd=3, revents=POLLIN}]) <0.000028>
76036 1600774118.374602 recvfrom(3<socket:[681464412]>, "4350314)11[$20'20'/=", 8988, MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 52 <0.000023>
76036 1600774118.374684 sendto(3<socket:[681464412]>, "435135211", 52, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 52 <0.000041>
76036 1600774118.374779 poll([{fd=3<socket:[681464412]>, events=POLLIN}], 1, 500) = 1 ([{fd=3, revents=POLLIN}]) <0.000025>
76036 1600774118.374852 recvfrom(3<socket:[681464412]>, "$224)1004351352", 8988, MSG_PEEK|MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36 <0.000025>
76036 1600774118.374928 recvfrom(3<socket:[681464412]>, "$224)1004351352", 8988, MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36 <0.000020>
76036 1600774118.375006 sendto(3<socket:[681464412]>, "435135344)1", 52, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 52 <0.000069>
76036 1600774118.375131 poll([{fd=3<socket:[681464412]>, events=POLLIN}], 1, 500) = 1 ([{fd=3, revents=POLLIN}]) <0.000023>
76036 1600774118.375216 recvfrom(3<socket:[681464412]>, "H234)1357377377377435135344)1", 8988, MSG_PEEK|MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 72 <0.000026>
76036 1600774118.375302 recvfrom(3<socket:[681464412]>, "H234)1357377377377435135344)1", 8988, MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 72 <0.000028>
76036 1600774118.375424 write(2</dev/pts/0>, "Error setting audit daemon pid (File exists)", 44) = 44 <0.000029>
76036 1600774118.375503 write(2</dev/pts/0>, "n", 1) = 1 <0.000040>
- Even if there is no pid file
/var/run/auditd.pid
.
-
Product(s)
- Red Hat Enterprise Linux
-
Component
- audit
-
Category
- Troubleshoot
-
Tags
- audit
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.