Log inSkip to main contentSkip to sidebar
- ZABBIX BUGS AND ISSUES
- ZBX-17826
Log In
Export
- XMLWordPrintable
Details
-
Type:
Problem report
-
Status:
Closed -
Priority:
Trivial
-
Resolution:
Fixed
-
Affects Version/s:
4.4.8
-
Fix Version/s:
5.2 (plan)
-
Component/s:
Server (S)
-
Labels:
None
-
Environment:
Debian 10 Buster amd64
MariaDB 10.4
Zabbix server 4.4.8
-
Team:
Team D
-
Sprint:
Sprint 65 (Jun 2020)
Description
Steps to reproduce:
- Temporary spike in % value cache used >75%
- zabbix[vcache,cache,mode] item switches to 1 = «low memory mode»
- Temporary spike drops below 75%
- zabbix[vcache,cache,mode] item stays at 1 = «low memory mode»
- Only zabbix serevr daemon restart clears trigger
Result:
Temporary spike:
zabbix[vcache,cache,mode]
Expected:
Item switches back to 0 after value cache % drops below 75%
Attachments
Attachments
-
- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All
-
- image-2020-05-27-14-54-22-713.png
- 28 kB
- 2020 May 27 15:54
-
- image-2020-05-27-14-54-44-010.png
- 28 kB
- 2020 May 27 15:54
-
- image-2020-05-27-16-01-24-686.png
- 34 kB
- 2020 May 27 17:01
-
- image-2020-05-27-16-11-13-417.png
- 31 kB
- 2020 May 27 17:11
-
- image-2020-05-28-13-10-20-124.png
- 38 kB
- 2020 May 28 14:10
-
- memory statistics for value cache size.txt
- 22 kB
- 2020 May 27 16:19
Activity
People
- Assignee:
-
Marina Generalova
- Reporter:
-
Marco Hofmann
- Votes:
-
0
Vote for this issue
- Watchers:
-
5
Start watching this issue
Dates
- Created:
-
2020 May 27 15:55
- Updated:
-
2020 Jun 30 13:03
- Resolved:
-
2020 Jun 30 13:03
Links
- Homepage
- zabbix 4 database schema
- Zabbix compatibility matrix
- https://www.digitalocean.com/community/tutorials/introduction-to-queries-mysql
- compilation instructions
- Documentation
- Examples of Common Queries
- Custom scripts
- Various scripts to automate tasks in Zabbix
- Tuning mysql for zabbix
- https://huyabbix.com
- Migrating zabbix database with minimal downtime
- Bug tracker
- Clean up database
- Zabbix and selinux
- Apache/SSL checks
- Zabbix on RHEL/Centos
- Grafana
- https://blog.zabbix.com/zabbix-ha-cluster-setups/8264/ Zabbix HA cluster]
- Active vs Passive
- Very old network diagram
- Fighting zabbix alert floods
Documentation
Triggers
- Trigger functions
Function str() searches for substrings
Installation
- Zabbix repositories
- Zabbix on Gentoo
Installing Zabbix from git
git clone https://github.com/zabbix/zabbix.git cd zabbix ./bootstrap.sh
Zabbix API
- The Zabbix API
- API and python
Zabbix error codes
Z3005
Database issue
Items
Agent item keys
Item dialog
- Item documentation
- Zabbix agent items
Units
- B
- uptime
- unixtime
- s
proc.mem
proc.mem[<name>,<user>,<mode>,<cmdline>,<memtype>]
name
??
cmdline
regex like php-fpm:
memtype
- Notes on proc.mem memtypes
Item preprocessing
Preprocessing regular expressions
Templates
- Community templates
Template App MySQL
https://github.com/tiramiseb/zabbix-templates/blob/master/Template%20App%20MySQL.txt
TODO shouldn’t this be user zabbix?
mysql user account:
create user 'monitor'@'localhost' identified by auth_socket; grant PROCESS,SHOW DATABASES,SHOW VIEW on *.* to 'monitor'@'localhost'; flush privileges;
Configuration
Zabbix agent active
On client
Have port 10051 open and:
ActiveServer zabbix.ser.ver
On server
Set Agent IP to 0.0.0.0
Zabbix and SQL
Find hosts with hostmacro defined
select h.host, m.macro, m.value from hosts h, hostmacro m where macro like '%FOO%' and h.hostid = m.hostid;
most frequent items in history_uint
select itemid,count(itemid) as freq from history_uint group by itemid order by freq desc limit 5;
and then
select name from items where itemid = whateveryoufind;
HOWTO
LLD with JSON
- LLD with JSON and dependent items
if you want multiple keys, use jsonpath like
$[?(@.share=='{#FSTYPE}' && @.name=='{#NAME}')].size.first()
testing jsonpath preprocessing
In Value paste valid json, then name {#NAME} value somevalue
Test trapper
FAQ
SERVER
Adjust loglevel
zabbix_server --runtime-control log_level_increase=trapper
Reload zabbix server configuration
You can’t, but you might want
zabbix_server -c /etc/zabbix/zabbix_server.conf -R config_cache_reload
No media defined for user
The frontend does not match Zabbix database.
Probably version conflict between frontend and server
value cache working in low memory mode
Increase ValueCacheSize
PROXY
Zabbix Proxy
Front end
Round numbers
Preprocessing javascript
2 decimals:
return Math.round(value* 100) / 100
0 decimals:
return Math.round(value)
Visable name vs hostname
Visible name: {HOST.NAME}
Hostname: {HOST.HOST}
Host IP: (as defined in Interface->IP/DNS) {HOST.CONN}
Acknowledge multiple items
Monitor->Problems apply filters, select all, mass update
No permissions to referred object or it does not exist!
Graph no longer exists. Probably items no longer discovered
Cannot add host
??
SNMP
Cannot find host interface on «xxxhost» for item key foo
Might mean you’re trying to import an SNMP template before configuring SNMP for the host
No SNMP data
snmp_parse_oid(): cannot parse OID «IF-MIB::ifSpeed.3
Agent side ping check
UserParameter=pingtime[*],fping -e $1|sed 's/^.*(([0-9].*) ms).*$/1/g' UserParameter=pingalive[*],fping $1|grep -q alive;echo $?
LLD/Discovery
Discover: value must be a JSON object
Could mean you need to escape slashes, check output with zabbix_get
Cannot create item: item with the same key
make sure the key containts «{#MACRONAME}»
Discovery data example
Output of a discovery script should look like:
{"data":[ {"{#VAR1}":"value11","#{VAR2":"value12"}, {"{#VAR1}":"value21","#{VAR2":"value22"} ]}
IPMI
IPMI Monitoring account for zabbix
https://www.thomas-krenn.com/en/wiki/Configuring_IPMI_under_Linux_using_ipmitool
ipmitool user set name 3 monitor ipmitool user set password 3 ipmitool channel setaccess 1 3 link=on ipmi=on callin=on privilege=2 ipmitool user enable 3
cannot connect to IPMI host: [22] Operation canceled
Usually temporary because of broken ipmi lib, ignore it
cannot connect to IPMI host: [16777411] Unknown error 16777411
classic, probably authentication problem
cannot connect to IPMI host: [22] Invalid argument
zabbix_sender
processed: 0; failed: 1
Possible causes:
- incorrect hostname
- incorrect item key
- item not in the server configuration cache yet
- Allowed hosts in trapper item
- phase of moon
- aliens
Testing zabbix_sender
zabbix_sender stuff
Filters
The regular expressions referred to in discovery are found under Administration->General, and then «Regular expressions» in the dropdown at top right of the page
cannot connect to IPMI host: [125] Operation canceled
possibly authentication method issue
Calculated items
See Calculated items explained
Cannot create item: Invalid first parameter
Cannot create item, error in formula
Problably a calculated item, try doublequoting the item key:
last("foo[bar]")
Invalid parameter «/1/params»
Maybe forgot to use last()?
You might need to doublequote your items, or prepend with double slashes
Reset trigger/alert
For example when you changed the settings
Just disable, wait a bit and enable again.
Install recent zabbix on CentOS/RHEL
rpm -ivh https://repo.zabbix.com/zabbix/3.4/rhel/7/x86_64/zabbix-release-3.4-2.el7.noarch.rpm yum install zabbix-agent
Backing up tables
https://www.zabbix.org/wiki/Docs/howto/mysql_backup_script
cannot send list of active checks
If in agent log: most likely ServerActive is defined in agent config, while not used at all
It is also possible agent is sending some active check to server while host is monitored via proxy.
active check configuration update started to fail
??
Latest 20 issues
DEFAULT_LATEST_ISSUES_CNT in/usr/share/zabbix/include/defines.inc.php
Zabbix unreachable poller processes more than 75% busy
Increase StartPollersUnreachable
Zabbix poller processes more than 75% busy
another mystery
More than 100 items having missing data for more than 10 minutes
Could be high load. Also check Administration->Queue
Zabbix escalator processes more than 75% busy
probably high system load overall
Check agent
zabbix_get -s my.host.com -k agent.version
ZBX_NOTSUPPORTED
Could be anything, enable logging on agent. It could be version mismatch. Check
zabbix_get -s yourhost -k agent.version
If that works, you’re calling for an undefined or unsupported key.
Incorrect trigger expression. Host «xx» does not exist or you have no access to this host.
Means there’s no related item.
zabbix_get returns nothing
best look at log on agent side
run playbook on single host
ansible_playbook -l somehost somplay.yml
Category:Monitoring
Zabbix server is not running: the information displayed may not be current
Might be selinux: http://sysads.co.uk/2013/11/zabbix-server-running-alert/
Monitoring vmware
vmware.hv.cpu.usage[{$URL},{HOST.HOST}]» became not supported: Couldn’t resolve host name
Set macro {$URL} to https://your.ip/sdk/ (shouldn't discovery figure that out from {$HOST} ?
Couldn’t resolve host name
Sometimes it’s a matter of waiting a few hours
vmware events collector returned empty result
???
No «vmware collector» processes started.
Check StartVMwareCollectors on server or proxy
unsupported item key
This might mean it’s expecting a value from the script you’re calling.
echo 1
became not supported: Not supported by Zabbix Agent
probably output by userparameter/script
ansible or API not showing host groups
Permissions!! See Administration->User Groups
failed to update local proxy configuration copy: invalid field name «items.lastlogsize»
check everything
Received value [11] is not suitable for value type [Numeric (unsigned)] and data type [Decimal]
This probably means the agent returned 1n1
database is down: retrying in 10 seconds
try upping max_connections
[Incorrect key file for table ‘items’; try to repair it
Could be something /tmp related
another network error, wait for 8 seconds
UnreachableDelay=8
failed: first network error
Setting Timeout in server configuration
also Timeout in agents?
no active checks on server
- Hostname in agent config (-kagent.hostname) must match name on server
- simple no connection possible? firewall?
???
show cpu utilization
Monitoring->host->graphs
fuzzytime on command line
TS=lotsofseconds
- output in hours
echo $(( ($(date +%s) - $TS) / 3600 ))
duplicate entry adding user/group
Check table ‘ids’
Ссылка на исходный текстhttp://blog.51cto.com/net881004/2067677
1. Zabbix alarm icmp pinger обрабатывает более 75% занятости
[[email protected] zabbix]# vi /etc/zabbix/zabbix_server.conf
Установите для этого значения StartPingers = 5, а затем перезапустите службу zabbix-server.
2、zabbix unreachable poller processes more than 75 busy
Недостижимые процессы опроса были в состоянии занятости. Что это означает? Проверьте внутренние процессы zabbix и недоступный опросчик-опрос на наличие недоступных устройств для опроса недоступных устройств.
Возможная ситуация:
1. Устройство, которое собирает данные через Zabbix-агент, находится в состоянии мониторинга, но происходит сбой машины или другие причины, в результате чего агент zabbix умирает, и сервер не может получить данные, и в это время активируется недоступный опросчик .
2. Устройство, которое собирает данные через Zabbix-агент, находится в состоянии мониторинга, но серверу требуется слишком много времени для получения данных от агента, часто превышающее время ожидания, установленное сервером, и количество недоступных опросчиков увеличится в это время .
3. MySQL, поддерживающий Zabbix, зависает, может зависнуть ввод-вывод Zabbix сервера, и процесс Zabbix может быть выделен для недостаточной памяти.
Простой метод - увеличить количество процессов, инициализируемых при запуске Zabbix Server, что напрямую увеличивает нагрузку на опрос и пропорционально сокращает количество загруженных ситуаций.
[[email protected] zabbix]# vi /etc/zabbix/zabbix_server.conf
Установите для этого значения StartPollers = 500, а затем перезапустите службу zabbix-server. Вы также можете регулярно перезапускать службу zabbix.
3、Zabbix alerter processes more than 75% busy
Получены сотни предупреждений zabbix:
Zabbix alerter processes more than 75% busy
Возможные причины:
проблема с базой данных zabbix
IO загрузка сервера zabbix
У процесса zabbix недостаточно выделенной памяти
Сетевая задержка или сбой
Подходить:
[[email protected] zabbix] vim /etc/zabbix/zabbix_server.conf
Измените его значение по умолчанию с 5 на 20:
StartPollers=500
Измененное местоположение
# StartDiscoverers=1
StartDiscoverers=100
4. Служба zabbix-server зависает, а после запуска автоматически останавливается, а в журнале ниже много ошибок
Подсказка тревоги
Zabbix value cache working in low memory mode
Less than 25% free in the configuration cache
[[email protected] zabbix] cat /var/log/zabbix/zabbix_server.log
6278:20180320:190117.775 using configuration file: /etc/zabbix/zabbix_server.conf
6278:20180320:190117.807 current database version (mandatory/optional): 03020000/03020001
6278:20180320:190117.807 required mandatory version: 03020000
6278:20180320:190118.378 __mem_malloc: skipped 0 asked 136 skip_min 4294967295 skip_max 0
6278:20180320:190118.378 [file:dbconfig.c,line:653] zbx_mem_malloc(): out of memory (requested 136 bytes)
6278:20180320:190118.378 [file:dbconfig.c,line:653] zbx_mem_malloc(): please increase CacheSize configuration parameter
6354:20180320:190128.632 Starting Zabbix Server. Zabbix 3.2.10 (revision 74337).
[[email protected] zabbix] vi /etc/zabbix/zabbix_server.conf
### Option: CacheSize
# Size of configuration cache, in bytes.
# Shared memory size for storing host, item and trigger data.
#
# Mandatory: no
# Range: 128K-8G
# Default:
# CacheSize=8M
CacheSize=2048M
[[email protected] zabbix]# systemctl restart zabbix-server
Примечание. Сегодня было добавлено пакетно 700 хостов, что привело к переполнению памяти.
5. Журнал zabbix-сервера сообщает об ошибке, в которой говорится, что соединение с базой данных 'zabbix' не удалось: [1040] Ошибка слишком большого количества подключений, mariadb в норме. Я думаю, это должна быть проблема максимального количества подключений MySQL.
Ссылка для изменения максимального количества подключений mysql:
http://blog.51cto.com/net881004/2089198
6. Аварийный сигнал запрашивает более 100 элементов с отсутствующими данными более 10 минут, а опросчик Zabbix обрабатывает более 75% ошибок занятости.
Измените файл конфигурации, чтобы увеличить количество потоков и кеш
[[email protected] zabbix]# vim /usr/local/zabbix/etc/zabbix_server.conf
StartPollers=500
StartPollersUnreachable=50
StartTrappers=30
StartDiscoverers=6
CacheSize=1G
CacheUpdateFrequency=300
StartDBSyncers=20
HistoryCacheSize=512M
TrendCacheSize=256M
HistoryTextCacheSize=80M
ValueCacheSize=1G
7. Есть много журналов сервера, первая сетевая ошибка, подождите 15 секунд, ошибка
В файле конфигурации сервера Timeout time был изменен на большее значение, я изменил его на 30s.
8. Zabbix alarms «Zabbix poller обрабатывает более 75% занятости»(Нетизены)
Причина предупреждения:
1. Процесс завис,
2. Слишком много сбоев в процессе зомби, что привело к замедлению
3. Сетевая задержка (можно игнорировать)
4. zabbix потребляет больше памяти
Предупреждение об опасности:
Общее предупреждение, на данный момент вреда нет (но лучше с ним разобраться)
Метод обработки:
Первое: просто и грубо (перезапуск zabbix-сервера может использоваться в сочетании с задачами синхронизации)
service zabbix-server restart
crontab -e вызывает план в редакторе Cron:
@daily service zabbix-server restart > /dev/null 2>&1
2: отредактируйте файл конфигурации Zabbix Server /etc/zabbix/zabbix_server.conf и найдите параграф, который настраивает StartPollers:
### Option: StartPollers
# Number of pre-forked instances of pollers.
#
# Mandatory: no
# Range: 0-1000
# Default:
# StartPollers=5
Раскомментируйте строку StartPollers = или добавьте ее напрямую:
StartPollers=10
Количество измененных StartPollers зависит от производительности сервера и количества наблюдений. После установки StartPollers на 12 аварийных сигналов не обнаружено. Если памяти достаточно, ее можно установить выше.
9. Утром я получил много сообщений о тревоге, официальный сайт был недоступен, и многие порты серверов были заблокированы. Тем не менее, вы можете получить доступ к официальному веб-сайту с вашего мобильного телефона.В электронном письме много сообщений о тревогах о том, что оповещатель Zabbix обрабатывает более 75% занятости, HTTP-опросчик Zabbix обрабатывает более 75% занятости и сбой порта.
Поскольку конфигурация zabbix была оптимизирована ранее, я думаю, что это не должно быть проблемой с конфигурацией zabbix. Это может быть вызвано отключением или задержкой сети, в которой в это время находится zabbix (после подтверждения сеть в компьютерном зале была отключена на 2 часа, и эти сообщения тревоги были отправлены после восстановления). Похоже, нам нужно контролировать сам сервер zabbix в удаленном месте, и успеть посмотреть на nagios.
1. System log output when starting ZABBIX agent
PID file /run/zabbix/zabbix_agentd.pid not readable (yet?) after star zabbix-agent.service never wrote its PID file. Failing
Restart zabbix agent service and it still can’t start normally. Check /log/zabbix/zabbix-agentd.log and find
zabbix_agentd [5922]: cannot open log: cannot create semaphore set: [28] No space left on device
make some changes:
#vim /etc/sysctl.conf kernel.sem =500 64000 64 256 #sysctl -p /etc/sysctl.conf
After that, it can start normally. (reason: kernel.sem parameter setting is too small. The default setting of the original system is 250 32000 32 128)
Parameter meaning:
The above four data correspond to four core parameters: SEMMSL, SEMMNS, SEMOPM and SEMMNI. The specific meaning and configuration are as follows.
- SEMMSL: used to control the maximum number of signals per signal set.
- SEMMNS: used to control the maximum number of signals (not signal sets) in the entire Linux system.
- SEMOPM: kernel parameter is used to control the number of signal operations that each semap system call can perform.
2. zabbix unreachable poller processes more than 75 busy
unreachable poller processes are always in busy state. What does this mean? Check the official documents zabbix internal process and unreachable poller — poller for unreachable devices for polling unreachable devices.
Possible situation:
- The device collecting data through ZABBIX agent is in the state of moniting, but at this time, the machine crashes or other reasons lead to the death of ZABBIX agent, and the server cannot get the data, and the unreachable poller will rise.
- The device collecting data through Zabbix agent is in the state of moniting, but the time for the server to obtain data from the agent is too long, often exceeding the timeout time set by the server, and the unreachable poller will rise.
- It is possible that MySQL is stuck, IO of Zabbix server is stuck, and Zabbix process has insufficient memory allocation.
terms of settlement:
Increase the number of processes at the start of Zabbix Server, which reduces the busy situation
[root@localhost zabbix]# vi /etc/zabbix/zabbix_server.conf Set this value to StartPollers=500,Then restart zabbix-server Services. It can also restart regularly zabbix Services.
3. Zabbix alerter processes more than 75% busy
Received hundreds of zabbix alarm messages:
Zabbix alerter processes more than 75% busy
Possible causes:
- Database problems of zabbix
- IO overload of zabbix server
- The zabbix process is out of memory allocated
- Network delay or failure
Treatment method:
[root@localhost zabbix] vim /etc/zabbix/zabbix_server.conf
Change the default value of 5 to 20:
StartPollers=500
change
StartDiscoverers=1
StartDiscoverers=100
4. The zabbix-server has hung up and stopped automatically after it was started. There are many errors in the log
Alarm prompt:
Zabbix value cache working in low memory mode Less than 25% free in the configuration cache
Log printing:
[root@localhost zabbix] cat /var/log/zabbix/zabbix_server.log 6278:20180320:190117.775 using configuration file: /etc/zabbix/zabbix_server.conf 6278:20180320:190117.807 current database version (mandatory/optional): 03020000/03020001 6278:20180320:190117.807 required mandatory version: 03020000 6278:20180320:190118.378 __mem_malloc: skipped 0 asked 136 skip_min 4294967295 skip_max 0 6278:20180320:190118.378 [file:dbconfig.c,line:653] zbx_mem_malloc(): out of memory (requested 136 bytes) 6278:20180320:190118.378 [file:dbconfig.c,line:653] zbx_mem_malloc(): please increase CacheSize configuration parameter 6354:20180320:190128.632 Starting Zabbix Server. Zabbix 3.2.10 (revision 74337).
terms of settlement:
Modify configuration, restart
[root@localhost zabbix] vi /etc/zabbix/zabbix_server.conf ### Option: CacheSize # Size of configuration cache, in bytes. # Shared memory size for storing host, item and trigger data. # Mandatory: no # Range: 128K-8G # Default: # CacheSize=8M CacheSize=2048M [root@localhost zabbix]# systemctl restart zabbix-server
5. zabbix server error, connection to database ‘zabbix’ failed: [1040] Too many connections, and mariadb is normal.
It should be the maximum number of connections in mysql.
1,Check the default maximum number of connections. The default is 151 [root@localhost zabbix]# /usr/bin/mysql -u root -p Enter password: Welcome to the MariaDB monitor. Commands end with ; or g. Your MariaDB connection id is 14872 Server version: 5.5.56-MariaDB MariaDB Server Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others. Type 'help;' or 'h' for help. Type 'c' to clear the current input statement. MariaDB [(none)]> show variables like 'max_connections'; +-----------------+-------+ | Variable_name | Value | +-----------------+-------+ | max_connections | 151 | +-----------------+-------+ 1 row in set (0.00 sec) 2,modify /etc/my.cnf configuration file Add a new line in [mysqld]: [root@localhost zabbix]# vi /etc/my.cnf max_connections=1000 restart mariadb Service, and verify the maximum number of connections: [root@localhost zabbix]# systemctl restart mariadb.service [root@localhost zabbix]# /usr/bin/mysql -u root -p Enter password: Welcome to the MariaDB monitor. Commands end with ; or g. Your MariaDB connection id is 446 Server version: 5.5.56-MariaDB MariaDB Server Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others. Type 'help;' or 'h' for help. Type 'c' to clear the current input statement. MariaDB [(none)]> show variables like 'max_connections'; +-----------------+-------+ | Variable_name | Value | +-----------------+-------+ | max_connections | 214 | +-----------------+-------+ 1 row in set (0.00 sec) 3, change /usr/lib/systemd/system/mariadb.service to increase the number of open files. [root@localhost zabbix]# vi /usr/lib/systemd/system/mariadb.service Add these two lines in [Service]: LimitNOFILE=10000 LimitNPROC=10000 4,Reload system services and restart mariadb service [root@localhost zabbix]# systemctl --system daemon-reload [root@localhost zabbix]# systemctl restart mariadb.service 5,Re-verify whether it is 1000 MariaDB [(none)]> show variables like 'max_connections'; ERROR 2006 (HY000): MySQL server has gone away No connection. Trying to reconnect... Connection id: 5 Current database: *** NONE *** +-----------------+-------+ | Variable_name | Value | +-----------------+-------+ | max_connections | 1000 | +-----------------+-------+ 1 row in set (0.00 sec)
6. More than 100 items having missing data for more than 10 minutes and Zabbix poller processes more than 75% busy.
terms of settlement:
Modify the configuration file to increase the number of threads and cache
[root@localhost zabbix]# vim /usr/local/zabbix/etc/zabbix_server.conf StartPollers=500 StartPollersUnreachable=50 StartTrappers=30 StartDiscoverers=6 CacheSize=1G CacheUpdateFrequency=300 StartDBSyncers=20 HistoryCacheSize=512M TrendCacheSize=256M HistoryTextCacheSize=80M ValueCacheSize=1G
7. first network error, wait for 15 seconds
increased Timeout of The server configuration file, I changed it to 30s.
8. Zabbix poller processes more than 75% busy
reason:
1. A process is stuck,
2. Zombie process error, too many, resulting in slow
3. Network delay (negligible)
4.zabbix consumes too much memory
Warning hazard:
Common alarm, no harm (but best to deal with)
Treatment method:
- Simple and rude (restart ZABBIX server, which can be used in combination with cronjob)
service zabbix-server restart crontab -e call out Cron Add a plan to the editor: @daily service zabbix-server restart > /dev/null 2>&1
- Edit the configuration file of Zabbix Server /etc/zabbix/zabbix_server.conf, find the paragraph to configure StartPollers:
### Option: StartPollers # Number of pre-forked instances of pollers. # # Mandatory: no # Range: 0-1000 # Default: # StartPollers=5 StartPollers=10
It depends on the performance of the server and the number of monitors to change startfilters. After setting StartPollers change to 12, there is no alarm again. If there is enough memory, it can be set higher.
9. Error: No route to host
Configure Zabbix_agentd today, through automatic register Zabbix_Server to page, click the host list, but ZBX is shown in red and cannot be monitored
No route to host On the client side telnet 10051 port of the server, There is no problem. but the server telnet Client port 10050 error: telnet 1.1.1.1 10050 Trying 1.1.1.1... telnet: connect to address 120.27.241.253: No route to host It was originally blocked by the client's firewall. Close the client's firewall or configure the corresponding rules
10. active check configuration update from [1.1.1.1:10051] started to fail (ZBX_TCP_READ() timed out)
Phenomenon:
check /var/log/zabbix/zabbix_agentd.log,A lot of errors 1404:20161225:183259.913 active check configuration update from [1.1.1.1:10051] started to fail (ZBX_TCP_READ() timed out)
reason:
It’s zabbix_sender needs to send data to the server actively, but the port 10051 of zabbix_sender is blocked by the firewall. The problem is solved by re-releasing the port
11. Error in startup prompt after ZABBIX installation
[root@bogon zabbix-2.2.2]# /usr/local/zabbix-2.2.2/sbin/zabbix_server /usr/local/zabbix-2.2.2/sbin/zabbix_server: error while loading shared libraries: libmysqlclient.so.16: cannot open shared object file: No such file or directory
It’s because it can’t be found libmysqlclient.so.16 file, you can find the installation directory of mysql, find this file, and then make a soft link
#ln -s /usr/local/mysql/lib/mysql/libmysqlclient.so.16 /usr/lib Or open it /etc/ld.so.confrs file # vim /etc/ld.so.confrs Add: /usr/local/mysql/lib
12.Received empty response from Zabbix Agent at [127.0.0.1]. Assuming that agent dropped connection because of access permissions.
It means that you don’t have access to agent port 10050
The solution is as follows:
Modify the agent link IP 127.0.0.1 of server to local IP
Restart service
systemctl restart zabbix-server
13.Zabbix discoverer processes more than 75% busy
Increase the number of processes at the start of Zabbix Server, which reduces the busy situation
[root@zabbix-server ~]# vim /etc/zabbix/zabbix_server.conf Amend to read StartDiscoverers=5 restart [root@zabbix-server ~]# systemctl restart zabbix-server
14. ZABBIX agent failed to start error
# tail -20 /var/log/zabbix/zabbix_agentd.log ......................... zabbix_agentd [1232]: cannot create PID file [/var/run/zabbix/zabbix_agentd.pid]: [2] No such file or directory zabbix_agentd [3847]: cannot create PID file [/var/run/zabbix/zabbix_agentd.pid]: [2] No such file or directory zabbix_agentd [1724]: cannot create PID file [/var/run/zabbix/zabbix_agentd.pid]: [13] Permission denied
solve
[root@elkstack ~]# mkdir -p /var/run/zabbix/ [root@elkstack ~]# chown zabbix.zabbix /var/run/zabbix/ [root@elkstack ~]# systemctl restart zabbix-agent.service
15. Summary of web page error reporting
Question 1: ZABBIX alert processes more than 75% busy
Cause of the problem:
zabbix server mail process is busy, generally because the interval of setting action is too short. Under special circumstances, a large number of alarms will be generated, such as the process of sending thousands of e-mails by the server, the thread is hung
Solution:
- Delete database solution (high risk, not recommended)
- Modify the e-mail script, change the action of e-mail to print time, wait for the e-mail to be completely released, and then change it back, as follows
[root@m01 ~]# cat /usr/lib/zabbix/alertscripts/sms #!/bin/bash echo `date` >>/tmp/sms.txt
Question 2: Zabbix discoverer processes more than 75% busy
Cause of the problem:
-
01. The discovery auto discovery task is configured. Each configured discovery task takes up one process in a certain period of time, while zabbix_server.conf There is only one discovery in the default configuration (annotated, effective by default)
-
02. In order to quickly verify the automatic discovery effect, set the «Delay» of the discovery task from the default 3600s to 60s
Solution:
- 01. Modify the number of StartDiscoverers processes in the configuration file, remove # at the front, change the value to 5, and finally restart the service
(Note: according to the system hardware configuration, it can be set to a higher value, but the range is 0 ~ 250)
[root@m01 ~]# grep 'StartDiscoverers' /etc/zabbix/zabbix_server.conf ### Option: StartDiscoverers StartDiscoverers=5 [root@m01 ~]# systemctl restart zabbix-server.service
- 02. Write cronjob task script to restart zabbix_server to reduce the load
[root@m01 ~]# crontab -e @daily service zabbix-server restart > /dev/null 2>&1 #The plan is to restart Zabbix service automatically every day to end the zombie process and clean up memory
Question 3: Zabbix poller processes more than 75% busy
Cause of the problem:
- 01. The device that collects data through Zabbix agent crashes or other reasons cause Zabbix agent to die, and the server cannot get the data
- 02. It takes too long for the server to obtain data from the agent, which exceeds the timeout time set by the server
Solution:
- 01. Increase the number of processes initialized when Zabbix Server starts
### Option: StartPollers StartPollers=10 #How much depends on the performance of the server and the number of monitors. If there is enough memory, it can be set higher
- 02. Modify the retention period of lost resources in the template auto discovery rule to 0
Question 4: Zabbix housekeeper processes more than 75% busy
Cause of the problem:
In order to prevent the database from growing continuously, zabbix has the mechanism of automatically deleting historical data, that is, housekeeper. When mysql deletes data, the performance will be reduced and the error will be reported
Solution:
Adjusting the HousekeepingFrequency parameter
HousekeepingFrequency=1 #Interval time MaxHousekeeperDelete=1000000 #Maximum deletion
Problem 5: Zabbix server memory overflow, unable to start
Cause of the problem:
After using zabbix for a period of time, add a batch of switch monitoring again, and zabbix server will be unable to start. The log is displayed as follows (prompt memory overflow, you need to adjust zabbix server configuration zabbix_server.conf )
1 2816:20170725:174352.675 [file:dbconfig.c,line:652] zbx_mem_realloc(): out of memory (requested 162664 bytes)2 3 2816:20170725:174352.675 [file:dbconfig.c,line:652] zbx_mem_realloc(): please increase CacheSize configuration parameter
Solution:
#vim zabbix_server.conf CacheSize=1024M #The default is 8M
Question 6: PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 11 bytes)
Cause of the problem:
Some pages of zabbix can’t be opened. Looking at the php log, we find that there is an error when visiting this page. There is not enough memory
Solution:
It’s not clear if there is a memory leak. The easiest way is to increase the available memory of the php process
[root@zabbix-master ~]# grep 'memory_limit' /etc/httpd/conf.d/zabbix.conf php_value memory_limit 512M #Default 128M
16.cannot connect to [[172.16.2.225]:10050]: [113] No route to host
This is usually a network connection problem
telnet 172.16.2.225 10050 on the server, if is the same error. Check whether iptables and selinux are closed
17.zabbix server is not running: the information displayed may not be current.
Edit zabbix.conf.php File, change the original value of $ZBX_SERVER localhost to local IP address.
vim /etc/zabbix/web/zabbix.conf.php $ZBX_SERVER = '172.16.2.116';
18.
-
Open the zabbix web interface and click profile, see following error message:
scandir() has been disabled for security reasons [profile.php:198 → CView->
solve:
scandir is written in disable_functions. in disable_functions of php.ini, Just remove scandir. (restart php-fpm and nginx)
-
Error when adding windows monitor:
Get value error: ZBX_TCP_READ() failed: [104] Connection reset by peer
solve:
Under windows agentd.conf The file IP address is not correct -
zabbix is open but there is no data to show
try another browser
-
Configure zabbix-server to monitor IPMI
Compiling with --with-openipmi parameter error. configure: error: Invalid OPENIPMI directory - unable to find ipmiif.h
solve:
#yum install net-snmp-devel OpenIPMI OpenIPMI-devel rpm-build
19. zabbix_server dead but subsys locked error
Today, we upgraded Zabbix from 3.2 to 3.4. But when Zabbix_Server is running, appears «Server dead but subsys locked».
Cause of the problem
After checking zabbix_server log. The following alarms are found in the log
zbx_mem_malloc(): out of memory (requested 256 bytes) zbx_mem_malloc(): please increase CacheSize configuration parameter
The cause of the error is very clear, memory overflow, please adjust the CacheSize.
Problem solving
Edit zabbix_server.conf configuration file, positioned to the CacheSize keyword location, increase CacheSize according to the environment.
# Size of configuration cache, in bytes. # Shared memory size for storing host, item and trigger data. # Mandatory: no # Range: 128K-8G # Default: CacheSize=32M
Finally, restart zabbix_server service.
Zabbix value cache working in low memory mode
Problem solving:
Edit zabbix_server.conf configuration file, positioned to the ValueCacheSize keyword location, increase the ValueCacheSize according to the environment.
# Option: ValueCacheSize # Size of history value cache, in bytes. # Shared memory size for caching item history data requests. # Setting to 0 disables value cache. # # Mandatory: no # Range: 0,128K-64G # Default: ValueCacheSize=2048M
20. Error resolution:
-
Error installing zabbix:
① Error: always prompt gcc not find when compiling zabbix
Solution: install development tools, command:
yum -y groupinstall «Delvelopment Tools»
② Error: when compiling zabbix, it prompts mysqlclient not find
Solution: install MySQL devel, command:
yum -y install mysql-devel
③ Error: enter 127.0.0.1/zabbix/setup.php Prompt 403 for hidden
Solution: close Selinux, use setenforce 0 command, or vim /etc/selinux/config, change SELINUX=enforcing to SELINUX=disabled, and then restart linux. -
Error occurred during use:
① Error: zabbix running status shows no, not running
Solution: first check whether the ZABBIX service is not started, and use /etc/init.d/zabbix_server start to starts ZABBIX service;
If still not working, vim/var/www/html/zabbix/conf/zabbix.conf.php , change the $ZBX_SERVER in configuration to the IP address of the server. The default value is 127.0.0.1, and then restart zabbix_server service;
② Error: zabbix agent unreachable.
Solution: vim /usr/local/etc/zabbix_agentd.conf (the path could be different according to personal conditions) check whether the Hostname is the same as the configuration host name. If it is different, change the host name and change the Server to ip.
③ Error: Lack of free swap space
Solutions: 1. Check the Swap space,
Command: free -m
If the returned information is empty, means Swap file does not exist.
2. Check the file system,
Command: df -hal
Check the returned information, and make sure there is enough hard disk space left.
3. Create and allow Swap files,
Command: dd if=/dev/zero of=/swapfile bs=1024 count=2048000
Parameter interpretation:
If=file_name: enter the file name, which is the standard input by default. Specifies the source file. < if=input file >
Of=file_name: output file name, default to standard output. The destination file is specified. < of=output file >
bs=bytes: set the read in / output block size to bytes at the same time
count=blocks: only blocks are copied, and the block size is equal to the number of bytes specified by bs.
4. Format and activate the Swap file,
Command:
format Swap: mkswap /swapfile
Activate swap: swapon /swapfile
View swap: swapon -s
Modify fstab configuration: vim /etc/fstab, add /swapfile swap swap defaults 0 0 at the end
Authorized by:
chown root:root /swapfile
chmod 0600 /swapfile
④ Error: zabbix custom key display is not enabled, bad interpreter error in log
Solution: when sh file created in windows runs in linux, because the hidden character ^M is added after each line in window, the bad interpreter error is caused when linux compiles because ^M cannot be compiled. Use vi -b to find ^M and then delete it.
21.Zabbix is an enterprise level open source distributed monitoring software based on WEB interface. Many people will repeatedly encounter various problems when deploying and configuring zabbix, and forget to make records after temporary solutions. This is a very bad habit. This paper summarizes the solutions to common errors for your reference.
Question 1:
After using the source code to install, can’t use MySQL database on zabbix’s web page.
resolvent:
zabbix needs php to support mysqli; when using the source code to install php, you need to add the –with-mysqli=mysqlnd parameter.
Question 2:
When ./configure, prompt configure: error: configure: error: Invalid Net-SNMP directory – unable to find net-snmp-config
resolvent:
yum install -y net-snmp-devel libxml2-devel libcurl-devel
Question 3:
After filling in MySQL information on zabbix web page, the next step is to prompt: The frontend does not match Zabbix database
resolvent:
After confirming that the mysql account information is correct, check whether the initialization of zabbix database is successful. If an error is still, reinitialize zabbix database.
Question 4:
When installing zabbix on the web page, Unable to create the configuration file error
resolvent:
Set the web server user to have write permission in the conf/ directory of zabbix web page, and the configuration file will be saved automatically.
Question 5:
After the installation of zabbix, the Chinese language cannot be selected in the page of admin > admin profile
**Solution: * modify zabbix/include/locales.inc.php in the directory of zabbix website, (Chinese language is supported default)
Find ‘zh_CN’ => [‘name’ => _(‘Chinese (zh_CN)’), ‘display’ => false] change false to true
Question 6:
After the background language is changed to Chinese, the Chinese characters in the graphics are displayed as random squares
resolvent:
[root@eazence ~]# cd /etc/nginx/html/zabbix/fonts/ #This is the font path of the zabbix web page [root@eazence fonts]# ls DejaVuSans.ttf [root@eazence fonts]# wget -c http://www.138096.com/simkai.ttf [root@eazence fonts]# cp -p DejaVuSans.ttf DejaVuSans.ttf.bak [root@eazence fonts]# mv -f simkai.ttf DejaVuSans.ttf #After completing this step, refresh the web page
22.
-
In Zabbix’s Dashboard, the Status of Zabbix:
Zabbix server is running’s value is «No»Solutions:
Consider that the account connected to the database in the configuration file of ZABBIX server does not have enough permission to the ZABBIX database, and modify the permission of the account to the database; -
ITEM cannot receive data and report the following error:
Received value [0.05] is not suitable for value type [Numeric (unsigned)]
Solutions:
Modify the default value of CacheSize in Zabbix Server configuration file to rise the CacheSize as much as possible;
Or there is an error in the configuration of Type of information in the ITEM, and change it to an appropriate format
23. Error in importing percona template
Import failed Invalid XML tag "/zabbix_export/date": "YYYY-MM-DDThh:mm:ssZ" is expected.
terms of settlement
import zabbix_agent_template_percona_mysql_server_ht_2.0.9-sver1.1.6.xml into zabbix2.4 and exported. After that, the new exported XML is imported into 3.0 to solve the problem.
Percona template exported from zabbix3.0: Percona-MySQL-Server-Template
24.Zabbix Server suddenly hangs up. Check the log and report the following error:
using configuration file: /etc/zabbix/zabbix_server.conf...[file:dbconfig.c,line:545] zbx_mem_malloc(): out of memory (requested 16 bytes)[file:dbconfig.c,line:545] zbx_mem_malloc(): please increase CacheSize configuration parameter
The repair method has been clearly indicated in the error report: please increase cache size configuration parameter
So we went to zabbix_server.conf The CacheSize field was found in
### Option: CacheSize# Size of configuration cache, in bytes. # Shared memory size for storing host, item and trigger data .## Mandatory: no# Range: 128K-8G# Default: # CacheSize=8M According to the server configuration, modify the CacheSize ### Option: CacheSize # Size of configuration cache, in bytes. # Shared memory size for storing host, item and trigger data. ## Mandatory: no # Range: 128K-8G# Default:CacheSize=2048M restart Zabbix Server that will do
25.Zabbix log error summary
zabbix_agentd.log
Error one
no active checks on server [*.*.*.*:10051]: host [*] not found
The reason for this error is usually the Hostname of zabbix_agentd.conf is different with the front-end zabbix web (Monitoring->Configuration->Hosts Name)
Solution: change the Host name in the Monitoring->Configuration->Hosts page, the same as the Hostname in zabbix_agentd.conf.
Error two
activecheck configuration update from [127.0.0.1:10051] started to fail (cannotconnect to [[127.0.0.1]:10051]: [111] Connection refused)
solve:
There are errors in the places marked above. We can edit them
etc/zabbix/zabbix_agentd.conf
#ServerActive=127.0.0.1
And restart zabbix agent.
zabbix_server.log
1,failed to accept an incoming connection: connection from “. . . ” rejected, allowed hosts: «127.0.0.1» this is zabbix_agentd.conf File configuration error
vim /usr/local/zabbix/etc/zabbix_agentd.conf
Revision:
Server=your server address ServerActive=your server address
Hostname=your client name
27.zabbix_agentd.log
- Error one
no active checks on server [*.*.*.*:10051]: host [*] not found
The reason for this error is usually the Hostname in zabbix_agentd.conf is different with configuration of the host Name in the front-end zabbix web (Monitoring->Configuration->Hosts Name)
solve
On the zabbix web page Monitoring->Configuration->Hosts, change the Host name to the same as Host name in zabbix_agentd.conf
- Error two
activecheck configuration update from [127.0.0.1:10051] started to fail (cannotconnect to [[127.0.0.1]:10051]: [111] Connection refused)solve:
edit etc/zabbix/zabbix_agentd.conf
#ServerActive=127.0.0.1
and restart zabbix agent
28. failed to accept an incoming connection: connection from «. . . » rejected, allowed hosts: «127.0.0.1»
This is zabbix_agentd.conf File configuration error
# vim /usr/local/zabbix/etc/zabbix_agentd.conf Revision: Server=Your server address ServerActive=Your server address Hostname=Your client name in especial Hostname Hostname in zabbix_agentd.conf must be the same with theHostname in web management interface
29. Before logging into Zabbix, confirm that Nginx service is turned on, php-fpm is turned on, service zabbix_server start server_agentd start
The following error occurred in Zabbix login due to unexpected power off
Database error
Error connecting to database: Can’t connect to local MySQL server through socket ‘/tmp/mysql.sock’ (2)
Unable to connect to the database. Please make sure the database is running
When I want to start the database service, the database error again, because I didn’t start the hot backup….
[root@dep5 ~]# service mysqld statusMySQL is not running, but lock file (/var/lock/subsys/mysql[fail]ts[root@dep5 ~]# service mysqld startStarting MySQL...The server quit without updating PID file mysql.pid).
#View log
#[root@dep5 ~]# vim /data/mysqldb/log/mysql-error.log 2016-09-03 16:26:43 10550 [ERROR] InnoDB: Attempted to open a previously opened tablespace. Previous tablespace zabbix/groups uses space ID: 3 at filepath: ./zabbix/groups.ibd. Cannot open tablespace mysql/slave_relay_log_info which uses space ID: 3 at filepath: ./mysql/slave_relay_log_info.ibd2016-09-03 16:26:43 7f4097e0a720 InnoDB: Operating system error number 2 in a file operation.InnoDB: The error means the system cannot find the path specified.InnoDB: If you are installing InnoDB, remember that you must createInnoDB: directories yourself, InnoDB does not create them.InnoDB: Error: could not open single-table tablespace file ./mysql/slave_relay_log_info.ibdInnoDB: We do not continue the crash recovery, because the table may becomeInnoDB: corrupt if we cannot apply the log records in the InnoDB log to it.InnoDB: To fix the problem and start mysqld:InnoDB: 1) If there is a permission problem in the file and mysqld cannotInnoDB: open the file, you should modify the permissions.InnoDB: 2) If the table is not needed, or you can restore it from a backup,InnoDB: then you can remove the .ibd file, and InnoDB will do a normalInnoDB: crash recovery and ignore that table.InnoDB: 3) If the file system or the disk is broken, and you cannot removeInnoDB: the .ibd file, you can set innodb_force_recovery > 0 in my.cnfInnoDB: and force InnoDB to continue crash recovery here.160903 16:26:43 mysqld_safe mysqld from pid file /tmp/mysql.pid ended
mysql log gives their respective solutions
1) Authority problem, modify authority OK
2) If it’s a file system or magnetic damage, you can’t remove it. You can do it in your my.cnf innodb_force_recovery > 0
solve:
[root@dep5 ~]# vim /etc/my.cnf
#innodbinnodb_file_per_table = 1
innodb_data_file_path = ibdata1:2048M:autoextend
innodb_log_file_size = 128m
innodb_log_files_in_group = 3
innodb_buffer_pool_size = 60M
innodb_buffer_pool_instances = -1
innodb_max_dirty_pages_pct = 70
#innodb_thread_concurrency = 8
innodb_flush_method = O_DIRECT
innodb_log_buffer_size = 16m
innodb_flush_log_at_trx_commit = 2
innodb_force_recovery = 1 # add this to Ok
#[root@dep5 ~]# vim /etc/my.cnf
#[root@dep5 ~]# service mysqld start
#Starting MySQL…
I have a look at the following fragments in the database log after the successful startup. I guess Zabbix can’t open normally
2016-09-03 16:41:33 18646 [Warning] Info table is not ready to be used. Table 'mysql.slave_master_info' cannot be opened.2016-09-03 16:41:33 18646 [Warning] InnoDB: Cannot open table mysql/slave_worker_info from the internal data dictionary of InnoDB though the .frm file for the table exists. See http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting.html for how you can resolve the problem.2016-09-03 16:41:33 18646 [Warning] InnoDB: Cannot open table mysql/slave_relay_log_info from the internal data dictionary of InnoDB though the .frm file for the table exists. See http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting.html for how you can resolve the problem.2016-09-03 16:41:33 18646 [Warning] Info table is not ready to be used. Table 'mysql.slave_relay_log_info' cannot be opened.2016-09-03 16:41:34 18646 [Note] Event Scheduler: Loaded 0 events2016-09-03 16:41:34 18646 [Note] /usr/local/mysql/bin/mysqld: ready for connections.Version: '5.6.31-log' socket: '/tmp/mysql.sock' port: 3306 Source distribution2016-09-03 16:41:34 18646 [Note] Event Scheduler: scheduler thread started with id 12016-09-03 16:41:39 7feb5261e700 InnoDB: Error: Table "mysql"."innodb_table_stats" not found.2016-09-03 16:41:39 7feb5261e700 InnoDB: Error: Fetch of persistent statistics requested for table "zabbix"."users" but the required system tables mysql.innodb_table_stats and mysql.innodb_index_stats are not present or have unexpected structure. Using transient stats instead.2016-09-03 16:41:39 7feb5261e700 InnoDB: Error: Table "mysql"."innodb_table_stats" not found.
This is the interface that Zabbix opens,
I’m thinking about the comments my.cnf Which line to add,
Although the restart of mysql is OK, the mysql log is refreshed once
2016-09-03 16:48:11 7f37cdfb7700 InnoDB: Error: Table "mysql"."innodb_table_stats" not found.2016-09-03 16:48:11 7f37cdfb7700 InnoDB: Error: Fetch of persistent statistics requested for table "zabbix"."media_type" but the required system tables mysql.innodb_table_stats and mysql.innodb_index_stats are not present or have unexpected structure. Using transient stats instead.
I just want to fix the table…
[root@dep5 ~]# mysqlcheck -r zabbixzabbix.acknowledgesnote : The storage engine for the table doesn't support repairzabbix.actionsnote : The storage engine for the table doesn't support repairzabbix.alerts
Tragedy, I guess the engine of zabbix database should be myisam, no engine..
Using MySQL 5.6 or higher, since MySQL was acquired by Oracle, its performance has improved a lot. Please choose innodb instead of myisam, because the performance of zabbix in innodb is 1.5 times faster than that in myisam, and myisam is not safe. zabbix has a large amount of monitoring data. Once the table is broken, it will be a tragedy.
Tragedy!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
be careful:
After all, the most stupid way I can think of is to do it all over again (no configuration backup, no engine modification, so embarrassing)
31.
- System log output when starting ZABBIX agent
PID file /run/zabbix/zabbix_agentd.pid not readable (yet?) after star zabbix-agent.service never wrote its PID file. Failing
At the same time, run systemctl status zabbix-agent.service, See that selinux is there. run getenforce command, it is found that selinux is open,then close selinux
Restart zabbix-agent service and it still can’t start normally. Check /var/log/zabbix/zabbix-agentd.log, The system prompts zabbix shared memory error
zabbix_agentd [5922]: cannot open log: cannot create semaphore set: [28] No space left on device
As shown in the figure:
After that, vim /etc/sysctl.conf
kernel.sem =500 64000 64 256
After sysctl -p /etc/sysctl.conf, it can start normally. (error reason: kernel.sem parameter setting is too small. The default setting of the original system is 250 32000 32 128)
Parameter meaning
The above four data correspond to four core parameters: SEMMSL, SEMMNS, SEMOPM and SEMMNI. The specific meaning and configuration are as follows.
1.SEMMSL: used to control the maximum number of signals in each signal set.
2.SEMMNS: used to control the maximum number of signals (not signal sets) in the whole Linux system.
3.SEMOPM: kernel parameters are used to control the number of signal operations that each semop system call can perform.
4.SEMMNI: kernel parameters are used to control the maximum number of signal sets in the whole Linux system.
32.1. zabbix dashboard error
Question:
zabbix server is not running: the information displayed may not be current
Solution:
This error can be caused by several situations: 1) zabbix-server does not install zabbix-agent, or it installs the port 2 on which the agent is not detected;)
2. Error in log
Question:
172730.555 [Z3001] connection to database ‘zabbix’ failed: [1045] Access denied for
Solution:
Modify configuration file
shell->vim/etc/zabbix/zabbix-server.confDBPassword=zabbix
Restart service
shell->/etc/init.d/zabbix-server restart
view log
shell->tail -f /var/log/zabbix/zabbix-server.log
- There is no Chinese environment
Question:
You are not able to choose some of the languages, because locales for them are not installed on the
Solution:
-
Enable Chinese
vi /usr/share/zabbix/include/locales.inc.php change zh_CN to true, and then go to the web interface to select the language. You are not able to choose some of the languages, because locales for them are not installed on the web server. it’s because there is no CN env in your system. So the first step is to set up the Chinese environment, install the Chinese package:
apt-get install language-pack-zh-hant language-pack-zh-hans
The second step is to configure the relevant environment variables: vi /etc/environment, and add the language and coding settings in the file: LANG=»zh_CN.UTF-8″ LANGUAGE=»zh_CN:zh:en_US:en», The third step is to reset the local configuration: dpkg-reconfigure locales. Now restart Apache & zabbix_serve two services, should be able to choose..
4. The important mibs library must be updated, otherwise the mib will report an error when snmp monitors the switch. (unmeasured)
apt-get install snmp-mibs-downloade
- Apt get update error
Question:
Failed to fetch http://ubuntu.kurento.org/dists/trusty/kms6/binary-i386/Packages 403 Forbidden [IP: 112.124.140.210 80]
Solution:
There is no permission (403) problem in apt-get update. 112.124.140.210 is apt proxy address. It is best to modify (or delete) apt.conf, delete this proxy, your ubuntu must be able to access the Internet.
32. zabbix3.2 upgrade to 3.4: Database error
After upgrading from zabbix3.2 to zabbix3.4, open the page and report an error. The error content is as follows
Database error The frontend does not match Zabbix database. Current database version (mandatory/optional): 3020000/3020000. Required mandatory version: 3040000. Contact your system administrator.
terms of settlement:
Enter database
mysql> show databases; mysql> use zabbix; mysql> update dbversion set mandatory=3040000; mysql> flush privileges; Reopen web It can be solved
34.zabbix error: cannot connect to [[192.168.119.110]: 10050]: [111] connection reused
Error analysis: Connection refused!
(1) The network between client and server is not connected;
(2) The client service is blocked by firewall;
(3) The network segment is blocked by physical firewall.
resolvent:
(1) View log: view and analyze error causes
root@a-desktop:~# tail /var/log/zabbix-agent/zabbix_agentd.log 5927:20160913:101039.428 agent #2 started [listener #2] 5923:20160913:102113.808 Got signal [signal:15(SIGTERM),sender_pid:5999,sender_uid:0,reason:0]. Exiting ... 5923:20160913:102113.810 Zabbix Agent stopped. Zabbix 2.2.2 (revision 42525). 6004:20160913:102113.824 Starting Zabbix Agent [Cloud_platform002]. Zabbix 2.2.2 (revision 42525). 6004:20160913:102113.824 using configuration file: /etc/zabbix/zabbix_agentd.conf 6005:20160913:102113.824 agent #0 started [collector] 6006:20160913:102113.825 agent #1 started [listener #1] 6007:20160913:102113.825 agent #2 started [listener #2] 6008:20160913:102113.825 agent #3 started [listener #3] 6009:20160913:102113.825 agent #4 started [active checks #1]
(2) If the network is blocked, you can do domain name resolution or realize data collection through zabbix-agent
(3) firewall
Add rule: iptables -I INPUT -p tcp -m multiport —destination-port 80,10050:10051 -j ACCEPT
(4) Physical firewall
Similarly, open a 10050 TCP port on the wall
35. zabbix broken graph caused by sudo bug
I’ve always found that the update value of a machine will become lower after a period of time. I haven’t found rc before. I just restart the agent to repair it. Recently, my colleagues carefully found that it may be related to sudo’s bug.
Let’s go back and verify the whole troubleshooting process.
- Get the missing data item through zabbix database, and take out the list of missing (not updated in 20 minutes) values
select b.key_,b.lastvalue,from_unixtime(b.lastclock) from hosts a, items b where a.hostid=b.hostid and a.host='xxxxxx' and b.lastclock < (unix_timestamp() - 1200) limit 10;
Like here agent.ping :
Observe the monitoring chart and find that the data is lost after 18:20
- Analyzing the log of zabbix agent
It is found that the following log appears around 18:24, and there are only a large number of update_cpustats status. At the same time, one line of kill command failed log is found
27589:20141021:182442.143 In zbx_popen() command:'sudo hadoop_stats.sh nodemanager StopContainerAvgTime' 27589:20141021:182442.143 End of zbx_popen():5 48430:20141021:182442.143 zbx_popen(): executing script 27585:20141021:182442.284 In update_cpustats() 27585:20141021:182442.285 End of update_cpustats() 27585:20141021:182443.285 In update_cpustats() 27585:20141021:182443.286 End of update_cpustats() 27585:20141021:182444.286 In update_cpustats() 27585:20141021:182444.287 End of update_cpustats() 27585:20141021:182445.287 In update_cpustats() 27585:20141021:182445.287 End of update_cpustats() 27585:20141021:182446.288 In update_cpustats() 27585:20141021:182446.288 End of update_cpustats() .......... 27585:20141021:182508.305 In update_cpustats() 27585:20141021:182508.305 End of update_cpustats() 27585:20141021:182509.306 In update_cpustats() 27585:20141021:182509.306 End of update_cpustats() 27585:20141021:182510.306 In update_cpustats() 27585:20141021:182510.307 End of update_cpustats() 27585:20141021:182511.307 In update_cpustats() 27585:20141021:182511.308 End of update_cpustats() 27589:20141021:182512.154 failed to kill [sudo hadoop_stats.sh nodemanager StopContainerAvgTime]: [1] Operation not permitted 27589:20141021:182512.155 In zbx_waitpid() 27585:20141021:182512.308 In update_cpustats() 27585:20141021:182512.309 End of update_cpustats() 27585:20141021:182513.309 In update_cpustats() 27585:20141021:182513.309 End of update_cpustats()
Compared with the normal log:
27589:20141021:180054.376 In zbx_popen() command:'sudo hadoop_stats.sh nodemanager StopContainerAvgTime' 27589:20141021:180054.377 End of zbx_popen():5 18798:20141021:180054.377 zbx_popen(): executing script 27589:20141021:180054.384 In zbx_waitpid() 27589:20141021:180054.384 zbx_waitpid() exited, status:1 27589:20141021:180054.384 End of zbx_waitpid():18798 27589:20141021:180054.384 Run remote command [sudo hadoop_stats.sh nodemanager StopContainerAvgTime] Result [2] [-1]... 27589:20141021:180054.384 For key [hadoop_stats[nodemanager,StopContainerAvgTime]] received value [-1] 27589:20141021:180054.384 In process_value() key:'gd6g203s80-hadoop-datanode.idc.vipshop.com:hadoop_stats[nodemanager,StopContainerAvgTime]' value:'-1' 27589:20141021:180054.384 In send_buffer() host:'10.200.100.28' port:10051 values:37/50 27589:20141021:180054.384 Will not send now. Now 1413885654 lastsent 1413885654 < 1 27589:20141021:180054.385 End of send_buffer():SUCCEED 27589:20141021:180054.385 buffer: new element 37 27589:20141021:180054.385 End of process_value():SUCCEED
You can see that under normal circumstances, the script will have a return value, but when there is a problem, the script does not have a return value. Moreover, because the script is run with sudo, zabbix started by an ordinary user can’t kill this command(Operation not permitted error) at timeout
- Suppose that the ordinary user who starts zabbix agent here is apps user, let’s take a look at the current state of the script
# ps -ef|grep hadoop_stats.sh root 34494 31429 0 12:54 pts/0 00:00:00 grep 48430 root 48430 27589 0 Oct21 ? 00:00:00 sudo hadoop_stats.sh nodemanager StopContainerAvgTime root 48431 48430 0 Oct21 ? 00:00:00 [hadoop_stats.sh] <defunct>
As you can see, a zombie process is generated here http://en.wikipedia.org/wiki/Zombie_process)
Zombie process is caused by sending SIGCHLD to the parent process after the child process finishes running, but the parent process does not process the signal normally.
You have killed the process, but a dead process doesn't disappear from the process table until its parent process performs a task called "reaping" (essentially calling wait(3) for that process to read its exit status). Dead processes that haven't been reaped are called "zombie processes." The parent process id you see for 31756 is process id 1, which always belongs to init. That process should reap its zombie processes periodically, but if it can't, they will remain zombies in the process table until you reboot.
Under normal process conditions, we use strace attach to the parent process, and then kill the child process to see the following information:
Process 3036 attached - interrupt to quit select(6, [5], [], NULL, NULL ) = ? ERESTARTNOHAND (To be restarted) --- SIGCHLD (Child exited) @ 0 (0) --- rt_sigreturn(0x11) = -1 EINTR (Interrupted system call) wait4(3037, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], WNOHANG|WSTOPPED, NULL) = 3037 exit_group(143) = ? Process 3036 detached
After a zombie process is generated, it can be orphaned by killing the parent process (the parent process is init process)
However, because the script is started with sudo, the startup user is root, and the apps user has no authority to kill the startup command. As a result, the child process is always a zombie process
- Take a look at the process started by zabbix agent
# ps -ef|grep zabbix apps 27583 1 0 Sep09 ? 00:00:00 /apps/svr/zabbix/sbin/zabbix_agentd -c /apps/conf/zabbix_agentd.conf apps 27585 27583 0 Sep09 ? 00:33:25 /apps/svr/zabbix/sbin/zabbix_agentd -c /apps/conf/zabbix_agentd.conf apps 27586 27583 0 Sep09 ? 00:00:14 /apps/svr/zabbix/sbin/zabbix_agentd -c /apps/conf/zabbix_agentd.conf apps 27587 27583 0 Sep09 ? 00:00:14 /apps/svr/zabbix/sbin/zabbix_agentd -c /apps/conf/zabbix_agentd.conf apps 27588 27583 0 Sep09 ? 00:00:14 /apps/svr/zabbix/sbin/zabbix_agentd -c /apps/conf/zabbix_agentd.conf apps 27589 27583 0 Sep09 ? 02:28:12 /apps/svr/zabbix/sbin/zabbix_agentd -c /apps/conf/zabbix_agentd.conf root 34207 31429 0 12:54 pts/0 00:00:00 grep zabbix root 48430 27589 0 Oct21 ? 00:00:00 sudo /apps/sh/zabbix_scripts/hadoop/hadoop_stats.sh nodemanager StopContainerAvgTime
Through strace, we find that the process of 27589 has been waiting for the process of 48430
strace -p 27589 Process 27589 attached - interrupt to quit wait4(48430, ^C <unfinished ...> Process 27589 detached
The process of 48430 is the parent process of the zombie process. Through strace attach, you can see it’s waiting for fd of #5
#strace -p 48430 Process 48430 attached - interrupt to quit select(6, [5], [], NULL, NULL^C <unfinished ...> Process 48430 detached
From lsof, we can see that the fd of #5 is actually a socket
#lsof -p 48430 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME sudo 48430 root cwd DIR 8,2 4096 2 / sudo 48430 root rtd DIR 8,2 4096 2 / sudo 48430 root txt REG 8,2 212904 1578739 /usr/bin/sudo sudo 48430 root mem REG 8,2 65928 1441822 /lib64/libnss_files-2.12.so sudo 48430 root mem REG 8,2 99158704 1573509 /usr/lib/locale/locale-archive sudo 48430 root mem REG 8,2 91096 1441832 /lib64/libz.so.1.2.3 sudo 48430 root mem REG 8,2 141576 1442145 /lib64/libpthread-2.12.so sudo 48430 root mem REG 8,2 386040 1442172 /lib64/libfreebl3.so sudo 48430 root mem REG 8,2 108728 1575924 /usr/lib64/libsasl2.so.2.0.23 sudo 48430 root mem REG 8,2 243064 1441896 /lib64/libnspr4.so sudo 48430 root mem REG 8,2 21256 1442186 /lib64/libplc4.so sudo 48430 root mem REG 8,2 17096 1442187 /lib64/libplds4.so sudo 48430 root mem REG 8,2 128368 1577789 /usr/lib64/libnssutil3.so sudo 48430 root mem REG 8,2 1290648 1582418 /usr/lib64/libnss3.so sudo 48430 root mem REG 8,2 188072 1575925 /usr/lib64/libsmime3.so sudo 48430 root mem REG 8,2 220200 1587191 /usr/lib64/libssl3.so sudo 48430 root mem REG 8,2 113952 1442182 /lib64/libresolv-2.12.so sudo 48430 root mem REG 8,2 43392 1442173 /lib64/libcrypt-2.12.so sudo 48430 root mem REG 8,2 63304 1442180 /lib64/liblber-2.4.so.2.5.6 sudo 48430 root mem REG 8,2 1979000 1442169 /lib64/libc-2.12.so sudo 48430 root mem REG 8,2 308912 1442181 /lib64/libldap-2.4.so.2.5.6 sudo 48430 root mem REG 8,2 22536 1442171 /lib64/libdl-2.12.so sudo 48430 root mem REG 8,2 58480 1442174 /lib64/libpam.so.0.82.2 sudo 48430 root mem REG 8,2 17520 1441884 /lib64/libutil-2.12.so sudo 48430 root mem REG 8,2 124624 1441798 /lib64/libselinux.so.1 sudo 48430 root mem REG 8,2 99112 1442170 /lib64/libaudit.so.1.0.0 sudo 48430 root mem REG 8,2 156872 1442168 /lib64/ld-2.12.so sudo 48430 root 0r CHR 1,3 0t0 3916 /dev/null sudo 48430 root 1w FIFO 0,8 0t0 1429910151 pipe sudo 48430 root 2w REG 8,3 376639626 524292 /apps/logs/zabbix/zabbix_agentd.log sudo 48430 root 3u sock 0,6 0t0 1429910161 can't identify protocol sudo 48430 root 4r REG 8,2 764 2240617 /etc/group sudo 48430 root 5u unix 0xffff880179ee4680 0t0 1429910162 socket
Here, by looking at the status of the file descriptor under /proc/pid/fd, we find that the fd is actually closed.
In this case, it is possible that the child process has completed running, but the parent process does not correctly handle the return information of the child process, which leads to the parent process always thinking that the child process is still running, and eventually produces a zombie process.
This is actually a bug of sudo. Related bug id:
http://www.gratisoft.us/bugzilla/show_bug.cgi?id=447
Description of bug:
If the parent process gets re-scheduled after the "if" was executed, and at this very time the child process finishes and SIGCHLD is sent to the parent process, sudo gets in trouble. The SIGCHLD handler accounts in the variable "recvsig[]" that the signal was received, and then the parent process calls select(). This select will never be interrupted, as the author had it in mind. In 99% of the cases, the parent process will enter in the select() blocking state before the child process ended. The child would then send SIGCHLD, which will be accounted in the handler procedure, and will also interrupt select() which will return -1 in "nready", and "errno" will be set to EINTR.
The problem lies in sudo’s code sudo/file/tip/src/exec.c, all the versions before 1.7.5 or 1.8.0 have this issue. When the subprocess exits just before the system call select(), the handle has been exited, so sudo will be stuck here
patch:
http://www.sudo.ws/repos/sudo/rev/99adc5ea7f0a
Avoid a potential race condition if SIGCHLD is received immediately before we call select().
Someone has the same problem:
sudo hangs and leaves the executed program as “zombie”
36 phenomenon:
The configuration of zabbix has not been changed. Suddenly, an error is reported when zabbix server starts
Can't open PID file /run/zabbix/zabbix_server.pid (yet?) after start: No such file or directory
solution:
Restart virtual machine
Стандартные настройки в zabbix, не всегда хорошо работают и их необходимо тюнить под свои нужды. Я собрал в данной статье, основные параметры для оптимизации. Возможно, — это не все и вы знаете больше, но я постараюсь максимизировать.
Настройка кеша
Для оптимизации заббикс сервера, стоит увеличить размер кеша, для этого — открываем:
# vim /etc/zabbix/zabbix_server.conf
Находим строку «CacheSize» и увеличиваем его.
### Option: CacheSize # Size of configuration cache, in bytes. # Shared memory size for storing host, item and trigger data. # # Mandatory: no # Range: 128K-8G # Default: CacheSize=256M
Я увеличил до 256M. При надобности, можно добавить.
Zabbix discoverer processes more than 75% busy
Недавно получил алерт в заббиксе:
Zabbix discoverer processes more than 75% busy
Это можно исправить, откроем zabbix_server.conf конфиг-файл:
# vim /etc/zabbix/zabbix_server.conf
Ищем строку с опцией «StartDiscoverers» и увеличиваем данный параметр:
### Option: StartDiscoverers # Number of pre-forked instances of discoverers. # # Mandatory: no # Range: 0-250 # Default: StartDiscoverers=5
Я, опцию StartDiscoverers увеличил до 5. На этом настройка заканчивается, нужно сохранить конфиг и перезагрузить zabbix сервер:
# service zabbix-server restart
Можно увидеть мой наглядный пример:
Если после добавления хостов ( с разными подсетями) вы увидите что снова сработал этот триггер, то нужно увеличить StartDiscoverers.
Zabbix icmp pinger processes more than 75% busy
Недавно получил алерт в заббиксе:
Zabbix icmp pinger processes more than 75% busy
Данное сообщение, говорит — что процесс(ы) выполняющие ping по хостам, перегружены.
Это можно исправить, откроем zabbix_server.conf конфиг-файл:
# vim /etc/zabbix/zabbix_server.conf
Ищем строку с опцией «StartPingers» и увеличиваем данный параметр:
### Option: StartPingers # Number of pre-forked instances of ICMP pingers. # # Mandatory: no # Range: 0-1000 # Default: StartPingers=5
Я, опцию StartPingers увеличил до 5, тем самым — я увеличил количество процессов выполняющих ICMP Ping.
На этом настройка заканчивается, нужно сохранить конфиг и перезагрузить zabbix сервер:
# service zabbix-server restart
Zabbix poller processes more than 75% busy
poller — это процесс который опрашивает агентов.
Данный параметр стоит увеличивать в 2- случаях:
- Большая сеть
- Есть много недоступных ресурсов и они мониторятся.
Как исправить?
Это можно исправить, откроем zabbix_server.conf конфиг-файл:
# vim /etc/zabbix/zabbix_server.conf
Ищем строку с опцией «StartPollers» и увеличиваем данный параметр:
### Option: StartPollers # Number of pre-forked instances of pollers. # # Mandatory: no # Range: 0-1000 # Default: StartPollers=5
Я установил данный параметр в 5. Если очень будет худо, то увеличиваем его до 20. Ничто не приходит бесследно, увеличение процессов ведет к увеличение потребления ресурсов.
После этого, вы можете получить:
Zabbix unreachable poller processes more than 75% busy
Если видите у себя данное сообщение ( алерт, сработанный триггер), открываем конфиг:
# vim /etc/zabbix/zabbix_server.conf
Ищем строку с опцией «StartPollersUnreachable» и увеличиваем данный параметр:
### Option: StartPollersUnreachable # Number of pre-forked instances of pollers for unreachable hosts (including IPMI and Java). # At least one poller for unreachable hosts must be running if regular, IPMI or Java pollers # are started. # # Mandatory: no # Range: 0-1000 # Default: # StartPollersUnreachable=1
PS: У меня данный параметр используется по умолчанию и я его не трогал ( не было ошибок).
Имеется вероятность того, что перестанет хватать коннекщенов для БД, то надо увеличивать лимит подключений.
Zabbix housekeeper processes more than 75% busy
Это можно исправить, откроем zabbix_server.conf конфиг-файл:
# vim /etc/zabbix/zabbix_server.conf
И приводим к виду:
HousekeepingFrequency=1 MaxHousekeeperDelete=100
Сохраняем файл и перезагружаем zabbix:
# service zabbix-server restart
Zabbix busy timer processes, in %
Это можно исправить, откроем zabbix_server.conf конфиг-файл:
# vim /etc/zabbix/zabbix_server.conf
Переменную укажу позже (не знаю какая)!
Сохраняем файл и перезагружаем zabbix:
# service zabbix-server restart
Zabbix busy escalator processes, in %
Это можно исправить, откроем zabbix_server.conf конфиг-файл:
# vim /etc/zabbix/zabbix_server.conf
Переменную укажу позже (не знаю какая)!
Сохраняем файл и перезагружаем zabbix:
# service zabbix-server restart
Zabbix busy alerter processes, in %
Это можно исправить, откроем zabbix_server.conf конфиг-файл:
# vim /etc/zabbix/zabbix_server.conf
Переменную укажу позже (не знаю какая)!
Сохраняем файл и перезагружаем zabbix:
# service zabbix-server restart
Zabbix busy configuration syncer processes, in %
Это можно исправить, откроем zabbix_server.conf конфиг-файл:
# vim /etc/zabbix/zabbix_server.conf
Находим и изменяем:
HistoryCacheSize=
Сохраняем файл и перезагружаем zabbix:
# service zabbix-server restart
Zabbix busy db watchdog processes, in %
Это можно исправить, откроем zabbix_server.conf конфиг-файл:
# vim /etc/zabbix/zabbix_server.conf
Переменную укажу позже (не знаю какая)!
Сохраняем файл и перезагружаем zabbix:
# service zabbix-server restart
Zabbix busy history syncer processes, in %
Это можно исправить, откроем zabbix_server.conf конфиг-файл:
# vim /etc/zabbix/zabbix_server.conf
Находим и изменяем:
HistoryCacheSize= CacheSize=
Сохраняем файл и перезагружаем zabbix:
# service zabbix-server restart
Zabbix busy self-monitoring processes, in %
Это можно исправить, откроем zabbix_server.conf конфиг-файл:
# vim /etc/zabbix/zabbix_server.conf
Переменную укажу позже (не знаю какая)!
Сохраняем файл и перезагружаем zabbix:
# service zabbix-server restart
Zabbix busy http poller processes, in %
Это можно исправить, откроем zabbix_server.conf конфиг-файл:
# vim /etc/zabbix/zabbix_server.conf
Находим и меняем параметр:
StartHTTPPollers=
Сохраняем файл и перезагружаем zabbix:
# service zabbix-server restart
Zabbix busy java poller processes, in %
Это можно исправить, откроем zabbix_server.conf конфиг-файл:
# vim /etc/zabbix/zabbix_server.conf
Находим и меняем параметр:
StartJavaPollers=
Сохраняем файл и перезагружаем zabbix:
# service zabbix-server restart
А на этом, у меня все и статья «Оптимизация настроек Zabbix» завершена.