Operating system call recv failed error no 104 - Исправление ошибок и поиск оптимальных решений проблем

How to analyze network disconnections shown in system log (BC transaction SM21) ?

System log (transaction SM21) shows network disconnections, e.g.

Q04 Connection to user 2642 (EXTRACAO), terminal 38 (iguacucp125) lost
Delete session 001 after error 061
Operating system call recv failed (error no. 10054)

First at all, consider that a «network disconnection» in the system log (transaction SM21) or in a developer trace is not always meaningful; a typical case is an operating system 10061 error while trying to connect e.g. to the gateway of an SAP system that crashed; obviously, while trying to contact the sapgwXX service of the remote host, the connection cannot be established as the gateway is not running there. In case that still it makes sense to analyze the disconnection, there are several possibilities to analyze the errors:

SAP software: SAPgui and kernel

Make sure that you are using the latest SAPGUI available.
Make sure that your current SAP kernel is up-to-date (at least not older than half a year) in your SAP application servers.
This is the starting point to eliminate that error. There are other possible causes that are to be checked if the issue persists after updating to the latest kernel and GUI patches, namely:
A user with authorization for transaction SM04 can delete a session of any user; this will generate that message in the syslog and tracefiles.
If a user is already logged in the system and he logs again with the same user, then he will get a pop up window with three options
Continue with this logon and end any other logon (then his previous session in the system will be ended, and the information message «Delete session XXX after error 061» will be issued)
Continue this log on without ending other logo
Terminate this logon
Another possibility are problems with the SAPGUI. In this case you should see some error messages after activating the frontend-trace.

Operating System support level: workstations and server(s)

Ensure that your systems are patched to the highest support pack, as well as the network card drivers, etc.
Check your hostname configuration (‘hosts’ files in the workstations, etc.).

Parametrization of SAP system

Sometimes disconnections are not a failure, but a feature offered by the SAP software to avoid the waste of resources due to disconnections caused by users closing the SAPgui without the proper log off, etc. The lines below explain how this mechanism works.

The kernel regularly checks whether a session is still in use and any session that is no longer in use is removed; the check is very simple: if the frontend has not sent any data to the application server for «rdisp/keepalive» seconds, the application server sends a short «ping» message to the frontend. The frontend should answer within the next 40 seconds with «pong», otherwise the application server assumes that the link is dead and releases all resources to the corresponding user. An error line “DP_CONN_DEAD» then appears in the trace file dev_disp. This usually occurs when a user switches off their PC without carrying out the shutdown procedure. A value of «rdisp/keepalive = 0» means that no check occurs.

If the parameter «rdisp/gui_auto_logout» is set, the timeout also applies to HTTP sessions as well as GUI sessions.

Networking tests

There are several situations that can cause a partner not to respond; if none of the above paragraphs can explain your issue, possibly one of the following will fit for your case:

Workstation issue: a «hardware» issue (e.g. network card broken, but also an old NI driver, an outdated operating system, etc.), a local firewall or antivirus prevents the communication to flow, a OS restriction to the program (the SAPgui in our case) prevents the program to use the network (e.g. User Account Control in the Windows Vista or Server 2008), the program is not running, etc.
Networking issue: a firewall placed between both parties prevents the communication, a hardware issue (e.g. a damaged cable, node, EM interferences, etc.)
Server issue (similar to the workstation issue)

Then, the key here will be to determine which is the root cause of this issue. Of course, we will support you closely in case that the a bug in the SAP software is the cause; but please understand that we need to work very closely to you as we do not know your network configuration. It is convenient that you involve here your local networking team.

To further analyze the cause for the frontend not to respond, schedule a detailed network analysis between your application server and the workstation failing until this issue arises again (if ever) or, at least, for some days (even weeks, depending on the periodicity of this subject). This way we will decide if networking issues can be discarded as the root cause of this matter.

NIPING tool is located in the executables directory on any SAP server. You can fetch the latest version of NIPING from the Service Marketplace or, if it is not possible, you can copy the binary from your server binaries directory.

Operating System settings

The following are some typical errors for Microsoft Windows platforms:

10048 (WSAEADDRINUSE, SI_EPORT_INUSE) => Only one usage of each socket address (protocol/network address/port) is normally permitted.
10054 (WSAECONNRESET, SI_ECONN_BROKEN) => An existing connection was forcibly closed by the remote host.
10055 (WSAENOBUFS) => An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full.
10061 (WSAECONNREFUSED) => No connection could be made because the destination computer actively refused it, e.g. in the remote TCP port there is no server program running.

Sometimes these are due to insufficient settings for your operating system due to your particular requirements. This would be the case e.g. if a Java application needs to create a high amount of threads in a very short period of time, everyone with one or more TCP/IP connections; then you should extend the default values for the registry keys MaxUserPort and TCPTimedWaitDelay, otherwise you will get aforementioned error 10055.

Also, we have found a lot of issues with some new features as the Scalable Networking Pack aka. SNP (TCP Chimney Offload feature, RSS, and NetDMA). In particular, we always recommend to disable the «TCP Chimney Offload» feature option on your NIC. In order to do so, you can run from a command prompt “netsh int ip set chimney DISABLED”; run “netsh int ip show chimney” in order to know its current status. Then, reboot the system (it is mandatory!).

Even the “Media Sensing” feature can cause some troubles. Note that this feature is disabled by default in a Windows Server 2003-based server cluster, and so the DisableDHCPMediaSense registry entry has no effect.

Источник

Здравствуйте. Имеется сервер fastvps.ru/dedicated тариф EX-4, установлена ОС Debian-70-wheezy-64. Крутится один основной сайт, на связке nginx и php-fpm. Если нагрузить сайт тестом Siege буквально 10 потоками, то в лог начинают валиться ошибки:

readv() failed (104: Connection reset by peer) while reading upstream
recv() failed (104: Connection reset by peer) while reading response

Optcache включен. Помогите разобраться, что не так с конфигами? Судя по всему, дело в php-fpm.

Конфиг php-fpm:
cosmopolite.ru/php_fpm.txt

Конфиг nginx:

user www-data;
worker_processes 4;
pid /var/run/nginx.pid;

events {
	worker_connections 768;
}

http {

	sendfile on;
	tcp_nopush on;
	tcp_nodelay on;
	keepalive_timeout 65;
	types_hash_max_size 2048;

	proxy_read_timeout 500;
	proxy_connect_timeout 500;
	client_max_body_size 100M;
	server_names_hash_bucket_size 64;

	include /etc/nginx/mime.types;
	default_type application/octet-stream;

	access_log /var/log/nginx/access.log;
	error_log /var/log/nginx/error.log;

	gzip on;
	gzip_disable "msie6";

	include /etc/nginx/conf.d/*.conf;
	include /etc/nginx/sites-enabled/*;
	include /usr/local/ispmgr/etc/nginx.domain;

}

Вопрос задан

более трёх лет назад
36598 просмотров

Siege разорвал соединение, пока nginx получал данные от php-fpm или что там у тебя в /etc/nginx/conf.d/*.conf;

Начни с тестов без php, сделай каталог с простым index.html и натрави туда. Если ошибки остались, дело в кол-во обработчиков: 4 процесса по 768 медленнее, чем 1 процесс с 3072 сокетами на современном железе. Подними worker_connections до 2048 хотя бы.

Далее с backlog
Через sysctl net.core.somaxconn узнай значение backlog, если оно меньше параметра worker_connections то в sysctl.conf увеличивай до значения = worker_connections
Такое же значение укажи в fpm listen.backlog

Пригласить эксперта

Попробуйте в конфиге пула fpm’а установить

pm = static
pm.max_children = 60

и посмотреть, скажется ли это в положительную сторону на вашем тесте.
Если да, то вам нужно регулировать значения
pm.max_children pm.max_spare_servers в режиме dynamic.
Их значения будут зависеть от профиля вашей нагрузки, насколько он равномерный или наоборот «пиковый».

А чем вам не нравиться static ? У Вас как я погляжу на сервере 16Гб оперативки, не так уж и много займут воркеры в «пустом» состоянии.

Была такая проблема. Решилась выставлением правильного владельца на файлы

sudo chown -R www-data:www-data /var/www/mysite.com

Проблема шире чем кажется:
Надо понимать, что это означает — отвалился php
отвалиться он может по нескольким причинам:
1) лимиты в самом php.ini — первое из них время выполнения php скрипта
2) лимиты удержания сокета (конекта) настройки php-fpm
3) лимиты ожидания ответа от php-fpm со стороны nginx
4) все процессы заняты (умерли)

И вот когда все таймауты правильно настроенны и вас нет ооооочень долго выполняющихся скриптов — тогда все чики пуки.

Тюнить надо одновременно все эти 4 составляющие, понимаю взаимодействие.

Показать ещё
Загружается…

09 февр. 2023, в 13:28

777 руб./за проект

12 февр. 2023, в 21:32

80000 руб./за проект

12 февр. 2023, в 21:30

2900 руб./за проект

Минуточку внимания

Источник

Кто-нибудь сталкивался с такой руганью в логах? При этом все продолжает работать. 100% появляются такие ошибки когда останавливаю инстанцию. И периодически когда работают в системе. Вот пример лога:

Code:

Wed Nov 26 11:01:06 2008
***LOG Q0I=> NiIRead: recv (10054: WSAECONNRESET: Connection reset by peer) [nixxi.cpp 4424]
*** ERROR => NiIRead: SiRecv failed for hdl 5 / sock 332
(SI_ECONN_BROKEN/10054; I4; ST; 172.16.3.26:2788) [nixxi.cpp 4424]

Wed Nov 26 11:01:21 2008
***LOG Q0I=> NiIRead: recv (10054: WSAECONNRESET: Connection reset by peer) [nixxi.cpp 4424]
*** ERROR => NiIRead: SiRecv failed for hdl 3 / sock 356
(SI_ECONN_BROKEN/10054; I4; ST; 127.0.0.1:2368) [nixxi.cpp 4424]
***LOG Q0I=> NiIRead: recv (10054: WSAECONNRESET: Connection reset by peer) [nixxi.cpp 4424]
*** ERROR => NiIRead: SiRecv failed for hdl 8 / sock 280
(SI_ECONN_BROKEN/10054; I4; ST; 127.0.0.1:2404) [nixxi.cpp 4424]
***LOG Q0I=> NiIRead: recv (10054: WSAECONNRESET: Connection reset by peer) [nixxi.cpp 4424]
*** ERROR => NiIRead: SiRecv failed for hdl 2 / sock 368
(SI_ECONN_BROKEN/10054; I4; ST; 127.0.0.1:2367) [nixxi.cpp 4424]

Wed Nov 26 11:01:22 2008
***LOG S30=> GwStopGateway, gateway stopped () [gwxxrd.c 14655]

И пример лога с другого сервера(идентичное железо) во время работы:

Code:

Tue Nov 25 14:45:43 2008
***LOG Q0I=> NiIRead: recv (10054: WSAECONNRESET: Connection reset by peer) [nixxi.cpp 4424]
*** ERROR => NiIRead: SiRecv failed for hdl 13 / sock 1212
(SI_ECONN_BROKEN/10054; I4; ST; 172.16.0.103:3364) [nixxi.cpp 4424]
Network error of client T23, NiBufReceive (-6: NIECONN_BROKEN), dp_tm_status=3
Client address of T23 is 172.16.0.103(hostname)
***LOG Q04=> DpRTmPrep, NiBufReceive (2028 USERNAME 23 hostname ) [dpxxdisp.c 11532]
RM-T23, U2028, 200 USERNAME, hostname, 11:22:16, M0, W1, , 2/0

Сеть проверял, ни одного пакета не теряется.

Ноты которые уже прошерстил:

Note 155147 — WinNT: Connection reset by peer

Note 545177 — FAQ: Preliminary steps in analyzing RFC connections

Note 500235 — Network Diagnosis with NIPING

Источник

задний план

recv() failed (104: Connection reset by peer) while reading response header from upstreamПроблема, очевидно, в том, что существует проблема со связью между nginx и php-fpm, обычно из-за того, что php-fpm завершается из-за тайм-аута и других причин, а nginx не получает действительного ответа.
На этот раз я пишу этот вопрос более ограниченный, этоphalconСлучайные аномальные проблемы были обнаружены в рамках фреймворка и были обнаружены в онлайн-среде, поэтому эта проблема была исследована. Конечная причина связана с памятью, потому что я подозревал много причин, но они не являются первопричиной, я также нашел много похожих проблем в Интернете, но найденные решения не являются корнем моей проблемы, поэтому эта статья в основном предназначена для Запись, эта статья может не иметь справочного значения для большинства людей.

Обнаружение проблемы

Фреймворк phalcon, используемый более старым проектом, который был передан ранее, phalcon — это фреймворк, похожий на yaf, основанный на производительности расширений PHP (Официальный сайт портала). Из-за добавления некоторых небольших функций после завершения разработки при тестировании в предоставленной тестовой среде я обнаружил, что будут случайные ошибки 502, а правил нет. Я наблюдал error_log для nginx, error_log для php и error_log для php-fpm. Не найдено действительного сообщения об ошибке.

журнал nginx:
2019/03/22 20:17:17 [error] 16436#0: *573 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: xxxxx, server: xxxxx, request: “GET /article HTTP/1.1”, upstream: “fastcgi://unix:/dev/shm/php-fpm.sock:”, host: “xxxxx”, referrer: “xxxxx”
журнал php-fpm:
[23-Mar-2019 17:36:25] WARNING: [pool www] child 1843 exited on signal 9 (SIGKILL) after 248.595491 seconds from start

Из журнала можно получить только стандартную обратную связь. Nginx сообщил нам, что обратной связи не было. Php-fpm сообщил нам, что процесс был прерван, но никто не сказал нам, почему.

Первоначально я подозревал, что время выполнения процесса php-fpm истекло из-за больших данных или логических ошибок, но в медленном журнале это не было записано, и я обнаружил, что запрос был непосредственно 502 на 1 с +, поэтому я был уверен, что это не было вызвано тайм-аутом выполнения.

Кроме того, я заметил, что моя локальная среда не обнаружила эту проблему, поэтому я подозревал, что возникла проблема с аппаратной конфигурацией тестовой среды или параметрами настройки nginx / php, из-за которой фреймворк phalcon был ненормальным, поэтому я вышел в Интернет, чтобы проверить это, и выяснил. . . Есть даже эта проблема в сети, но она возникает редко, и нет тестовой среды так часто, интересно, не возникало ли у этой проблемы обратной связи? ? Из профессиональной этики пришлось вытирать чужую задницу.

Решение проблем

Поэтому я поискал людей, которые использовали фреймворк phalcon и столкнулись с той же проблемой и уже нашли решение. Затем я нашел несколько, казалось бы, возможных причин:

1. Несовместимость между расширением phalcon и расширением opcache вызвана

В этом есть смысл: в конце концов, все они являются расширениями php, и я не уверен, что расширение phalcon широко использовалось и было протестировано публикой. Честно говоря, я слышал об этом фреймворке только после того, как этот проект был передан, затем я проверил документацию на официальном сайте и обнаружил, что этот фреймворк тоже хорош и его можно разработать, но могут быть некоторые проблемы.
Поэтому я временно закрыл расширение opache, а затем попробовал еще раз, проблема все еще существует, передайте

2. Количество бит операционной системы при компиляции расширения phalcon несовместимо с фактическим

⬆️Исходный адрес

Проблема этого приятеля точно такая же, как и у меня, а онлайн-среда и тестовая среда напрямую отражаются при передаче эксплуатации и обслуживания. Кто скомпилировал и установил расширение phalcon? Никто не может гарантировать и проверить правильность установки. Когда я подумал, что нашел причину и собирался перекомпилировать и повторить попытку, я заметил, что не выбрал количество операционных систем при компиляции и установке в локальной среде, и на этот вопрос был дан ответ в 2013 году. Временной промежуток немного велик, поэтому я Я зашел на официальный сайт, чтобы убедиться, что расширенная установка системы linux не упоминает количество бит операционной системы. Более того, количество скомпилированных битов неверно и не должно происходить время от времени. Оно всегда должно быть ненормальным.
Очевидно, это не то, что мне нужно.

3. Причина самого каркаса фалкона

В документе есть описание «phalcon может вызвать сбой некоторых процессов вашего веб-сервера». В таком случае решения нет. .

Поэтому я думаю, что мне следует оторваться от фреймворка phalcon и подумать о проблеме с самого php. Может быть, проблема не такая сложная, как я думаю, а просто небольшая проблема.

Действительно, все мои предположения вначале были основаны на расширении фалкона. Я не знал, как сосредоточиться на фалконе.

проблема решена

Итак, я проверил, появился ли php-fpmrecv() failed (104: Connection reset by peer) while reading response header from upstreamДвумя основными причинами проблемы являются завершение процесса тайм-аута выполнения и завершение процесса, если доступной памяти недостаточно.
Я уже проверял тайм-аут выполнения раньше. Тайм-аут для php-fpm и cli в php.ini и php-fpm.conf намного превышает фактическое время выполнения моего бизнеса. Таймаута нет Проблема, и время на возврат запроса составляет всего одну-две секунды.
Итак, память стала главным подозреваемым. Это правда, что я никогда раньше не проверял конфигурацию этого тестового сервера, потому что обычно дается конфигурация около 2c2g, и я единственный, кто его посещает, проблем с производительностью быть не должно. Но я проверил конфигурацию машины и обнаружил, что это машина 1c1g, а доступные размеры памяти и подкачки очень малы каждый раз, когда возникает проблема, и их мало, значит проблема здесь.
Итак, проверьте конфигурацию php еще раз и найдите конфигурацию php-fpm

pm = static
pm.max_children = 64

Запуск 64 процессов php-fpm в статическом режиме не подходит для реальной конфигурации этого компьютера, но он не должен зависать. Наблюдая за процессом php-fpm, он действительно выполняет несколько запросов и запрашивает большие объемы данных. Многие php- Процесс fpm занимает более 15% памяти.Я помню, что при обработке страницы phalcon, похоже, используется весь результирующий набор объекта orm для создания объектов данных подкачки (детали пока тщательно не изучены), и объем данных действительно велик, что может потреблять память Если есть утечка памяти, эта ситуация будет более заметной.

Этот анализ больше соответствует текущим случайным ненормальным проблемам.Кроме того, что касается конфигурации этой машины, динамический режим более подходит, количество процессов не подходит для многих, текущие настройки явно не соответствуют действительности, даже если это не является причиной проблемы, это должно быть необходимо Оптимизация, утечки памяти используются напрямуюpm.max_requestsЧтобы справиться с этим, если он существует, его смягчат.

Итак, я изменил конфигурацию php-fpm:

pm = dynamic
pm.max_children = 15
pm.start_servers = 8
pm.min_spare_servers = 6
pm.max_spare_servers = 15
pm.max_requests = 500

Кажется, что это близко к конфигурации по умолчанию, но я думаю, что это больше подходит, чем раньше.

Затем перезапустите php-fpm, чтобы еще раз проверить, решена ли ненормальная проблема.

После сотен запросов проблема не возникла, и память машины не оказалась ниже 1000к как раньше.

Итак, проблема здесь, а затем перейдите к просмотру соответствующей конфигурации php-fpm, а затем выполните углубленное изучение, чтобы подтвердить проблему.

Источник

NGINX ERROR RECV () FAILED (104: CONNECTION RESET BY PEER) …

2020-04-23 AWS ElasticBeanstalk NodeJS — 502 error: recv() failed (104: Connection reset by peer) while reading response header from upstream The Overflow Blog You can add biometric …
From stackoverflow.com
Reviews 1

NGINX RECV() FAILED (104: CONNECTION RESET BY PEER) WHILE READING …

2022-04-09 Latheesan Asks: nginx recv() failed (104: Connection reset by peer) while reading response header from upstream I have recently upgrade my magento from 1.5 to 1.9 and …
From solveforum.com

PHP :: BUG #65584 :: RECV() FAILED (104: CONNECTION RESET BY PEER …

2022-11-12 If i etablish the connection over a network port is all ok. [2014-03-30 22:19 UTC] vlad dot rusu at gmail dot com I am experience the same problems on Ubuntu Server 12.04 / …
From bugs.php.net

NGINX: 104: CONNECTION RESET BY PEER ERROR — RURALDOCK.COM

2021-05-07 «RECV failed (104: connection reset by peer» (1) The number of concurrent connections of the server exceeds its carrier, and the server will connect some of them; (2) …
From ruraldock.com

NGINX RECV() FAILED (104: CONNECTION RESET BY PEER) WHILE READING …

DevOps & SysAdmins: nginx recv() failed (104: Connection reset by peer) while reading response header from upstreamHelpful? Please support me on Patreon: ht…
From youtube.com

RECV () FAILED (104: CONNECTION RESET BY PEER) ERROR, GUNICORN AND …

2017-04-06 You can find all upstart stuff in /etc/init/ and /etc/init.d/ Sounds to me that it might have something to do with port numbers or maybe the PID file. Since you’re having problems …
From digitalocean.com

NGINX ERROR RECV() FAILED (104: CONNECTION RESET BY PEER)

2022-08-24 Current visitors New profile posts Search profile posts. Log in. Register
From solveforum.com

NGINX ERROR RECV () FAILED (104: CONNECTION RESET BY PEER)

2014-08-03 1. It seems your Apache is more busy than your Nginx. When Nginx get some requests but Apache can’t handle, you get ‘502 Bad Gateway’, which meas Apache refused to …
From stackoverflow.com

REQUEST ERROR:» RECV() FAILED (104: CONNECTION RESET BY PEER) WHILE …

2022-04-18 Request error:» recv() failed (104: Connection reset by peer) while reading response header from upstream» #6861. Closed 123libohan opened this issue Apr 15, 2022 · …
From github.com

HOW TO FIX “CONNECTION RESET BY PEER” ERROR — TECH NEWS TODAY

2022-07-30 If you have access to the server, you can do it yourself. First, verify that the services and the daemons are running using systemctl command. Restart the relevant daemons. The …
From technewstoday.com

NGINX ERROR RECV() FAILED (104: CONNECTION RESET BY PEER)

Nginx uwsgi (104: Connection reset by peer) while reading response header from upstream. After spending a lot of time on this, I finally figured it out. There are many references to Nginx …
From itcodar.com

CURL ERROR: RECV FAILURE: CONNECTION RESET BY PEER

It might be a TCP/IP issue you need to resolve with your host or upgrade your OS most times connection is closed with remote server before it finished downloading the content resulting in …
From stackoverflow.com

OPERATING SYSTEM CALL RECV FAILED (ERROR NO. 104 ) — SAP

2009-04-15 ***LOG Q0I=> NiIRead: recv (104: Connection reset by peer) [nixxi.cpp 4424] ERROR => NiIRead: SiRecv failed for hdl 109 / sock 34 (SI_ECONN_BROKEN/104; I4; ST; …
From answers.sap.com

104: CONNECTION RESET BY PEER · ISSUE #5220 · KONG/KONG · GITHUB

2019-11-13 More than one minute after preview request, i first saved the TCP connection data in the two pod: kong pod: Then i start send one request to api, a 502 response occured. kong …
From github.com

LINUX — NGINX RECV() FAILED (104: CONNECTION RESET BY PEER) WHILE …

2015-04-08 If you’re proxying request for each connection you’d need 2 file handles. Which means that in case of many connections you’d reach the limit quite quickly. nginx has a …
From serverfault.com

CURL: (56) RECV FAILURE: CONNECTION RESET BY PEER问题汇总和解决方案

2022-11-26 问题描述：. curl: (56) Recv failure: Connection reset by peer ，意思是说访问失败，连接被重置，会出现这个错误的原因有很多种，我们需要查看日志文件来找出原因。. 如上 …
From blog.csdn.net

[SOLVED] NGINX RECV() FAILED (104: CONNECTION RESET BY PEER) WHILE

2022-09-18 Solution 1. Such errors usually occurs when server is running out of resources, assuming that you’re running most recent stable versions of php5-fpm:. Check that php5-fpm …
From 9to5answer.com

Источник