Sqlstate hy000 general error 7 ssl syscall error eof detected

Solve Postgres error “Error: SSL SYSCALL error: EOF detected” I was running a scraping job from a Docker container using Python and the psycopg2 adapter that wrote the results to a Postgres database managed on DigitalOcean (DO). On random occasions, the container died, for some reason. I started debugging and these are my findings. […]

Содержание

  1. Solve Postgres error “Error: SSL SYSCALL error: EOF detected”
  2. Make the queries run faster
  3. Adjust the connection timeout
  4. Does the `hot_standby_feedback=on` may cause `SSL SYSCALL error: EOF detected` error?
  5. PostgreSQL: SSL SYSCALL error: EOF detected
  6. psycopg2.OperationalError: SSL SYSCALL error: EOF detected on Flask/SQLAclemy/Celery + PostgreSQL app
  7. Postgres SSL SYSCALL error: EOF detected with python and psycopg
  8. 8 Answers 8

Solve Postgres error “Error: SSL SYSCALL error: EOF detected”

I was running a scraping job from a Docker container using Python and the psycopg2 adapter that wrote the results to a Postgres database managed on DigitalOcean (DO). On random occasions, the container died, for some reason. I started debugging and these are my findings.

The error I ran into was the following:

I literally had no idea what to expect when Googling this error. Here’s what I found.

Make the queries run faster

It tends to happen with Postgres deployments that have very little RAM allocated to them. For example, I’m using the cheapest Postgres hosting on DO, it only has 512mb of memory attached to it .

In other words: the lower the memory, the longer it takes to run complex queries, with a higher probability that the connection times out.

If you can spare the bucks: add more memory.

Adjust the connection timeout

If adding more memory is not an option, try changing the keepalives parameters of your Postgres connection. For example, in psycopg2, you can do that as follows.

Let’s go over these four parameters quickly:

  • keepalives (boolean): By setting this to 1, you indicate you want to use your own client-side keepalives.
  • keepalives_idle (seconds): Sets the number of seconds of inactivity after which a keepalive message such be sent.
  • keepalives_interval (seconds): Sets the number of seconds to wait before resending a message that has not been acknowledged by the server.
  • keepalives_count (count): Sets the number of non-acknowledged keepalives to determine that the connection is dead.

So, solve the problem by making your queries run faster, or by making sure your connection doesn’t time out.

Источник

Does the `hot_standby_feedback=on` may cause `SSL SYSCALL error: EOF detected` error?

On a AWS Master slave hot standby replication schema on Amazon RDS with a single master and a single slave. Both master and replica are being hosted over Amazon RDS on eu-central-1 and the master is used for reading as well. For that we use a rotating dns record that is aliased both on slave and on master database as well.

On my slave database I have the following setting as well:

I receive the following error on the replica:

Also, after further investigations I found out that the error happen both on insert and select queries as well.

And I try to debug this error. The one issue I thought was the connection capacity that may cause memory starvation. But I was proven wrong:

That has value 1722 and the total active connections are about

17 (without the one that does replication) on my slave database.

Also in the memory in the slave database has no drop as well it stays in a leveled position as seen on this image:

Both master ans slave database instances are db.t3.xlarge ones. Hence, it has 4vCPUs, 16GB ram and a 100GB Storage. So being an issue or resources is out of question.

Therefore, I wonder how does the hot_standby_feedback=on may trigger this error and how? Also does the replication may cause unexpected connection termination as well because I use a single replica instead of 2 ones?

In the latter case how I can list past server-terminated connections between a time range?

The database connections are performed over a php Laravel application worker. There are 2 servers that run 4 processes each and for queuing system is used Amazon SQS.

Источник

PostgreSQL: SSL SYSCALL error: EOF detected

First, I’ve searched for and found several posts relating to this error, and most of them point either to a RAM issue or an SSL issue, I tried overcoming the SSL possibility by adding the sslmode=disabled in the command line:

But the same message appeared:

Regarding the possible memory issue, I don’t know how to troubleshoot it.

The data structure is the one described in this question and, as you may found this would a very long running query to complete achieve the full alter table over all the inherited tables.

Update 2017-06-01 13:50 GMT

Changed command to (due to @ Daniel Vérité ‘s recommendations):

Problem actually changed to the following:

Update 2017-06-01 15:34 GMT

Found several log entries (in /var/log/postgresql/postgresql-9.4-main.log ) like these:

So I’ll proceed with the suggested hint.

Also found this group of entries, that actually refer to the crash and later recovery:

Any suggestions on this last log part?

OOM Killer is enabled, and the following is the output at /var/log/messages :

Update 2017-06-01 16:19 GMT

Changed settings to:

And I filled the hard drive 🙁 I generously increased the checkpoint_segments, but didn’t first check the available space. Luckily I’m testing this procedure in a non-production environment. So I may have to clone the production server once again, or is there any way to free up temp space used, that’s now being wasted?

As per @deszo’s question, the memory overcommit values are the following:

Update 2017-06-01 18:107 GMT

Server instance is an AWS c4.large (2 vCPU, 3.75GB RAM)

Источник

psycopg2.OperationalError: SSL SYSCALL error: EOF detected on Flask/SQLAclemy/Celery + PostgreSQL app

I have an app that was written with Flask+SQLALchemy+Celery, RabbitMQ as a broker, database is PostgreSQL (PostgreSQL 10.11 (Ubuntu 10.11-1.pgdg16.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1

16.04.12) 5.4.0 20160609, 64-bit). Database is hosting in DigitalOcean (1 CPU, 2Gb RAM). All app workers (flask or celery) are starting in Supervisor.

In my project for connection to DB I’m using flask_sqlalchemy package like this:

I wrote some login in Flask app, tested it and then copied it to the Celery project (in which db connection is the same). So now my example celery task looked like this:

And the problem is when I’m running my app on my laptop, it’s OK, everything is working fine, no errors. When I uploaded my app on VPS and there was not so much users, everything was ok too. But some time later when there are 30+ users on the same time and they calling this example_task, at some time errors began to appear periodically on very simple queries for select some data from database:

Sometimes, but very-very rare I saw in logs this error:

I wrote a example exception decorator that handle the errors (for example any errors, not only the SQLAlchemy errors), and after it catched the error I do a db.session.rollback()

But it didn’t helped me, because yes, it’s reloads the connection for db, function worked fine then, but app starts to work slower and slower and at some point I should do a reload of workers in Supervisor. I saw a lot of idle connections in PostgreSQL, I set the idle transaction timeout for 5 mins, but it didn’t helped.

I dont’t know what to do next, because only solution that helps now is to reload app workers in supervisor any time I see that app is working slower and slower.

Источник

Postgres SSL SYSCALL error: EOF detected with python and psycopg

Using psycopg2 package with python 2.7 I keep getting the titled error: psycopg2.DatabaseError: SSL SYSCALL error: EOF detected

It only occurs when I add a WHERE column LIKE »%X%» clause to my pgrouting query. An example:

Threads on the internet suggest it is an issue with SSL intuitively, but whenever I comment out the pattern matching side of things the query and connection to the database works fine.

This is on a local database running Xubuntu 13.10.

After further investigation: It looks like this may be cause by the pgrouting extension crashing the database because it is a bad query and their are not links which have this pattern.

Will post an answer soon .

8 Answers 8

The error: psycopg2.operationalerror: SSL SYSCALL error: EOF detected

The setup: Airflow + Redshift + psycopg2

When: Queries take a long time to execute (more than 300 seconds).

A socket timeout occurs in this instance. What solves this specific variant of the error is adding keepalive arguments to the connection string.

Redshift requires a keepalives_idle of less than 300. A value of 30 worked for me, your mileage may vary. It is also possible that the keepalives_idle argument is the only one you need to set — but ensure keepalives is set to 1.

I ran into this problem when running a slow query in a Droplet on a Digital Ocean instance. All other SQL would run fine and it worked on my laptop. After scaling up to a 1 GB RAM instance instead of 512 MB it works fine so it seems that this error could occur if the process is running out of memory.

Very similar answer to what @FoxMulder900 did, except I could not get his first select to work. This works, though:

If you want to kill the processes from long_running just comment out the last line and insert SELECT pg_cancel_backend(long_running.pid) from long_running ;

This issue occurred for me when I had some rogue queries running causing tables to be locked indefinitely. I was able to see the queries by running:

then kill them with:

I encountered the same error. By CPU, RAM usage everything was ok, solution by @antonagestam didn’t work for me.

Basically, the issue was at the step of engine creation. pool_pre_ping=True solved the problem:

What it does, is that each time when the connection is being used, it sends SELECT 1 query to check the connection. If it is failed, then the connection is recycled and checked again. Upon success, the query is then executed.

In my case, I had the same error in python logs. I checked the log file in /var/log/postgresql/ , and there were a lot of error messages could not receive data from client: Connection reset by peer and unexpected EOF on client connection with an open transaction . This can happen due to network issues.

Источник

I was running a scraping job from a Docker container using Python and the psycopg2 adapter that wrote the results to a Postgres database managed on DigitalOcean (DO). On random occasions, the container died, for some reason. I started debugging and these are my findings.

The error I ran into was the following:

Error: SSL SYSCALL error: EOF detected

I literally had no idea what to expect when Googling this error. Here’s what I found.

Make the queries run faster

It tends to happen with Postgres deployments that have very little RAM allocated to them. For example, I’m using the cheapest Postgres hosting on DO, it only has 512mb of memory attached to it .

In other words: the lower the memory, the longer it takes to run complex queries, with a higher probability that the connection times out.

If you can spare the bucks: add more memory.

Adjust the connection timeout

If adding more memory is not an option, try changing the keepalives parameters of your Postgres connection. For example, in psycopg2, you can do that as follows.

keepalive_kwargs = {
  "keepalives": 1,
  "keepalives_idle": 60,
  "keepalives_interval": 10,
  "keepalives_count": 5
}

conn = psycopg2.connect(
  host = 'YOUR_HOST>',
  database = '<DB_NAME>',
  user = '<USER>',
  password = '<PASSWORD>',
  port = 25060,
  **keepalive_kwargs
)

Let’s go over these four parameters quickly:

  • keepalives (boolean): By setting this to 1, you indicate you want to use your own client-side keepalives.
  • keepalives_idle (seconds): Sets the number of seconds of inactivity after which a keepalive message such be sent.
  • keepalives_interval (seconds): Sets the number of seconds to wait before resending a message that has not been acknowledged by the server.
  • keepalives_count (count): Sets the number of non-acknowledged keepalives to determine that the connection is dead.

So, solve the problem by making your queries run faster, or by making sure your connection doesn’t time out.

Great success!

Генератор данных реального мира 1

Используя пакет psycopg2 с python 2.7, я постоянно получаю сообщение об ошибке: psycopg2.DatabaseError: SSL SYSCALL error: обнаружен EOF

Это происходит только тогда, когда я добавляю WHERE column LIKE ''%X%'' предложение к моему запросу pgrouting. Пример:

SELECT id1 as node, cost FROM PGR_Driving_Distance( 'SELECT id, source, target, cost FROM edge_table WHERE cost IS NOT NULL and column LIKE ''%x%'' ', 1, 10, false, false) 

Обсуждения в Интернете предполагают, что это проблема с SSL интуитивно, но всякий раз, когда я комментирую сторону сопоставления с шаблоном, запрос и подключение к базе данных работают нормально.

Это в локальной базе данных под управлением Xubuntu 13.10.

После дальнейшего расследования: похоже, это может быть вызвано тем, что расширение pgrouting приводит к сбою базы данных, потому что это неправильный запрос, и это не ссылки, которые имеют этот шаблон.

Скоро отправлю ответ …

  • Почему подзапрос? не имеет смысла.
  • Подзапрос предназначен для функции PGR_DrivingDistance.
  • 59 знаменитых последних слов: Will post an answer soon ...
  • Иногда ТАК делал, смеши меня: D
  • 2 @PhilDonovan, не бросайте нас!

Я столкнулся с этой проблемой при выполнении медленного запроса в капле на экземпляре Digital Ocean. Все остальные SQL работали нормально, и он работал на моем ноутбуке. После масштабирования до 1 ГБ ОЗУ вместо 512 МБ он работает нормально, поэтому кажется, что эта ошибка может возникнуть, если процессу не хватает памяти.

  • 2 похоже, что это не всегда исправление — я использую машину с оперативной памятью 160 ГБ и все еще имею эту ошибку при использовании pg_dump в базе данных только с SSL. используется только 15 ГБ.
  • Что ж, это может сработать, но это не похоже на настоящее решение. Должен быть способ как-то это оптимизировать
  • 1 Именно с чем я столкнулся! Добавил 4 ГБ пространства подкачки к экземпляру 512 МБ, и все заработало как шарм.
  • У моего db выделено 16 ГБ ОЗУ, своп системы не используется, но проблема все еще возникает … Это происходит только при небольшом количестве запросов … Странно.
  • @ gies0r У этой проблемы, вероятно, больше причин, чем проблем с памятью, хотя я не исключаю ее полностью.

Эта проблема возникла у меня, когда у меня выполнялись некоторые мошеннические запросы, вызывающие блокировку таблиц на неопределенный срок. Я смог увидеть запросы, запустив:

SELECT * from STV_RECENTS where status='Running' order by starttime desc; 

затем убейте их:

SELECT pg_terminate_backend(); 

В моем случае это был убийца OOM (запрос слишком тяжелый)

Проверьте dmesg:

dmesg | grep -A2 Kill 

В моем случае:

Out of memory: Kill process 28715 (postgres) score 150 or sacrifice child 
  • 2 Непосвященным не совсем понятно, о чем вы говорите. Пожалуйста, объясните, что dmesg есть и почему вы его запускаете.
  • 1 это может быть полезно, dmesg Именно там заканчивается множество ошибок ядра Linux, обычно это сообщения драйверов (например, я много раз был в dmesg, ища, как исправить мой Wi-Fi). Когда в Linux (и ОС в целом) заканчивается память (и происходит подкачка), ядро ​​выбирает один из текущих процессов и убивает его, чтобы освободить память. Обратите внимание, что в этот момент у ОС есть два варианта: убить один процесс или заморозить навсегда.

Вам может потребоваться выразить % в качестве %% так как % маркер-заполнитель. http://initd.org/psycopg/docs/usage.html#passing-parameters-to-sql-queries

Ошибка: psycopg2.operationalerror: SSL SYSCALL error: EOF detected

Настройка: Поток воздуха + Красное смещение + psycopg2

Когда: запросы занимают длинный время выполнения (более 300 секунд).

В этом случае происходит тайм-аут сокета. Что решает этот конкретный вариант ошибки, так это добавление аргументов поддержки активности в строку подключения.

keepalive_kwargs = { 'keepalives': 1, 'keepalives_idle': 30, 'keepalives_interval': 5, 'keepalives_count': 5, } conection = psycopg2.connect(connection_string, **keepalive_kwargs) 

Redshift требует keepalives_idle менее 300. У меня работало значение 30, ваш пробег может отличаться. Также возможно, что keepalives_idle аргумент — единственный, который вам нужно установить, но убедитесь, что keepalives установлен на 1.

Ссылка на документацию по сообщениям поддержки активности postgres.

Ссылка на документ по воздушному потоку, советующий по таймауту 300.

Ответ очень похож на то, что сделал @ FoxMulder900, за исключением того, что я не смог заставить его первый выбор работать. Однако это работает:

WITH long_running AS ( SELECT pid, now() - pg_stat_activity.query_start AS duration, query, state FROM pg_stat_activity WHERE (now() - pg_stat_activity.query_start) > interval '1 minutes' and state = 'active' ) SELECT * from long_running; 

Если вы хотите убить процессы из long_running просто закомментируйте последнюю строку и вставьте SELECT pg_cancel_backend(long_running.pid) from long_running ;

Я получил эту ошибку при выполнении большого оператора UPDATE в таблице с 3 миллионами строк. В моем случае оказалось, что диск заполнен. Как только я добавил больше места, ОБНОВЛЕНИЕ работало нормально.

Tweet

Share

Link

Plus

Send

Send

Pin

Во-первых, я искал и нашел несколько сообщений, касающихся этой ошибки, и большинство из них указывают либо на проблему с ОЗУ, либо на проблему с SSL, я попытался преодолеть возможность SSL, добавив в командной строке sslmode = disabled:

 psql -U waypoint -d waypoint -W -c "alter table telemetria_data.historico alter clase type smallint, alter valor type real[], alter power type smallint, alter voltaje type real;" -h localhost -v sslmode=disable

Но появилось то же сообщение:

SSL SYSCALL error: EOF detected
connection to server was lost

Что касается возможной проблемы с памятью, я не знаю, как ее устранить.

Структура данных — та, которая описана в этом вопросе, и, как вы, возможно, обнаружите, это будет очень длительный запрос, чтобы завершить достижение полной таблицы изменения для всех унаследованных таблиц.

ОПЕРАЦИОННЫЕ СИСТЕМЫ:

Linux ip-10-1-0-9 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u2 (2016-10-19) x86_64 GNU/Linux

PostgreSQL:

PostgreSQL 9.4.9 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit

Обновление 2017-06-01 13:50 GMT

Изменена команда на (из-за рекомендаций @ Daniel Vérité):

time PGSSLMODE=disable psql -U waypoint -d waypoint -W -c "alter table telemetria_data.historico alter clase type smallint, alter valor type real[], alter power type smallint, alter voltaje type real;" -h localhost

Проблема фактически изменилась на следующее:

server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
connection to server was lost

Обновление 2017-06-01 15:34 GMT

Найдено несколько записей в журнале (в /var/log/postgresql/postgresql-9.4-main.log), как эти:

2017-06-01 13:48:49 UTC [22899-357] LOG:  checkpoints are occurring too frequently (19 seconds apart)
2017-06-01 13:48:49 UTC [22899-358] HINT:  Consider increasing the configuration parameter "checkpoint_segments".

Поэтому я продолжу с предложенной подсказкой.

Также найдена эта группа записей, которые фактически ссылаются на сбой и последующее восстановление:

2017-06-01 13:49:04 UTC [4982-17] LOG:  server process (PID 6569) was terminated by signal 9: Killed
2017-06-01 13:49:04 UTC [4982-18] DETAIL:  Failed process was running: alter table telemetria_data.historico alter clase type smallint, alter valor type real[], alter power type smallint, alter voltaje type real;
2017-06-01 13:49:04 UTC [4982-19] LOG:  terminating any other active server processes
2017-06-01 13:49:04 UTC [22902-2] WARNING:  terminating connection because of crash of another server process
2017-06-01 13:49:04 UTC [22902-3] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2017-06-01 13:49:04 UTC [22902-4] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2017-06-01 13:49:04 UTC [16383-1] waypoint@waypoint FATAL:  the database system is in recovery mode
2017-06-01 13:49:04 UTC [16384-1] waypoint@waypoint FATAL:  the database system is in recovery mode
2017-06-01 13:49:04 UTC [16386-1] waypoint@waypoint FATAL:  the database system is in recovery mode
2017-06-01 13:49:04 UTC [16385-1] waypoint@waypoint FATAL:  the database system is in recovery mode
2017-06-01 13:49:04 UTC [4982-20] LOG:  all server processes terminated; reinitializing
2017-06-01 13:49:05 UTC [16402-1] LOG:  database system was interrupted; last known up at 2017-06-01 13:48:45 UTC
2017-06-01 13:49:05 UTC [16403-1] waypoint@waypoint FATAL:  the database system is in recovery mode
2017-06-01 13:49:05 UTC [16404-1] waypoint@waypoint FATAL:  the database system is in recovery mode
2017-06-01 13:49:05 UTC [16414-1] waypoint@waypoint FATAL:  the database system is in recovery mode
2017-06-01 13:49:05 UTC [16415-1] waypoint@waypoint FATAL:  the database system is in recovery mode
2017-06-01 13:49:06 UTC [16452-1] waypoint@waypoint FATAL:  the database system is in recovery mode
2017-06-01 13:49:06 UTC [16453-1] waypoint@waypoint FATAL:  the database system is in recovery mode
2017-06-01 13:49:06 UTC [16462-1] waypoint@waypoint FATAL:  the database system is in recovery mode
2017-06-01 13:49:06 UTC [16463-1] waypoint@waypoint FATAL:  the database system is in recovery mode
2017-06-01 13:49:06 UTC [16472-1] waypoint@waypoint FATAL:  the database system is in recovery mode
2017-06-01 13:49:06 UTC [16473-1] waypoint@waypoint FATAL:  the database system is in recovery mode
2017-06-01 13:49:06 UTC [16482-1] waypoint@waypoint FATAL:  the database system is in recovery mode
2017-06-01 13:49:06 UTC [16483-1] waypoint@waypoint FATAL:  the database system is in recovery mode
2017-06-01 13:49:09 UTC [16402-2] LOG:  database system was not properly shut down; automatic recovery in progress
2017-06-01 13:49:09 UTC [16402-3] LOG:  redo starts at 11EC/9960F440
2017-06-01 13:49:21 UTC [16402-4] LOG:  unexpected pageaddr 11E6/52726000 in log segment 00000001000011EC000000C9, offset 7495680
2017-06-01 13:49:21 UTC [16402-5] LOG:  redo done at 11EC/C9723D60
2017-06-01 13:49:32 UTC [16402-6] LOG:  MultiXact member wraparound protections are now enabled
2017-06-01 13:49:32 UTC [4982-21] LOG:  database system is ready to accept connections

Любые предложения по этой последней части журнала?

OOM Killer включен, и следующий вывод /var/log/messages:

Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.672817] psql invoked oom-killer: gfp_mask=0x2000d0, order=2, oom_score_adj=0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.675652] psql cpuset=/ mems_allowed=0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.677274] CPU: 1 PID: 16367 Comm: psql Not tainted 3.16.0-4-amd64 #1 Debian 3.16.36-1+deb8u2
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680406] Hardware name: Xen HVM domU, BIOS 4.2.amazon 11/11/2016
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  0000000000000000 ffffffff815123b5 ffff88003dcda1d0 0000000000000000
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  ffffffff8150ff8d 0000000000000000 ffffffff810d6e3f 0000000000000000
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  ffffffff81516d2e 0000000000000200 ffffffff810689d3 ffffffff810c43e4
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557] Call Trace:
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff815123b5>] ? dump_stack+0x5d/0x78
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff8150ff8d>] ? dump_header+0x76/0x1e8
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff810d6e3f>] ? smp_call_function_single+0x5f/0xa0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff81516d2e>] ? mutex_lock+0xe/0x2a
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff810689d3>] ? put_online_cpus+0x23/0x80
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff810c43e4>] ? rcu_oom_notify+0xc4/0xe0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff81153d1c>] ? do_try_to_free_pages+0x4ac/0x520
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff811427dd>] ? oom_kill_process+0x21d/0x370
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff8114239d>] ? find_lock_task_mm+0x3d/0x90
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff81142f43>] ? out_of_memory+0x473/0x4b0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff81148e0f>] ? __alloc_pages_nodemask+0x9ef/0xb50
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff81065c86>] ? copy_process.part.25+0x116/0x1c50
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffffa00f6bba>] ? call_filldir+0x9a/0x160 [ext4]
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff81058301>] ? __do_page_fault+0x1d1/0x4f0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff811ac3f9>] ? get_empty_filp+0xc9/0x1c0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff81067990>] ? do_fork+0xe0/0x3d0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff811c6a1c>] ? __alloc_fd+0x7c/0x120
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff815188f9>] ? stub_clone+0x69/0x90
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff8151858d>] ? system_call_fast_compare_end+0x10/0x15
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.734210] Mem-Info:
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.735151] Node 0 DMA per-cpu:
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.736512] CPU    0: hi:    0, btch:   1 usd:   0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.738300] CPU    1: hi:    0, btch:   1 usd:   0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.740072] Node 0 DMA32 per-cpu:
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.741335] CPU    0: hi:  186, btch:  31 usd:   0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.743092] CPU    1: hi:  186, btch:  31 usd:   0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.745210] active_anon:370484 inactive_anon:549110 isolated_anon:24
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.745210]  active_file:240 inactive_file:1425 isolated_file:0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.745210]  unevictable:0 dirty:173 writeback:0 unstable:0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.745210]  free:20427 slab_reclaimable:9729 slab_unreclaimable:3425
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.745210]  mapped:567547 shmem:587500 pagetables:4209 bounce:0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.745210]  free_cma:0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.760872] Node 0 DMA free:15224kB min:184kB low:228kB high:276kB active_anon:228kB inactive_anon:188kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:144kB shmem:204kB slab_reclaimable:4kB slab_unreclaimable:80kB kernel_stack:80kB pagetables:4kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.777646] lowmem_reserve[]: 0 3757 3757 3757
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.779553] Node 0 DMA32 free:69956kB min:44868kB low:56084kB high:67300kB active_anon:1481708kB inactive_anon:2196252kB active_file:1772kB inactive_file:1748kB unevictable:0kB isolated(anon):96kB isolated(file):0kB present:3915776kB managed:3849676kB mlocked:0kB dirty:0kB writeback:0kB mapped:2267676kB shmem:2349796kB slab_reclaimable:38712kB slab_unreclaimable:13620kB kernel_stack:2032kB pagetables:16832kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:14 all_unreclaimable? no
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.795757] lowmem_reserve[]: 0 0 0 0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.797650] Node 0 DMA: 11*4kB (EM) 8*8kB (EM) 1*16kB (E) 2*32kB (UE) 1*64kB (E) 1*128kB (E) 2*256kB (UE) 2*512kB (EM) 3*1024kB (UEM) 3*2048kB (EMR) 1*4096kB (M) = 15228kB
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.804011] Node 0 DMA32: 13752*4kB (UEM) 85*8kB (EM) 54*16kB (M) 43*32kB (M) 17*64kB (M) 15*128kB (M) 10*256kB (M) 3*512kB (M) 2*1024kB (M) 0*2048kB 1*4096kB (R) = 71176kB
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.811528] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.815461] 588017 total pagecache pages
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.816967] 0 pages in swap cache
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.818205] Swap cache stats: add 0, delete 0, find 0/0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.820215] Free swap  = 0kB
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.821475] Total swap = 0kB
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.822543] 982941 pages RAM
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.823640] 0 pages HighMem/MovableOnly
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.825110] 16525 pages reserved
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.826310] 0 pages hwpoisoned
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.827473] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.830523] [  159]     0   159     8242      800      21        0             0 systemd-journal
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.833717] [  162]     0   162    10200      135      22        0         -1000 systemd-udevd
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.836679] [  316]     0   316     6351     1726      14        0             0 dhclient
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.839533] [  351]     0   351     7181       72      18        0             0 cron
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.842112] [  353]     0   353     4964       68      14        0             0 systemd-logind
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.844894] [  362]   107   362    10531       96      26        0          -900 dbus-daemon
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.847504] [  376]   106   376     8345      154      21        0             0 ntpd
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.850085] [  377]     0   377    65721      457      30        0             0 rsyslogd
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.852755] [  388]     0   388     3909       39      12        0             0 agetty
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.855305] [  389]     0   389     3864       40      13        0             0 agetty
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.857914] [  451]     0   451    13796      168      29        0         -1000 sshd
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.860562] [  481]  1002   481    26362     5081      54        0             0 perfmon_loop.rb
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.863918] [  486]  1002   486    15211     3146      31        0             0 cht_perfmon
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.867273] [  625]     0   625     9560      144      22        0             0 master
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.869878] [  630]   108   630    10164      234      24        0             0 qmgr
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.872356] [ 3443]     0  3443    20130      213      41        0             0 sshd
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.874903] [ 3445]  1000  3445    20164      222      39        0             0 sshd
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.877451] [ 3446]  1000  3446     3176       43       9        0             0 sftp-server
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.880261] [ 4982]   105  4982   614831    42946     132        0          -900 postgres
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.882988] [ 6634]     0  6634     1570       23       9        0             0 collectdmon
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.885784] [ 6635]     0  6635   174485      156      36        0             0 collectd
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.888347] [22899]   105 22899   615399   541666    1105        0             0 postgres
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.890970] [22900]   105 22900   615395    14251      88        0             0 postgres
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.893836] [22901]   105 22901   615088     4252      53        0             0 postgres
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.896365] [22902]   105 22902   615305     1316      60        0             0 postgres
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.899083] [22903]   105 22903    21336      378      40        0             0 postgres
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.901954] [ 2946]   108  2946    10076      137      22        0             0 pickup
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.904462] [ 6376]     0  6376    20130      213      42        0             0 sshd
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.907440] [ 6378]  1000  6378    20130      209      40        0             0 sshd
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.910534] [ 6379]  1000  6379     5795      151      16        0             0 bash
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.913144] [ 6382]     0  6382    11515      107      28        0             0 sudo
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.915635] [ 6383]     0  6383    11895       96      27        0             0 su
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.918149] [ 6384]   105  6384     5796      139      16        0             0 bash
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.920900] [ 6561]   105  6561    18289      236      40        0             0 psql
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.923394] [ 6569]   105  6569   925161   853454    1718        0             0 postgres
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.926097] [16319]     0 16319    10865       95      25        0             0 cron
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.928611] [16320]     0 16320    10865       95      25        0             0 cron
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.931063] [16321]     0 16321    10865       95      25        0             0 cron
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.933758] [16322]     0 16322    10865       95      25        0             0 cron
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.936197] [16323]  1000 16323     1084       20       7        0             0 sh
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.938762] [16324]  1000 16324     1084       20       7        0             0 sh
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.941450] [16325]  1000 16325     1084       21       7        0             0 sh
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.943849] [16326]  1000 16326     1084       21       6        0             0 sh
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.946409] [16327]  1000 16327     3612       54      12        0             0 telemetria.sh
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.949258] [16328]  1000 16328     3613       57      12        0             0 instantaneo.sh
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.952633] [16329]  1000 16329    21335     4808      48        0             0 mon-put-instanc
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.955744] [16330]  1000 16330     3612       54      12        0             0 conexiones.sh
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.958555] [16366]  1000 16366    10744     1513      26        0             0 psql
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.961216] [16367]  1000 16367    10735     1512      26        0             0 psql
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.963722] [16368]  1000 16368     7725     1044      19        0             0 aws

Обновление 2017-06-01 16:19 GMT

Изменены настройки на:

checkpoint_segments = 100       # in logfile segments, min 1, 16MB each
checkpoint_timeout = 30s        # range 30s-1h

И я заполнил жесткий диск :( Я щедро увеличил checkpoint_segments, но сначала не проверил доступное пространство. К счастью, я тестирую эту процедуру в непроизводственной среде. Поэтому мне, возможно, придется снова клонировать рабочий сервер, или есть какой-нибудь способ освободить временное пространство, которое сейчас используется?

ERROR:  could not extend file "base/16384/3940428": No space left on device
HINT:  Check free disk space.

В соответствии с вопросом @ deszo, значения переполнения памяти следующие:

vm.nr_overcommit_hugepages = 0
vm.overcommit_kbytes = 0
vm.overcommit_memory = 0
vm.overcommit_ratio = 50

Обновление 2017-06-01 18: 107 GMT

Экземпляр сервера — AWS c4.large (2 vCPU, 3,75 ГБ ОЗУ)

Еще несколько параметров from postgresql.conf:

shared_buffers = 2GB            # min 128kB
work_mem = 32MB             # min 64kB
max_connections =800            # (change requires restart)

Понравилась статья? Поделить с друзьями:
  • Sqlstate hy000 general error 2013 lost connection to mysql server during query
  • Sqlstate hy000 general error 1205 lock wait timeout exceeded try restarting transaction
  • Sqlstate hy000 general error 1114 the table is full
  • Sqlstate hy000 general error 1005 can t create table
  • Sqlstate hy000 general error 1 no such table