Could not receive data from wal stream error requested wal segment - Исправление ошибок и поиск оптимальных решений проблем

We have 2 node cluster, current state :

patronictl -c /etc/patroni/postgres.yml list

+———+————-+—————+———+———+————+
| Cluster | Member | Host | Role | State | Lag in MB |
+———+————-+—————+———+———+————+
| dev1 | postgresql1 | 10.144.250.41 | | running | 371.0 |
| dev1 | postgresql2 | 10.144.250.42 | Leader | running | 0.0 |
+———+————-+—————+———+———+————+

postgresql1 node was in shutdown state few hours ago and now I start it.
It can’t syncing with master with error:
cp: cannot stat ‘/data/wal_archive/000000520000012B00000081’: No such file or directory
< 2019-12-05 10:17:05.994 MSK > LOG: started streaming WAL from primary at 12B/81000000 on timeline 82
< 2019-12-05 10:17:05.994 MSK > FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000520000012B00000081 has already been removed

But it’s not true . Wal segment 000000520000012B00000081 exists on master :
[root@pgdb2 wal_archive]# ll /data/wal_archive/000000520000012B00000081
-rw——- 1 postgres postgres 16777216 Dec 4 23:38 /data/wal_archive/000000520000012B00000081

What’s wrong ?

Источник

Postgres requested WAL segment has already been removed (however it is actually in the slave’s directory)

I am using repmgr as my replication tool. On the slave I keep getting an error:

requested WAL segment has already been removed

When I check the Master indeed it is not there; however, it is in the slave’s directories both in pg_xlogs and pg_xlogs/archive_status . I can’t understand why it would be looking for this file if it’s already in the slave?

In fact it has xlogs going past the requested one. The solutions in What to do with WAL files for Postgres Slave reset are for a slightly different problem. They seem to be for a scenario where the master deletes a log file before the slave receives it. In my case it is very much present on the slave and several other files in the sequence after the one being requested.

This also tells me I do not need to increase the keep wal segments option as it didn’t seem to fall behind?

2 Answers 2

From Streaming Replication in the PostgreSQL documentation:

If you use streaming replication without file-based continuous archiving, the server might recycle old WAL segments before the standby has received them. If this occurs, the standby will need to be reinitialized from a new base backup. You can avoid this by setting wal_keep_segments to a value large enough to ensure that WAL segments are not recycled too early, or by configuring a replication slot for the standby. If you set up a WAL archive that’s accessible from the standby, these solutions are not required, since the standby can always use the archive to catch up provided it retains enough segments.

To fix the issue, you have to reinitialize the data from primary server. Remove data directory on slave:

Copy all data from the primary server:

if version is 12, Create the standby.signal file, otherwise configure replica.conf:

How long is the pg_basebackup taking? Remember that segments are generated about every 5 minutes, so if the backup takes an hour, you need at least 12 segments stored. At 2 hours, you need 24 etc., I’d set the value to about 12.2 segments/hour of backup.

Источник

Postgresql Streaming Replication Error: WAL segment removed

I want to set up PostgreSQL streaming replication, but get the following error:

Master IP : 192.168.0.30

Slave IP : 192.168.0.36

On Master:

I have created a user rep which is used solely for replication.

The relevant files inside Postgres config directory ( /opt/Postgres/9.3/data ):

I’ve restarted the postgres service.

On Slave:

I’ve stopped the postgres service, then applied the changes to the two files:

For replicating the initial database I have done:

On Master:

Internal postgres backup start command to create a backup label:

. for transferring the database data to slave:

. for internal backup stop to clean up:

On Slave:

I’ve created the following recovery.conf :

Starting the postgres service on the slave starts without any errors but is still waiting:

Meanwhile, on master:

psql command on the slave gives:

—> cd pg_log gives reason for waiting:-

How can I solve this error?

1 Answer 1

From Streaming Replication in the PostgreSQL documentation:

If you use streaming replication without file-based continuous archiving, the server might recycle old WAL segments before the standby has received them. If this occurs, the standby will need to be reinitialized from a new base backup. You can avoid this by setting wal_keep_segments to a value large enough to ensure that WAL segments are not recycled too early, or by configuring a replication slot for the standby. If you set up a WAL archive that’s accessible from the standby, these solutions are not required, since the standby can always use the archive to catch up provided it retains enough segments.

Источник

Could not receive data from WAL stream: ERROR: requested WAL segment has already been removed

Configuration: Postgres 9.6 with a 3 cluster node. db1 is the master, db2 and db3 are replicas. WAL files are archived in AWS S3 using custom pgrsync tool. Cluster managed by patroni. The archive_command and restore_command is properly configured on all the nodes.

To simulate: On db1, do heavy writes (like vacuum a large table) and then stop db1 by sudo systemctl stop patroni ). db3 becomes the new leader. db2 requests more WAL files, which it gets via the proper restore command from AWS S3, becomes replica to db3.

Now, start db1 again by ( sudo systemctl start patroni ). But db1 (the old leader and the new to-be-replica) never comes up as a replica and gives the error message:

could not receive data from WAL stream: ERROR: requested WAL segment 0000002400053C55000000AE has already been removed.

This error message is reported by db3 (the leader), which db1 just logs it.

So, let’s see the timeline. Initially db1 was in timeline 35 (0x23) and did write the following files to archive:

db1 is stopped at this point. db3’s logs show this:

and db3 copies the following files to archives

As db3 became leader, db2 starts the process to become replica to db3 (which it successfully becomes) and here is the summary of the logs:

db1 is started now and here are the logs:

0000002400053C55000000AE was never written to archives by any Postgres node. The old leader (db1) copied the archive 0000002300053C55000000AE (note: 0023, not 0024) before it was stopped.
The new leader (db3) copied 0000002200053C55000000AE (note: 0022, not 0024)
max_wal_size is set to 1024 on all nodes.
After db3 became the new leader, there was hardly any activity on the nodes. db3 only writes WAL files every 10 mins ( archive_timeout =600s).

Is there any thing wrong in the configuration that makes the the old leader asking for a WAL segment, which the new leader does not have?
How to restore the old leader (db1) at this state, without having to erase and start over?

Lots of disk space available. The problem can be simulated at will. Had tried pg_rewind on the old primary, pointing to new primary. It just said it is already on the same timeline (not exact words). Note: It was not an error message. But even after that, it was showing the same error, when starting Postgres.

We are on 9.6.19. Just a few days back 9.6.21 (and last 9.5.x release) was announced which exactly points out fix for this specific problem (Fix WAL-reading logic so that standbys can handle timeline switches correctly. This issue could have shown itself with errors like «requested WAL segment has already been removed».). However, even after upgrading to 9.6.21, the same problem exists for us.

Источник

Postgresql сломалась репликаци знаю причины не знаю как поченить стэндбай?

Добрый день.
У меня настроена каскадная репликация с одного мастера идет слейв который в свою очередь после синхронизации реплицирует на последний слейв.
OC — Ubuntu 18
СУБД — PostgreSQL 10
https://postgrespro.ru/docs/postgrespro/10/warm-standby

Проблема заключается в том что на слейвах у меня ощибка

error requested wal segment has already been removed.

Правильно ли Я понимаю что эта ошибка говорит о том что Если база с которой настроенно получение WAL уже удалила сегмент — то тут как раз репликация и встанет с ошибкой, что такого сегмента уже нет. Если его восстановить неоткуда — то необходимо копировать реплику заново. Наиболее простой способ — через pg_basebackup.
https://ru.stackoverflow.com/questions/972585/post.

В связи с этим есть вопросы:

1. Можно сделать это как то подругому ?
К примеру:
Остановить слейв скопировать с мастера папку pg_wal с файлами и запустить его ? Не станет ли слейв после этого мастером ?
Я просто не понимаю что нужно будет делать на слейве что бы он догнал мастер. Иными словами как востонавливать пошагово.

2. Могла ли данная ошибка возникнуть из-за того что Я делаю на мастере pg_basebackup -D /my_dir. Которая в свою очередь лочит базу и репликация не может пройти. Точнее она проходит но с ошибкой.
Тогда как делать правильно pg_basebackup что бы не ломалась реплика ?

Источник

WAL segment has already been removed error when running database backup on CloudForms

Environment

Issue

Getting requested WAL segment 00000001000000140000004F has already been removed when running pg_basebackup on CloudForms.

Resolution

Adjust the wal_keep_segments setting in the postgresql.conf file.

Instructions:

SSL into your DB server CloudForms appliance
Set wal_keep_segments to (

/4)/16MB

Run: psql -c «ALTER SYSTEM SET wal_keep_segments = ‘ ‘; -d vmdb_production ##This is going to make the change

Run: psql -c «SELECT pg_reload_conf();» -d vmdb_production ## To reload the conf file

Run: psql -c «show wal_keep_segments;» -d vmdb_production ## Show what the current setting is (should show )

Please note that when calculating the that you want to keep it as low as possible, as this will take up space on the database

Root Cause

The server recycles old WAL segments before the backup can finish.
Depending on how long the pg_basebackup is taking it is common knowledge that the segments are generated about every 5 minutes, so if the backup takes an hour, you need at least 12 segments stored.

Product(s)
Red Hat Hybrid Cloud Console

Category
Supportability

Tags
cloud
cloudforms
database
postgres

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Источник

Introduction

Various replication modes are available with PostgreSQL.

In this article, a PostgreSQL 9.6 streaming replication is implemented on Linux Ubuntu 18.04 servers, it’s very easy.
In the streaming replication mode,
the standby connects to the primary, which streams WAL records (Write Ahead Log) to the standby as they’re generated,
without waiting for the WAL file to be filled. Streaming replication allows a standby server to stay more up-to-date than
is possible with file-based log shipping.

The standby server can be in read only mode for reporting purposes for example.
A replication slot ensures that the needed WAL files for the standby are not removed in the primary server
before the standby server processes them. Multiple replication slots can be defined depending on the number of the standby servers.

(LSN : Log Sequence Number)

The procedure below is valid for PostgreSQL 9.6, 10 and 11. For PostgreSQL version 12 and above, the setup is slightly different
and not discussed here. It is not important when configuring a PostgreSQL streaming replication, but default WAL files location is
different on PostgreSQL 9.6 and PostgreSQL 10/11.

PostgreSQL 9.6	`$PGDATA/pg_xlog`
PostgreSQL 10/11	`$PGDATA/pg_wal`

The context is the following : 1 primary server, 1 standby server.

Primary server : VPSFRSQLPAC1
Standby server : VPSFRSQLPAC2

Binaries (PostgreSQL 9.6.15) : /opt/postgres/pgsql-9.6/bin
$PATH : /opt/postgres/pgsql-9.6/bin:$PATH
$PGLIB : /opt/postgres/pgsql-9.6/lib

Port : 30001

$PGDATA : /sqlpac/postgres/srvpg1

$CFG : /opt/postgres/dba/srvpg1/cfg

Configuration files :
$CFG/postgresql.conf
$CFG/pg_hba.conf
$CFG/pg_ident.conf

Controlling the PostgreSQL Server :

pg_ctl start|stop|restart…  -D $CFG

Preparing the primary server

System parameters

The primary server must be restarted, especially with PostgreSQL 9.6, to apply at least the following static parameters :

listen_addresses : *
wal_level : the wal level is replica for streaming replication.
max_replication_slots : at least 1 replication slot (1 standby). Higher values of replication slots if more standby or logical servers will be configured.
max_wal_senders : at least 3 wal senders (1 standby + 2 for pg_basebackup). Higher values if more standby servers will be configured.

$CFG/postgresql.conf

listen_addresses = '*'
wal_level=replica
max_replication_slots=3
max_wal_senders=3

With PostgreSQL 10 and 11, the default values are already adjusted for replication. However, check the settings.

PostgreSQL 10 / 11	Default values
`wal_level max_replication_slots max_wal_senders`	`replica 10 10`

postgres@vpsfrsqlpac1$ pg_ctl restart -D $CFG

Replication role

Create a role with the replication privilege, this role will be used by the standby server to connect to the primary
server :

create role repmgr with replication login encrypted password '***********';

Add the role in the primary server file pg_hba.conf with the standby IP address server, this will allow connections from the standby server.
Don’t forget to manage existing firewall rules.

$CFG/pg_hba.conf

host    replication     repmgr          51.xxx.xxx.xxx/32           md5

Here, SSL connections are not implemented.

Reload the configuration :

postgres@vpsfrsqlpac1$ pg_ctl reload -D $CFG

Replication slot

Create a replication slot in the primary server.

select * from pg_create_physical_replication_slot('standby1');
 slot_name | xlog_position
-----------+---------------
 standby1  |
(1 row)

select slot_name, restart_lsn from pg_replication_slots;
 slot_name | restart_lsn
-----------+-------------
 standby1  |
(1 row)

The replication slot (restart_lsn) will be initialized during the primary server backup
with pg_basebackup.

Starting with PostgreSQL 11, it is not mandatory to create manually the replication slot, this one can be created and initialized with
pg_basebackup.

Primary server backup (pg_basebackup)

The primary server backup is performed with pg_basebackup.

postgres@vpsfrsqlpac1$ pg_basebackup -D /sqlpac/postgres/backup/srvpg1 
                         -X stream 
                         --write-recovery-conf 
                         --slot=standby1 
                         --dbname="host=localhost user=postgres port=30001"

Starting with,PostgreSQL 11, add the argument --create-slot if the replication slot has not been previously created.

With the option --slot giving the replication slot name : that way, it is guaranteed the primary server does not remove
any necessary WAL data in the time between the end of the base backup and the start of streaming replication.

When the backup is completed, the replication slot standby1 is then defined :

select slot_name, restart_lsn from pg_replication_slots;
          
 slot_name | restart_lsn
-----------+-------------
 standby1  | 0/33000000

The option --write-recovery-conf (or -R) writes a file recovery.conf in the root backup directory.
This file will prevent any user error when starting the standby server, this file indeed indicates a standby server, and the slot name
is given :

/sqlpac/postgres/backup/srvpg1/recovery.conf

standby_mode = 'on'
primary_conninfo = 'user=postgres host=localhost port=30001 sslmode=prefer sslcompression=1 krbsrvname=postgres'
primary_slot_name = 'standby1'

Standby server activation

Install the primary server backup previously performed in the standby data directory ($PGDATA).

recovery.conf

Be sure the file recovery.conf is installed in the standby server root data directory
with the option standby_mode = 'on' and the replication slot name.

Update the connection info parameters to the primary server in this file.

$PGDATA : /sqlpac/postgres/srvpg1/recovery.conf

standby_mode = 'on'
primary_conninfo = 'user=repmgr host=vpsfrsqlpac1 port=30001 password=************'
primary_slot_name = 'standby1'

postgresql.conf

If read only connections are allowed, check the parameter hot_standby is set to on on the standby server
(on by default starting with PostgreSQL 10):

$CFG/postgresql.conf

hot_standby = on

Starting the standby server

Now the standby server can be started.

postgres@vpsfrsqlpac2$ pg_ctl start -D $CFG

When there is no error, in the standby server log file :

LOG:  entering standby mode
LOG:  redo starts at 0/33000028
LOG:  consistent recovery state reached at 0/34000000
LOG:  database system is ready to accept read only connections
LOG:  started streaming WAL from primary at 0/34000000 on timeline 1

The standby server is in recovery mode :

postgres@vpsfrsqlpac2$ psql -p30001

select pg_is_in_recovery();
 pg_is_in_recovery
-------------------
t

If the replication slot has not been defined and the needed WAL files removed in the primary server, an error occurs :

FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment 000000010000000000000010 has already been removed

Testing replication

Create a table heartbeat in the primary server, this table will be updated every minute :

postgres@vpsfrsqlpac1$ psql -p30001

create table heartbeat ( reptimestamp timestamp );

insert into heartbeat values (now());

select * from heartbeat;
        reptimestamp
----------------------------
 2019-11-22 09:04:36.399274

Check the replication to the standby server :

postgres@vpsfrsqlpac2$ psql -p30001

select * from heartbeat;
        reptimestamp
----------------------------
 2019-11-22 09:04:36.399274

Pause / Resume replication

To pause/resume replication, on the standby server :

postgres@vpsfrsqlpac2$ psql -p30001

PostgreSQL 9.6 PostgreSQL 10 / 11

Pause replication

	PostgreSQL 9.6	PostgreSQL 10 / 11
Pause replication	`select pg_xlog_replay_pause();` `2019-11-22 19:58:33.686 CET LOG: recovery has paused`	`select pg_wal_replay_pause();`
Resume replication	`select pg_xlog_replay_resume();`	`select pg_wal_replay_resume();`
Replication paused ?	`select pg_is_xlog_replay_paused();` `pg_is_xlog_replay_paused -------------------------- f`	`select pg_is_wal_replay_paused();` `pg_is_wal_replay_paused -------------------------- f`

select pg_xlog_replay_pause();

2019-11-22 19:58:33.686 CET LOG:  recovery has paused

select pg_wal_replay_pause();

Resume replication

select pg_xlog_replay_resume();

select pg_wal_replay_resume();

Replication paused ?

select pg_is_xlog_replay_paused();

 pg_is_xlog_replay_paused
--------------------------
 f

select pg_is_wal_replay_paused();

 pg_is_wal_replay_paused
--------------------------
 f

Essential replication informations

This article does not deal with replication monitoring, however below the essential informations about the replication state.

Standby server : pg_stat_wal_receiver

In the standby, use the view pg_stat_wal_receiver :

postgres@vpsfrsqlpac2$ psql -p30001

x on;
select * from pg_stat_wal_receiver;
-[ RECORD 1 ]---------+-----------------------------------------------------
pid                   | 2262
status                | streaming
receive_start_lsn     | 0/97000000
receive_start_tli     | 1
received_lsn          | 0/99000920
received_tli          | 1
last_msg_send_time    | 2019-11-22 18:17:46.355579+01
last_msg_receipt_time | 2019-11-22 18:17:46.355835+01
latest_end_lsn        | 0/99000760
latest_end_time       | 2019-11-22 18:15:46.232277+01
slot_name             | standby1
conninfo              | user=repmgr password=******** dbname=replication host=vpsfrsqlpac1 port=30001 …

The WAL receiver process id is 2262 :

postgres@vpsfrsqlpac2$ ps -ef | grep 'postgres' | grep 2262

postgres  2262 32104  0 18:35 ?        00:00:04 postgres: wal receiver process   streaming 0/99000920

Primary server : pg_stat_replication and pg_replication_slots

In the primary server, use pg_stat_replication and pg_replication_slots :

postgres@vpsfrsqlpac1$ psql -p30001

x on;
select * from pg_stat_replication;
-[ RECORD 1 ]----+-----------------------------
pid              | 6247
usesysid         | 16384
usename          | repmgr
application_name | walreceiver
client_addr      | 51.xxx.xxx.xxx
client_hostname  |
client_port      | 41354
backend_start    | 2019-11-22 09:35:42.41099+01
backend_xmin     |
state            | streaming
sent_location    | 0/99000920
write_location   | 0/99000920
flush_location   | 0/99000920
replay_location  | 0/99000840
sync_priority    | 0
sync_state       | async
x on;
select * from pg_replication_slots;
-[ RECORD 1 ]-------+-----------
slot_name           | standby1
plugin              |
slot_type           | physical
datoid              |
database            |
active              | t
active_pid          | 6247
xmin                |
catalog_xmin        |
restart_lsn         | 0/99000920
confirmed_flush_lsn |

The WAL sender process id is 6247 :

postgres@vpsfrsqlpac1$ ps -ef | grep 'postgres' | grep 6247

postgres  6247  5576  0 18:35 ?        00:00:00 postgres: wal sender process repmgr 51.xxx.xxx.xxx(41354) streaming 0/99000920

Conclusion

Installing a streaming replication with PostgreSQL 9.6 is very easy, maybe one of the easiest replication architecture.

Do not forget replication slots ! Only replication slots guarantee standby servers won’t run out of sync from the primary server.

Источник

patronictl -c /etc/patroni/postgres.yml list

Postgres requested WAL segment has already been removed (however it is actually in the slave’s directory)

2 Answers 2

Postgresql Streaming Replication Error: WAL segment removed

1 Answer 1

Could not receive data from WAL stream: ERROR: requested WAL segment has already been removed

Postgresql сломалась репликаци знаю причины не знаю как поченить стэндбай?

WAL segment has already been removed error when running database backup on CloudForms

Environment

Issue

Resolution

Root Cause

Introduction

Preparing the primary server

System parameters

Replication role

Replication slot

Primary server backup (pg_basebackup)

Standby server activation

recovery.conf

postgresql.conf

Starting the standby server

Testing replication

Pause / Resume replication

Essential replication informations

Standby server : pg_stat_wal_receiver

Primary server : pg_stat_replication and pg_replication_slots

Conclusion

Читайте также: