Hi,
We are using pgbouncer on windows machine and postgress db on centos.
So problem is pgbouncer is failing intermittently, log says that ‘Pooler Error: pgbouncer cannot connect to server’
pgbouncer version : 1.7.1
Any help on this?
Logs:
2017-09-29 05:04:30.159 2756 WARNING lookup failed: myhostname: result=11003
2017-09-29 05:04:30.159 2756 NOISE dns: deliver_info(myhostname) addr=NULL
2017-09-29 05:04:30.159 2756 LOG S-01022ce0: test/test@(bad-af):0 closing because: server dns lookup failed (age=3)
2017-09-29 05:04:30.159 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:30.486 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:30.820 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:31.155 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:31.489 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:31.823 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:32.157 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:32.491 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:32.825 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:33.159 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:33.493 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:33.827 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:34.161 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:34.495 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:34.829 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:35.163 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:35.497 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:35.831 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:36.165 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:36.351 2756 NOISE new fd from accept=612
2017-09-29 05:04:36.352 2756 DEBUG C-00fec670: (nodb)/(nouser)@127.0.0.1:19939 P: got connection: 127.0.0.1:19939 -> 127.0.0.1:5432
2017-09-29 05:04:36.352 2756 NOISE safe_accept(524) = A non-blocking socket operation could not be completed immediately.
2017-09-29 05:04:36.352 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:36.353 2756 NOISE resync: done=0, parse=0, recv=0
2017-09-29 05:04:36.353 2756 NOISE C-00fec670: (nodb)/(nouser)@127.0.0.1:19939 pkt=’!’ len=93
2017-09-29 05:04:36.353 2756 DEBUG C-00fec670: (nodb)/(nouser)@127.0.0.1:19939 got var: user=testuser
2017-09-29 05:04:36.353 2756 DEBUG C-00fec670: (nodb)/(nouser)@127.0.0.1:19939 got var: client_encoding=UTF8
2017-09-29 05:04:36.354 2756 DEBUG C-00fec670: (nodb)/(nouser)@127.0.0.1:19939 got var: database=testdb
2017-09-29 05:04:36.354 2756 DEBUG C-00fec670: (nodb)/(nouser)@127.0.0.1:19939 using application_name: testApp
2017-09-29 05:04:36.354 2756 LOG C-00fec670: test/test@127.0.0.1:19939 login attempt: db=testdb user=testuser tls=no
2017-09-29 05:04:36.355 2756 LOG C-00fec670: test/test@@127.0.0.1:19939 closing because: pgbouncer cannot connect to server (age=0)
2017-09-29 05:04:36.355 2756 WARNING C-00fec670: test/test@@127.0.0.1:19939 Pooler Error: pgbouncer cannot connect to server
2017-09-29 05:04:36.355 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:36.356 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:36.499 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:36.833 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:37.167 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:37.501 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:37.835 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:38.169 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:38.503 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:38.837 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:39.171 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:39.505 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:39.839 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:40.173 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:40.507 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:40.841 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:41.175 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:41.509 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:41.843 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:42.177 2756 NOISE S-01022ce0: test/test@(bad-af):0 inet socket: myhostname
2017-09-29 05:04:42.177 2756 NOISE S-01022ce0: test/test@(bad-af):0 dns socket: myhostname
2017-09-29 05:04:42.177 2756 NOISE dns: deliver_info(myhostname) addr=NULL
2017-09-29 05:04:42.177 2756 LOG S-01022ce0: test/test@(bad-af):0 closing because: server dns lookup failed (age=0)
Thank you!
Содержание
- Pooler Error: pgbouncer cannot connect to server #239
- Comments
- Common Error Messages¶
- Relation foo is not distributed¶
- Resolution¶
- Could not receive query results¶
- Resolution¶
- Canceling the transaction since it was involved in a distributed deadlock¶
- Resolution¶
- Could not connect to server: Cannot assign requested address¶
- Resolution¶
- SSL error: certificate verify failed¶
- Resolution¶
- Could not connect to any active placements¶
- Resolution¶
- Remaining connection slots are reserved for non-replication superuser connections¶
- Resolution¶
- PgBouncer cannot connect to server¶
- Resolution¶
- Unsupported clause type¶
- Resolution¶
- Cannot open new connections after the first modification command within a transaction¶
- Resolution¶
- Cannot create uniqueness constraint¶
- Resolution¶
- Function create_distributed_table does not exist¶
- Resolution¶
- STABLE functions used in UPDATE queries cannot be called with column references¶
- Resolution¶
- Unable to connect to EDB-PostGreSQL #161
- Comments
Pooler Error: pgbouncer cannot connect to server #239
We are using pgbouncer on windows machine and postgress db on centos.
So problem is pgbouncer is failing intermittently, log says that ‘Pooler Error: pgbouncer cannot connect to server’
pgbouncer version : 1.7.1
Any help on this?
2017-09-29 05:04:30.159 2756 WARNING lookup failed: myhostname: result=11003
2017-09-29 05:04:30.159 2756 NOISE dns: deliver_info(myhostname) addr=NULL
2017-09-29 05:04:30.159 2756 LOG S-01022ce0: test/test@(bad-af):0 closing because: server dns lookup failed (age=3)
2017-09-29 05:04:30.159 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:30.486 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:30.820 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:31.155 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:31.489 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:31.823 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:32.157 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:32.491 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:32.825 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:33.159 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:33.493 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:33.827 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:34.161 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:34.495 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:34.829 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:35.163 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:35.497 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:35.831 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:36.165 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:36.351 2756 NOISE new fd from accept=612
2017-09-29 05:04:36.352 2756 DEBUG C-00fec670: (nodb)/(nouser)@127.0.0.1:19939 P: got connection: 127.0.0.1:19939 -> 127.0.0.1:5432
2017-09-29 05:04:36.352 2756 NOISE safe_accept(524) = A non-blocking socket operation could not be completed immediately.
2017-09-29 05:04:36.352 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:36.353 2756 NOISE resync: done=0, parse=0, recv=0
2017-09-29 05:04:36.353 2756 NOISE C-00fec670: (nodb)/(nouser)@127.0.0.1:19939 pkt=’!’ len=93
2017-09-29 05:04:36.353 2756 DEBUG C-00fec670: (nodb)/(nouser)@127.0.0.1:19939 got var: user=testuser
2017-09-29 05:04:36.353 2756 DEBUG C-00fec670: (nodb)/(nouser)@127.0.0.1:19939 got var: client_encoding=UTF8
2017-09-29 05:04:36.354 2756 DEBUG C-00fec670: (nodb)/(nouser)@127.0.0.1:19939 got var: database=testdb
2017-09-29 05:04:36.354 2756 DEBUG C-00fec670: (nodb)/(nouser)@127.0.0.1:19939 using application_name: testApp
2017-09-29 05:04:36.354 2756 LOG C-00fec670: test/test@127.0.0.1:19939 login attempt: db=testdb user=testuser tls=no
2017-09-29 05:04:36.355 2756 LOG C-00fec670: test/test@@127.0.0.1:19939 closing because: pgbouncer cannot connect to server (age=0)
2017-09-29 05:04:36.355 2756 WARNING C-00fec670: test/test@@127.0.0.1:19939 Pooler Error: pgbouncer cannot connect to server
2017-09-29 05:04:36.355 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:36.356 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:36.499 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:36.833 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:37.167 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:37.501 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:37.835 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:38.169 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:38.503 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:38.837 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:39.171 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:39.505 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:39.839 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:40.173 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:40.507 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:40.841 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:41.175 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:41.509 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:41.843 2756 DEBUG launch_new_connection: last failed, wait
2017-09-29 05:04:42.177 2756 NOISE S-01022ce0: test/test@(bad-af):0 inet socket: myhostname
2017-09-29 05:04:42.177 2756 NOISE S-01022ce0: test/test@(bad-af):0 dns socket: myhostname
2017-09-29 05:04:42.177 2756 NOISE dns: deliver_info(myhostname) addr=NULL
2017-09-29 05:04:42.177 2756 LOG S-01022ce0: test/test@(bad-af):0 closing because: server dns lookup failed (age=0)
The text was updated successfully, but these errors were encountered:
Источник
Common Error Messages¶
Relation foo is not distributed¶
This is caused by attempting to join local and distributed tables in the same query.
Resolution¶
For an example, with workarounds, see JOIN a local and a distributed table .
Could not receive query results¶
Caused when the the coordinator node is unable to connect to a worker.
Resolution¶
To fix, check that the worker is accepting connections, and that DNS is correctly resolving.
Canceling the transaction since it was involved in a distributed deadlock¶
Deadlocks can happen not only in a single-node database, but in a distributed database, caused by queries executing across multiple nodes. Citus has the intelligence to recognize distributed deadlocks and defuse them by aborting one of the queries involved.
We can see this in action by distributing rows across worker nodes, and then running two concurrent transactions with conflicting updates:
Resolution¶
Detecting deadlocks and stopping them is part of normal distributed transaction handling. It allows an application to retry queries or take another course of action.
Could not connect to server: Cannot assign requested address¶
This occurs when there are no more sockets available by which the coordinator can respond to worker requests.
Resolution¶
Configure the operating system to re-use TCP sockets. Execute this on the shell in the coordinator node:
This allows reusing sockets in TIME_WAIT state for new connections when it is safe from a protocol viewpoint. Default value is 0 (disabled).
SSL error: certificate verify failed¶
As of Citus 8.1, nodes are required talk to one another using SSL by default. If SSL is not enabled on a Postgres server when Citus is first installed, the install process will enable it, which includes creating and self-signing an SSL certificate.
However, if a root certificate authority file exists (typically in
/.postgresql/root.crt ), then the certificate will be checked unsuccessfully against that CA at connection time. The Postgres documentation about SSL support warns:
Resolution¶
Possible solutions are to sign the certificate, turn off SSL, or remove the root certificate.
Could not connect to any active placements¶
When all available worker connection slots are in use, further connections will fail.
Resolution¶
This error happens most often when copying data into Citus in parallel. The COPY command opens up one connection per shard. If you run M concurrent copies into a destination with N shards, that will result in M*N connections. To solve the error, reduce the shard count of target distributed tables, or run fewer copy commands in parallel.
Remaining connection slots are reserved for non-replication superuser connections¶
This occurs when PostgreSQL runs out of available connections to serve concurrent client requests.
Resolution¶
The max_connections GUC adjusts the limit, with a typical default of 100 connections. Note that each connection consumes resources, so adjust sensibly. When increasing max_connections it’s usually a good idea to increase memory limits too.
Using PgBouncer can also help by queueing connection requests which exceed the connection limit. Citus Cloud has a built-in PgBouncer instance, see Scaling Connections (pgBouncer) to learn how to connect through it.
PgBouncer cannot connect to server¶
In a self-hosted Citus cluster, this error indicates that the coordinator node is not responding to PgBouncer.
Resolution¶
Try connecting directly to the server with psql to ensure it is running and accepting connections.
Unsupported clause type¶
This error no longer occurs in the current version of citus. It used to happen when executing a join with an inequality condition:
Resolution¶
Upgrade to Citus 7.2 or higher.
Cannot open new connections after the first modification command within a transaction¶
This error no longer occurs in the current version of citus except in certain unusual shard repair scenarios. It used to happen when updating rows in a transaction, and then running another command which would open new coordinator-to-worker connections.
Resolution¶
Upgrade to Citus 7.2 or higher.
Cannot create uniqueness constraint¶
As a distributed system, Citus can guarantee uniqueness only if a unique index or primary key constraint includes a table’s distribution column. That is because the shards are split so that each shard contains non-overlapping partition column values. The index on each worker node can locally enforce its part of the constraint.
Trying to make a unique index on a non-distribution column will generate an error:
Enforcing uniqueness on a non-distribution column would require Citus to check every shard on every INSERT to validate, which defeats the goal of scalability.
Resolution¶
There are two ways to enforce uniqueness on a non-distribution column:
- Create a composite unique index or primary key that includes the desired column (C), but also includes the distribution column (D). This is not quite as strong a condition as uniqueness on C alone, but will ensure that the values of C are unique for each value of D. For instance if distributing by company_id in a multi-tenant system, this approach would make C unique within each company.
- Use a reference table rather than a hash distributed table. This is only suitable for small tables, since the contents of the reference table will be duplicated on all nodes.
Function create_distributed_table does not exist¶
Resolution¶
When basic Citus Utility Functions are not available, check whether the Citus extension is properly installed. Running dx in psql will list installed extensions.
One way to end up without extensions is by creating a new database in a Postgres server, which requires extensions to be re-installed. See Creating a New Database to learn how to do it right.
STABLE functions used in UPDATE queries cannot be called with column references¶
Each PostgreSQL function is marked with a volatility, which indicates whether the function can update the database, and whether the function’s return value can vary over time given the same inputs. A STABLE function is guaranteed to return the same results given the same arguments for all rows within a single statement, while an IMMUTABLE function is guaranteed to return the same results given the same arguments forever.
Non-immutable functions can be inconvenient in distributed systems because they can introduce subtle changes when run at slightly different times across shard replicas. Differences in database configuration across nodes can also interact harmfully with non-immutable functions.
One of the most common ways this can happen is using the timestamp type in Postgres, which unlike timestamptz does not keep a record of time zone. Interpreting a timestamp column makes reference to the database timezone, which can be changed between queries, hence functions operating on timestamps are not immutable.
Citus forbids running distributed queries that filter results using stable functions on columns. For instance:
In this case the comparison operator between timestamp and timestamptz is not immutable.
Resolution¶
Avoid stable functions on columns in a distributed UPDATE statement. In particular, whenever working with times use timestamptz rather than timestamp . Having a time zone in timestamptz makes calculations immutable.
© Copyright 2022, Citus Data, a Microsoft Company. Revision 7038ea07 .
Источник
Unable to connect to EDB-PostGreSQL #161
Please could you help? I’m sure there’s a problem with PGBouncer when using the following configuration.
I’ve been testing this for the past few days and cannot get PGBouncer to connect with PostgreSQL
I had was a working system using PGBouncer 1.5, but on upgrading to 1.72 have not been able to get it to connect since.
PGBouncer was installed using EnterpriseDB Application Stack Builder
PG version 9.5 (also tested 9.6)
PGBouncer version 1.72
Tested on Windows 10, 7 and 2008 server.
For testing i’m using a very simple PHP script just to connect. EG:
host=127.0.0.1 dbname=postgres port=6432 user=postgres password=blabla
When running the above I get:
Unable to connect to PostgreSQL server: ERROR: client_login_timeout (server down)
Unable to connect to PostgreSQL server: ERROR: pgbouncer cannot connect to server
But if I swap port 6432 for 5432 and try again, the connection is successful.
Here’s an extract from the PGBouncer log:
2016-11-07 22:12:52.913 12612 LOG File descriptor limit: -1 (H:-1), max_client_conn: 100, max fds possible: 230
2016-11-07 22:12:52.924 12612 LOG listening on ::/6432
2016-11-07 22:12:52.924 12612 LOG listening on 0.0.0.0:6432
2016-11-07 22:12:52.927 12612 LOG process up: pgbouncer 1.7.2, libevent 2.0.21-stable (win32), adns: evdns2, tls: OpenSSL 1.0.2g 1 Mar 2016
2016-11-07 22:13:09.223 12612 LOG C-01558FD8: postgres/postgres@127.0.0.1:60223 login attempt: db=postgres user=postgres tls=no
2016-11-07 22:13:09.225 12612 WARNING
2016-11-07 22:13:09.225 12612 LOG S-0158F240: postgres/postgres@127.0.0.1:5432 closing because: connect failed (age=0)
2016-11-07 22:13:24.299 12612 WARNING
2016-11-07 22:13:24.301 12612 LOG S-0158F240: postgres/postgres@127.0.0.1:5432 closing because: connect failed (age=0)
2016-11-07 22:13:39.317 12612 WARNING
2016-11-07 22:13:39.318 12612 LOG S-0158F240: postgres/postgres@127.0.0.1:5432 closing because: connect failed (age=0)
2016-11-07 22:13:52.924 12612 LOG Stats: 0 req/s, in 0 b/s, out 0 b/s,query 0 us
2016-11-07 22:13:54.338 12612 WARNING
2016-11-07 22:13:54.338 12612 LOG S-0158F240: postgres/postgres@127.0.0.1:5432 closing because: connect failed (age=0)
2016-11-07 22:14:09.353 12612 LOG C-01558FD8: postgres/postgres@127.0.0.1:60223 closing because: client_login_timeout (server down) (age=60)
2016-11-07 22:14:09.355 12612 WARNING C-01558FD8: postgres/postgres@127.0.0.1:60223 Pooler Error: client_login_timeout (server down)
Connecting to and validating with PGBouncer appears to be working fine, however the subsequent connection from PGBouncer to PostgreSQL fails.
I’ve checked the PostgreSQL logs but all that is listed is:
PG Log
2016-11-06 22:17:33 GMT LOG: incomplete startup packet
2016-11-06 22:17:52 GMT LOG: incomplete startup packet
I’m using the following PGBouncer config:
pgbouncer.ini
[databases]
postgres = host=127.0.0.1 port=5432
testdb = host=127.0.0.1 port=5432
[pgbouncer]
listen_addr = *
listen_port = 6432
auth_type = md5
auth_file = C:Program FilesPgBounceretcuserlist.txt
the rest is standard and was not altered.
PostgreSQL hba file
host all all 127.0.0.1/32 md5
host all all 0.0.0.0 0.0.0.0 md5
PG is set to listen on all addresses.
I’ve exhausted any more configurations i can think of and have struggled to find anything meaningful via Google.
Thank you, regards
Jamie
The text was updated successfully, but these errors were encountered:
Hi,
I understand everyone’s extremely busy, but please is there any quick suggestions i could investigate or perhaps more information that I can provide that helps?
Are you using stock PostgreSQL or the EnterpriseDB variant? pgbouncer does not use libpq, so if EDB has changed protocol then pgbouncer cannot connect to it indeed.
If stock one, then «incomplete startup packet» in weird. Could you turn on pgbouncer verbose logging with «pgbouncer -v -v » and see if it logs anything interesting.
Thanks very much for looking at this.
The installation we’re using is the EnterpriseDB variant.
As suggested, I started pgbouncer with verbose logging («pgbouncer -v -v «) and have pasted an extract below.
However these two lines looked interesting, but i’m not 100% sure:
2016-12-06 19:46:33.700 6524 DEBUG loading auth_file: «C:Program Files (x86)PgBounceretcuserlist.txt»
2016-12-06 19:46:33.716 6524 NOISE event: 72, SBuf: 108, PgSocket: 264, IOBuf: 4108
Does this sound like it’s had a problem loading the userlist.txt file?
pgbouncer log
c:Program Files (x86)PgBouncerbin>pgbouncer ../share/pgbouncer.ini -v -v
2016-12-06 19:46:33.599 6524 DEBUG parse_ini_file: [databases]
2016-12-06 19:46:33.599 6524 DEBUG parse_ini_file: ‘postgres’ = ‘host=127.0.0.1 port=5444’
2016-12-06 19:46:33.616 6524 NOISE cstr_get_pair: «host»=»127.0.0.1»
2016-12-06 19:46:33.617 6524 NOISE cstr_get_pair: «port»=»5444»
2016-12-06 19:46:33.618 6524 DEBUG pktbuf_dynamic(128): 013183B8
2016-12-06 19:46:33.618 6524 DEBUG parse_ini_file: ‘postgres’ = ‘host=127.0.0.1 port=5444’ ok:1
2016-12-06 19:46:33.618 6524 DEBUG parse_ini_file: ‘promsol’ = ‘host=127.0.0.1 port=5444’
2016-12-06 19:46:33.618 6524 NOISE cstr_get_pair: «host»=»127.0.0.1»
2016-12-06 19:46:33.628 6524 NOISE cstr_get_pair: «port»=»5444»
2016-12-06 19:46:33.630 6524 DEBUG pktbuf_dynamic(128): 013184D0
2016-12-06 19:46:33.632 6524 DEBUG parse_ini_file: ‘promsol’ = ‘host=127.0.0.1 port=5444’ ok:1
2016-12-06 19:46:33.634 6524 DEBUG parse_ini_file: [pgbouncer]
2016-12-06 19:46:33.636 6524 DEBUG parse_ini_file: ‘logfile’ = ‘C:Program Files (x86)PgBouncerlogpgbouncer.log’
2016-12-06 19:46:33.640 6524 DEBUG parse_ini_file: ‘logfile’ = ‘C:Program Files (x86)PgBouncerlogpgbouncer.log’ ok:1
2016-12-06 19:46:33.644 6524 DEBUG parse_ini_file: ‘pidfile’ = ‘C:Program Files (x86)PgBouncerlogpgbouncer.pid’
2016-12-06 19:46:33.648 6524 DEBUG parse_ini_file: ‘pidfile’ = ‘C:Program Files (x86)PgBouncerlogpgbouncer.pid’ ok:1
2016-12-06 19:46:33.652 6524 DEBUG parse_ini_file: ‘listen_addr’ = ‘‘
2016-12-06 19:46:33.654 6524 DEBUG parse_ini_file: ‘listen_addr’ = ‘‘ ok:1
2016-12-06 19:46:33.656 6524 DEBUG parse_ini_file: ‘listen_port’ = ‘6432’
2016-12-06 19:46:33.659 6524 DEBUG parse_ini_file: ‘listen_port’ = ‘6432’ ok:1
2016-12-06 19:46:33.662 6524 DEBUG parse_ini_file: ‘auth_type’ = ‘md5’
2016-12-06 19:46:33.664 6524 DEBUG parse_ini_file: ‘auth_type’ = ‘md5’ ok:1
2016-12-06 19:46:33.666 6524 DEBUG parse_ini_file: ‘auth_file’ = ‘C:Program Files (x86)PgBounceretcuserlist.txt’
2016-12-06 19:46:33.670 6524 DEBUG parse_ini_file: ‘auth_file’ = ‘C:Program Files (x86)PgBounceretcuserlist.txt’ ok:1
2016-12-06 19:46:33.670 6524 DEBUG parse_ini_file: ‘admin_users’ = ‘postgres’
2016-12-06 19:46:33.670 6524 DEBUG parse_ini_file: ‘admin_users’ = ‘postgres’ ok:1
2016-12-06 19:46:33.670 6524 DEBUG parse_ini_file: ‘stats_users’ = ‘postgres’
2016-12-06 19:46:33.682 6524 DEBUG parse_ini_file: ‘stats_users’ = ‘postgres’ ok:1
2016-12-06 19:46:33.684 6524 DEBUG parse_ini_file: ‘pool_mode’ = ‘session’
2016-12-06 19:46:33.686 6524 DEBUG parse_ini_file: ‘pool_mode’ = ‘session’ ok:1
2016-12-06 19:46:33.689 6524 DEBUG parse_ini_file: ‘server_reset_query’ = ‘DISCARD ALL’
2016-12-06 19:46:33.689 6524 DEBUG parse_ini_file: ‘server_reset_query’ = ‘DISCARD ALL’ ok:1
2016-12-06 19:46:33.689 6524 DEBUG parse_ini_file: ‘ignore_startup_parameters’ = ‘application_name’
2016-12-06 19:46:33.698 6524 DEBUG parse_ini_file: ‘ignore_startup_parameters’ = ‘application_name’ ok:1
2016-12-06 19:46:33.700 6524 DEBUG parse_ini_file: ‘max_client_conn’ = ‘100’
2016-12-06 19:46:33.700 6524 DEBUG parse_ini_file: ‘max_client_conn’ = ‘100’ ok:1
2016-12-06 19:46:33.700 6524 DEBUG parse_ini_file: ‘default_pool_size’ = ’20’
2016-12-06 19:46:33.700 6524 DEBUG parse_ini_file: ‘default_pool_size’ = ’20’ ok:1
2016-12-06 19:46:33.700 6524 DEBUG loading auth_file: «C:Program Files (x86)PgBounceretcuserlist.txt»
2016-12-06 19:46:33.716 6524 NOISE event: 72, SBuf: 108, PgSocket: 264, IOBuf: 4108
2016-12-06 19:46:33.718 6524 LOG File descriptor limit: -1 (H:-1), max_client_conn: 100, max fds possible: 150
2016-12-06 19:46:33.720 6524 DEBUG pktbuf_dynamic(128): 007F81D0
2016-12-06 19:46:33.720 6524 DEBUG make_room(007F81D0, 9): realloc newlen=256
2016-12-06 19:46:33.720 6524 DEBUG pktbuf_dynamic(128): 007F8350
2016-12-06 19:46:33.720 6524 DEBUG adns_create_context: evdns2
2016-12-06 19:46:33.738 6524 DEBUG add_listen: ::/6432
2016-12-06 19:46:33.740 6524 LOG listening on ::/6432
2016-12-06 19:46:33.741 6524 DEBUG add_listen: 0.0.0.0:6432
2016-12-06 19:46:33.741 6524 LOG listening on 0.0.0.0:6432
2016-12-06 19:46:33.749 6524 LOG process up: pgbouncer 1.7.2, libevent 2.0.21-stable (win32), adns: evdns2, tls: OpenSSL 1.0.2g 1 Mar 2016
2016-12-06 19:46:36.919 6524 NOISE new fd from accept=612
2016-12-06 19:46:36.920 6524 DEBUG C-007F8FD8: (nodb)/(nouser)@127.0.0.1:60875 P: got connection: 127.0.0.1:60875 -> 127.0.0.1:6432
2016-12-06 19:46:36.920 6524 NOISE
2016-12-06 19:46:36.920 6524 NOISE resync: done=0, parse=0, recv=0
2016-12-06 19:46:36.920 6524 NOISE C-007F8FD8: (nodb)/(nouser)@127.0.0.1:60875 pkt=’!’ len=8
2016-12-06 19:46:36.920 6524 NOISE C-007F8FD8: (nodb)/(nouser)@127.0.0.1:60875 C: req SSL
2016-12-06 19:46:36.937 6524 NOISE C-007F8FD8: (nodb)/(nouser)@127.0.0.1:60875 P: nak
2016-12-06 19:46:36.940 6524 NOISE resync: done=8, parse=8, recv=8
2016-12-06 19:46:36.940 6524 NOISE resync: done=0, parse=0, recv=0
2016-12-06 19:46:36.940 6524 NOISE C-007F8FD8: (nodb)/(nouser)@127.0.0.1:60875 pkt=’!’ len=41
2016-12-06 19:46:36.940 6524 DEBUG C-007F8FD8: (nodb)/(nouser)@127.0.0.1:60875 got var: user=postgres
2016-12-06 19:46:36.940 6524 DEBUG C-007F8FD8: (nodb)/(nouser)@127.0.0.1:60875 got var: database=postgres
2016-12-06 19:46:36.940 6524 LOG C-007F8FD8: postgres/postgres@127.0.0.1:60875 login attempt: db=postgres user=postgres tls=no
2016-12-06 19:46:36.960 6524 NOISE resync: done=41, parse=41, recv=41
2016-12-06 19:46:36.963 6524 NOISE resync: done=0, parse=0, recv=0
2016-12-06 19:46:36.965 6524 NOISE C-007F8FD8: postgres/postgres@127.0.0.1:60875 pkt=’p’ len=41
2016-12-06 19:46:36.968 6524 DEBUG finish_client_login: no welcome message, pause
2016-12-06 19:46:36.971 6524 DEBUG C-007F8FD8: postgres/postgres@127.0.0.1:60875 pause_client
2016-12-06 19:46:36.974 6524 NOISE S-0082F240: postgres/postgres@(bad-af):0 inet socket: 127.0.0.1
2016-12-06 19:46:36.977 6524 DEBUG S-0082F240: postgres/postgres@127.0.0.1:5444 launching new connection to server
2016-12-06 19:46:36.981 6524 NOISE
2016-12-06 19:46:36.983 6524 WARNING
2016-12-06 19:46:36.984 6524 LOG S-0082F240: postgres/postgres@127.0.0.1:5444 closing because: connect failed (age=0)
2016-12-06 19:46:36.988 6524 NOISE failed to launch new connection
2016-12-06 19:46:36.990 6524 DEBUG launch_new_connection: last failed, wait
2016-12-06 19:46:37.109 6524 DEBUG launch_new_connection: last failed, wait
2016-12-06 19:46:37.443 6524 DEBUG launch_new_connection: last failed, wait
2016-12-06 19:46:37.776 6524 DEBUG launch_new_connection: last failed, wait
2016-12-06 19:46:38.125 6524 DEBUG launch_new_connection: last failed, wait
2016-12-06 19:46:38.461 6524 DEBUG launch_new_connection: last failed, wait
2016-12-06 19:46:38.794 6524 DEBUG launch_new_connection: last failed, wait
2016-12-06 19:46:51.321 6524 DEBUG launch_new_connection: last failed, wait
2016-12-06 19:46:51.655 6524 DEBUG launch_new_connection: last failed, wait
2016-12-06 19:46:52.002 6524 NOISE S-0082F240: postgres/postgres@(bad-af):0 inet socket: 127.0.0.1
2016-12-06 19:46:52.006 6524 DEBUG S-0082F240: postgres/postgres@127.0.0.1:5444 launching new connection to server
2016-12-06 19:46:52.029 6524 NOISE
2016-12-06 19:46:52.040 6524 WARNING
2016-12-06 19:46:52.046 6524 LOG S-0082F240: postgres/postgres@127.0.0.1:5444 closing because: connect failed (age=0)
2016-12-06 19:46:52.070 6524 NOISE failed to launch new connection
2016-12-06 19:46:52.335 6524 DEBUG launch_new_connection: last failed, wait
2016-12-06 19:46:52.669 6524 DEBUG launch_new_connection: last failed, wait
2016-12-06 19:46:53.007 6524 DEBUG launch_new_connection: last failed, wait
2016-12-06 19:46:53.341 6524 DEBUG launch_new_connection: last failed, wait
2016-12-06 19:46:53.678 6524 DEBUG launch_new_connection: last failed, wait
2016-12-06 19:46:54.024 6524 DEBUG launch_new_connection: last failed, wait
2016-12-06 19:46:54.357 6524 DEBUG launch_new_connection: last failed, wait
2016-12-06 19:46:54.694 6524 DEBUG launch_new_connection: last failed, wait
2016-12-06 19:46:55.039 6524 DEBUG launch_new_connection: last failed, wait
2016-12-06 19:46:55.115 6524 LOG Got SIGINT, shutting down
2016-12-06 19:46:55.373 6524 LOG server connections dropped, exiting
c:Program Files (x86)PgBouncerbin>
postgres log:
2016-12-06 19:35:18 GMT LOG: incomplete startup packet
2016-12-06 19:46:24 GMT LOG: incomplete startup packet
2016-12-06 19:46:36 GMT LOG: incomplete startup packet
2016-12-06 19:46:52 GMT LOG: incomplete startup packet
Источник
Troubleshooting a reference architecture setup (FREE SELF)
This page serves as the troubleshooting documentation if you followed one of
the reference architectures.
Troubleshooting object storage
S3 API compatibility issues
Not all S3 providers are fully compatible
with the Fog library that GitLab uses. Symptoms include:
411 Length Required
GitLab Pages requires NFS
If you intend to use GitLab Pages, this currently requires
NFS. There is work in progress
to remove this dependency. In the future, GitLab Pages will use
object storage.
The dependency on disk storage also prevents Pages being deployed using the
GitLab Helm chart.
Incremental logging is required for CI to use object storage
If you configure GitLab to use object storage for CI logs and artifacts,
you must also enable incremental logging.
Proxy Download
A number of the use cases for object storage allow client traffic to be redirected to the
object storage back end, like when Git clients request large files via LFS or when
downloading CI artifacts and logs.
When the files are stored on local block storage or NFS, GitLab has to act as a proxy.
With object storage, the default behavior is for GitLab to redirect to the object
storage device rather than proxy the request.
The proxy_download
setting controls this behavior: the default is generally false
.
Verify this in the documentation for each use case. Set it to true
to make
GitLab proxy the files rather than redirect.
When not proxying files, GitLab returns an
HTTP 302 redirect with a pre-signed, time-limited object storage URL.
This can result in some of the following problems:
-
If GitLab is using non-secure HTTP to access the object storage, clients may generate
https->http
downgrade errors and refuse to process the redirect. The solution to this
is for GitLab to use HTTPS. LFS, for example, will generate this error:LFS: lfsapi/client: refusing insecure redirect, https->http
-
Clients will need to trust the certificate authority that issued the object storage
certificate, or may return common TLS errors such as:x509: certificate signed by unknown authority
-
Clients will need network access to the object storage. Errors that might result
if this access is not in place include:Received status code 403 from server: Forbidden
ETag mismatch
Using the default GitLab settings, some object storage back-ends such as
MinIO
and Alibaba
might generate ETag mismatch
errors.
When using GitLab direct upload, the
workaround for MinIO
is to use the --compat
parameter on the server.
We are working on a fix to GitLab component Workhorse, and also
a workaround, in the mean time, to
allow ETag verification to be disabled.
There are a lot of moving parts that needs to be taken care carefully
in order for the HA setup to work as expected.
Before proceeding with the troubleshooting below, check your firewall rules:
- Redis machines
- Accept TCP connection in
6379
- Connect to the other Redis machines via TCP in
6379
- Accept TCP connection in
- Sentinel machines
- Accept TCP connection in
26379
- Connect to other Sentinel machines via TCP in
26379
- Connect to the Redis machines via TCP in
6379
- Accept TCP connection in
Troubleshooting Redis replication
You can check if everything is correct by connecting to each server using
redis-cli
application, and sending the info replication
command as below.
/opt/gitlab/embedded/bin/redis-cli -h <redis-host-or-ip> -a '<redis-password>' info replication
When connected to a Primary
Redis, you will see the number of connected
replicas
, and a list of each with connection details:
# Replication
role:master
connected_replicas:1
replica0:ip=10.133.5.21,port=6379,state=online,offset=208037514,lag=1
master_repl_offset:208037658
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:206989083
repl_backlog_histlen:1048576
When it’s a replica
, you will see details of the primary connection and if
its up
or down
:
# Replication
role:replica
master_host:10.133.1.58
master_port:6379
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
replica_repl_offset:208096498
replica_priority:100
replica_read_only:1
connected_replicas:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
Troubleshooting Sentinel
If you get an error like: Redis::CannotConnectError: No sentinels available.
,
there may be something wrong with your configuration files or it can be related
to this issue.
You must make sure you are defining the same value in redis['master_name']
and redis['master_pasword']
as you defined for your sentinel node.
The way the Redis connector redis-rb
works with sentinel is a bit
non-intuitive. We try to hide the complexity in omnibus, but it still requires
a few extra configurations.
To make sure your configuration is correct:
-
SSH into your GitLab application server
-
Enter the Rails console:
# For Omnibus installations sudo gitlab-rails console # For source installations sudo -u git rails console -e production
-
Run in the console:
redis = Redis.new(Gitlab::Redis::SharedState.params) redis.info
Keep this screen open and try to simulate a failover below.
-
To simulate a failover on primary Redis, SSH into the Redis server and run:
# port must match your primary redis port, and the sleep time must be a few seconds bigger than defined one redis-cli -h localhost -p 6379 DEBUG sleep 20
-
Then back in the Rails console from the first step, run:
redis.info
You should see a different port after a few seconds delay
(the failover/reconnect time).
Troubleshooting Gitaly
For troubleshooting information, see Gitaly and Gitaly Cluster
troubleshooting information.
Troubleshooting the GitLab Rails application
mount: wrong fs type, bad option, bad superblock on
You have not installed the necessary NFS client utilities. See step 1 above.
mount: mount point /var/opt/gitlab/... does not exist
This particular directory does not exist on the NFS server. Ensure
the share is exported and exists on the NFS server and try to remount.
Troubleshooting Monitoring
If the monitoring node is not receiving any data, check that the exporters are
capturing data.
curl "http[s]://localhost:<EXPORTER LISTENING PORT>/metric"
or
curl "http[s]://localhost:<EXPORTER LISTENING PORT>/-/metric"
Troubleshooting PgBouncer
In case you are experiencing any issues connecting through PgBouncer, the first place to check is always the logs:
sudo gitlab-ctl tail pgbouncer
Additionally, you can check the output from show databases
in the administrative console. In the output, you would expect to see values in the host
field for the gitlabhq_production
database. Additionally, current_connections
should be greater than 1.
PgBouncer administrative console
As part of Omnibus GitLab, the gitlab-ctl pgb-console
command is provided to automatically connect to the PgBouncer administrative console. See the PgBouncer documentation for detailed instructions on how to interact with the console.
To start a session:
sudo gitlab-ctl pgb-console
The password you will be prompted for is the pgbouncer_user_password
To get some basic information about the instance, run
pgbouncer=# show databases; show clients; show servers;
name | host | port | database | force_user | pool_size | reserve_pool | pool_mode | max_connections | current_connections
---------------------+-----------+------+---------------------+------------+-----------+--------------+-----------+-----------------+---------------------
gitlabhq_production | 127.0.0.1 | 5432 | gitlabhq_production | | 100 | 5 | | 0 | 1
pgbouncer | | 6432 | pgbouncer | pgbouncer | 2 | 0 | statement | 0 | 0
(2 rows)
type | user | database | state | addr | port | local_addr | local_port | connect_time | request_time | ptr | link
| remote_pid | tls
------+-----------+---------------------+--------+-----------+-------+------------+------------+---------------------+---------------------+-----------+------
+------------+-----
C | gitlab | gitlabhq_production | active | 127.0.0.1 | 44590 | 127.0.0.1 | 6432 | 2018-04-24 22:13:10 | 2018-04-24 22:17:10 | 0x12444c0 |
| 0 |
C | gitlab | gitlabhq_production | active | 127.0.0.1 | 44592 | 127.0.0.1 | 6432 | 2018-04-24 22:13:10 | 2018-04-24 22:17:10 | 0x12447c0 |
| 0 |
C | gitlab | gitlabhq_production | active | 127.0.0.1 | 44594 | 127.0.0.1 | 6432 | 2018-04-24 22:13:10 | 2018-04-24 22:17:10 | 0x1244940 |
| 0 |
C | gitlab | gitlabhq_production | active | 127.0.0.1 | 44706 | 127.0.0.1 | 6432 | 2018-04-24 22:14:22 | 2018-04-24 22:16:31 | 0x1244ac0 |
| 0 |
C | gitlab | gitlabhq_production | active | 127.0.0.1 | 44708 | 127.0.0.1 | 6432 | 2018-04-24 22:14:22 | 2018-04-24 22:15:15 | 0x1244c40 |
| 0 |
C | gitlab | gitlabhq_production | active | 127.0.0.1 | 44794 | 127.0.0.1 | 6432 | 2018-04-24 22:15:15 | 2018-04-24 22:15:15 | 0x1244dc0 |
| 0 |
C | gitlab | gitlabhq_production | active | 127.0.0.1 | 44798 | 127.0.0.1 | 6432 | 2018-04-24 22:15:15 | 2018-04-24 22:16:31 | 0x1244f40 |
| 0 |
C | pgbouncer | pgbouncer | active | 127.0.0.1 | 44660 | 127.0.0.1 | 6432 | 2018-04-24 22:13:51 | 2018-04-24 22:17:12 | 0x1244640 |
| 0 |
(8 rows)
type | user | database | state | addr | port | local_addr | local_port | connect_time | request_time | ptr | link | rem
ote_pid | tls
------+--------+---------------------+-------+-----------+------+------------+------------+---------------------+---------------------+-----------+------+----
--------+-----
S | gitlab | gitlabhq_production | idle | 127.0.0.1 | 5432 | 127.0.0.1 | 35646 | 2018-04-24 22:15:15 | 2018-04-24 22:17:10 | 0x124dca0 | |
19980 |
(1 row)
Message: LOG: invalid CIDR mask in address
See the suggested fix in Geo documentation.
Message: LOG: invalid IP mask "md5": Name or service not known
See the suggested fix in Geo documentation.
Troubleshooting PostgreSQL with Patroni
In case you are experiencing any issues connecting through PgBouncer, the first place to check is always the logs for PostgreSQL (which is run through Patroni):
sudo gitlab-ctl tail patroni
Consul and PostgreSQL with Patroni changes not taking effect
Due to the potential impacts, gitlab-ctl reconfigure
only reloads Consul and PostgreSQL, it will not restart the services. However, not all changes can be activated by reloading.
To restart either service, run gitlab-ctl restart consul
or gitlab-ctl restart patroni
respectively.
For PostgreSQL with Patroni, to prevent the primary node from being failed over automatically, it’s safest to stop all secondaries first, then restart the primary and finally restart the secondaries again.
On the Consul server nodes, it is important to restart the Consul service in a controlled fashion. Read our Consul documentation for instructions on how to restart the service.
PgBouncer error ERROR: pgbouncer cannot connect to server
You may get this error when running gitlab-rake gitlab:db:configure
or you
may see the error in the PgBouncer log file.
PG::ConnectionBad: ERROR: pgbouncer cannot connect to server
The problem may be that your PgBouncer node’s IP address is not included in the
trust_auth_cidr_addresses
setting in /etc/gitlab/gitlab.rb
on the database nodes.
You can confirm that this is the issue by checking the PostgreSQL log on the master
database node. If you see the following error then trust_auth_cidr_addresses
is the problem.
2018-03-29_13:59:12.11776 FATAL: no pg_hba.conf entry for host "123.123.123.123", user "pgbouncer", database "gitlabhq_production", SSL off
To fix the problem, add the IP address to /etc/gitlab/gitlab.rb
.
postgresql['trust_auth_cidr_addresses'] = %w(123.123.123.123/32 <other_cidrs>)
Reconfigure GitLab for the changes to take effect.
Вопрос:
Я пытаюсь выполнить python manage.py syncdb
в установке Django, но я продолжаю получать OperationalError: ERROR: pgbouncer cannot connect to server
. pgbouncer.log содержит строки, такие как:
2017-09-19 19:44:15.107 1128 LOG C-0x8a9930: mydb/myuser@unix:6432 closing because: pgbouncer cannot connect to server (age=0)
2017-09-19 19:44:15.107 1128 WARNING C-0x8a9930: mydb/myuser@unix:6432 Pooler Error: pgbouncer cannot connect to server
2017-09-19 19:44:15.107 1128 LOG S-0x8c72e0: mydb/myuser@35.154.149.188:5432 new connection to server
2017-09-19 19:44:15.107 1128 LOG C-0x8a9930: mydb/myuser@unix:6432 login failed: db=mydb user=myuser
2017-09-19 19:44:30.108 1128 LOG S-0x8c72e0: mydb/myuser@35.154.149.188:5432 closing because: connect failed (age=15)
В случае необходимости ps -aef | grep pgbouncer
ps -aef | grep pgbouncer
:
postgres 1128 1 0 18:38 ? 00:00:00 /usr/sbin/pgbouncer -d /etc/pgbouncer/pgbouncer.ini
myuser 1919 1533 0 19:45 pts/0 00:00:00 grep --color=auto pgbouncer
Кроме того, grep port/etc/pgbouncer/pgbouncer.ini
приводит к:
;; dbname= host= port= user= password=
mydb = host=xx.xxx.xxx.xxx port=5432 dbname=mydb
;forcedb = host=127.0.0.1 port=300 user=baz password=foo client_encoding=UNICODE datestyle=ISO connect_query='SELECT 1'
listen_port = 6432
Наконец, соответствующие разделы settings.py
содержат:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'NAME': 'mydb',
'USER': 'myuser',
'PASSWORD': 'mypassword',
'HOST': '/var/run/postgresql',
'PORT': '6432',
}
Я включил log_connections
для on
в postgresql.conf, перезапустил PG и попробовал еще раз. Здесь соответствующие строки:
2017-09-20 07:50:59 UTC LOG: database system is ready to accept connections
2017-09-20 07:50:59 UTC LOG: autovacuum launcher started
2017-09-20 07:51:00 UTC LOG: connection received: host=[local]
2017-09-20 07:51:00 UTC LOG: incomplete startup packet
2017-09-20 07:51:00 UTC LOG: connection received: host=[local]
2017-09-20 07:51:00 UTC LOG: connection authorized: user=postgres database=postgres
2017-09-20 07:51:01 UTC LOG: connection received: host=[local]
2017-09-20 07:51:01 UTC LOG: connection authorized: user=postgres database=postgres
2017-09-20 07:51:01 UTC LOG: connection received: host=[local]
2017-09-20 07:51:01 UTC LOG: connection authorized: user=postgres database=postgres
Кажется, что соединение проходит, но имя user
и database
– postgres
. Эти полномочия не являются тем, что я поставил в pgbouncer.ini
.
Однако явное добавление myuser
в строку соединения, описанную в pgbouncer.ini
приводит к:
2017-09-20 09:37:37 UTC FATAL: Peer authentication failed for user "myuser"
2017-09-20 09:37:37 UTC DETAIL: Connection matched pg_hba.conf line 90: "local all all peer"
Полностью тупик.
Лучший ответ:
Кажется, неправильная конфигурация исходит из этой строки в settings.py
:
'PORT': '6432',
Я прокомментировал это, и pgbouncer начал работать.
Хотя я не уверен, почему.
Может быть, в этом порту есть столкновение; Pgbouncer и PG сосуществуют на одном сервере в моем случае. Я установил их на разных виртуальных машинах в прошлом без заминки (и без комментариев 'PORT': '6432',
)
Given CockroachDB scales with vCPU, there’s a hard limit to how many active connections we can support per vCPU before a serious problem arises. PGBouncer stretches the limits a bit making it a cost-effective option. In serverless architectures, there is no client-side connection pooling, and using middleware like PGBouncer can alleviate the problem of connection storms. Please see my previous articles on the topic for more details.
Previous Articles
- Using PGBouncer with CockroachDB
- Using PGBouncer with Cockroach Cloud Free Tier
- Exploring PGBouncer auth_type(s) with CockroachDB
Motivation
We’ve covered how to deploy PGBouncer with a self-hosted CockroachDB cluster. Today, I’m going to demonstrate how to run PGBouncer along with the Cockroach Cloud free-forever tier database. The overall concepts are identical, but we will highlight some of the major differences in deploying PGBouncer with a cloud product.
High-Level Steps
- Create a free forever Cockroach Cloud free cluster
- Install and configure PGBouncer
- Verify
Step-By-Step Instructions
Start a Cluster
You can create a cluster using our latest directions. As soon as your cluster is created, you will be prompted with steps to connect. You have to download and install cockroach
binary and download a certificate to authenticate with the cluster. Also, make note of the password, as we don’t display it again and you will have to change the password from the SQL users screen if you lose it.
Before you proceed with the steps described, take a look at the third tab in the connection modal called Connection parameters. Make note of all of the properties, as we will need to use them in the same form they’re displayed; specifically, we need the database name as it is displayed. At the time of writing, the connection string shows options=--cluster%3Dartem-freetier-2924
, which is the unique identifier for the multi-tenant cluster we’re using. In the display modal for connection parameters, the database name is artem-freetier-2924.defaultdb
, which means the same thing, but it’s a form PGBouncer can understand. There is no meaningful way to pass the options=--cluster
argument to the pgbouncer.ini
that I know of. If you do, please respond in the comments, as I’d love to learn.
Install and Configure PGBouncer
I’m using a Mac, and PGBouncer is conveniently available via brew
.
brew install pgbouncer
Prepare a userlist.txt
File for Authentication Through PGBouncer
It takes the form of "username"
and "password"
separated by a space on each new line. You will take your CockroachDB username provided earlier along with the password and paste them into the file:
"yourfreetierusername" "yourfreetierpassword"
Prepare a pgbouncer.ini
File Using a Template
Fill out the template based on the information you were provided in the connection parameters.
[databases] * = host = free-tier.gcp-us-central1.cockroachlabs.cloud dbname = artem-freetier-2924.defaultdb user = artem port = 26257 [pgbouncer] listen_port = 6432 listen_addr = * auth_type = trust auth_file = userlist.txt logfile = pgbouncer.log pidfile = pgbouncer.pid admin_users = artem ignore_startup_parameters=extra_float_digits,options
Start PGBouncer
-d
will run in the background. When we make changes to the pgbouncer.ini
file, we can reload the configuration using -R
flag.
Initial command:
pgbouncer -d pgbouncer.ini
Reloading PGBouncer:
pgbouncer -d -R pgbouncer.ini
Check the Logs
tail -f pgbouncer.log
Connect to the Database via PGBouncer
cockroach sql --host=localhost --port=6432
ERROR: cannot load certificates. Check your certificate settings, set --certs-dir, or use --insecure for insecure clusters. problem using security settings: no certificates found; does certs dir exist? Failed running "sql"
Let’s try the suggestion passing --insecure
, as we don’t have any certificate except for the certificate authority provided to us.
cockroach sql --host=localhost --port=6432 --insecure
# To exit, type: q. # ERROR: password authentication not enabled in insecure mode Failed running "sql"
Ok, we’re getting somewhere. In case it is not obvious, let me explain. We are trying to authenticate using a username and password, but the cluster expects a certificate. We’re passing --insecure
to invalidate the request for TLS, but the way cockroach
binary works is in an insecure mode: we don’t even rely on password authentication and we break the connection.
What if we use psql
client instead?
psql -h localhost -p 6432 -U artem
psql: error: ERROR: pgbouncer cannot connect to server
This is still ambiguous. What do the PGBouncer logs say?
2021-08-11 10:21:14.408 EDT [96549] WARNING server login failed: FATAL server requires encryption
The keyword here is server requires encryption
. That’s the key part, as we are not talking about client. I’m highlighting this point, as it is important when you look at the TLS properties of pgbouncer.ini
file, specifically the difference between client_tls_sslmode
and server_tls_sslmode
.
To prove my point, let’s try another attempt to connect to the cluster disabling client_tls_sslmode
.
cockroach sql --url "postgresql://localhost:6432/yourfreetierclustername.defaultdb?sslmode=disable&user=yourfreetierusername&password=yourfreetierpassword"
ERROR: SSL authentication error while connecting. pq: pgbouncer cannot connect to server Failed running "sql"
The logs say:
2021-08-11 10:27:59.045 EDT [96549] WARNING server login failed: FATAL server requires encryption 2021-08-11 10:27:59.045 EDT [96549] LOG S-0x7ff811808210: artem-freetier-2924.defaultdb/artem@35.184.49.18:26257 closing because: login failed (age=0s)
I did try to invalidate the sslmode=disable
, but that’s not what the cluster is expecting. Also, recall that we were provided a root.crt
, and up to this point we have not used it. We have to make some additional changes to the pgbouncer.ini
file.
[databases] * = host = free-tier.gcp-us-central1.cockroachlabs.cloud dbname = artem-freetier-2924.defaultdb user = artem port = 26257 [pgbouncer] listen_port = 6432 listen_addr = * auth_type = trust auth_file = userlist.txt logfile = pgbouncer.log pidfile = pgbouncer.pid admin_users = artem ignore_startup_parameters=extra_float_digits,options # free tier provided cert server_tls_sslmode = verify-full server_tls_ca_file = /Users/artem/.postgresql/root.crt
Again, we’re focusing on the server part here, so let’s add server_tls_sslmode
and server_tls_ca_file
properties which we were provided by Cockroach Cloud.
server_tls_sslmode = verify-full server_tls_ca_file = /Users/artem/.postgresql/root.crt
The «sslmode» is straight out of the connection string and root.crt
is the cert you downloaded originally.
Let’s restart PGBouncer and see what changes. By the way, we have to restart PGBouncer when we’re making changes with TLS up to version 1.15.0 of PGBouncer. Given that PGBouncer 1.16.0 was just released, the key feature is hot reloading TLS settings without restarting. I’m patiently waiting for the new version to be available in brew
!
Let’s try to connect again:
cockroach sql --url "postgresql://localhost:6432/yourfreetierclustername.defaultdb?sslmode=disable&user=yourfreetierusername&password=yourfreetierpassword"
# # Enter ? for a brief introduction. # artem@localhost:6432/defaultdb>
Success! Let’s try the other methods to connect:
psql -h localhost -p 6432 -U artem -d artem-freetier-2924.defaultdb
psql (13.3, server 13.0.0) Type "help" for help. artem-freetier-2924.defaultdb=>
Again, success!
cockroach sql --host=localhost --port=6432 --insecure
#
# Welcome to the CockroachDB SQL shell.
# All statements must be terminated by a semicolon.
# To exit, type: q.
#
ERROR: password authentication not enabled in insecure mode
Failed running "sql"
Bummer!
The good news is we can connect to the Cockroach cluster using PGBouncer using cockroach
binary, --url
, and via psql
. We could stop here if we wanted. This state will do the trick and you can test scaling active connections with PGBouncer, but this is not representative of real-world use cases. There’s also the issue of authenticating to PGBouncer using trust
method, which is weak.
Considering PGBouncer is running on my laptop where my client resides, this is a risk worth taking for the benefit of convenience. However, say PGBouncer was not on the same client, and you had multiple clusters and users you need to manage. It is highly advisable to set up TLS between the client and PGBouncer. Reasons include:
- Man in the middle attacks are preventable.
- Configuration errors where clients are directed to the wrong PGBouncer instances, etc.
- Users with varying privileges are not intermixed.
- Specifically in our case, as the free tier today comes with password authentication, we no longer need to pass passwords in clear text to the console.
Obviously, if you have a mature infrastructure where the lift of additional TLS complexity is solved by service mesh architecture or something else entirely, you can ignore the next steps.
Setup TLS Between Client and PGBouncer
Let’s make this a little bit more interesting by enabling TLS between client and PGBouncer, as the connection between PGBouncer and the cluster is already encrypted after having done server_tls_ssmode
and server_tls_ca_file
.
We’re moving away from trust
in favor of cert
authentication. We have to create the certs to make it work, and you can follow these directions to do so.
Step 1: Create the CA key and certificate pair.
The ca.cnf
file is used as-is from the docs, but you can change it to fit your environment if necessary.
mkdir certs my-safe-directory openssl genrsa -out my-safe-directory/ca.key 2048 chmod 400 my-safe-directory/ca.key openssl req -new -x509 -config ca.cnf -key my-safe-directory/ca.key -out certs/ca.crt -days 365 -batch rm -f index.txt serial.txt touch index.txt echo '01' > serial.txt
Step 2: Create the certificate and key pairs for nodes.
My node.cnf
file looks like so:
# OpenSSL node configuration file [ req ] prompt=no distinguished_name = distinguished_name req_extensions = extensions [ distinguished_name ] organizationName = Example Inc [ extensions ] subjectAltName = critical,DNS:localhost,DNS:node,IP:0.0.0.0
We are going to connect to PGBouncer using localhost
, but we also require node
, so I’m including both in the SAN field.
openssl genrsa -out certs/node.key 2048 chmod 400 certs/node.key openssl req -new -config node.cnf -key certs/node.key -out node.csr -batch openssl ca -config ca.cnf -keyfile my-safe-directory/ca.key -cert certs/ca.crt -policy signing_policy -extensions signing_node_req -out certs/node.crt -outdir certs/ -in node.csr -batch openssl x509 -in certs/node.crt -text | grep "X509v3 Subject Alternative Name" -A 1
X509v3 Subject Alternative Name: critical DNS:localhost, DNS:node, IP Address:0.0.0.0
Step 3: Create the certificate and key pair for the first user.
My client.cnf
file:
[ req ] prompt=no distinguished_name = distinguished_name req_extensions = extensions [ distinguished_name ] organizationName = Cockroach commonName = artem [ extensions ] subjectAltName = DNS:root
openssl genrsa -out certs/client.artem.key 2048 chmod 400 certs/client.artem.key openssl req -new -config client.cnf -key certs/client.artem.key -out client.artem.csr -batch openssl ca -config ca.cnf -keyfile my-safe-directory/ca.key -cert certs/ca.crt -policy signing_policy -extensions signing_client_req -out certs/client.artem.crt -outdir certs/ -in client.artem.csr -batch openssl x509 -in certs/client.artem.crt -text | grep "CN ="
Issuer: O = Cockroach, CN = Cockroach CA Subject: O = Cockroach, CN = artem
Now, one quirk to note before we attempt to connect is the placement of node.crt
and client.username.crt
certificates in the same directory. I recommend creating a separate node-certs
directory, and moving node cert as well as the associated key there. Also, copy ca.crt
to the same directory for good measure.
pgbouncer_freetier mkdir node-certs mv certs/node.* node-certs cp certs/ca.crt node-certs
We are going to add the following additional properties to the pgbouncer.ini
file.
auth_type = cert client_tls_sslmode = verify-full # Path to file that contains trusted CA certs client_tls_key_file = ./certs/node.key client_tls_cert_file = ./certs/node.crt client_tls_ca_file = ./certs/ca.crt
My final pgbouncer.ini
file looks like so:
[databases] * = host = free-tier.gcp-us-central1.cockroachlabs.cloud dbname = artem-freetier-2924.defaultdb user = artem port = 26257 [pgbouncer] listen_port = 6432 listen_addr = * auth_type = cert auth_file = userlist.txt logfile = pgbouncer.log pidfile = pgbouncer.pid admin_users = artem ignore_startup_parameters=extra_float_digits,options client_tls_sslmode = verify-full # Path to file that contains trusted CA certs client_tls_key_file = ./node-certs/node.key client_tls_cert_file = ./node-certs/node.crt client_tls_ca_file = ./node-certs/ca.crt server_tls_sslmode = verify-full server_tls_ca_file = /Users/artem/.postgresql/root.crt
Restart PGBouncer.
Verify
cockroach sql --url "postgresql://localhost:6432/artem-freetier-2924.defaultdb?sslcert=./certs%2F/client.artem.crt&sslkey=./certs%2F/client.artem.key&sslmode=verify-full&sslrootcert=./certs%2Fca.crt"
# # Enter ? for a brief introduction. # artem@localhost:6432/defaultdb>
From the logs:
2021-08-11 13:08:00.794 EDT [99846] LOG C-0x7f9527008610: artem-freetier-2924.defaultdb/artem@[::1]:51138 login attempt: db=artem-freetier-2924.defaultdb user=artem tls=TLSv1.3/TLS_AES_256_GCM_SHA384
Notice, we’re using localhost
, port 6432
, sslmode=verify-full
, and ca as well as client certs we just created. This means PGBouncer and client will check the authenticity of the hostname as well as the authenticity of the client accessing PGBouncer.
For completeness, here’s the psql
version:
psql "host=localhost dbname=artem-freetier-2924.defaultdb port=6432 user=artem sslmode=verify-full sslcert=./certs/client.artem.crt sslkey=./certs/client.artem.key sslrootcert=./certs/ca.crt"
2021-08-11 13:15:01.161 EDT [99846] LOG C-0x7f9527008610: artem-freetier-2924.defaultdb/artem@[::1]:51253 login attempt: db=artem-freetier-2924.defaultdb user=artem tls=TLSv1.3/TLS_AES_256_GCM_SHA384
Wrap-Up
At this point, we have a working CockroachDB free tier cluster connected to PGBouncer and secured with TLS end to end. Happy pooling!