Error 1047 08s01 at line 1 wsrep has not yet prepared node for application use - Исправление ошибок и поиск оптимальных решений проблем

TWO-NODE CLUSTERS

In a two-node cluster, a single-node failure causes the other to stop working.

Situation

You have a cluster composed of only two nodes. One of the nodes leaves the cluster ungracefully. That is, instead of being shut down through init or systemd, it crashes or suffers a loss of network connectivity. The node that remains becomes nonoperational. It remains so until some additional information is provided by a third party, such as a human operator or another node.

If the node remained operational after the other left the cluster ungracefully, there would be the risk that each of the two nodes will think itself as being the Primary Component. To prevent this, the node becomes nonoperational.

Solutions

There are two solutions available to you:

You can bootstrap the surviving node to form a new Primary Component, using the pc.boostrap wsrep Provider option. To do so, log into the database client and run the following command:

SET GLOBAL wsrep_provider_options=’pc.bootstrap=YES’;

This bootstraps the surviving node as a new Primary Component. When the other node comes back online or regains network connectivity with this node, it will initiate a state transfer and catch up with this node.

In the event that you want the node to continue to operate, you can use the pc.ignore_sb wsrep Provider option. To do so, log into the database client and run the following command:

SET GLOBAL wsrep_provider_options=’pc.ignore_sb=TRUE’;

The node resumes processing updates and it will continue to do so, even in the event that it suspects a split-brain situation.

Note Warning: Enabling pc.ignore_sb is dangerous in a multi-master setup, due to the aforementioned risk for split-brain situations. However, it does simplify things in master-slave clusters, (especially in cases where you only use two nodes).

In addition to the solutions provided above, you can avoid the situation entirely using Galera Arbitrator. Galera Arbitrator functions as an odd node in quorum calculations. Meaning that, if you enable Galera Arbitrator on one node in a two-node cluster, that node remains the Primary Component, even if the other node fails or loses network connectivity.

http://galeracluster.com/documentation-webpages/twonode.html

Источник

I have 3 node MySQL Galera cluster in Production, Today when we hit large DDoS on our network cause some issue and assuming some kind of network separation caused following issue on node-1

ERROR 1047 (08S01): WSREP has not yet prepared node for application use

node-2 and node-3 is working fine. in this condition what option i have to fix node-1 only without creating any chaos?

I did google and people only giving example of 2 node cluster but in my case its 3 node and its functional at present so i don’t want to bootstrap again, what is the safe way to bring this node back into cluster?

This is the status of above node

MariaDB [(none)]> SHOW STATUS LIKE 'wsrep_%';
+------------------------------+-------------------------------------------------------+
| Variable_name                | Value                                                 |
+------------------------------+-------------------------------------------------------+
| wsrep_apply_oooe             | 0.130673                                              |
| wsrep_apply_oool             | 0.000010                                              |
| wsrep_apply_window           | 1.413593                                              |
| wsrep_causal_reads           | 0                                                     |
| wsrep_cert_deps_distance     | 95.591121                                             |
| wsrep_cert_index_size        | 454                                                   |
| wsrep_cert_interval          | 6.275534                                              |
| wsrep_cluster_conf_id        | 146                                                   |
| wsrep_cluster_size           | 3                                                     |
| wsrep_cluster_state_uuid     | f0d74456-9803-11e8-be49-6a9820bd29f8                  |
| wsrep_cluster_status         | Primary                                               |
| wsrep_commit_oooe            | 0.000000                                              |
| wsrep_commit_oool            | 0.000002                                              |
| wsrep_commit_window          | 1.267027                                              |
| wsrep_connected              | ON                                                    |
| wsrep_desync_count           | 0                                                     |
| wsrep_evs_delayed            |                                                       |
| wsrep_evs_evict_list         |                                                       |
| wsrep_evs_repl_latency       | 0/0/0/0/0                                             |
| wsrep_evs_state              | OPERATIONAL                                           |
| wsrep_flow_control_paused    | 0.059237                                              |
| wsrep_flow_control_paused_ns | 322079243575550                                       |
| wsrep_flow_control_recv      | 3556809                                               |
| wsrep_flow_control_sent      | 372                                                   |
| wsrep_gcomm_uuid             | cca74c82-8346-11eb-931e-b7df128f4e5d                  |
| wsrep_incoming_addresses     | 172.28.1.216:3306,172.28.1.236:3306,172.28.1.183:3306 |
| wsrep_last_committed         | 3947799887                                            |
| wsrep_local_bf_aborts        | 0                                                     |
| wsrep_local_cached_downto    | 18446744073709551615                                  |
| wsrep_local_cert_failures    | 0                                                     |
| wsrep_local_commits          | 0                                                     |
| wsrep_local_index            | 2                                                     |
| wsrep_local_recv_queue       | 2148340                                               |
| wsrep_local_recv_queue_avg   | 7779.915248                                           |
| wsrep_local_recv_queue_max   | 2148340                                               |
| wsrep_local_recv_queue_min   | 0                                                     |
| wsrep_local_replays          | 0                                                     |
| wsrep_local_send_queue       | 0                                                     |
| wsrep_local_send_queue_avg   | 0.000000                                              |
| wsrep_local_send_queue_max   | 1                                                     |
| wsrep_local_send_queue_min   | 0                                                     |
| wsrep_local_state            | 1                                                     |
| wsrep_local_state_comment    | Joining: receiving State Transfer                     |
| wsrep_local_state_uuid       | f0d74456-9803-11e8-be49-6a9820bd29f8                  |
| wsrep_protocol_version       | 7                                                     |
| wsrep_provider_name          | Galera                                                |
| wsrep_provider_vendor        | Codership Oy <info@codership.com>                     |
| wsrep_provider_version       | 25.3.22(r3764)                                        |
| wsrep_ready                  | OFF                                                   |
| wsrep_received               | 294474146                                             |
| wsrep_received_bytes         | 631142924318                                          |
| wsrep_repl_data_bytes        | 0                                                     |
| wsrep_repl_keys              | 0                                                     |
| wsrep_repl_keys_bytes        | 0                                                     |
| wsrep_repl_other_bytes       | 0                                                     |
| wsrep_replicated             | 0                                                     |
| wsrep_replicated_bytes       | 0                                                     |
| wsrep_thread_count           | 17                                                    |
+------------------------------+-------------------------------------------------------+
58 rows in set (0.00 sec)

Источник

Bootstrapping a Galera Cluster

Bootstrapping is the process of (re)starting a Galera cluster.

When to Bootstrap

Bootstrapping is only required when the cluster has lost quorum.

Quorum is lost when less than half of the nodes can communicate with each other (for longer than the configured grace period). In Galera terminology, if a node can communicate with the rest of the cluster, its DB is in a good state, and it reports itself as synced.

If quorum has not been lost, individual unhealthy nodes should automatically rejoin the cluster once repaired (error resolved, node restarted, or connectivity restored).

Symptoms of Lost Quorum

All nodes appear «Unhealthy» on the proxy dashboard.

All responsive nodes report the value of wsrep_cluster_status as non-Primary.

mysql> SHOW STATUS LIKE 'wsrep_cluster_status';
+----------------------+-------------+
| Variable_name        | Value       |
+----------------------+-------------+
| wsrep_cluster_status | non-Primary |
+----------------------+-------------+

All responsive nodes respond with ERROR 1047 when queried with most statement types.

mysql> select * from mysql.user;
ERROR 1047 (08S01) at line 1: WSREP has not yet prepared node for application use

See Cluster Behavior for more details about determining cluster state.

Auto-bootstrap errand

As part of cf-mysql-release v25, we provide an auto-bootstrap feature which runs as a BOSH errand. The errand evaluates if quorum has been lost on a cluster, and if so bootstraps the cluster. Before running the errand, one should ensure that there are no network partitions. Once network partitions have been resolved, the cluster is in a state where the errand can be run.

How to run

Run bosh run errand bootstrap from the terminal. When done, this should successfully bootstrap the cluster, and all jobs should report as running. Note that:

If the cluster was already healthy to begin with (i.e. quorum was never lost), the errand will error out saying bootstrap is not required.

If one or more nodes are not reachable (i.e. the VM exists but in an unknown state), it will error out saying Error: could not reach node. In this situation, follow the steps below:

bosh -n stop mysql_z1 && bosh -n stop mysql_z2 && bosh -n stop <arbitrator|mysql>_z3
bosh edit deployment
Set update.canaries to 0, update.max_in_flight to 3, and update.serial to false.
bosh deploy

Note, if you get a 503 error (like Sending stop request to monit: Request failed, response: Response{ StatusCode: 503, Status: '503 Service Unavailable' }), it means that monit is still trying to stop the vms. Please wait a few minutes and try this step again.

bosh -n start mysql_z1 ; bosh -n start mysql_z2 ; bosh -n start <arbitrator|mysql>_z3

This will throw several errors, but it ensures that all the jobs are present on the VM.

bosh instances to verify that all jobs report as failing.
Try running the errand again using bosh -n run errand bootstrap as above.

Once the errand succeeds, the cluster is synced, although some jobs might still report as failing.

bosh edit deployment
Set update.canaries to 1, update.max_in_flight to 1, and update.serial to true.
Verify that deployment succeeds and all jobs are healthy. A healthy deployment should look like this:

$ bosh vms cf-mysql'
Acting as user 'admin' on deployment 'cf-mysql' on 'Bosh Lite Director'
| mysql_z1/0           | running | mysql_z1           | 10.244.7.2   |
| mysql_z2/0           | running | mysql_z2           | 10.244.8.2   |
| arbitrator_z3/0      | running | arbitrator_z3      | 10.244.9.6   |
...

If these steps did not work for you, please refer to the Manual Bootstrap Process below.

How it works

The bootstrap errand simply automates the steps in the manual bootstrapping process documented below. It finds the node with the highest transaction sequence number, and asks it to start up by itself (i.e. in bootstrap mode), then asks the remaining nodes to join the cluster.

The sequence number of a stopped node can be retained by either reading the node’s state file under /var/vcap/store/mysql/grastate.dat, or by running a mysqld command with a WSREP flag, like mysqld --wsrep-recover.

Manual Bootstrap Process

The following steps are prone to user-error and can result in lost data if followed incorrectly.
Please follow the Auto-bootstrap instructions above first, and only resort to the manual process if the errand fails to repair the cluster.

SSH to each node in the cluster and, as root, shut down the mariadb process.

$ monit stop mariadb_ctrl

Re-bootstrapping the cluster will not be successful unless all other nodes have been shut down.

Choose a node to bootstrap.

Find the node with the highest transaction sequence number (seqno):
- If a node shutdown gracefully, the seqno should be in the galera state file.
```
$ cat /var/vcap/store/mysql/grastate.dat | grep 'seqno:'
```
- If the node crashed or was killed, the seqno in the galera state file should be -1. In this case, the seqno may be recoverable from the database. The following command will cause the database to start up, log the recovered sequence number, and then exit.
```
$ /var/vcap/packages/mariadb/bin/mysqld --wsrep-recover
```
  Scan the error log for the recovered sequence number (the last number after the group id (uuid) is the recovered seqno):
```
$ grep "Recovered position" /var/vcap/sys/log/mysql/mysql.err.log | tail -1
150225 18:09:42 mysqld_safe WSREP: Recovered position e93955c7-b797-11e4-9faa-9a6f0b73eb46:15
```
  Note: The galera state file will still say seqno: -1 afterward.
- If the node never connected to the cluster before crashing, it may not even have a group id (uuid in grastate.dat). In this case there’s nothing to recover. Unless all nodes crashed this way, don’t choose this node for bootstrapping.
Use the node with the highest seqno value as the new bootstrap node. If all nodes have the same seqno, you can choose any node as the new bootstrap node.

Important: Only perform these bootstrap commands on the node with the highest seqno. Otherwise the node with the highest seqno will be unable to join the new cluster (unless its data is abandoned). Its mariadb process will exit with an error. See cluster behavior for more details on intentionally abandoning data.

On the new bootstrap node, update state file and restart the mariadb process:

$ echo -n "NEEDS_BOOTSTRAP" > /var/vcap/store/mysql/state.txt
$ monit start mariadb_ctrl

You can check that the mariadb process has started successfully by running:

It can take up to 10 minutes for monit to start the mariadb process.

Once the bootstrapped node is running, start the mariadb process on the remaining nodes via monit.

$ monit start mariadb_ctrl

Verify that the new nodes have successfully joined the cluster. The following command should output the total number of nodes in the cluster:

mysql> SHOW STATUS LIKE 'wsrep_cluster_size';

Источник

While restoring a dump from a CloudSQL instance into a node of a Percona xtradb cluster, I am getting this error. The line 6955 in the .sql dump file is in bold below:

DROP TABLE IF EXISTS `next_balls`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `next_balls` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `match_id` int(11) NOT NULL DEFAULT '0',
  `inning_id` int(11) NOT NULL DEFAULT '0',
  `created_at` timestamp NULL DEFAULT NULL,
  `updated_at` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=176335 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Dumping data for table `next_balls`
--

****/*!40000 ALTER TABLE `next_balls` DISABLE KEYS */;****

The error came on restoring the table ‘next_balls’, this table was restored in the node, but data was not inserted, so I think the error comes due to the «ALTER TABLE next_balls..» statement. Any idea why this error pops up? The restoration stops when the error comes. Any help would be highly appreciated.

Edit 1: I had a 2 node percona cluster, it seems the error comes when one of the node is down or feels that the other is down, which results in a split brain situation and the cluster halts. Is this the case? Should I add another node in the cluster and do the restore? Or should I first do the restore in one node and then set up the cluster?

Источник

I am running MariaDB Gallera 10.0 in Ubuntu 14.04 Server with 3 node, While Data inserting happening in the server if I restart the mysql server in any of the 3 nodes I am Unable to bring back the server to the cluster.

Error:ERROR 1047 (08S01): WSREP has not yet prepared node for application use

I can start the mysql server, but WSREP is not showing READY

root@ip-172-31-3-97:~# service mysql start
 * Starting MariaDB database server mysqld                                                                                              [ OK ] 
 * Checking for corrupt, not cleanly closed and upgrade needing tables.
root@ip-172-31-3-97:~# ERROR 1047 (08S01) at line 1: WSREP has not yet prepared node for application use

MariaDB [(none)]> show status like "wsrep%";
+------------------------------+------------------------------------------------------+
| Variable_name                | Value                                                |
+------------------------------+------------------------------------------------------+
| wsrep_local_state_uuid       | 8b2b70a9-d646-11e5-a275-022c471b9ae2                 |
| wsrep_protocol_version       | 7                                                    |
| wsrep_last_committed         | 1499108                                              |
| wsrep_replicated             | 0                                                    |
| wsrep_replicated_bytes       | 0                                                    |
| wsrep_repl_keys              | 0                                                    |
| wsrep_repl_keys_bytes        | 0                                                    |
| wsrep_repl_data_bytes        | 0                                                    |
| wsrep_repl_other_bytes       | 0                                                    |
| wsrep_received               | 1                                                    |
| wsrep_received_bytes         | 276                                                  |
| wsrep_local_commits          | 0                                                    |
| wsrep_local_cert_failures    | 0                                                    |
| wsrep_local_replays          | 0                                                    |
| wsrep_local_send_queue       | 0                                                    |
| wsrep_local_send_queue_max   | 1                                                    |
| wsrep_local_send_queue_min   | 0                                                    |
| wsrep_local_send_queue_avg   | 0.000000                                             |
| wsrep_local_recv_queue       | 0                                                    |
| wsrep_local_recv_queue_max   | 1                                                    |
| wsrep_local_recv_queue_min   | 0                                                    |
| wsrep_local_recv_queue_avg   | 0.000000                                             |
| wsrep_local_cached_downto    | 18446744073709551615                                 |
| wsrep_flow_control_paused_ns | 0                                                    |
| wsrep_flow_control_paused    | 0.000000                                             |
| wsrep_flow_control_sent      | 0                                                    |
| wsrep_flow_control_recv      | 0                                                    |
| wsrep_cert_deps_distance     | 0.000000                                             |
| wsrep_apply_oooe             | 0.000000                                             |
| wsrep_apply_oool             | 0.000000                                             |
| wsrep_apply_window           | 0.000000                                             |
| wsrep_commit_oooe            | 0.000000                                             |
| wsrep_commit_oool            | 0.000000                                             |
| wsrep_commit_window          | 0.000000                                             |
| wsrep_local_state            | 1                                                    |
| wsrep_local_state_comment    | Joining: receiving State Transfer                    |
| wsrep_cert_index_size        | 0                                                    |
| wsrep_causal_reads           | 0                                                    |
| wsrep_cert_interval          | 0.000000                                             |
| wsrep_incoming_addresses     | 172.31.3.97:3306,172.31.28.172:3306,172.31.3.98:3306 |
| wsrep_evs_delayed            |                                                      |
| wsrep_evs_evict_list         |                                                      |
| wsrep_evs_repl_latency       | 0/0/0/0/0                                            |
| wsrep_evs_state              | OPERATIONAL                                          |
| wsrep_gcomm_uuid             | aa655de3-d6fc-11e5-8326-02f5d924b426                 |
| wsrep_cluster_conf_id        | 5                                                    |
| wsrep_cluster_size           | 3                                                    |
| wsrep_cluster_state_uuid     | 8b2b70a9-d646-11e5-a275-022c471b9ae2                 |
| wsrep_cluster_status         | Primary                                              |
| wsrep_connected              | ON                                                   |
| wsrep_local_bf_aborts        | 0                                                    |
| wsrep_local_index            | 0                                                    |
| wsrep_provider_name          | Galera                                               |
| wsrep_provider_vendor        | Codership Oy <info@codership.com>                    |
| wsrep_provider_version       | 3.9(rXXXX)                                           |
| wsrep_ready                  | OFF                                                  |
| wsrep_thread_count           | 2                                                    |
+------------------------------+------------------------------------------------------+
57 rows in set (0.00 sec)

MariaDB [(none)]>

Following is my cluster.cnf

root@ip-172-31-3-97:~# cat /etc/mysql/conf.d/cluster.cnf 
[mysqld]
query_cache_size=0
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
query_cache_type=0
#bind-address=0.0.0.0

# Galera Provider Configuration
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_provider_options="gcache.size=2G"

# Galera Cluster Configuration
wsrep_cluster_name="Auth_User Cluster"
wsrep_cluster_address="gcomm://172.31.28.172"

# Galera Synchronization Congifuration
wsrep_sst_method=rsync
#wsrep_sst_auth=user:pass

# Galera Node Configuration
wsrep_node_address="172.31.3.97"
wsrep_node_name=Maria2"

log from the 172.31.28.172 server.

Feb 19 11:39:42 ip-172-31-28-172 mysqld: 160219 11:39:42 [Note] WSREP: (bac5684c, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://172.31.3.97:4567 
Feb 19 11:39:44 ip-172-31-28-172 mysqld: 160219 11:39:44 [Note] WSREP: (bac5684c, 'tcp://0.0.0.0:4567') reconnecting to aa655de3 (tcp://172.31.3.97:4567), attempt 0
Feb 19 11:39:47 ip-172-31-28-172 mysqld: 160219 11:39:47 [Note] WSREP: evs::proto(bac5684c, GATHER, view_id(REG,aa655de3,5)) suspecting node: aa655de3
Feb 19 11:39:47 ip-172-31-28-172 mysqld: 160219 11:39:47 [Note] WSREP: evs::proto(bac5684c, GATHER, view_id(REG,aa655de3,5)) suspected node without join message, declaring inactive
Feb 19 11:39:47 ip-172-31-28-172 mysqld: 160219 11:39:47 [Note] WSREP: declaring cc5d820a at tcp://172.31.3.98:4567 stable
Feb 19 11:39:47 ip-172-31-28-172 mysqld: 160219 11:39:47 [Note] WSREP: Node bac5684c state prim
Feb 19 11:39:47 ip-172-31-28-172 mysqld: 160219 11:39:47 [Note] WSREP: view(view_id(PRIM,bac5684c,6) memb {
Feb 19 11:39:47 ip-172-31-28-172 mysqld: #011bac5684c,0
Feb 19 11:39:47 ip-172-31-28-172 mysqld: #011cc5d820a,0
Feb 19 11:39:47 ip-172-31-28-172 mysqld: } joined {
Feb 19 11:39:47 ip-172-31-28-172 mysqld: } left {
Feb 19 11:39:47 ip-172-31-28-172 mysqld: } partitioned {
Feb 19 11:39:47 ip-172-31-28-172 mysqld: #011aa655de3,0
Feb 19 11:39:47 ip-172-31-28-172 mysqld: })
Feb 19 11:39:47 ip-172-31-28-172 mysqld: 160219 11:39:47 [Note] WSREP: save pc into disk
Feb 19 11:39:47 ip-172-31-28-172 mysqld: 160219 11:39:47 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
Feb 19 11:39:47 ip-172-31-28-172 mysqld: 160219 11:39:47 [Note] WSREP: forgetting aa655de3 (tcp://172.31.3.97:4567)
Feb 19 11:39:47 ip-172-31-28-172 mysqld: 160219 11:39:47 [Note] WSREP: (bac5684c, 'tcp://0.0.0.0:4567') turning message relay requesting off
Feb 19 11:39:47 ip-172-31-28-172 mysqld: 160219 11:39:47 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 7ac8b609-d6fd-11e5-9a9c-96322f9cb198
Feb 19 11:39:47 ip-172-31-28-172 mysqld: 160219 11:39:47 [Note] WSREP: STATE EXCHANGE: sent state msg: 7ac8b609-d6fd-11e5-9a9c-96322f9cb198
Feb 19 11:39:47 ip-172-31-28-172 mysqld: 160219 11:39:47 [Note] WSREP: STATE EXCHANGE: got state msg: 7ac8b609-d6fd-11e5-9a9c-96322f9cb198 from 0 (Maria1")
Feb 19 11:39:47 ip-172-31-28-172 mysqld: 160219 11:39:47 [Note] WSREP: STATE EXCHANGE: got state msg: 7ac8b609-d6fd-11e5-9a9c-96322f9cb198 from 1 (Maria3")
Feb 19 11:39:47 ip-172-31-28-172 mysqld: 160219 11:39:47 [Note] WSREP: Quorum results:
Feb 19 11:39:47 ip-172-31-28-172 mysqld: #011version    = 3,
Feb 19 11:39:47 ip-172-31-28-172 mysqld: #011component  = PRIMARY,
Feb 19 11:39:47 ip-172-31-28-172 mysqld: #011conf_id    = 5,
Feb 19 11:39:47 ip-172-31-28-172 mysqld: #011members    = 2/2 (joined/total),
Feb 19 11:39:47 ip-172-31-28-172 mysqld: #011act_id     = 1499548,
Feb 19 11:39:47 ip-172-31-28-172 mysqld: #011last_appl. = 1499416,
Feb 19 11:39:47 ip-172-31-28-172 mysqld: #011protocols  = 0/7/3 (gcs/repl/appl),
Feb 19 11:39:47 ip-172-31-28-172 mysqld: #011group UUID = 8b2b70a9-d646-11e5-a275-022c471b9ae2
Feb 19 11:39:47 ip-172-31-28-172 mysqld: 160219 11:39:47 [Note] WSREP: Flow-control interval: [23, 23]
Feb 19 11:39:47 ip-172-31-28-172 mysqld: 160219 11:39:47 [Note] WSREP: New cluster view: global state: 8b2b70a9-d646-11e5-a275-022c471b9ae2:1499548, view# 6: Primary, number of nodes: 2, my index: 0, protocol version 3
Feb 19 11:39:47 ip-172-31-28-172 mysqld: 160219 11:39:47 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Feb 19 11:39:47 ip-172-31-28-172 mysqld: 160219 11:39:47 [Note] WSREP: REPL Protocols: 7 (3, 2)
Feb 19 11:39:47 ip-172-31-28-172 mysqld: 160219 11:39:47 [Note] WSREP: Service thread queue flushed.
Feb 19 11:39:47 ip-172-31-28-172 mysqld: 160219 11:39:47 [Note] WSREP: Assign initial position for certification: 1499548, protocol version: 3
Feb 19 11:39:47 ip-172-31-28-172 mysqld: 160219 11:39:47 [Note] WSREP: Service thread queue flushed.
Feb 19 11:39:50 ip-172-31-28-172 mysqld: 160219 11:39:50 [Note] WSREP:  cleaning up aa655de3 (tcp://172.31.3.97:4567)
Feb 19 11:39:52 ip-172-31-28-172 mysqld: 160219 11:39:52 [Note] WSREP: (bac5684c, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 

Feb 19 11:39:52 ip-172-31-28-172 mysqld: 160219 11:39:52 [Note] WSREP: declaring aa655de3 at tcp://172.31.3.97:4567 stable
Feb 19 11:39:52 ip-172-31-28-172 mysqld: 160219 11:39:52 [Note] WSREP: declaring cc5d820a at tcp://172.31.3.98:4567 stable
Feb 19 11:39:52 ip-172-31-28-172 mysqld: 160219 11:39:52 [Note] WSREP: Node bac5684c state prim
Feb 19 11:39:52 ip-172-31-28-172 mysqld: 160219 11:39:52 [Note] WSREP: view(view_id(PRIM,aa655de3,7) memb {
Feb 19 11:39:52 ip-172-31-28-172 mysqld: #011aa655de3,0
Feb 19 11:39:52 ip-172-31-28-172 mysqld: #011bac5684c,0
Feb 19 11:39:52 ip-172-31-28-172 mysqld: #011cc5d820a,0
Feb 19 11:39:52 ip-172-31-28-172 mysqld: } joined {
Feb 19 11:39:52 ip-172-31-28-172 mysqld: } left {
Feb 19 11:39:52 ip-172-31-28-172 mysqld: } partitioned {
Feb 19 11:39:52 ip-172-31-28-172 mysqld: })
Feb 19 11:39:52 ip-172-31-28-172 mysqld: 160219 11:39:52 [Note] WSREP: save pc into disk
Feb 19 11:39:52 ip-172-31-28-172 mysqld: 160219 11:39:52 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 3
Feb 19 11:39:52 ip-172-31-28-172 mysqld: 160219 11:39:52 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
Feb 19 11:39:53 ip-172-31-28-172 mysqld: 160219 11:39:53 [Note] WSREP: STATE EXCHANGE: sent state msg: 7dd70ac6-d6fd-11e5-b460-6e9ef4e0c8c7
Feb 19 11:39:53 ip-172-31-28-172 mysqld: 160219 11:39:53 [Note] WSREP: STATE EXCHANGE: got state msg: 7dd70ac6-d6fd-11e5-b460-6e9ef4e0c8c7 from 0 (Maria2")
Feb 19 11:39:53 ip-172-31-28-172 mysqld: 160219 11:39:53 [Note] WSREP: STATE EXCHANGE: got state msg: 7dd70ac6-d6fd-11e5-b460-6e9ef4e0c8c7 from 1 (Maria1")
Feb 19 11:39:53 ip-172-31-28-172 mysqld: 160219 11:39:53 [Note] WSREP: STATE EXCHANGE: got state msg: 7dd70ac6-d6fd-11e5-b460-6e9ef4e0c8c7 from 2 (Maria3")
Feb 19 11:39:53 ip-172-31-28-172 mysqld: 160219 11:39:53 [Note] WSREP: Quorum results:
Feb 19 11:39:53 ip-172-31-28-172 mysqld: #011version    = 3,
Feb 19 11:39:53 ip-172-31-28-172 mysqld: #011component  = PRIMARY,
Feb 19 11:39:53 ip-172-31-28-172 mysqld: #011conf_id    = 6,
Feb 19 11:39:53 ip-172-31-28-172 mysqld: #011members    = 2/3 (joined/total),
Feb 19 11:39:53 ip-172-31-28-172 mysqld: #011act_id     = 1499548,
Feb 19 11:39:53 ip-172-31-28-172 mysqld: #011last_appl. = 1499416,
Feb 19 11:39:53 ip-172-31-28-172 mysqld: #011protocols  = 0/7/3 (gcs/repl/appl),
Feb 19 11:39:53 ip-172-31-28-172 mysqld: #011group UUID = 8b2b70a9-d646-11e5-a275-022c471b9ae2
Feb 19 11:39:53 ip-172-31-28-172 mysqld: 160219 11:39:53 [Note] WSREP: Flow-control interval: [28, 28]
Feb 19 11:39:53 ip-172-31-28-172 mysqld: 160219 11:39:53 [Note] WSREP: New cluster view: global state: 8b2b70a9-d646-11e5-a275-022c471b9ae2:1499548, view# 7: Primary, number of nodes: 3, my index: 1, protocol version 3
Feb 19 11:39:53 ip-172-31-28-172 mysqld: 160219 11:39:53 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Feb 19 11:39:53 ip-172-31-28-172 mysqld: 160219 11:39:53 [Note] WSREP: REPL Protocols: 7 (3, 2)
Feb 19 11:39:53 ip-172-31-28-172 mysqld: 160219 11:39:53 [Note] WSREP: Service thread queue flushed.
Feb 19 11:39:53 ip-172-31-28-172 mysqld: 160219 11:39:53 [Note] WSREP: Assign initial position for certification: 1499548, protocol version: 3
Feb 19 11:39:53 ip-172-31-28-172 mysqld: 160219 11:39:53 [Note] WSREP: Service thread queue flushed.

Feb 19 11:39:53 ip-172-31-28-172 mysqld: 160219 11:39:53 [Note] WSREP: Member 0.0 (Maria2") requested state transfer from '*any*'. Selected 2.0 (Maria3")(SYNCED) as donor.

Feb 19 11:39:53 ip-172-31-28-172 mysqld: 160219 11:39:53 [Note] WSREP: 2.0 (Maria3"): State transfer to 0.0 (Maria2") complete.
Feb 19 11:39:53 ip-172-31-28-172 mysqld: 160219 11:39:53 [Note] WSREP: Member 2.0 (Maria3") synced with group.

Feb 19 11:39:56 ip-172-31-28-172 mysqld: 160219 11:39:56 [Note] WSREP: (bac5684c, 'tcp://0.0.0.0:4567') turning message relay requesting off

Please provide a solution for this.

Источник

Description

Omri Hochman

2017-08-21 14:56:41 UTC

[OSP12] overcloud deployment fails on Bare-metal due to "ERROR 1047 (08S01) at line 1: WSREP has not yet prepared node for application use" 


Environment (puddle is 2017-08-18.3, images TAG=2017-08-18.2)
--------------------------------------------------------------
openstack-tripleo-heat-templates-7.0.0-0.20170805163048.el7ost.noarch


steps :
-------
attempt deployment on BM :  3 controler , 1 compute  
 

results :
-----------
-  overcloud deployment fails:  

| AllNodesDeploySteps                          | e79423e8-dba6-4c0e-b350-0ed7869fd225                                                                                                                                                 | OS::TripleO::PostDeploySteps                                                                                                   | CREATE_FAILED   | 2017-06-13T16:10:47Z | overcloud                                                                                                                                                |
| ControllerDeployment_Step1                   | aa764d47-2a8b-4377-ad9f-4bfe7428527a                                                                                                                                                 | OS::Heat::StructuredDeploymentGroup                                                                                            | CREATE_FAILED   | 2017-06-13T16:38:43Z | overcloud-AllNodesDeploySteps-3rqma3wcyrum                                                                                                               |
| 2                                            | 7c4c33eb-b4db-466f-b2b8-ac222f0e861a                                                                                                                                                 | OS::Heat::StructuredDeployment                                                                                                 | CREATE_FAILED   | 2017-06-13T16:41:48Z | overcloud-AllNodesDeploySteps-3rqma3wcyrum-ControllerDeployment_Step1-r3rbji33anuj                                                                       |

--------------
failure view : 
--------------
2:36:00 2017-08-09 21:23:44Z [overcloud.AllNodesDeploySteps.ControllerDeployment_Step1]: CREATE_FAILED  Resource CREATE failed: Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
02:36:00 2017-08-09 21:23:45Z [overcloud.AllNodesDeploySteps.ControllerDeployment_Step1]: CREATE_FAILED  Error: resources.ControllerDeployment_Step1.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
02:36:00 2017-08-09 21:23:45Z [overcloud.AllNodesDeploySteps]: CREATE_FAILED  Resource CREATE failed: Error: resources.ControllerDeployment_Step1.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
02:36:00 2017-08-09 21:23:46Z [overcloud.AllNodesDeploySteps]: CREATE_FAILED  Error: resources.AllNodesDeploySteps.resources.ControllerDeployment_Step1.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
02:36:00 2017-08-09 21:23:46Z [overcloud]: CREATE_FAILED  Resource CREATE failed: Error: resources.AllNodesDeploySteps.resources.ControllerDeployment_Step1.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
02:36:00 
02:36:00  Stack overcloud CREATE_FAILED 
02:36:00 
02:36:00 overcloud.AllNodesDeploySteps.ControllerDeployment_Step1.1:
02:36:00   resource_type: OS::Heat::StructuredDeployment
02:36:00   physical_resource_id: d8769363-2a8f-4cab-a78a-77ea3e92dd85
02:36:00   status: CREATE_FAILED
02:36:00   status_reason: |
02:36:00     Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
02:36:00   deploy_stdout: |
02:36:00     ...
02:36:00         ], 
02:36:00         "changed": false, 
02:36:00         "failed": true, 
02:36:00         "failed_when_result": true
02:36:00     }
02:36:00     	to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/1f2a23d0-5e97-4e7a-9a3a-4fdaf0ba918d_playbook.retry
02:36:00     
02:36:00     PLAY RECAP *********************************************************************
02:36:00     localhost                  : ok=8    changed=1    unreachable=0    failed=1   
02:36:00     
02:36:00     (truncated, view all with --long)
02:36:00   deploy_stderr: |
02:36:00 
02:36:00 overcloud.AllNodesDeploySteps.ControllerDeployment_Step1.0:
02:36:00   resource_type: OS::Heat::StructuredDeployment
02:36:00   physical_resource_id: de764d1f-1256-4446-ad8c-f5e0b2a62a7a
02:36:00   status: CREATE_FAILED
02:36:00   status_reason: |
02:36:00     Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
02:36:00   deploy_stdout: |
02:36:00     ...
02:36:00         ], 
02:36:00         "changed": false, 
02:36:00         "failed": true, 
02:36:00         "failed_when_result": true
02:36:00     }
02:36:00     	to retry, use: --limit @/var/lib/heaHeat Stack create failed.
02:36:00 Heat Stack create failed.
02:36:00 t-config/heat-config-ansible/34fd2f6f-de26-4db7-993b-f1b9765ac2fa_playbook.retry
02:36:00     
02:36:00     PLAY RECAP *********************************************************************
02:36:00     localhost                  : ok=8    changed=1    unreachable=0    failed=1   
02:36:00     
02:36:00     (truncated, view all with --long)
02:36:00   deploy_stderr: |
02:36:00 
02:36:00 Build step 'Virtualenv Builder' marked build as failure
02:36:01 Build step 'Groovy Postbuild' marked build as failure
02:36:01 Build step 'Groovy Postbuild' marked build as failure
02:36:01 [BFA] Scanning build for known causes...
02:36:02 .[BFA] Found failure cause(s):
02:36:02 [BFA] no-such-file-or-directory
02:36:02 [BFA] command not found from category shell
02:36:02 [BFA] Done. 1s
02:36:02 Started calculate disk usage of build
02:36:02 Finished Calculation of disk usage of build in 0 seconds
02:36:02 Started calculate disk usage of workspace
02:36:02 Finished Calculation of disk usage of workspace in 0 seconds
02:36:03 Finished: FAILURE




(undercloud) [stack@undercloud ~]$ heat deployment-show 7c4c33eb-b4db-466f-b2b8-ac222f0e861a | less
WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead
{
  "status": "FAILED",
  "server_id": "10d956cc-9468-426c-b309-d078e081be90",
  "config_id": "8a882786-c197-4b73-b981-ea6339ddcdc3",
  "output_values": {
    "deploy_stdout": "nPLAY [localhost] ***************************************************************nnTASK [Gathering Facts] *********************************************************nok: [localhost]nnTASK [Write the config_step hieradata] *****************************************nchanged: [localhost]nnTASK [Run puppet host configuration for step 1] ********************************nok: [localhost]nnTASK [debug] *******************************************************************nok: [localhost] => {n    "(outputs.stderr|default('')).split('\n')|union(outputs.stdout_lines|default([]))": [n        "exception: connect failed", n        "Warning: Undefined variable 'deploy_config_name'; ", n        "   (file & line not available)", n        "Warning: This method is deprecated, please use the stdlib validate_legacy function, with Stdlib::Compat::Bool. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 54]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]", n        "   (at /etc/puppet/modules/stdlib/lib/puppet/functions/deprecation.rb:25:in `deprecation')", n        "Warning: This method is deprecated, please use the stdlib validate_legacy function, with Stdlib::Compat::Absolute_Path. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 55]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]", n        "Warning: This method is deprecated, please use the stdlib validate_legacy function, with Stdlib::Compat::String. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 56]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]", n        "Warning: This method is deprecated, please use the stdlib validate_legacy function, with Stdlib::Compat::Array. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 66]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]", n        "Warning: This method is deprecated, please use the stdlib validate_legacy function, with Pattern[]. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 68]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]", n        "Warning: This method is deprecated, please use the stdlib validate_legacy function, with Stdlib::Compat::Numeric. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 89]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]", n        "Warning: ModuleLoader: module 'ssh' has unresolved dependencies - it will only see those that are resolved. Use 'puppet module list --tree' to see information about modules", n        "Warning: This method is deprecated, please use the stdlib validate_legacy function, with Stdlib::Compat::Hash. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ssh/manifests/server.pp\", 12]:[\"/var/lib/tripleo-config/puppet_step_config.pp\", 31]", n        "Warning: ModuleLoader: module 'timezone' has unresolved dependencies - it will only see those that are resolved. Use 'puppet module list --tree' to see information about modules", n        "Notice: hiera(): Cannot load backend module_data: cannot load such file -- hiera/backend/module_data_backend", n        "Notice: Scope(Class[Tripleo::Firewall::Post]): At this stage, all network traffic is blocked.", n        "Notice: Compiled catalog for overcloud-controller-2.localdomain in environment production in 4.78 seconds", n        "Notice: /Stage[main]/Tripleo::Profile::Base::Docker/Augeas[docker-sysconfig-options]/returns: executed successfully", n        "Notice: /Stage[main]/Tripleo::Profile::Base::Docker/Augeas[docker-sysconfig-registry]/returns: executed successfully", n



--------------------------------------------------------------------

connecting to controller-2 :
---------------------------


[heat-admin@overcloud-controller-2 ~]$ sudo journalctl -u os-collect-config | grep ERROR
Aug 21 01:56:47 overcloud-controller-2 os-collect-config[3368]: [2017-08-20 21:56:47,258] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None
Aug 21 01:57:11 overcloud-controller-2 os-collect-config[3368]: [2017-08-20 21:57:11,117] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None
Aug 21 01:57:32 overcloud-controller-2 os-collect-config[3368]: [2017-08-20 21:57:32,312] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None
Aug 21 01:58:08 overcloud-controller-2 os-collect-config[3368]: [2017-08-20 21:58:08,319] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None
Aug 21 01:58:51 overcloud-controller-2 os-collect-config[3368]: [2017-08-20 21:58:51,792] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None
Aug 21 02:01:12 overcloud-controller-2 os-collect-config[3368]: [2017-08-20 22:01:12,922] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None
Aug 21 02:02:30 overcloud-controller-2 os-collect-config[3368]: [2017-08-20 22:02:30,453] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None
Aug 21 02:03:18 overcloud-controller-2 os-collect-config[3368]: [2017-08-20 22:03:18,549] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None
Aug 21 02:04:32 overcloud-controller-2 os-collect-config[3368]: [2017-08-20 22:04:32,321] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None
Aug 21 02:06:21 overcloud-controller-2 os-collect-config[3368]: [2017-08-20 22:06:21,841] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None
Aug 21 02:16:11 overcloud-controller-2 os-collect-config[3368]: , n        "OK, successfully used password, moving on...\r", n        "Setting the root password ensures that nobody can log into the MariaDB\r", n        "root user without the proper authorisation.\r", n        "Set root password? [Y/n] y\r", n        "New password: \r", n        "Re-enter new password: \r", n        "ERROR 1047 (08S01) at line 1: WSREP has not yet prepared node for application use\r", n        "Password update failed!\r", n        "Cleaning up...\r", n        "Failed to get 'Remove anonymous users?' prompt", n        "stderr: INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json", n        "INFO:__main__:Validating config file", n        "INFO:__main__:Kolla config strategy set to: COPY_ALWAYS", n        "INFO:__main__:Copying service configuration files", n        "INFO:__main__:Copying /dev/null to /etc/libqb/force-filesystem-sockets", n        "INFO:__main__:Setting permission for /etc/libqb/force-filesystem-sockets", n        "INFO:__main__:Deleting /etc/my.cnf.d/galera.cnf", n        "INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/my.cnf.d/galera.cnf to /etc/my.cnf.d/galera.cnf", n        "INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/sysconfig/clustercheck to /etc/sysconfig/clustercheck", n        "INFO:__main__:Copying /var/lib/kolla/config_files/src/root/.my.cnf to /root/.my.cnf", n        "INFO:__main__:Writing out command to execute", n        "170821  2:15:59 [Warning] option 'open_files_limit': unsigned value 18446744073709551615 adjusted to 4294967295", n        "170821  2:16:01 [Warning] option 'open_files_limit': unsigned value 18446744073709551615 adjusted to 4294967295", n        "stdout: 1aadbb2545d1701665c7a1516d5dd60469779a4b67e5b4473038ffd3c3a7de48"n    ], n    "changed": false, n    "failed": true, n    "failed_when_result": truen}ntto retry, use: --limit @/var/lib/heat-config/heat-config-ansible/8a882786-c197-4b73-b981-ea6339ddcdc3_p
Aug 21 02:16:12 overcloud-controller-2 os-collect-config[3368]: "ERROR 1047 (08S01) at line 1: WSREP has not yet prepared node for application user",
Aug 21 02:16:12 overcloud-controller-2 os-collect-config[3368]: [2017-08-21 02:16:11,806] (heat-config) [ERROR] Error running /var/lib/heat-config/heat-config-ansible/8a882786-c197-4b73-b981-ea6339ddcdc3_playbook.yaml. [2]
Aug 21 02:16:13 overcloud-controller-2 os-collect-config[3368]: [2017-08-21 02:16:13,816] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None

Comment 4

Jon Schlueter

2017-09-13 22:25:18 UTC

above mentioned patch 489277 is in downstream build openstack-tripleo-heat-templates-7.0.0-0.20170901051303.0rc1.el7ost

Comment 5

Omri Hochman

2017-09-13 22:28:42 UTC

looks like the issue reproduced with:
 openstack-tripleo-heat-templates-7.0.0-0.20170901051303.0rc1.el7ost.noarch


  "In order to log into MariaDB to secure it, we'll need the current\r", n        "password for the root user.  If you've just installed MariaDB, and\r", n        "you haven't set the root password yet, the password will be blank,\r", n        "so you should just press enter here.\r", n        "Enter current password for root (enter for none): \r", n        "OK, successfully used password, moving on...\r", n        "Setting the root password ensures that nobody can log into the MariaDB\r", n        "root user without the proper authorisation.\r", n        "Set root password? [Y/n] y\r", n        "New password: \r", n        "Re-enter new password: \r", n        "ERROR 1047 (08S01) at line 1: WSREP has not yet prepared node for application use\r", n        "Password update failed!\r", n        "Cleaning up...\r", n        "Failed to get 'Remove anonymous users?' prompt", n        "stderr: + '[' -e /var/lib/mysql/mysql ']'", n        "+ kolla_start", n        "INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json", n        "INFO:__main__:Validating config file", n        "INFO:__main__:Kolla config strategy set to: COPY_ALWAYS", n        "INFO:__main__:Copying service configuration files", n        "INFO:__main__:Copying /dev/null to /etc/libqb/force-filesystem-sockets", n        "INFO:__main__:Setting permission for /etc/libqb/force-filesystem-sockets", n        "INFO:__main__:Deleting /etc/my.cnf.d/galera.cnf", n        "INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/my.cnf.d/galera.cnf to /etc/my.cnf.d/galera.cnf", n        "INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/sysconfig/clustercheck to /etc/sysconfig/clustercheck", n        "INFO:__main__:Copying /var/lib/kolla/config_files/src/root/.my.cnf to /root/.my.cnf", n        "INFO:__main__:Writing out command to execute", n        "170913 17:51:25 [Warning] option 'open_files_limit': unsigned value 18446744073709551615 adjusted to 4294967295", n        "170913 17:51:27 [Warning] option 'open_files_limit': unsigned value 18446744073709551615 adjusted to 4294967295", n        "stdout: 386224818d266b6f82532fad1ac3fe0ea45ecd04c1db754cb5f80dd8a7474a19"n    ], n    "changed": false, n    "failed": true, n    "failed_when_result": truen}ntto retry, use: --limit @/var/lib/heat-config/heat-config-ansible/2de16e6c-e601-4c18-bd1b-ce5a267accf8_playbook.retrynnPLAY RECAP *********************************************************************nlocalhost                  : ok=8    changed=1    unreachable=0    failed=1   nn",

Comment 7

Michele Baldessari

2017-09-14 05:55:16 UTC

controller-2 failed in mysql_bootstrap:
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "Error running ['docker', 'run', '--name', 'mysql_bootstrap', '--label', 'config_id=tripleo_step1', '--label', 'container_name=mysql_bootstrap', '--label', 'managed_by=paunch', '--label', 'config_data={"environment": ["KOLLA_CONFIG_STRATEGY=COPY_ALWAYS", "KOLLA_BOOTSTRAP=True", "KOLLA_KUBERNETES=True", "DB_MAX_TIMEOUT=60", "DB_CLUSTERCHECK_PASSWORD=wNZ6dj6DHpQg38Mn3wH4Mv9sj", "DB_ROOT_PASSWORD=kZlnL8BpMl"], "start_order": 1, "command": ["bash", "-ecx", "if [ -e /var/lib/mysql/mysql ]; then exit 0; fi\\nkolla_start\\nmysqld_safe --skip-networking --wsrep-on=OFF --wsrep-provider=none &\\ntimeout ${DB_MAX_TIMEOUT} /bin/bash -c \'until mysqladmin -uroot -p\\"${DB_ROOT_PASSWORD}\\" ping 2>/dev/null; do sleep 1; done\'\\nmysql -uroot -p\\"${DB_ROOT_PASSWORD}\\" -e \\"CREATE USER \'clustercheck\'@\'localhost\' IDENTIFIED BY \'${DB_CLUSTERCHECK_PASSWORD}\';\\"\\nmysql -uroot -p\\"${DB_ROOT_PASSWORD}\\" -e \\"GRANT PROCESS ON *.* TO \'clustercheck\'@\'localhost\' WITH GRANT OPTION;\\"\\ntimeout ${DB_MAX_TIMEOUT} mysqladmin -uroot -p\\"${DB_ROOT_PASSWORD}\\" shutdown"], "volumes": ["/etc/hosts:/etc/hosts:ro", "/etc/localtime:/etc/localtime:ro", "/etc/puppet:/etc/puppet:ro", "/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro", "/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro", "/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro", "/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro", "/dev/log:/dev/log", "/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro", "/var/lib/kolla/config_files/mysql.json:/var/lib/kolla/config_files/config.json", "/var/lib/config-data/puppet-generated/mysql/:/var/lib/kolla/config_files/src:ro", "/var/lib/mysql:/var/lib/mysql"], "image": "192.168.24.1:8787/rhosp12/openstack-mariadb-docker:2017-09-08.3", "detach": false, "net": "host"}', '--env=KOLLA_CONFIG_STRATEGY=COPY_ALWAYS', '--env=KOLLA_BOOTSTRAP=True', '--env=KOL
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: LA_KUBERNETES=True', '--env=DB_MAX_TIMEOUT=60', '--env=DB_CLUSTERCHECK_PASSWORD=wNZ6dj6DHpQg38Mn3wH4Mv9sj', '--env=DB_ROOT_PASSWORD=kZlnL8BpMl', '--net=host', '--volume=/etc/hosts:/etc/hosts:ro', '--volume=/etc/localtime:/etc/localtime:ro', '--volume=/etc/puppet:/etc/puppet:ro', '--volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro', '--volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro', '--volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro', '--volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro', '--volume=/dev/log:/dev/log', '--volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro', '--volume=/var/lib/kolla/config_files/mysql.json:/var/lib/kolla/config_files/config.json', '--volume=/var/lib/config-data/puppet-generated/mysql/:/var/lib/kolla/config_files/src:ro', '--volume=/var/lib/mysql:/var/lib/mysql', '192.168.24.1:8787/rhosp12/openstack-mariadb-docker:2017-09-08.3', 'bash', '-ecx', 'if [ -e /var/lib/mysql/mysql ]; then exit 0; fi\nkolla_start\nmysqld_safe --skip-networking --wsrep-on=OFF --wsrep-provider=none &\ntimeout ${DB_MAX_TIMEOUT} /bin/bash -c \'until mysqladmin -uroot -p"${DB_ROOT_PASSWORD}" ping 2>/dev/null; do sleep 1; done\'\nmysql -uroot -p"${DB_ROOT_PASSWORD}" -e "CREATE USER \'clustercheck\'@\'localhost\' IDENTIFIED BY \'${DB_CLUSTERCHECK_PASSWORD}\';"\nmysql -uroot -p"${DB_ROOT_PASSWORD}" -e "GRANT PROCESS ON *.* TO \'clustercheck\'@\'localhost\' WITH GRANT OPTION;"\ntimeout ${DB_MAX_TIMEOUT} mysqladmin -uroot -p"${DB_ROOT_PASSWORD}" shutdown']. [1]",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "stdout: Installing MariaDB/MySQL system tables in '/var/lib/mysql' ...",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "OK",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "Filling help tables...",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "To start mysqld at boot time you have to copy",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "support-files/mysql.server to the right place for your system",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "PLEASE REMEMBER TO SET A PASSWORD FOR THE MariaDB root USER !",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "To do so, start the server, then issue the following commands:",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "'/usr/bin/mysqladmin' -u root password 'new-password'",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "'/usr/bin/mysqladmin' -u root -h overcloud-controller-2 password 'new-password'",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "Alternatively you can run:",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "'/usr/bin/mysql_secure_installation'",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "which will also give you the option of removing the test",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "databases and anonymous user created by default.  This is",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "strongly recommended for production servers.",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "See the MariaDB Knowledgebase at http://mariadb.com/kb or the",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "MySQL manual for more instructions.",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "You can start the MariaDB daemon with:",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "cd '/usr' ; /usr/bin/mysqld_safe --datadir='/var/lib/mysql'",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "You can test the MariaDB daemon with mysql-test-run.pl",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "cd '/usr/mysql-test' ; perl mysql-test-run.pl",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "Please report any problems at http://mariadb.org/jira",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "The latest information about MariaDB is available at http://mariadb.org/.",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "You can find additional information about the MySQL part at:",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "http://dev.mysql.com",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "Support MariaDB development by buying support/new features from MariaDB",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "Corporation Ab. You can contact us about this at sales.",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "Alternatively consider joining our community based development effort:",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "http://mariadb.com/kb/en/contributing-to-the-mariadb-project/",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "170913 17:51:29 mysqld_safe Logging to '/var/log/mariadb/mariadb.log'.",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "170913 17:51:29 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "170913 17:51:29 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.xYeqx3' --pid-file='/var/lib/mysql/overcloud-controller-2-recover.pid'",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "170913 17:51:32 mysqld_safe WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "spawn mysql_secure_installationr",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "/usr/bin/mysql_secure_installation: line 379: find_mysql_client: command not foundr",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "r",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "NOTE: RUNNING ALL PARTS OF THIS SCRIPT IS RECOMMENDED FOR ALL MariaDBr",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "      SERVERS IN PRODUCTION USE!  PLEASE READ EACH STEP CAREFULLY!r",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "In order to log into MariaDB to secure it, we'll need the currentr",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "password for the root user.  If you've just installed MariaDB, andr",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "you haven't set the root password yet, the password will be blank,r",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "so you should just press enter here.r",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "Enter current password for root (enter for none): r",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "OK, successfully used password, moving on...r",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "Setting the root password ensures that nobody can log into the MariaDBr",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "root user without the proper authorisation.r",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "Set root password? [Y/n] yr",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "New password: r",                                                                                                                                                                                                                                          
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "Re-enter new password: r",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "ERROR 1047 (08S01) at line 1: WSREP has not yet prepared node for application user",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "Password update failed!r",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "Cleaning up...r",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "Failed to get 'Remove anonymous users?' prompt",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "stderr: + '[' -e /var/lib/mysql/mysql ']'",
Sep 13 17:51:40 overcloud-controller-2 os-collect-config[3191]: "+ kolla_start",

Interestingly enough it did work on controller-0:
11c5320a7b9c        192.168.24.1:8787/rhosp12/openstack-mariadb-docker:2017-09-08.3     "bash -ecx 'if [ -e /"   11 hours ago        Exited (0) 11 hours ago                       mysql_bootstrap
 
Same on controller-1:
4c8be359a5db        192.168.24.1:8787/rhosp12/openstack-mariadb-docker:2017-09-08.3     "bash -ecx 'if [ -e /"   11 hours ago        Exited (0) 11 hours ago                       mysql_bootstrap
 
Only contorller-2 failed. If I start the container by hand I get the following:
[root@overcloud-controller-2 ~]# docker run --rm -it  -e "KOLLA_CONFIG_STRATEGY=COPY_ALWAYS"  -e "KOLLA_BOOTSTRAP=True"  -e "KOLLA_KUBERNETES=True"  -e "DB_MAX_TIMEOUT=60"  -e "DB_CLUSTERCHECK_PASSWORD=wNZ6dj6DHpQg38Mn3wH4Mv9sj"  -e "DB_ROOT_PASSWORD=kZlnL8BpMl"  -e "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"  -e "container=oci"  -e "KOLLA_BASE_DISTRO=rhel"  -e "KOLLA_INSTALL_TYPE=binary"  -e "KOLLA_INSTALL_METATYPE=rhos"  -e "PS1=$(tput bold)($(printenv KOLLA_SERVICE_NAME))$(tput sgr0)[$(id -un)@$(hostname -s) $(pwd)]$ "   -v /dev/log:/dev/log -v /etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro -v /var/lib/mysql:/var/lib/mysql -v /etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro -v /etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro -v /var/lib/config-data/puppet-generated/mysql:/var/lib/kolla/config_files/src:ro -v /etc/hosts:/etc/hosts:ro -v /etc/puppet:/etc/puppet:ro -v /etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro -v /var/lib/kolla/config_files/mysql.json:/var/lib/kolla/config_files/config.json -v /etc/localtime:/etc/localtime:ro -v /etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro -ti 192.168.24.1:8787/rhosp12/openstack-mariadb-docker:2017-09-08.3 /bin/bash
()[root@overcloud-controller-2 /root]$ id
uid=42434(mysql) gid=42434(mysql) groups=42434(mysql),42400(kolla)
()[root@overcloud-controller-2 /root]$ export TERM=xterm
 
Now if I run the same command that THT runs:
()[root@overcloud-controller-2 /root]$ mysqld_safe --skip-networking --wsrep-on=OFF --wsrep-provider=none &
[1] 10
()[root@overcloud-controller-2 /root]$ 170914 05:50:58 mysqld_safe Logging to '/var/log/mariadb/mariadb.log'.
170914 05:50:58 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
170914 05:50:58 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.G83dzc' --pid-file='/var/lib/mysql/003dbc6362fb-recover.pid'
170914 05:51:00 mysqld_safe WSREP: Recovered position 2db1852d-98ac-11e7-93b0-824a02ed3674:0
 
()[root@overcloud-controller-2 /root]$ more /var/log/mariadb/mariadb.log
170914 05:50:58 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
170914 05:50:58 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.G83dzc' --pid-file='/var/lib/mysql/003dbc6362fb-recover.pid'
nohup: ignoring input
170914 05:51:00 mysqld_safe WSREP: Recovered position 2db1852d-98ac-11e7-93b0-824a02ed3674:0
170914  5:51:00 [Note] WSREP: wsrep_start_position var submitted: '2db1852d-98ac-11e7-93b0-824a02ed3674:0'
170914  5:51:00 [Warning] You need to use --log-bin to make --binlog-format work.
170914  5:51:00 InnoDB: The InnoDB memory heap is disabled
170914  5:51:00 InnoDB: Mutexes and rw_locks use GCC atomic builtins
170914  5:51:00 InnoDB: Compressed tables use zlib 1.2.7
170914  5:51:00 InnoDB: Using Linux native AIO
170914  5:51:00 InnoDB: Initializing buffer pool, size = 128.0M
170914  5:51:00 InnoDB: Completed initialization of buffer pool
170914  5:51:00 InnoDB: highest supported file format is Barracuda.
170914  5:51:00  InnoDB: Waiting for the background threads to start
170914  5:51:01 Percona XtraDB (http://www.percona.com) 5.5.41-MariaDB-37.0 started; log sequence number 1598283
170914  5:51:01 [Note] Plugin 'FEEDBACK' is disabled.
170914  5:51:01 [Note] Event Scheduler: Loaded 0 events
170914  5:51:01 [Note] WSREP: Initial position: 2db1852d-98ac-11e7-93b0-824a02ed3674:0
170914  5:51:01 [Note] WSREP: wsrep_load(): loading provider library 'none'
170914  5:51:01 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.5.42-MariaDB-wsrep'  socket: '/var/lib/mysql/mysql.sock'  port: 0  MariaDB Server, wsrep_25.11.r4026
 
What I don't get is is this WSREP recovering position stuff is normal. Will chat with Damien once he is up. (leaving needinfo on me)

Comment 8

Damien Ciabrini

2017-09-14 08:48:30 UTC

From the logs, this seems to happen during the regular kolla bootstrap shell script, so _before_ we add the clustercheck user.

I'm currently looking at the kolla steps in detail to find out why it would randomly fail.

Comment 9

Damien Ciabrini

2017-09-14 13:06:53 UTC

Omri,

So it looks to me that some of the calls to msqld_safe were still happening with WSREP enabled while they shouldn't during the bootstrap of the DB.

Could you maybe try to apply this patch from https://review.openstack.org/#/c/504049/ and see if it fixes your issues?

Comment 10

Omri Hochman

2017-09-14 21:24:22 UTC

Hi Damien  thanks for your help , 
I saw the deployment fails again on the BM env after applying the patch.

I assume you're still reviewing the issue,  just wanted to show the output I'm getting on the controller :


[root@overcloud-controller-0 ~]# journalctl -u os-collect-config | grep ERROR
Sep 14 19:34:12 overcloud-controller-0 os-collect-config[3395]: O:__main__:Copying service configuration files", n        "INFO:__main__:Copying /dev/null to /etc/libqb/force-filesystem-sockets", n        "INFO:__main__:Setting permission for /etc/libqb/force-filesystem-sockets", n        "ERROR:__main__:MissingRequiredSource: /var/lib/kolla/config_files/src/* file is not found", n        "stdout: 4b35cbd91e0a094b5c07be5a1d156914d89ba2cab1f13b737b2e6f4f567bc936"n    ], n    "changed": false, n    "failed": true, n    "failed_when_result": truen}ntto retry, use: --limit @/var/lib/heat-config/heat-config-ansible/b6072a53-edef-4116-933f-43dc524b6aeb_playbook.retrynnPLAY RECAP *********************************************************************nlocalhost                  : ok=8    changed=1    unreachable=0    failed=1   nn", "deploy_stderr": "", "deploy_status_code": 2}
Sep 14 19:34:12 overcloud-controller-0 os-collect-config[3395]: "ERROR:__main__:MissingRequiredSource: /var/lib/kolla/config_files/src/* file is not found",
Sep 14 19:34:12 overcloud-controller-0 os-collect-config[3395]: [2017-09-14 19:34:12,671] (heat-config) [ERROR] Error running /var/lib/heat-config/heat-config-ansible/b6072a53-edef-4116-933f-43dc524b6aeb_playbook.yaml. [2]
 

not sure why this playbook fails : 

[root@overcloud-controller-0 ~]# less /var/lib/heat-config/heat-config-ansible/b6072a53-edef-4116-933f-43dc524b6aeb_playbook.yaml

http://pastebin.test.redhat.com/516337

Comment 11

Damien Ciabrini

2017-09-19 06:18:53 UTC

Yes, the version you tested was still under review. I've fixed that error and did a deploy on a downstream env to validate it.

The review upstream has all the +2 needed, I'm waiting for the jobs to pass green.

Comment 12

Martin André

2017-09-20 13:32:11 UTC

*** Bug 1492314 has been marked as a duplicate of this bug. ***

Comment 13

Damien Ciabrini

2017-09-21 07:05:42 UTC

fix merged in Master and Pike upstream

Comment 15

Omri Hochman

2017-11-03 16:09:25 UTC

Unable to reproduce with : OpenStack-12.0-RHEL-7 Puddle: 2017-10-31.1

Comment 19

errata-xmlrpc

2017-12-13 21:53:35 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462

Источник

Bootstrapping a Galera Cluster

When to Bootstrap

Symptoms of Lost Quorum

Auto-bootstrap errand

How to run

How it works

Manual Bootstrap Process

Читайте также: