Error cluster is not currently running on this node - Исправление ошибок и поиск оптимальных решений проблем

Issue

Running command ‘pcs cluster status’ there is a message ‘Error: cluster is not currently running on this node’ even when the cluster was started using command ‘pcs cluster start’
In the logs there are messages like below
- warning: Verify pacemaker and pacemaker_remote are not both enabled.
- lrmd[XXXXX]: error: Could not bind AF_UNIX (): Address already in use (98)
- attrd[XXXXX]: error: Could not bind AF_UNIX (): Address already in use (98)
- cib[XXXXX]: error: Could not bind AF_UNIX (): Address already in use (98)

Environment

Red Hat Enterprise Linux 7 with High-Availability Pacemaker Add-on

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Источник

uses Corosync or heartbeat, (it seems) corosync is the one to go for.

Notes

by specifying -INFINITY, the constraint is binding.

Quickstart

Keep in mind you might want to use dedicated IPs for sync, so define those in /etc/hosts On both nodes

set password

passwd hacluster

systemctl start pcsd.service
systemctl enable pcsd.service

Commands/tools

crm
crmadmin
cibadm
pcs
corosync

Useful commands

save entire config

pcs config backup configfile

Dump entire crm

cibadm -Q

HOWTO

Groups

Add existing resource to group

pcs resource group add GROUPID RESOURCEID

Stop resource group

pcs resource disable MYGROUP

See if entire group is disabled

pcs resource show MYGROUP

Meta Attrs: target-role=Stopped

FAQ

Update resource

pcs resource update resourcname variablename=newvalue

Current DC

In output of

pcs status

this is Designated Controller

Remove resource group + members

pcs resource delete whateverresource

Move resource to node

pcs resource move RES NODE

Show default resource stickiness

pcs resource default

Set resource stickiness

pcs resource meta <resource_id> resource-stickiness=100

and to check:

pcs resource show <resource_id>

Or better yet:

crm_simulate -Ls

Undo resource move

pcs constraint --full

Location Constraints:
  Resource: FOO
    Enabled on: santest-a (score:INFINITY) (role: Started) (id:cli-prefer-FOO)

pcs constraint remove cli-prefer-FOO

pcs status: Error: cluster is not currently running on this node

Don’t panic until after

sudo pcs status

show detailed resources

pcs resource --full

stop node (standby)

The following command puts the specified node into standby mode. The specified node is no longer able to host resources. Any resources currently active on the node will be moved to another node. If you specify the —all, this command puts all nodes into standby mode.

pcs cluster standby node-1

pcs node standby

on the node itself

and undo this with

pcs cluster unstandby node-1

pcs node unstandby

set maintenance mode

This sets the cluster in maintenance mode, so it stops managing the resources

pcs property set maintenance-mode=true

Error: cluster is not currently running on this node

pcs cluster start [<node name>]

Remove a constraint

pcs constraint list --full

to identify the constraints and then

pcs constraint remove <whatever-constraint-id>

Clear error messages

 pcs resource cleanup

Call cib_replace failed (-205): Update was older than existing configuration

can be run only once

[Error signing on to the CIB service: Transport endpoint is not connected ]

probably selinux

Show allocation scores

crm_simulate -sL

Show resource failcount

pcs resource failcount show <resource>

export current configuration as commands

pcs config export pcs-commands

debug resource

pcs resource debug-start resource

* Resource management is DISABLED * The cluster will not attempt to start, stop or recover services

Cluster is in maintenance mode

Found meta data is «unclean», please apply-al first

Troubleshooting

Debugging the policy engine

pcs status all resources stopped

probably a bad ordering constraint

Fencing and resource management disabled due to lack of quorum

Problably means you forgot to pcs cluster start the other node

Resource cannot run anywhere

Check if some stickiness was set

pcs resource update unable to find resource

Trying to unset stickiness:

pcs resource update ISCSIgroupTEST1 meta resource-stickiness=

caused: Error: Unable to find resource: ISCSIgroupTEST1

what his means is: try it on the host where stickiness was set

Difference between maintenance-mode and standby

Still not clear

drbdadm create-md test3 ‘test3’ not defined in your config (for this host).

You’re supposed to use `hostname` in the ‘on …’ bit

corosync: active/disabled

As far as i can tell means some resources have been disabled

ocf-exit-reason:Undefined iSCSI target implementation

Install scsi-target-utils

moving RES away after 1000000 failures

If failcount is 0, try pcs resource cleanup

Источник

7.9.2.1. Резервное копирование настроек кластера¶

Проверьте состояние кластера Pacemaker выполнением команды (на УУ):

Если УУ не включен в кластер, будет выведено следующее сообщение:

Error: cluster is not currently running on this node

7.9.2.1.1. Контроллеры OpenStack (УУ)¶

На контроллерах (УУ) следует снимать копии следующих директорий и файлов:

/etc/keystone/
/etc/glance/
/etc/nova/
/etc/neutron/
/etc/cinder/
/etc/httpd/
/etc/openstack-dashboard/
/etc/tionix/
/etc/my.cnf
/etc/my.cnf.d/
/etc/haproxy/
/etc/tionix
/etc/rabbitmq
/var/lib/rabbitmq/

Ниже приведены типовые операции и команды, используемые для бэкапирования
(создания резервных копий).

Бекапирование базы данных:

mysqldump --all-databases | gzip > openstack.sql.gz

Регулярный бэкап по заданному расписанию (cron), без перерыва в обслуживании:

0 2 * * * mysqldump -u root --all-databases --single-transaction --quick --lock-tables=false > /var/lib/mysql/backup/all_db_backup.sql

Восстановление базы данных из файла, содержащего резервную копию БД:

gzip -cd openstack.sql.gz | mysql

Бекапирование конфигурации Pacemaker:

pcs config backup openstack

7.9.2.1.2. Вычислительные узлы (ВУ)¶

На вычислительных узлах следует снимать резервные копии следующих директорий:

/etc/nova/
/etc/neutron/
/etc/tionix

7.9.2.2. Восстановление ноды (после отказа)¶

При отказе какой-либо из нод проделайте следующие действия:

Переустановите операционную систему CentOS 7.
Настройте сетевые интерфейсы.
Подключите требуемые репозитории:

yum install epel-release centos-release-openstack-queens

Добавьте файл описания доступа к репозиторию Tionix:

Дальнейшие действия производятся в зависимости от типа восстанавливаемой
ноды (УУ или ВУ).

7.9.2.2.1. Контроллер Openstack (УУ)¶

Установите (из репозитория) требуемые пакеты:

yum install 
haproxy memcached corosync pacemaker pcs 
mariadb-galera-server  mariadb-galera-common  mariadb-server-galera 
mariaDB-server rabbitmq-server 
openstack-keystone httpd mod_wsgi 
openstack-glance openstack-nova-api openstack-nova-conductor 
openstack-nova-scheduler openstack-nova-novncproxy 
openstack-nova-placement-api openstack-neutron openstack-neutron-ml2 
openstack-neutron-openvswitch openstack-dashboard openstack-cinder 
python-tionix_clientpython-tionix_dashboard python-tionix_licensing 
python-tionix_monitor python-tionix_node_control 
python-tionix_scheduler python-tionix_vdi_server

Разархивируйте зарезервированные ранее директории с конфигурационными
файлами по своим путям.

Включите системные службы:

systemctl enable 
haproxy httpd memcached mariadb 
openstack-glance-api openstack-glance-registry openstack-nova-api 
openstack-nova-scheduler openstack-nova-conductor 
openstack-nova-novncproxy neutron-server neutron-dhcp-agent 
neutron-metadata-agent neutron-l3-agent neutron-openvswitch-agent 
openstack-cinder-api openstack-cinder-scheduler 
tionix-*

Восстановите кластер Pacemaker (Раздел 7.9.2.4) и
кластер Galera (Раздел 7.9.2.3).

Восстановите кластер RabbitMQ (Раздел 7.9.2.5).

7.9.2.2.2. Вычислительный узел¶

Установите на ВУ требуемые пакеты:

yum install openstack-nova-compute neutron-l3-agent 
neutron-metadata-agent neutron-openvswitch-agent

Разархивируйте зарезервированные ранее директории сконфигурационными
файлами по своим путям.

Включите требуемые сервисы:

systemctl enable openstack-nova-compute neutron-l3-agent 
neutron-metadata-agent neutron-openvswitch-agent

Перезагрузите ВУ (восстановленную ноду):

7.9.2.3. Восстановление кластера Galera¶

В случае одновременного отказа всех контроллеров (перезагрузка, либо иная
аварийная ситуация) возможна ситуация, когда ноды Galera отказываются
собираться в кластер без дополнительных действий.

Для восстановления необходимо на всех контроллерах запустить команду:

Будет выведен лог состояния ноды. Среди вывода будет строка следующего
формата:

WSREP: Recovered position <UUID>:1234

где

UUID – (в примере b95919ad-3b5e-48de-a102-516345783738) будет совпадать
между всеми нодами, а число после двоеточия (в примере 1234) является
номер последовательности, которую данная нода видела последней.

Из всех нод необходимо выбрать ту, на которой данный номер последовательности
наибольший (или же любую из нод, если он совпадает), и выполнить на ней
команду:

Выполнение этой команды произведет корректный bootstrap (конфигурацию запуска)
нового кластера.

Остальные ноды необходимо стартовать стандартным для ОС Linux путем:

Если Galera контролируем из Pacemaker, то для восстановления кластера
необходимо перезапустить ресурс Galera кластера. Выполните команду:

pcs resource restart galera_resource

Внимание.

После проделанных выше действий ноды должны присоединиться к новому кластеру.

Проверить состояние кластера можно выполнением SQL-запроса, присоединившись
к какой-либо из нод при помощи клиента СУБД (mysql):

mysql -p -u root -h controller1

> SHOW STATUS LIKE ‘wsrep%’;

7.9.2.4. Восстановление кластера PCS (при потере одной ноды)¶

Остановите работу кластера:

Удалите выведенную из эксплуатации ноду из кластера:

pcs cluster node remove old_node

Примечание.

Дальнейшие действия производятся на новой ноде (new_node).

Разархивируйте по соответствующим путям конфигурационные файлы для запуска
сервисов, обслуживаемых при помощи PCS.

Задайте пароль пользователю hacluster (на новой ноде):

Запустите и добавьте в автозапуск сервис pcsd:

systemctl start pcsd

systemctl enable pcsd

Аутентифицируйте в кластере новую ноду:

pcs cluster auth new_node

С любой действующей ноды кластера выполните команду:

pcs cluster node add new_node

На новой ноде (new_node) выполнить команду:

Выполните восстановление конфигурации сервисов Pacemaker (на новой ноде):

pcs config restore openstack.tar

На каждой из нод (УУ), включенных в кластер, выполните команду:

7.9.2.5. Восстановление кластера RabbitMQ¶

Способы решения проблем в работе очереди данных RabbitMQ могут быть найдены в
официальной документации OpenStack.

Разархивируйте каталоги с конфигурационными файлами в соответствующие пути.

Удостоверьтесь, что файлы /var/lib/rabbitmq/.erlang.cookie,
хранящиеся на всех узлах кластера, совпадают друг с другом.

Переименуйте узел кластера, если имя узла отличается:

rabbitmqctl rename_cluster_node <oldnode> <newnode>

Внимание.

Также, следует заменить имя узла во всех конфигурационных файлах.

Запустите службу rabbitmq:

systemctl start rabbitmq-server

Добавьте узел в кластер. Для этого выполните команду:

rabbitmqctl stop_app && 
rabbitmqctl join_cluster rabbit@имя_любого_узла_кластера && 
rabbitmqctl start_app

Удостоверьтесь, что новая нода появилась в кластере:

rabbitmqctl cluster_status

Дополнительная информация о кластеризованном обращении с очередью, влиянии
сетевых неполадок(пропаданием связи между нодами) и опциях восстановления,
может быть найдена в Интернет. Перейдите по ссылке:

Источник

In this article, we will see that how to configure two node Redhat cluster using pacemaker & corosync on REHL 7.2. Once you have installed the necessary packages, you need to enable the cluster services at the system start-up. You must start the necessary cluster services before kicking off the cluster configuration. “hacluster” user will be created automatically during the package installation with disabled password. Corosync will use this user to sync the cluster configuration, starting and stopping the cluster on cluster nodes.

Environment:

Operating System: Redhat Enterprise Linux 7.2
Type of Cluster : Two Node cluster – Failover
Nodes: UA-HA & UA-HA2 (Assuming that packages have been installed on both the nodes)
Cluster Resource : KVM guest (VirtualDomain) – See in Next Article.

Hardware configuration:

CPU – 2
Memory – 4GB
NFS – For shared storage

Redhat Cluster 7 – RHEL 7 – PCS

Enable & Start the Services on both the Nodes:

1.Login to both the cluster nodes as root user.

2. Enable the pcsd daemon on both the nodes to start automatically across the reboot. pcsd is pacemaker configuration daemon. (Not a cluster service)

[root@UA-HA ~]# systemctl start pcsd.service
[root@UA-HA ~]# systemctl enable pcsd.service
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.
[root@UA-HA ~]# systemctl status pcsd.service
● pcsd.service - PCS GUI and remote configuration interface
   Loaded: loaded (/usr/lib/systemd/system/pcsd.service; enabled; vendor preset: disabled)
   Active: active (running) since Sun 2015-12-27 23:22:08 EST; 14s ago
 Main PID: 18411 (pcsd)
   CGroup: /system.slice/pcsd.service
           ├─18411 /bin/sh /usr/lib/pcsd/pcsd start
           ├─18415 /bin/bash -c ulimit -S -c 0 >/dev/null 2>&1 ; /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb
           └─18416 /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb

Dec 27 23:22:07 UA-HA systemd[1]: Starting PCS GUI and remote configuration interface...
Dec 27 23:22:08 UA-HA systemd[1]: Started PCS GUI and remote configuration interface.
[root@UA-HA ~]#

3. Set the new password for cluster user “hacluster” on both the nodes.

[root@UA-HA ~]# passwd hacluster
Changing password for user hacluster.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
[root@UA-HA ~]#
[root@UA-HA2 ~]# passwd hacluster
Changing password for user hacluster.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
[root@UA-HA2 ~]#

Configure corosync & Create new cluster:

1. Login to any of the cluster node and authenticate “hacluster” user.

[root@UA-HA ~]# pcs cluster auth UA-HA UA-HA2
Username: hacluster
Password:
UA-HA: Authorized
UA-HA2: Authorized
[root@UA-HA ~]#

2.Create a new cluster using pcs command.

[root@UA-HA ~]# pcs cluster setup --name UABLR UA-HA UA-HA2
Shutting down pacemaker/corosync services...
Redirecting to /bin/systemctl stop  pacemaker.service
Redirecting to /bin/systemctl stop  corosync.service
Killing any remaining services...
Removing all cluster configuration files...
UA-HA: Succeeded
UA-HA2: Succeeded
Synchronizing pcsd certificates on nodes UA-HA, UA-HA2...
UA-HA: Success
UA-HA2: Success

Restaring pcsd on the nodes in order to reload the certificates...
UA-HA: Success
UA-HA2: Success
[root@UA-HA ~]#

3. Check the cluster status .

[root@UA-HA ~]# pcs status
Error: cluster is not currently running on this node
[root@UA-HA ~]#

You see the error because , cluster service is not started.

4. Start the cluster using pcs command. “–all” will start the cluster on all the configured nodes.

[root@UA-HA ~]# pcs cluster start --all
UA-HA2: Starting Cluster...
UA-HA: Starting Cluster...
[root@UA-HA ~]#

In the back-end , “pcs cluster start” command will trigger the following command on each cluster node.

# systemctl start corosync.service
# systemctl start pacemaker.service

5. Check the cluster services status.

[root@UA-HA ~]# systemctl status corosync
● corosync.service - Corosync Cluster Engine
   Loaded: loaded (/usr/lib/systemd/system/corosync.service; disabled; vendor preset: disabled)
   Active: active (running) since Sun 2015-12-27 23:34:31 EST; 11s ago
  Process: 18994 ExecStart=/usr/share/corosync/corosync start (code=exited, status=0/SUCCESS)
 Main PID: 19001 (corosync)
   CGroup: /system.slice/corosync.service
           └─19001 corosync

Dec 27 23:34:31 UA-HA corosync[19001]:  [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2
Dec 27 23:34:31 UA-HA corosync[19001]:  [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2
Dec 27 23:34:31 UA-HA corosync[19001]:  [QUORUM] Members[1]: 1
Dec 27 23:34:31 UA-HA corosync[19001]:  [MAIN  ] Completed service synchronization, ready to provide service.
Dec 27 23:34:31 UA-HA corosync[19001]:  [TOTEM ] A new membership (192.168.203.131:1464) was formed. Members joined: 2
Dec 27 23:34:31 UA-HA corosync[19001]:  [QUORUM] This node is within the primary component and will provide service.
Dec 27 23:34:31 UA-HA corosync[19001]:  [QUORUM] Members[2]: 2 1
Dec 27 23:34:31 UA-HA corosync[19001]:  [MAIN  ] Completed service synchronization, ready to provide service.
Dec 27 23:34:31 UA-HA systemd[1]: Started Corosync Cluster Engine.
Dec 27 23:34:31 UA-HA corosync[18994]: Starting Corosync Cluster Engine (corosync): [  OK  ]
[root@UA-HA ~]# systemctl status pacemaker
● pacemaker.service - Pacemaker High Availability Cluster Manager
   Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; disabled; vendor preset: disabled)
   Active: active (running) since Sun 2015-12-27 23:34:32 EST; 15s ago
 Main PID: 19016 (pacemakerd)
   CGroup: /system.slice/pacemaker.service
           ├─19016 /usr/sbin/pacemakerd -f
           ├─19017 /usr/libexec/pacemaker/cib
           ├─19018 /usr/libexec/pacemaker/stonithd
           ├─19019 /usr/libexec/pacemaker/lrmd
           ├─19020 /usr/libexec/pacemaker/attrd
           ├─19021 /usr/libexec/pacemaker/pengine
           └─19022 /usr/libexec/pacemaker/crmd

Dec 27 23:34:33 UA-HA crmd[19022]:   notice: pcmk_quorum_notification: Node UA-HA2[2] - state is now member (was (null))
Dec 27 23:34:33 UA-HA crmd[19022]:   notice: pcmk_quorum_notification: Node UA-HA[1] - state is now member (was (null))
Dec 27 23:34:33 UA-HA stonith-ng[19018]:   notice: Watching for stonith topology changes
Dec 27 23:34:33 UA-HA crmd[19022]:   notice: Notifications disabled
Dec 27 23:34:33 UA-HA crmd[19022]:   notice: The local CRM is operational
Dec 27 23:34:33 UA-HA crmd[19022]:   notice: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ]
Dec 27 23:34:33 UA-HA attrd[19020]:  warning: Node names with capitals are discouraged, consider changing 'UA-HA2' to something else
Dec 27 23:34:33 UA-HA attrd[19020]:   notice: crm_update_peer_proc: Node UA-HA2[2] - state is now member (was (null))
Dec 27 23:34:33 UA-HA stonith-ng[19018]:  warning: Node names with capitals are discouraged, consider changing 'UA-HA2' to something else
Dec 27 23:34:34 UA-HA stonith-ng[19018]:   notice: crm_update_peer_proc: Node UA-HA2[2] - state is now member (was (null))
[root@UA-HA ~]#

Verify Corosync configuration:

1. Check the corosync communication status.

[root@UA-HA ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
        id      = 192.168.203.134
        status  = ring 0 active with no faults
[root@UA-HA ~]#

In my setup, first RING is using interface “br0”.

[root@UA-HA ~]# ifconfig br0
br0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.203.134  netmask 255.255.255.0  broadcast 192.168.203.255
        inet6 fe80::84ef:2eff:fee9:260a  prefixlen 64  scopeid 0x20
        ether 00:0c:29:2d:3f:ce  txqueuelen 0  (Ethernet)
        RX packets 15797  bytes 1877460 (1.7 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 7018  bytes 847881 (828.0 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@UA-HA ~]#

We can have multiple RINGS to provide the redundancy for the cluster communication. (We use to call LLT links in VCS )

2. Check the membership and quorum API’s.

[root@UA-HA ~]# corosync-cmapctl  | grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.203.134)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.203.131)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined
[root@UA-HA ~]#
[root@UA-HA ~]# pcs status corosync

Membership information
----------------------
    Nodeid      Votes Name
         2          1 UA-HA2
         1          1 UA-HA (local)
[root@UA-HA ~]#

Verify Pacemaker Configuration:

1. Check the running pacemaker processes.

[root@UA-HA ~]# ps axf |grep pacemaker
19324 pts/0    S+     0:00  |       _ grep --color=auto pacemaker
19016 ?        Ss     0:00 /usr/sbin/pacemakerd -f
19017 ?        Ss     0:00  _ /usr/libexec/pacemaker/cib
19018 ?        Ss     0:00  _ /usr/libexec/pacemaker/stonithd
19019 ?        Ss     0:00  _ /usr/libexec/pacemaker/lrmd
19020 ?        Ss     0:00  _ /usr/libexec/pacemaker/attrd
19021 ?        Ss     0:00  _ /usr/libexec/pacemaker/pengine
19022 ?        Ss     0:00  _ /usr/libexec/pacemaker/crmd

2. Check the cluster status.

[root@UA-HA ~]# pcs status
Cluster name: UABLR
WARNING: no stonith devices and stonith-enabled is not false
Last updated: Sun Dec 27 23:44:44 2015          Last change: Sun Dec 27 23:34:55 2015 by hacluster via crmd on UA-HA
Stack: corosync
Current DC: UA-HA (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 0 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:


PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
[root@UA-HA ~]#

3. You can see that corosync & pacemaker is active now and disabled across the system reboot. If you would like to start the cluster automatically across the reboot, you can enable it using systemctl command.

[root@UA-HA2 ~]# systemctl enable corosync
Created symlink from /etc/systemd/system/multi-user.target.wants/corosync.service to /usr/lib/systemd/system/corosync.service.
[root@UA-HA2 ~]# systemctl enable pacemaker
Created symlink from /etc/systemd/system/multi-user.target.wants/pacemaker.service to /usr/lib/systemd/system/pacemaker.service.
[root@UA-HA2 ~]# pcs status
Cluster name: UABLR
WARNING: no stonith devices and stonith-enabled is not false
Last updated: Sun Dec 27 23:51:30 2015          Last change: Sun Dec 27 23:34:55 2015 by hacluster via crmd on UA-HA
Stack: corosync
Current DC: UA-HA (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 0 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:


PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA2 ~]#

4. When the cluster starts, it automatically records the number and details of the nodes in the cluster, as well as which stack is being used and the version of Pacemaker being used. To view the cluster configuration (Cluster Information Base – CIB) in XML format, use the following command.

[root@UA-HA2 ~]# pcs cluster cib

5. Verify the cluster information base using the following command.

[root@UA-HA ~]# crm_verify -L -V
   error: unpack_resources:     Resource start-up disabled since no STONITH resources have been defined
   error: unpack_resources:     Either configure some or disable STONITH with the stonith-enabled option
   error: unpack_resources:     NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
[root@UA-HA ~]#

By default pacemaker enables STONITH (Shoot The Other Node In The Head ) / Fencing in an order to protect the data. Fencing is mandatory when you use the shared storage to avoid the data corruptions.

For time being , we will disable the STONITH and configure it later.

6. Disable the STONITH (Fencing)

[root@UA-HA ~]#pcs property set stonith-enabled=false
[root@UA-HA ~]# 
[root@UA-HA ~]#  pcs property show stonith-enabled
Cluster Properties:
 stonith-enabled: false
[root@UA-HA ~]#

7. Verify the cluster configuration again. Hope the errors will be disappear

[root@UA-HA ~]# crm_verify -L -V
[root@UA-HA ~]#

We have successfully configured two node redhat cluster on RHEL 7.2 with new components pacemaker and corosync. Hope this article is informative to you.

Share it ! Comment it !! Be Sociable !!!

Источник

We will see using Pacemaker and Corosync how we can mange a redundant HAProxy cluster using Virtual IP.

In this example have only Two Nodes Node-1 and Node-2 so we would have to Disable quorum and stonith.

Installing the necessary Packages on both nodes:

[root@HOST-1 ~]# sudo yum install corosync pcs pacemaker haproxy

[root@HOST-2 ~]# sudo yum install corosync pcs pacemaker haproxy

[root@HOST-1 ~]# pcs status
Error: cluster is not currently running on this node

Disable the firewall service to avoid any traffic blocking in both Hosts.

service firewalld stop

To access the multiple hosts using a single interface, we need to create a cluster of all nodes and that is managed by PCS. while Installing the PCS and other packages Yum also created a user “hacluster” which is used with PCS for configuring the cluster nodes. and Before we can use PCS we need to set the password for user “hacluster” on both nodes:

[root@HOST-1 ~]#passwd hacluster
Changingpasswordfor user hacluster.
Newpassword:
Retypenewpassword:
passwd: all authentication tokens updated successfully.

[root@HOST-2 ~]#passwd hacluster
Changingpasswordfor user hacluster.
Newpassword:
Retypenewpassword:
passwd: all authentication tokens updated successfully.

Now start the pcsd service on both nodes:

[root@HOST-1 ~]#systemctl start pcsd
[root@HOST-2 ~]#systemctl start pcsd

Now using the user “hacluser” and its password we need to authenticate the nodes for PCS cluster.

[root@HOST-1 ~]# sudo pcs cluster auth HOST-1 HOST-2
Username: hacluster
Password:
node01: Authorized
node02: Authorized

Now once we have successfully authenticated both nodes with PCS cluster we can manage the cluster configuration for any of one node, and not required to repeat the commands in all nodes. in this example we will use HOST-1.

Creating the cluster and adding nodes into it:

We will add both nodes to a cluster named web_cluster:

[root@HOST-1 ~]# pcs cluster setup --name web_cluster HOST-1 HOST-2
Destroying cluster on nodes: HOST-1, HOST-2...
HOST-1: Stopping Cluster (pacemaker)...
HOST-2: Stopping Cluster (pacemaker)...
HOST-1: Successfully destroyed cluster
HOST-2: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'HOST-1', 'HOST-2'
HOST-2: successful distribution of the file 'pacemaker_remote authkey'
HOST-1: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
HOST-1: Succeeded
HOST-2: Succeeded

Synchronizing pcsd certificates on nodes HOST-1, HOST-2...
HOST-2: Success
HOST-1: Success
Restarting pcsd on the nodes in order to reload the certificates...
HOST-2: Success
HOST-1: Success

Above command will add the cluster configuration under /etc/corosync/corosync.conf file in both nodes.

[root@HOST-2 ~]# cat /etc/corosync/corosync.conf
totem {
version: 2
cluster_name: web_cluster
secauth: off
transport: udpu
}

nodelist {
node {
ring0_addr: HOST-1
nodeid: 1
}

node {
ring0_addr: HOST-2
nodeid: 2
}
}

quorum {
provider: corosync_votequorum
two_node: 1
}

logging {
to_logfile: yes
logfile: /var/log/cluster/corosync.log
to_syslog: yes
}

Now we can start the cluster

[root@HOST-1 ~]# pcs cluster start --all
HOST-1: Starting Cluster (corosync)...
HOST-2: Starting Cluster (corosync)...
HOST-2: Starting Cluster (pacemaker)...
HOST-1: Starting Cluster (pacemaker)...

[root@HOST-1 ~]# pcs status cluster
Cluster Status:
Stack: corosync
Current DC: HOST-2 (version 1.1.20-5.el7_7.2-3c4c782f70) - partition with quorum
Last updated: Mon Jan 6 03:26:01 2020
Last change: Mon Jan 6 02:03:28 2020 by root via cibadmin on HOST-1
2 nodes configured
2 resources configured

PCSD Status:
HOST-1: Online
HOST-2: Online

[root@HOST-1 ~]# pcs status corosync
Membership information
----------------------
Nodeid Votes Name
1 1 HOST-1 (local)
2 1 HOST-2


[root@HOST-1 ~]# pcs status nodes
Pacemaker Nodes:
Online: HOST-1 HOST-2

Checking any configuration errors:

[root@HOST-1 ~]# crm_verify -L -V
error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid

The above error is pop-upped because the STONITH is not configured in our setup which is required to avoid any split brain situation.

So will disable STONITH and quorum as we do have only two nodes and to enable it we at least needed 3 nodes (odd number).

[root@HOST-1 ~]# pcs property set stonith-enabled=false
[root@HOST-1 ~]# pcs property set no-quorum-policy=ignore

[root@HOST-1 ~]# pcs property
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: web_cluster
dc-version: 1.1.20-5.el7_7.2-3c4c782f70
have-watchdog: false
no-quorum-policy: ignore
stonith-enabled: false

Next we will add a Virtual IP to our cluster that will be an Interface for our Nodes inside the cluster. The Virtual IP will be assigned to any of one node in the cluster, Later we will configure HAProxy with this VIP.

[root@HOST-1 ~]# sudo pcs resource create virtual_ip ocf:heartbeat:IPaddr2 ip=10.10.10.8 cidr_netmask=24 op monitor interval=30s

[root@HOST-1 ~]# pcs status resources
virtual_ip (ocf::heartbeat:IPaddr2): Started HOST-1

[root@HOST-1 ~]# ip a
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:86:18:7f brd ff:ff:ff:ff:ff:ff
inet 10.10.10.6/24 brd 10.10.10.255 scope global noprefixroute enp0s8
valid_lft forever preferred_lft forever
inet 10.10.10.8/24 brd 10.10.10.255 scope global secondary enp0s8
valid_lft forever preferred_lft forever

Virtual IP got assigned to the HOST-1 which we can also check by “ip a” command.

Now the Virtual IP configuration is completed so we can go for HAProxy configuration. HAProxy package we have already installed so will will create a cluster resource for HAProxy hosts:

[root@HOST-1 heartbeat]# pcs resource create haproxy ocf:heartbeat:haproxy binpath=/usr/sbin/haproxy conffile=/etc/haproxy/haproxy.cfg op monitor interval=10s

We need also to ensure that HAProxy and the Virtual IP are running on just one node at a time. To do this we need to group the two resources.

[root@HOST-1 heartbeat]# pcs resource group add HAproxyGroup virtual_ip haproxy
[root@HOST-1 heartbeat]# pcs constraint order virtual_ip then haproxy
Adding virtual_ip haproxy (kind: Mandatory) (Options: first-action=start then-action=start)

That’s all.. We have now successfully configured HAProxy service and Virtual IP together and both services are running on HOST-1.

[root@HOST-1 heartbeat]# pcs status
Cluster name: web_cluster
Stack: corosync
Current DC: HOST-2 (version 1.1.20-5.el7_7.2-3c4c782f70) - partition with quorum
Last updated: Mon Jan 6 03:55:50 2020
Last change: Mon Jan 6 02:03:28 2020 by root via cibadmin on HOST-1

2 nodes configured
2 resources configured

Online: [ HOST-1 HOST-2 ]

Full list of resources:

Resource Group: HAproxyGroup
virtual_ip (ocf::heartbeat:IPaddr2): Started HOST-2
haproxy (ocf::heartbeat:haproxy): Started HOST-2

Testing with Httpd:

Now we will test the redundancy of HAproxy using Httpd. for that we have create individual index file under /var/www/html to check the Host Name.

[root@HOST-1 heartbeat]# cat /var/www/html/index.html
HOST-1

Next will have to define the cluster in HAProxy config file so that we can access the nodes in a Load Balancing fashion. we will user roundrobin method here in our example.

edit the /etc/haproxy/haproxy.cfg file in both Nodes and below lined at the end of the file.

listen web_server 10.10.10.8:81
mode http
balance roundrobin
server host1 10.10.10.6:80 check
server host2 10.10.10.7:80 check

Note: you can not bind the port 80 with Virtual IP as it is already used by httpd service so use any other port.

That’s it we have now finished all configuration now when you will try to access the http service using the Virtual IP the request will be delivered in round-robin fashion to both nodes.

Check PCS Resource:

[root@HOST-1 ~]# pcs resource standards
lsb
ocf
service
systemd

Resources

Clustered services consist of one or more resources. A resource can be an IP address, a file system, or a service like httpd, among others. All the bits required to provide a service to consumers are a resource.

Agent	Purpose
LSB	Linux Standard Base-compatible init scripts residing in /etc/ init.d/.
OCF	Open Cluster Framework-compatible scripts that are extended LSB init scripts that can process additional input parameters to provide additional control over the cluster resource.
Systemd	Systemd unit files, which are the standard for defining and managing services on Red Hat Enterprise Linux 7.

Check Resource Providers:

[root@HOST-1 ~]# pcs resource providers
heartbeat
openstack
pacemaker
rabbitmq

Check Resource Agents:

[root@HOST-1 ~]#pcs resource agents ocf:heartbeat
aliyun-vpc-move-ip
apache
aws-vpc-move-ip
awseip
awsvip
azure-lb
clvm
conntrackd
CTDB
db2
Delay
dhcpd
docker
Dummy
ethmonitor
exportfs
Filesystem
galera
garbd
haproxy
iface-vlan
IPaddr
IPaddr2
IPsrcaddr
iSCSILogicalUnit
iSCSITarget
LVM
LVM-activate
lvmlockd
MailTo
mysql
nagios
named
nfsnotify
nfsserver
nginx
NodeUtilization
oraasm
oracle
oralsnr
pgsql
portblock
postfix
rabbitmq-cluster
redis
Route
rsyncd
SendArp
slapd
Squid
sybaseASE
symlink
tomcat
vdo-vol
VirtualDomain
Xinetd
[root@HOST-1 ~]#

Commonly used resources

The following table has a list of some commonly used resources:

Agent	Purpose
Filesystem	Used for mounting a file system. This can be a local file system, a file system on a iSCSI or Fibre Channel device, or a remote file system like a NFS export or an SMB share.
IPaddr2	This resource is used to assign a floating IP address to a resource group. There is also a separate IPaddr resource that uses an older method of assigning a IP address. On Red Hat Enterprise Linux 7 IPaddr is a symbolic link to IPaddr2.
apache	This resource starts an apache httpd service. Unless otherwise configured this will use the configuration from /etc/httpd.
mysql	This resource controls a mysql database. Databases can be configured for standalone operation, a clone set with external replication, or as a full master/slave setup.

# pcs resource describe apache

Источник

Since our ultimate goal is high availability, we should test failover of our new resource before moving on.

First, find the node on which the IP address is running.

[root@pcmk-1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: pcmk-2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Mon Sep 10 16:55:26 2018
Last change: Mon Sep 10 16:53:42 2018 by root via cibadmin on pcmk-1

2 nodes configured
1 resource configured

Online: [ pcmk-1 pcmk-2 ]

Full list of resources:

 ClusterIP      (ocf::heartbeat:IPaddr2):       Started pcmk-1

You can see that the status of the ClusterIP resource is Started on a particular node (in this example, pcmk-1). Shut down Pacemaker and Corosync on that machine to trigger a failover.

[root@pcmk-1 ~]# pcs cluster stop pcmk-1
Stopping Cluster (pacemaker)...
Stopping Cluster (corosync)...

A cluster command such as pcs cluster stop nodename can be run from any node in the cluster, not just the affected node.

Verify that pacemaker and corosync are no longer running:

[root@pcmk-1 ~]# pcs status
Error: cluster is not currently running on this node

Go to the other node, and check the cluster status.

[root@pcmk-2 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: pcmk-2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Mon Sep 10 16:57:22 2018
Last change: Mon Sep 10 16:53:42 2018 by root via cibadmin on pcmk-1

2 nodes configured
1 resource configured

Online: [ pcmk-2 ]
OFFLINE: [ pcmk-1 ]

Full list of resources:

 ClusterIP      (ocf::heartbeat:IPaddr2):       Started pcmk-2

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

Notice that pcmk-1 is OFFLINE for cluster purposes (its pcsd is still active, allowing it to receive pcs commands, but it is not participating in the cluster).

Also notice that ClusterIP is now running on pcmk-2 — failover happened automatically, and no errors are reported.

If a cluster splits into two (or more) groups of nodes that can no longer communicate with each other (aka. partitions), quorum is used to prevent resources from starting on more nodes than desired, which would risk data corruption.

A cluster has quorum when more than half of all known nodes are online in the same partition, or for the mathematically inclined, whenever the following equation is true:

total_nodes < 2 * active_nodes

For example, if a 5-node cluster split into 3- and 2-node paritions, the 3-node partition would have quorum and could continue serving resources. If a 6-node cluster split into two 3-node partitions, neither partition would have quorum; pacemaker’s default behavior in such cases is to stop all resources, in order to prevent data corruption.

Two-node clusters are a special case. By the above definition, a two-node cluster would only have quorum when both nodes are running. This would make the creation of a two-node cluster pointless, but corosync has the ability to treat two-node clusters as if only one node is required for quorum.

The pcs cluster setup command will automatically configure two_node: 1 in corosync.conf, so a two-node cluster will «just work».

If you are using a different cluster shell, you will have to configure corosync.conf appropriately yourself.

Now, simulate node recovery by restarting the cluster stack on pcmk-1, and check the cluster’s status. (It may take a little while before the cluster gets going on the node, but it eventually will look like the below.)

[root@pcmk-1 ~]# pcs cluster start pcmk-1
pcmk-1: Starting Cluster...
[root@pcmk-1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: pcmk-2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Mon Sep 10 17:00:04 2018
Last change: Mon Sep 10 16:53:42 2018 by root via cibadmin on pcmk-1

2 nodes configured
1 resource configured

Online: [ pcmk-1 pcmk-2 ]

Full list of resources:

 ClusterIP      (ocf::heartbeat:IPaddr2):       Started pcmk-2

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

Источник

Since this is the second part of my previous article where I shared the steps to configure OpenStack HA cluster using pacemaker and corosync. In this article I will share the steps to configure HAProxy in Openstack and move our keystone endpoints to load balancer using Virtual IP.

Configure HAProxy in Openstack

To configure HAProxy in OpenStack, we will be using HAProxy to load-balance our control plane services in this lab deployment. Some deployments may also implement Keepalived and run HAProxy in an Active/Active configuration. For this deployment, we will run HAProxy Active/Passive and manage it as a resource along with our VIP in Pacemaker.

To start, install HAProxy on both nodes using the following command:

[root@controller1 ~]# yum install -y haproxy
[root@controller2 ~]# yum install -y haproxy

Verify installation with the following command:

[root@controller1 ~]# rpm -q haproxy
haproxy-1.5.18-7.el7.x86_64

[root@controller2 ~]# rpm -q haproxy
haproxy-1.5.18-7.el7.x86_64

Next, we will create a configuration file for HAProxy which load-balances the API services installed on the two controllers. Use the following example as a template, replacing the IP addresses in the example with the IP addresses of the two controllers and the IP address of the VIP that you’ll be using to load-balance the API services.

NOTE:

The IP Address which you plan to use for VIP must be free.

Take a backup of the existing config file on both the controller nodes

[root@controller1 ~]# mv /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.bkp
[root@controller2 ~]# mv /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.bkp

The following example /etc/haproxy/haproxy.cfg, will load-balance Horizon in our environment:

[root@controller1 haproxy]# cat haproxy.cfg
global
  daemon
  group  haproxy
  maxconn  40000
  pidfile  /var/run/haproxy.pid
  user  haproxy

defaults
  log  127.0.0.1 local2 warning
  mode  tcp
  option  tcplog
  option  redispatch
  retries  3
  timeout  connect 10s
  timeout  client 60s
  timeout  server 60s
  timeout  check 10s

listen horizon
  bind 192.168.122.30:80
  mode http
  cookie SERVERID insert indirect nocache
  option tcplog
  timeout client 180s
  server controller1 192.168.122.20:80 cookie controller1 check inter 1s
  server controller2 192.168.122.22:80 cookie controller2 check inter 1s

In this example, controller1 has an IP address of 192.168.122.20 and controller2 has an IP address of 192.168.122.22. The VIP that we’ve chosen to use is 192.168.122.30. Copy this file, replacing the IP addresses with the addresses in your lab, to /etc/haproxy/haproxy.cfg on each of the controllers.

ALSO READ: How to add or remove label from node in Kubernetes

To configure HAProxy in OpenStack we must copy this haproxy.cfg file to the second controller

[root@controller1 ~]# scp /etc/haproxy/haproxy.cfg controller2:/etc/haproxy/haproxy.cfg

In order for Horizon to respond to requests on the VIP, we’ll need to add the VIP as a ServerAlias in the Apache virtual host configuration. This is found at /etc/httpd/conf.d/15-horizon_vhost.conf in our lab installation. Look for the following line on controller1:

ServerAlias 192.168.122.20

and below line on controller2

ServerAlias 192.168.122.22

Add an additional ServerAlias line with the VIP on both controllers:

ServerAlias 192.168.122.30

You’ll also need to tell Apache not to listen on the VIP so that HAProxy can bind to the address. To do this, modify /etc/httpd/conf/ports.conf and specify the IP address of the controller in addition to the port numbers. The following is an example:

[root@controller1 ~]# cat /etc/httpd/conf/ports.conf
# ************************************
# Listen & NameVirtualHost resources in module puppetlabs-apache
# Managed by Puppet
# ************************************

Listen 0.0.0.0:8778
#Listen 35357
#Listen 5000
#Listen 80
Listen 8041
Listen 8042
Listen 8777
Listen 192.168.122.20:35357
Listen 192.168.122.20:5000
Listen 192.168.122.20:80
Here 192.168.122.20 is the IP of controller1

On controller2 repeat the same with the IP of the respective controller node

[root@controller2 ~(keystone_admin)]# cat /etc/httpd/conf/ports.conf
# ************************************
# Listen & NameVirtualHost resources in module puppetlabs-apache
# Managed by Puppet
# ************************************

Listen 0.0.0.0:8778
#Listen 35357
#Listen 5000
#Listen 80
Listen 8041
Listen 8042
Listen 8777
Listen 192.168.122.22:35357
Listen 192.168.122.22:5000
Listen 192.168.122.22:80

Restart Apache to pick up the new alias:

[root@controller1 ~]# systemctl restart httpd
[root@controller2 ~]# systemctl restart httpd

Next, add the VIP and the HAProxy service to the Pacemaker cluster as resources. These commands should only be run on the first controller node. This tells Pacemaker three things about the resource you want to add:

The first field (ocf in this case) is the standard to which the resource script conforms and where to find it.
The second field (heartbeat in this case) is standard-specific; for OCF resources, it tells the cluster which OCF namespace the resource script is in.
The third field (IPaddr2 in this case) is the name of the resource script.

ALSO READ: How to connect virtual machine to internet connection in VMware/VirtualBox

[root@controller1 ~]# pcs resource create VirtualIP IPaddr2 ip=192.168.122.30 cidr_netmask=24
Assumed agent name 'ocf:heartbeat:IPaddr2' (deduced from 'IPaddr2')

[root@controller1 ~]# pcs resource create HAProxy systemd:haproxy

Co-locate the HAProxy service with the VirtualIP to ensure that the two run together:

[root@controller1 ~]# pcs constraint colocation add VirtualIP with HAProxy score=INFINITY

Verify that the resources have been started on both the controllers:

[root@controller1 ~]# pcs status
Cluster name: openstack
Stack: corosync
Current DC: controller2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Tue Oct 16 12:44:27 2018
Last change: Tue Oct 16 12:44:23 2018 by root via cibadmin on controller1

2 nodes configured
2 resources configured

Online: [ controller1 controller2 ]

Full list of resources:

 VirtualIP      (ocf::heartbeat:IPaddr2):       Started controller1
 HAProxy        (systemd:haproxy):      Started controller1

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

At this point, you should be able to access Horizon using the VIP you specified. Traffic will flow from your client to HAProxy on the VIP to Apache on one of the two nodes.

Additional API service configuration

Now here configure HAProxy in Openstack is complete, the final configuration step is to move each of the OpenStack API endpoints behind the load balancer. There are three steps in this process, which are as follows:

Update the HAProxy configuration to include the service.
Move the endpoint in the Keystone service catalog to the VIP.
Reconfigure services to point to the VIP instead of the IP of the first controller.

ALSO READ: Beginners guide to how LVM works in Linux (architecture)

In the following example, we will move the Keystone service behind the load balancer. This process can be followed for each of the API services.

First, add a section to the HAProxy configuration file for the authorization and admin endpoints of Keystone. So we are adding below template to our existing haproxy.cfg file on both the controllers

[root@controller1 ~]# vim /etc/haproxy/haproxy.cfg
listen keystone-admin
  bind 192.168.122.30:35357
  mode tcp
  option tcplog
  server controller1 192.168.122.20:35357 check inter 1s
  server controller2 192.168.122.22:35357 check inter 1s

listen keystone-public
  bind 192.168.122.30:5000
  mode tcp
  option tcplog
  server controller1 192.168.122.20:5000 check inter 1s
  server controller2 192.168.122.22:5000 check inter 1s

Restart the haproxy service on the active node:

[root@controller1 ~]# systemctl restart haproxy.service

You can determine the active node with the output from pcs status. Check to make sure that HAProxy is now listening on ports 5000 and 35357 using the following commands on both the controllers:

[root@controller1 ~]# curl http://192.168.122.30:5000
{"versions": {"values": [{"status": "stable", "updated": "2018-02-28T00:00:00Z", "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v3+json"}], "id": "v3.10", "links": [{"href": "http://192.168.122.30:5000/v3/", "rel": "self"}]}, {"status": "deprecated", "updated": "2016-08-04T00:00:00Z", "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v2.0+json"}], "id": "v2.0", "links": [{"href": "http://192.168.122.30:5000/v2.0/", "rel": "self"}, {"href": "htt

[root@controller1 ~]# curl http://192.168.122.30:5000/v3
{"version": {"status": "stable", "updated": "2018-02-28T00:00:00Z", "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v3+json"}], "id": "v3.10", "links": [{"href": "http://192.168.122.30:5000/v3/", "rel": "self"}]}}

[root@controller1 ~]# curl http://192.168.122.30:35357/v3
{"version": {"status": "stable", "updated": "2018-02-28T00:00:00Z", "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v3+json"}], "id": "v3.10", "links": [{"href": "http://192.168.122.30:35357/v3/", "rel": "self"}]}}

[root@controller1 ~]# curl http://192.168.122.30:35357
{"versions": {"values": [{"status": "stable", "updated": "2018-02-28T00:00:00Z", "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v3+json"}], "id": "v3.10", "links": [{"href": "http://192.168.122.30:35357/v3/", "rel": "self"}]}, {"status": "deprecated", "updated": "2016-08-04T00:00:00Z", "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v2.0+json"}], "id": "v2.0", "links": [{"href": "http://192.168.122.30:35357/v2.0/", "rel": "self"}, {"href": "https://docs.openstack.org/", "type": "text/html", "rel": "describedby"}]}]}}

All the above commands should output some JSON describing the status of the Keystone service. So all the respective ports are in listening state

ALSO READ: Steps to install Kubernetes Cluster with minikube

Next, update the endpoint for the identity service in the Keystone service catalogue by creating a new endpoint and deleting the old one. So you can source your existing keystonerc_admin file

[root@controller1 ~(keystone_admin)]# source keystonerc_admin

Below is the content from my keystonerc_admin

[root@controller1 ~(keystone_admin)]# cat keystonerc_admin
unset OS_SERVICE_TOKEN
    export OS_USERNAME=admin
    export OS_PASSWORD='redhat'
    export OS_AUTH_URL=http://192.168.122.20:5000/v3
    export PS1='[u@h W(keystone_admin)]$ '

export OS_PROJECT_NAME=admin
export OS_USER_DOMAIN_NAME=Default
export OS_PROJECT_DOMAIN_NAME=Default
export OS_IDENTITY_API_VERSION=3

As you see currently the OS_AUTH_URL reflects to the existing endpoint for the controller. We will update this in a while.

Get the list if current keystone endpoints on your active controller

[root@controller1 ~(keystone_admin)]# openstack endpoint list | grep keystone
| 3ded2a2faffe4fd485f6c3c58b1990d6 | RegionOne | keystone     | identity     | True    | internal  | http://192.168.122.20:5000/v3                 |
| b0f5b7887cd346b3aec747e5b9fafcd3 | RegionOne | keystone     | identity     | True    | admin     | http://192.168.122.20:35357/v3                |
| c1380d643f734cc1b585048b2e7a7d47 | RegionOne | keystone     | identity     | True    | public    | http://192.168.122.20:5000/v3                 |

Now since we want to move the endpoint in the keystone service to VIP, we will create new endpoints with the VIP url as below for admin, public and internal

[root@controller1 ~(keystone_admin)]# openstack endpoint create --region RegionOne identity public http://192.168.122.30:5000/v3
+--------------+----------------------------------+
| Field        | Value                            |
+--------------+----------------------------------+
| enabled      | True                             |
| id           | 08a26ace08884b85a0ff869ddb20bea3 |
| interface    | public                           |
| region       | RegionOne                        |
| region_id    | RegionOne                        |
| service_id   | 555154c5facf4e96a8677362c62b2ac9 |
| service_name | keystone                         |
| service_type | identity                         |
| url          | http://192.168.122.30:5000/v3    |
+--------------+----------------------------------+

[root@controller1 ~(keystone_admin)]# openstack endpoint create --region RegionOne identity admin http://192.168.122.30:35357/v3
+--------------+----------------------------------+
| Field        | Value                            |
+--------------+----------------------------------+
| enabled      | True                             |
| id           | ef210afef1da4558abdc00cc13b75185 |
| interface    | admin                            |
| region       | RegionOne                        |
| region_id    | RegionOne                        |
| service_id   | 555154c5facf4e96a8677362c62b2ac9 |
| service_name | keystone                         |
| service_type | identity                         |
| url          | http://192.168.122.30:35357/v3   |
+--------------+----------------------------------+

[root@controller1 ~(keystone_admin)]# openstack endpoint create --region RegionOne identity internal http://192.168.122.30:5000/v3
+--------------+----------------------------------+
| Field        | Value                            |
+--------------+----------------------------------+
| enabled      | True                             |
| id           | 5205be865e2a4cb9b4ab2119b93c7461 |
| interface    | internal                         |
| region       | RegionOne                        |
| region_id    | RegionOne                        |
| service_id   | 555154c5facf4e96a8677362c62b2ac9 |
| service_name | keystone                         |
| service_type | identity                         |
| url          | http://192.168.122.30:5000/v3    |
+--------------+----------------------------------+

Last, update the auth_uri, auth_url and identity_uri parameters in each of the OpenStack services to point to the new IP address. The following configuration files will need to be edited:

/etc/ceilometer/ceilometer.conf
/etc/cinder/api-paste.ini
/etc/glance/glance-api.conf
/etc/glance/glance-registry.conf
/etc/neutron/neutron.conf
/etc/neutron/api-paste.ini
/etc/nova/nova.conf
/etc/swift/proxy-server.conf

Next install openstack-utils to get the openstack tools which can help us restart all the services at once rather than manually restarting all the openstack related services

[root@controller1 ~(keystone_admin)]# yum -y install openstack-utils

After editing each of the files, restart the OpenStack services on all of the nodes in the lab deployment using the following command:

[root@controller1 ~(keystone_admin)]# openstack-service restart

Next update your keystonerc_admin file to point to the new OS_AUTH_URL with the VIP i.e. 192.168.122.30:5000/v3 as shown below

[root@controller1 ~(keystone_admin)]# cat keystonerc_admin
unset OS_SERVICE_TOKEN
    export OS_USERNAME=admin
    export OS_PASSWORD='redhat'
    export OS_AUTH_URL=http://192.168.122.30:5000/v3
    export PS1='[u@h W(keystone_admin)]$ '

export OS_PROJECT_NAME=admin
export OS_USER_DOMAIN_NAME=Default
export OS_PROJECT_DOMAIN_NAME=Default
export OS_IDENTITY_API_VERSION=3

Now re-source the updated keystonerc_admin file

[root@controller1 ~(keystone_admin)]# source keystonerc_admin

Validate the new changes if the OS_AUTH_URL is pointing to the new VIP

[root@controller1 ~(keystone_admin)]# echo $OS_AUTH_URL
http://192.168.122.30:5000/v3

Once the openstack services are restrated, delete the old endpoints for keystone service

[root@controller1 ~(keystone_admin)]# openstack endpoint delete b0f5b7887cd346b3aec747e5b9fafcd3
[root@controller1 ~(keystone_admin)]# openstack endpoint delete c1380d643f734cc1b585048b2e7a7d47

NOTE:

You may get below error while attempting to delete the old endpoints, these are most likely because the keystone database is still not properly refreshed so perform another round of «openstact-service restart» and then re-attempt to delete the endpoint

[root@controller1 ~(keystone_admin)]# openstack endpoint delete 3ded2a2faffe4fd485f6c3c58b1990d6
Failed to delete endpoint with ID '3ded2a2faffe4fd485f6c3c58b1990d6': More than one endpoint exists with the name '3ded2a2faffe4fd485f6c3c58b1990d6'.
1 of 1 endpoints failed to delete.

[root@controller1 ~(keystone_admin)]# openstack endpoint list | grep 3ded2a2faffe4fd485f6c3c58b1990d6
| 3ded2a2faffe4fd485f6c3c58b1990d6 | RegionOne | keystone     | identity     | True    | internal  | http://192.168.122.20:5000/v3                 |

[root@controller1 ~(keystone_admin)]# openstack-service restart

[root@controller1 ~(keystone_admin)]# openstack endpoint delete 3ded2a2faffe4fd485f6c3c58b1990d6

Repeat the same set of steps of controller2

ALSO READ: Reasons to migrate your app to AWS

After deleting the old endpoints and creating the new ones, below is the updated list of keystone endpoints on controller2

[root@controller2 ~(keystone_admin)]# openstack endpoint list | grep keystone
| 07fca3f48dba47cdbf6528909bd2a8e3 | RegionOne | keystone     | identity     | True    | public    | http://192.168.122.30:5000/v3                 |
| 37db43efa2934ce3ab93ea19df8adcc7 | RegionOne | keystone     | identity     | True    | internal  | http://192.168.122.30:5000/v3                 |
| e9da6923b7ff418ab7e30ef65af5c152 | RegionOne | keystone     | identity     | True    | admin     | http://192.168.122.30:35357/v3                |

The OpenStack services will now be using the Keystone API endpoint provided by the VIP and the service will be highly available.

Perform a Cluster Failover

Since our ultimate goal is high availability, we should test failover of our new resource.

Before performing a failover let us make sure our cluster is UP and running properly

[root@controller2 ~(keystone_admin)]# pcs status
Cluster name: openstack
Stack: corosync
Current DC: controller1 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Tue Oct 16 14:54:45 2018
Last change: Tue Oct 16 12:44:23 2018 by root via cibadmin on controller1

2 nodes configured
2 resources configured

Online: [ controller1 controller2 ]

Full list of resources:

 VirtualIP      (ocf::heartbeat:IPaddr2):       Started controller1
 HAProxy        (systemd:haproxy):      Started controller1

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

As we see both our controller are online so let us stop the second controller

[root@controller2 ~(keystone_admin)]# pcs cluster stop controller2
Stopping Cluster (pacemaker)...
Stopping Cluster (corosync)...

Now let us try to check the pacemaker status from controller2

[root@controller2 ~(keystone_admin)]# pcs status
Error: cluster is not currently running on this node

Since cluster service is not running on controller2 we cannot check the status. So let us get the status from controller1

[root@controller1 ~(keystone_admin)]# pcs status
Cluster name: openstack
Stack: corosync
Current DC: controller1 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Tue Oct 16 13:21:32 2018
Last change: Tue Oct 16 12:44:23 2018 by root via cibadmin on controller1

2 nodes configured
2 resources configured

Online: [ controller1 ]
OFFLINE: [ controller2 ]

Full list of resources:

 VirtualIP      (ocf::heartbeat:IPaddr2):       Started controller1
 HAProxy        (systemd:haproxy):      Started controller1

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

As expected it shows controller2 is offline. So now let us check if our endpoint from keystone is readable

[root@controller2 ~(keystone_admin)]# openstack endpoint list
+----------------------------------+-----------+--------------+--------------+---------+-----------+-----------------------------------------------+
| ID                               | Region    | Service Name | Service Type | Enabled | Interface | URL                                           |
+----------------------------------+-----------+--------------+--------------+---------+-----------+-----------------------------------------------+
| 06473a06f4a04edc94314a97b29d5395 | RegionOne | cinderv3     | volumev3     | True    | internal  | http://192.168.122.20:8776/v3/%(tenant_id)s   |
| 07ad2939b59b4f4892d6a470a25daaf9 | RegionOne | aodh         | alarming     | True    | public    | http://192.168.122.20:8042                    |
| 07fca3f48dba47cdbf6528909bd2a8e3 | RegionOne | keystone     | identity     | True    | public    | http://192.168.122.30:5000/v3                 |
| 0856cd4b276f490ca48c772af2be49a3 | RegionOne | gnocchi      | metric       | True    | internal  | http://192.168.122.20:8041                    |
| 08ff114d526e4917b5849c0080cfa8f2 | RegionOne | aodh         | alarming     | True    | admin     | http://192.168.122.20:8042                    |
| 1e6cf514c885436fb14ffec0d55286c6 | RegionOne | aodh         | alarming     | True    | internal  | http://192.168.122.20:8042                    |
| 20178fdd0a064b5fa91b869ab492d2d1 | RegionOne | cinderv2     | volumev2     | True    | internal  | http://192.168.122.20:8776/v2/%(tenant_id)s   |
| 3524908122a44d7f855fd09dd2859d4e | RegionOne | nova         | compute      | True    | public    | http://192.168.122.20:8774/v2.1/%(tenant_id)s |
| 37db43efa2934ce3ab93ea19df8adcc7 | RegionOne | keystone     | identity     | True    | internal  | http://192.168.122.30:5000/v3                 |
| 3a896bde051f4ae4bfa3694a1eb05321 | RegionOne | cinderv2     | volumev2     | True    | admin     | http://192.168.122.20:8776/v2/%(tenant_id)s   |
| 3ef1f30aab8646bc96c274a116120e66 | RegionOne | nova         | compute      | True    | admin     | http://192.168.122.20:8774/v2.1/%(tenant_id)s |
| 42a690ef05aa42adbf9ac21056a9d4f3 | RegionOne | nova         | compute      | True    | internal  | http://192.168.122.20:8774/v2.1/%(tenant_id)s |
| 45fea850b0b34f7ca2443da17e82ca13 | RegionOne | glance       | image        | True    | admin     | http://192.168.122.20:9292                    |
| 46cbd1e0a79545dfac83eeb429e24a6c | RegionOne | cinderv2     | volumev2     | True    | public    | http://192.168.122.20:8776/v2/%(tenant_id)s   |
| 49f82b77105e4614b7cf57fe1785bdc3 | RegionOne | cinder       | volume       | True    | internal  | http://192.168.122.20:8776/v1/%(tenant_id)s   |
| 4aced9a3c17741608b2491a8a8fb7503 | RegionOne | cinder       | volume       | True    | public    | http://192.168.122.20:8776/v1/%(tenant_id)s   |
| 63eeaa5246f54c289881ade0686dc9bb | RegionOne | ceilometer   | metering     | True    | admin     | http://192.168.122.20:8777                    |
| 6e2fd583487846e6aab7cac4c001064c | RegionOne | gnocchi      | metric       | True    | public    | http://192.168.122.20:8041                    |
| 79f2fcdff7d740549846a9328f8aa993 | RegionOne | cinderv3     | volumev3     | True    | public    | http://192.168.122.20:8776/v3/%(tenant_id)s   |
| 9730a44676b042e1a9f087137ea52d04 | RegionOne | glance       | image        | True    | public    | http://192.168.122.20:9292                    |
| a028329f053841dfb115e93c7740d65c | RegionOne | neutron      | network      | True    | internal  | http://192.168.122.20:9696                    |
| acc7ff6d8f1941318ab4f456cac5e316 | RegionOne | placement    | placement    | True    | public    | http://192.168.122.20:8778/placement          |
| afecd931e6dc42e8aa1abdba44fec622 | RegionOne | glance       | image        | True    | internal  | http://192.168.122.20:9292                    |
| c08c1cfb0f524944abba81c42e606678 | RegionOne | placement    | placement    | True    | admin     | http://192.168.122.20:8778/placement          |
| c0c0c4e8265e4592942bcfa409068721 | RegionOne | placement    | placement    | True    | internal  | http://192.168.122.20:8778/placement          |
| d9f34d36bd2541b98caa0d6ab74ba336 | RegionOne | cinder       | volume       | True    | admin     | http://192.168.122.20:8776/v1/%(tenant_id)s   |
| e051cee0d06e45d48498b0af24eb08b5 | RegionOne | ceilometer   | metering     | True    | public    | http://192.168.122.20:8777                    |
| e9da6923b7ff418ab7e30ef65af5c152 | RegionOne | keystone     | identity     | True    | admin     | http://192.168.122.30:35357/v3                |
| ea6f1493aa134b6f9822eca447dfd1df | RegionOne | neutron      | network      | True    | admin     | http://192.168.122.20:9696                    |
| ed97856952bb4a3f953ff467d61e9c6a | RegionOne | gnocchi      | metric       | True    | admin     | http://192.168.122.20:8041                    |
| f989d76263364f07becb638fdb5fea6c | RegionOne | neutron      | network      | True    | public    | http://192.168.122.20:9696                    |
| fe32d323287c4a0cb221faafb35141f8 | RegionOne | ceilometer   | metering     | True    | internal  | http://192.168.122.20:8777                    |
| fef852af4f0d4f0cacd4620e5d5245c2 | RegionOne | cinderv3     | volumev3     | True    | admin     | http://192.168.122.20:8776/v3/%(tenant_id)s   |
+----------------------------------+-----------+--------------+--------------+---------+-----------+-----------------------------------------------+

yes we are still able to read the endpoint list for keystone so all looks fine..

ALSO READ: How to create or configure NIC Teaming using nmcli (CentOS / RHEL 7/8)

Let us again start our cluster configuration on controller2

[root@controller2 ~(keystone_admin)]# pcs cluster start
Starting Cluster...

And check the status

[root@controller2 ~(keystone_admin)]# pcs status
Cluster name: openstack
Stack: corosync
Current DC: controller1 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Tue Oct 16 13:23:17 2018
Last change: Tue Oct 16 12:44:23 2018 by root via cibadmin on controller1

2 nodes configured
2 resources configured

Online: [ controller1 controller2 ]

Full list of resources:

 VirtualIP      (ocf::heartbeat:IPaddr2):       Started controller1
 HAProxy        (systemd:haproxy):      Started controller1

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

So all is back to green and we were successfully able to configure HAProxy in Openstack.

Lastly I hope the steps from the article to configure HAProxy in Openstack (High Availability between controllers) was helpful. So, let me know your suggestions and feedback using the comment section.

Источник

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

Links

Notes

Quickstart

Commands/tools

Useful commands

save entire config

Dump entire crm

HOWTO

Groups

Add existing resource to group

Stop resource group

See if entire group is disabled

FAQ

Update resource

Current DC

Remove resource group + members

Move resource to node

Show default resource stickiness

Set resource stickiness

Undo resource move

pcs status: Error: cluster is not currently running on this node

show detailed resources

stop node (standby)

set maintenance mode

Error: cluster is not currently running on this node

Remove a constraint

Clear error messages

Call cib_replace failed (-205): Update was older than existing configuration

[Error signing on to the CIB service: Transport endpoint is not connected ]

Show allocation scores

Show resource failcount

export current configuration as commands

debug resource

*** Resource management is DISABLED *** The cluster will not attempt to start, stop or recover services

Found meta data is «unclean», please apply-al first

Troubleshooting

pcs status all resources stopped

Fencing and resource management disabled due to lack of quorum

Resource cannot run anywhere

pcs resource update unable to find resource

Difference between maintenance-mode and standby

drbdadm create-md test3 ‘test3’ not defined in your config (for this host).

corosync: active/disabled

ocf-exit-reason:Undefined iSCSI target implementation

moving RES away after 1000000 failures

7.9.2.1. Резервное копирование настроек кластера¶

7.9.2.1.1. Контроллеры OpenStack (УУ)¶

7.9.2.1.2. Вычислительные узлы (ВУ)¶

7.9.2.2. Восстановление ноды (после отказа)¶

7.9.2.2.1. Контроллер Openstack (УУ)¶

7.9.2.2.2. Вычислительный узел¶

7.9.2.3. Восстановление кластера Galera¶

7.9.2.4. Восстановление кластера PCS (при потере одной ноды)¶

7.9.2.5. Восстановление кластера RabbitMQ¶

Enable & Start the Services on both the Nodes:

Configure corosync & Create new cluster:

Verify Corosync configuration:

Verify Pacemaker Configuration:

Creating the cluster and adding nodes into it:

Testing with Httpd:

Resources

Check Resource Providers:

Commonly used resources

# pcs resource describe apache

Configure HAProxy in Openstack

Additional API service configuration

Perform a Cluster Failover

Читайте также:

* Resource management is DISABLED * The cluster will not attempt to start, stop or recover services