Task error need mds to add storage but none got active

The MDS is stuck on creating after making a cephfs. mds.cl-01 host cl-01 up:creating address 172.16.3.1:6803/blah https://pastebin.com/raw/4b9qnNGv
-- Logs begin at Sat 2019-08-31 17:13:20 PDT, end at Sat 2019-08-31 18:13:38 PDT. --
Aug 31 17:13:24 cl-01 systemd[1]: Started Ceph metadata server daemon.
Aug 31 17:13:24 cl-01 ceph-mds[1221]: starting mds.cl-01 at
Aug 31 17:14:59 cl-01 systemd[1]: Stopping Ceph metadata server daemon...
Aug 31 17:14:59 cl-01 ceph-mds[1221]: 2019-08-31 17:14:59.308 7feaabff0700 -1 received  signal: Terminated from /sbin/init nofb  (PID: 1) UID: 0
Aug 31 17:14:59 cl-01 ceph-mds[1221]: 2019-08-31 17:14:59.308 7feaabff0700 -1 mds.cl-01 *** got signal Terminated ***
Aug 31 17:15:00 cl-01 ceph-mds[1221]: /root/sources/pve/ceph/ceph-14.2.2/src/include/elist.h: In function 'elist<T>::~elist() [with T = MDSIOContextBase*]' thread 7feaaf241340 time 2019-08-31 17:15:00.966856
Aug 31 17:15:00 cl-01 ceph-mds[1221]: /root/sources/pve/ceph/ceph-14.2.2/src/include/elist.h: 91: FAILED ceph_assert(_head.empty())
Aug 31 17:15:00 cl-01 ceph-mds[1221]:  ceph version 14.2.2 (a887fe9a5d3d97fe349065d3c1c9dbd7b8870855) nautilus (stable)
Aug 31 17:15:00 cl-01 ceph-mds[1221]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x7feab0ae65c6]
Aug 31 17:15:00 cl-01 ceph-mds[1221]:  2: (()+0x28079e) [0x7feab0ae679e]
Aug 31 17:15:00 cl-01 ceph-mds[1221]:  3: (()+0x3dcfb3) [0x55d7907dffb3]
Aug 31 17:15:00 cl-01 ceph-mds[1221]:  4: (()+0x39d8c) [0x7feaaf776d8c]
Aug 31 17:15:00 cl-01 ceph-mds[1221]:  5: (()+0x39eba) [0x7feaaf776eba]
Aug 31 17:15:00 cl-01 ceph-mds[1221]:  6: (__libc_start_main()+0xf2) [0x7feaaf7610a2]
Aug 31 17:15:00 cl-01 ceph-mds[1221]:  7: (_start()+0x2a) [0x55d79053f8ca]
Aug 31 17:15:00 cl-01 ceph-mds[1221]: *** Caught signal (Segmentation fault) **
Aug 31 17:15:00 cl-01 ceph-mds[1221]:  in thread 7feaaf241340 thread_name:ceph-mds
Aug 31 17:15:00 cl-01 ceph-mds[1221]:  ceph version 14.2.2 (a887fe9a5d3d97fe349065d3c1c9dbd7b8870855) nautilus (stable)
Aug 31 17:15:00 cl-01 ceph-mds[1221]:  1: (()+0x12730) [0x7feaafeaf730]
Aug 31 17:15:00 cl-01 ceph-mds[1221]:  2: (__pthread_mutex_lock()+0) [0x7feaafea76c0]
Aug 31 17:15:00 cl-01 ceph-mds[1221]:  3: (ceph::logging::Log::submit_entry(ceph::logging::Entry&&)+0x41) [0x7feab0e590b1]
Aug 31 17:15:00 cl-01 ceph-mds[1221]:  4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x24c) [0x7feab0ae66c0]
Aug 31 17:15:00 cl-01 ceph-mds[1221]:  5: (()+0x28079e) [0x7feab0ae679e]
Aug 31 17:15:00 cl-01 ceph-mds[1221]:  6: (()+0x3dcfb3) [0x55d7907dffb3]
Aug 31 17:15:00 cl-01 ceph-mds[1221]:  7: (()+0x39d8c) [0x7feaaf776d8c]
Aug 31 17:15:00 cl-01 ceph-mds[1221]:  8: (()+0x39eba) [0x7feaaf776eba]
Aug 31 17:15:00 cl-01 ceph-mds[1221]:  9: (__libc_start_main()+0xf2) [0x7feaaf7610a2]
Aug 31 17:15:00 cl-01 ceph-mds[1221]:  10: (_start()+0x2a) [0x55d79053f8ca]
Aug 31 17:15:00 cl-01 systemd[1]: ceph-mds@cl-01.service: Main process exited, code=killed, status=11/SEGV
Aug 31 17:15:00 cl-01 systemd[1]: ceph-mds@cl-01.service: Failed with result 'signal'.
Aug 31 17:15:00 cl-01 systemd[1]: Stopped Ceph metadata server daemon.
Aug 31 17:16:18 cl-01 systemd[1]: Started Ceph metadata server daemon.
Aug 31 17:16:18 cl-01 ceph-mds[3972]: starting mds.cl-01 at
Aug 31 18:01:23 cl-01 systemd[1]: Stopping Ceph metadata server daemon...
Aug 31 18:01:23 cl-01 ceph-mds[3972]: 2019-08-31 18:01:23.847 7fdb81da6700 -1 received  signal: Terminated from /sbin/init nofb  (PID: 1) UID: 0
Aug 31 18:01:23 cl-01 ceph-mds[3972]: 2019-08-31 18:01:23.847 7fdb81da6700 -1 mds.cl-01 *** got signal Terminated ***
Aug 31 18:01:26 cl-01 ceph-mds[3972]: /root/sources/pve/ceph/ceph-14.2.2/src/include/elist.h: In function 'elist<T>::~elist() [with T = MDSIOContextBase*]' thread 7fdb84ff7340 time 2019-08-31 18:01:26.522080
Aug 31 18:01:26 cl-01 ceph-mds[3972]: /root/sources/pve/ceph/ceph-14.2.2/src/include/elist.h: 91: FAILED ceph_assert(_head.empty())
Aug 31 18:01:26 cl-01 ceph-mds[3972]:  ceph version 14.2.2 (a887fe9a5d3d97fe349065d3c1c9dbd7b8870855) nautilus (stable)
Aug 31 18:01:26 cl-01 ceph-mds[3972]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x7fdb8689c5c6]
Aug 31 18:01:26 cl-01 ceph-mds[3972]:  2: (()+0x28079e) [0x7fdb8689c79e]
Aug 31 18:01:26 cl-01 ceph-mds[3972]:  3: (()+0x3dcfb3) [0x557cfc66bfb3]
Aug 31 18:01:26 cl-01 ceph-mds[3972]:  4: (()+0x39d8c) [0x7fdb8552cd8c]
Aug 31 18:01:26 cl-01 ceph-mds[3972]:  5: (()+0x39eba) [0x7fdb8552ceba]
Aug 31 18:01:26 cl-01 ceph-mds[3972]:  6: (__libc_start_main()+0xf2) [0x7fdb855170a2]
Aug 31 18:01:26 cl-01 ceph-mds[3972]:  7: (_start()+0x2a) [0x557cfc3cb8ca]
Aug 31 18:01:26 cl-01 ceph-mds[3972]: *** Caught signal (Segmentation fault) **
Aug 31 18:01:26 cl-01 ceph-mds[3972]:  in thread 7fdb84ff7340 thread_name:ceph-mds
Aug 31 18:01:26 cl-01 ceph-mds[3972]:  ceph version 14.2.2 (a887fe9a5d3d97fe349065d3c1c9dbd7b8870855) nautilus (stable)
Aug 31 18:01:26 cl-01 ceph-mds[3972]:  1: (()+0x12730) [0x7fdb85c65730]
Aug 31 18:01:26 cl-01 ceph-mds[3972]:  2: (__pthread_mutex_lock()+0) [0x7fdb85c5d6c0]
Aug 31 18:01:26 cl-01 ceph-mds[3972]:  3: (ceph::logging::Log::submit_entry(ceph::logging::Entry&&)+0x41) [0x7fdb86c0f0b1]
Aug 31 18:01:26 cl-01 ceph-mds[3972]:  4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x24c) [0x7fdb8689c6c0]
Aug 31 18:01:26 cl-01 ceph-mds[3972]:  5: (()+0x28079e) [0x7fdb8689c79e]
Aug 31 18:01:26 cl-01 ceph-mds[3972]:  6: (()+0x3dcfb3) [0x557cfc66bfb3]
Aug 31 18:01:26 cl-01 ceph-mds[3972]:  7: (()+0x39d8c) [0x7fdb8552cd8c]
Aug 31 18:01:26 cl-01 ceph-mds[3972]:  8: (()+0x39eba) [0x7fdb8552ceba]
Aug 31 18:01:26 cl-01 ceph-mds[3972]:  9: (__libc_start_main()+0xf2) [0x7fdb855170a2]
Aug 31 18:01:26 cl-01 ceph-mds[3972]:  10: (_start()+0x2a) [0x557cfc3cb8ca]
Aug 31 18:01:26 cl-01 systemd[1]: ceph-mds@cl-01.service: Main process exited, code=killed, status=11/SEGV
Aug 31 18:01:26 cl-01 systemd[1]: ceph-mds@cl-01.service: Failed with result 'signal'.
Aug 31 18:01:26 cl-01 systemd[1]: Stopped Ceph metadata server daemon.
Aug 31 18:10:15 cl-01 systemd[1]: Started Ceph metadata server daemon.
Aug 31 18:10:15 cl-01 ceph-mds[17838]: starting mds.cl-01 at
Aug 31 18:10:38 cl-01 systemd[1]: Stopping Ceph metadata server daemon...
Aug 31 18:10:38 cl-01 ceph-mds[17838]: 2019-08-31 18:10:38.908 7f2ec5ded700 -1 received  signal: Terminated from /sbin/init nofb  (PID: 1) UID: 0
Aug 31 18:10:38 cl-01 ceph-mds[17838]: 2019-08-31 18:10:38.908 7f2ec5ded700 -1 mds.cl-01 *** got signal Terminated ***
Aug 31 18:10:39 cl-01 ceph-mds[17838]: /root/sources/pve/ceph/ceph-14.2.2/src/include/elist.h: In function 'elist<T>::~elist() [with T = MDSIOContextBase*]' thread 7f2ec903e340 time 2019-08-31 18:10:39.383807
Aug 31 18:10:39 cl-01 ceph-mds[17838]: /root/sources/pve/ceph/ceph-14.2.2/src/include/elist.h: 91: FAILED ceph_assert(_head.empty())
Aug 31 18:10:39 cl-01 ceph-mds[17838]:  ceph version 14.2.2 (a887fe9a5d3d97fe349065d3c1c9dbd7b8870855) nautilus (stable)
Aug 31 18:10:39 cl-01 ceph-mds[17838]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x7f2eca8e35c6]
Aug 31 18:10:39 cl-01 ceph-mds[17838]:  2: (()+0x28079e) [0x7f2eca8e379e]
Aug 31 18:10:39 cl-01 ceph-mds[17838]:  3: (()+0x3dcfb3) [0x557766e45fb3]
Aug 31 18:10:39 cl-01 ceph-mds[17838]:  4: (()+0x39d8c) [0x7f2ec9573d8c]
Aug 31 18:10:39 cl-01 ceph-mds[17838]:  5: (()+0x39eba) [0x7f2ec9573eba]
Aug 31 18:10:39 cl-01 ceph-mds[17838]:  6: (__libc_start_main()+0xf2) [0x7f2ec955e0a2]
Aug 31 18:10:39 cl-01 ceph-mds[17838]:  7: (_start()+0x2a) [0x557766ba58ca]
Aug 31 18:10:39 cl-01 ceph-mds[17838]: *** Caught signal (Segmentation fault) **
Aug 31 18:10:39 cl-01 ceph-mds[17838]:  in thread 7f2ec903e340 thread_name:ceph-mds
Aug 31 18:10:39 cl-01 ceph-mds[17838]:  ceph version 14.2.2 (a887fe9a5d3d97fe349065d3c1c9dbd7b8870855) nautilus (stable)
Aug 31 18:10:39 cl-01 ceph-mds[17838]:  1: (()+0x12730) [0x7f2ec9cac730]
Aug 31 18:10:39 cl-01 ceph-mds[17838]:  2: (__pthread_mutex_lock()+0) [0x7f2ec9ca46c0]
Aug 31 18:10:39 cl-01 ceph-mds[17838]:  3: (ceph::logging::Log::submit_entry(ceph::logging::Entry&&)+0x41) [0x7f2ecac560b1]
Aug 31 18:10:39 cl-01 ceph-mds[17838]:  4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x24c) [0x7f2eca8e36c0]
Aug 31 18:10:39 cl-01 ceph-mds[17838]:  5: (()+0x28079e) [0x7f2eca8e379e]
Aug 31 18:10:39 cl-01 ceph-mds[17838]:  6: (()+0x3dcfb3) [0x557766e45fb3]
Aug 31 18:10:39 cl-01 ceph-mds[17838]:  7: (()+0x39d8c) [0x7f2ec9573d8c]
Aug 31 18:10:39 cl-01 ceph-mds[17838]:  8: (()+0x39eba) [0x7f2ec9573eba]
Aug 31 18:10:39 cl-01 ceph-mds[17838]:  9: (__libc_start_main()+0xf2) [0x7f2ec955e0a2]
Aug 31 18:10:39 cl-01 ceph-mds[17838]:  10: (_start()+0x2a) [0x557766ba58ca]
Aug 31 18:10:39 cl-01 systemd[1]: ceph-mds@cl-01.service: Main process exited, code=killed, status=11/SEGV
Aug 31 18:10:39 cl-01 systemd[1]: ceph-mds@cl-01.service: Failed with result 'signal'.
Aug 31 18:10:39 cl-01 systemd[1]: Stopped Ceph metadata server daemon.
Aug 31 18:11:39 cl-01 systemd[1]: Started Ceph metadata server daemon.
Aug 31 18:11:39 cl-01 ceph-mds[20064]: starting mds.cl-01 at

  • #1

One minor but important observation to please fix in the documentation.
on https://pve.proxmox.com/wiki/Manage_Ceph_Services_on_Proxmox_VE_Nodes
at
Destroy CephFS

the command
ceph rm fs NAME —yes-i-really-mean-it
should be
ceph fs rm NAME —yes-i-really-mean-it

However,
I have an up and running ceph but creating a cephfs doesn’t work.

pveversion
pve-manager/5.3-5/97ae681d (running kernel: 4.15.18-9-pve)

MDS is created
root@proxmox:~# ceph mds stat
, 1 up:standby

creating a cephfs

root@proxmox:~# pveceph fs create —pg_num 128 —add-storage
creating data pool ‘cephfs_data’…
creating metadata pool ‘cephfs_metadata’…
configuring new CephFS ‘cephfs’
Successfully create CephFS ‘cephfs’
Adding ‘cephfs’ to storage configuration…
Waiting for an MDS to become active
Waiting for an MDS to become active
Waiting for an MDS to become active
Waiting for an MDS to become active
Waiting for an MDS to become active
Waiting for an MDS to become active
Waiting for an MDS to become active
Waiting for an MDS to become active
Waiting for an MDS to become active
Waiting for an MDS to become active
Need MDS to add storage, but none got active!

root@proxmox:~# ceph mds stat
cephfs-1/1/1 up {0=proxmox=up:creating}

Anyone with any idea, please ?

Alwin

Alwin

Proxmox Retired Staff


  • #2

ceph fs rm NAME —yes-i-really-mean-it

Correction is on the way, thanks for spotting.

root@proxmox:~# ceph mds stat
cephfs-1/1/1 up {0=proxmox=up:creating}

Is the cephfs now up? What does a ‘ceph fs status’ say?

  • #3

Unchanged since yesterday
root@proxmox:~# ceph mds stat
cephfs-1/1/1 up {0=proxmox=up:creating}

root@proxmox:~# ceph fs status
cephfs — 0 clients
======
+——+———-+———+———-+——-+——-+
| Rank | State | MDS | Activity | dns | inos |
+——+———-+———+———-+——-+——-+
| 0 | creating | proxmox | | 0 | 0 |
+——+———-+———+———-+——-+——-+
+——————+———-+——-+——-+
| Pool | type | used | avail |
+——————+———-+——-+——-+
| cephfs_metadata | metadata | 0 | 589G |
| cephfs_data | data | 0 | 589G |
+——————+———-+——-+——-+
+————-+
| Standby MDS |
+————-+
+————-+
MDS version: ceph version 12.2.8 (6f01265ca03a6b9d7f3b7f759d8894bb9dbb6840) luminous (stable)

I’m trying to get my head around the correct pgs counts. can this be an issue?
cluster [WRN] Health check update: Degraded data redundancy: 108 pgs undersized (PG_DEGRADED)

Alwin

Alwin

Proxmox Retired Staff


  • #4

I’m trying to get my head around the correct pgs counts. can this be an issue?
cluster [WRN] Health check update: Degraded data redundancy: 108 pgs undersized (PG_DEGRADED)

That may be why the MDS isn’t starting up, you will find more information in the ceph logs or in the journal. Anyway the pool replication needs to be solved. I guess, there are not enough domains (default host level) for the replication available.

  • #5

I’m using 5.3-9 and i’m also trying to create a cephfs. I have 1 ceph mds and cephs health is OK .. 3 monitors as instructed across 8 nodes with 2 osds each.
root@blah:~# ceph mds stat
, 1 up:standby
root@blah:~# pveceph fs create —pg_num 128 —add-storage
creating data pool ‘cephfs_data’…
mon_command failed — error parsing integer value »: Expected option value to be integer, got »in»}

Any suggestions on how to win this battle?

Thanks

Alwin

Alwin

Proxmox Retired Staff


  • #6

What package version are you on (pveversion -v)? What pools do exist on your cluster?

  • #7

I can confirm the error wylde is getting on a new cluster I’m building, a previous installation did not give this error, we update the machine every week and for today many ceph packages are about to be updated.

cephfs_data pool is created however the metadata pool is not.

pveversion -v output is:

Code:

proxmox-ve: 5.3-1 (running kernel: 4.15.18-11-pve)
pve-manager: 5.3-9 (running version: 5.3-9/ba817b29)
pve-kernel-4.15: 5.3-2
pve-kernel-4.15.18-11-pve: 4.15.18-33
ceph: 12.2.11-pve1
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: not correctly installed
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-46
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-38
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-3
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-2
openvswitch-switch: 2.7.0-3
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-33
pve-container: 2.0-34
pve-docs: 5.3-2
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-17
pve-firmware: 2.0-6
pve-ha-manager: 2.0-6
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 3.10.1-1
qemu-server: 5.0-46
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3

Alwin

Alwin

Proxmox Retired Staff


t.lamprecht


  • #10

FYI: this was be fixed in pve-manager with version 5.3-11, at the time of writing already publicly available through our pvetest repo.

As a storage administrator, you can troubleshoot the most common issues that can occur when using the Ceph Metadata Server (MDS). Some of the common errors that you might encounter:

  • An MDS node failure requiring a new MDS deployment.
  • An MDS node issue requiring redeployment of an MDS node.

6.1. Redeploying a Ceph MDS

Ceph Metadata Server (MDS) daemons are necessary for deploying a Ceph File System. If an MDS node in your cluster fails, you can redeploy a Ceph Metadata Server by removing an MDS server and adding a new or existing server. You can use the command-line interface or Ansible playbook to add or remove an MDS server.

6.1.1. Prerequisites

  • A running Red Hat Ceph Storage cluster.

6.1.2. Removing a Ceph MDS using Ansible

To remove a Ceph Metadata Server (MDS) using Ansible, use the shrink-mds playbook.

If there is no replacement MDS to take over once the MDS is removed, the file system will become unavailable to clients. If that is not desirable, consider adding an additional MDS before removing the MDS you would like to take offline.

Prerequisites

  • At least one MDS node.
  • A running Red Hat Ceph Storage cluster deployed by Ansible.
  • Root or sudo access to an Ansible administration node.

Procedure

  1. Log in to the Ansible administration node.
  2. Change to the /usr/share/ceph-ansible directory:

    Example

    [ansible@admin ~]$ cd /usr/share/ceph-ansible

  3. Run the Ansible shrink-mds.yml playbook, and when prompted, type yes to confirm shrinking the cluster:

    Syntax

    ansible-playbook infrastructure-playbooks/shrink-mds.yml -e mds_to_kill=ID -i hosts

    Replace ID with the ID of the MDS node you want to remove. You can remove only one Ceph MDS each time the playbook runs.

    Example

    [ansible @admin ceph-ansible]$ ansible-playbook infrastructure-playbooks/shrink-mds.yml -e mds_to_kill=node02 -i hosts

  4. As root or with sudo access, open and edit the /usr/share/ceph-ansible/hosts inventory file and remove the MDS node under the [mdss] section:

    Syntax

    [mdss]
    MDS_NODE_NAME
    MDS_NODE_NAME

    Example

    [mdss]
    node01
    node03

    In this example, node02 was removed from the [mdss] list.

Verification

  • Check the status of the MDS daemons:

    Example

    [ansible@admin ceph-ansible]$ ceph fs dump
    
    [mds.node01 {0:115304} state up:active seq 5 addr [v2:172.25.250.10:6800/695510951,v1:172.25.250.10:6801/695510951]]
    
    Standby daemons:
    [mds.node03 {-1:144437} state up:standby seq 2 addr [v2:172.25.250.11:6800/172950087,v1:172.25.250.11:6801/172950087]]

Additional Resources

  • For more information on installing Red Hat Ceph Storage, see the Red Hat Ceph Storage Installation Guide.
  • See the Adding a Ceph MDS using Ansible section in the Red Hat Ceph Storage Troubleshooting Guide for more details on adding an MDS using Ansible.

6.1.3. Removing a Ceph MDS using the command-line interface

You can manually remove a Ceph Metadata Server (MDS) using the command-line interface.

If there is no replacement MDS to take over once the current MDS is removed, the file system will become unavailable to clients. If that is not desirable, consider adding an MDS before removing the existing MDS.

Prerequisites

  • The ceph-common package is installed.
  • A running Red Hat Ceph Storage cluster.
  • Root or sudo access to the MDS nodes.

Procedure

  1. Log into the Ceph MDS node that you want to remove the MDS daemon from.
  2. Stop the Ceph MDS service:

    Syntax

    sudo systemctl stop ceph-mds@HOST_NAME

    Replace HOST_NAME with the short name of the host where the daemon is running.

    Example

    [admin@node02 ~]$ sudo systemctl stop ceph-mds@node02

  3. Disable the MDS service if you are not redeploying MDS to this node:

    Syntax

    sudo systemctl disable ceph-mds@HOST_NAME

    Replace HOST_NAME with the short name of the host to disable the daemon.

    Example

    [admin@node02 ~]$ sudo systemctl disable ceph-mds@node02

  4. Remove the /var/lib/ceph/mds/ceph-MDS_ID directory on the MDS node:

    Syntax

    sudo rm -fr /var/lib/ceph/mds/ceph-MDS_ID

    Replace MDS_ID with the ID of the MDS node that you want to remove the MDS daemon from.

    Example

    [admin@node02 ~]$ sudo rm -fr /var/lib/ceph/mds/ceph-node02

Verification

  • Check the status of the MDS daemons:

    Example

    [ansible@admin ceph-ansible]$ ceph fs dump
    
    [mds.node01 {0:115304} state up:active seq 5 addr [v2:172.25.250.10:6800/695510951,v1:172.25.250.10:6801/695510951]]
    
    Standby daemons:
    [mds.node03 {-1:144437} state up:standby seq 2 addr [v2:172.25.250.11:6800/172950087,v1:172.25.250.11:6801/172950087]]

Additional Resources

  • For more information on installing Red Hat Ceph Storage, see the Red Hat Ceph Storage Installation Guide.
  • See the Adding a Ceph MDS using the command line interface section in the Red Hat Ceph Storage Troubleshooting Guide for more details on adding an MDS using the command line interface.

6.1.4. Adding a Ceph MDS using Ansible

Use the Ansible playbook to add a Ceph Metadata Server (MDS).

Prerequisites

  • A running Red Hat Ceph Storage cluster deployed by Ansible.
  • Root or sudo access to an Ansible administration node.
  • New or existing servers that can be provisioned as MDS nodes.

Procedure

  1. Log in to the Ansible administration node
  2. Change to the /usr/share/ceph-ansible directory:

    Example

    [ansible@admin ~]$ cd /usr/share/ceph-ansible

  3. As root or with sudo access, open and edit the /usr/share/ceph-ansible/hosts inventory file and add the MDS node under the [mdss] section:

    Syntax

    [mdss]
    MDS_NODE_NAME
    NEW_MDS_NODE_NAME

    Replace NEW_MDS_NODE_NAME with the host name of the node where you want to install the MDS server.

    Alternatively, you can colocate the MDS daemon with the OSD daemon on one node by adding the same node under the [osds] and [mdss] sections.

    Example

    [mdss]
    node01
    node03

  4. As the ansible user, run the Ansible playbook to provision the MDS node:

    • Bare-metal deployments:

      [ansible@admin ceph-ansible]$ ansible-playbook site.yml --limit mdss -i hosts
    • Container deployments:

      [ansible@admin ceph-ansible]$ ansible-playbook site-container.yml --limit mdss -i hosts

      After the Ansible playbook has finished running, the new Ceph MDS node appears in the storage cluster.

Verification

  • Check the status of the MDS daemons:

    Example

    [ansible@admin ceph-ansible]$ ceph fs dump
    
    [mds.node01 {0:115304} state up:active seq 5 addr [v2:172.25.250.10:6800/695510951,v1:172.25.250.10:6801/695510951]]
    
    Standby daemons:
    [mds.node03 {-1:144437} state up:standby seq 2 addr [v2:172.25.250.11:6800/172950087,v1:172.25.250.11:6801/172950087]]

  • Alternatively, you can use the ceph mds stat command to check if the MDS is in an active state:

    Example

    [ansible@admin ceph-ansible]$ ceph mds stat
    cephfs:1 {0=node01=up:active} 1 up:standby

Additional Resources

  • For more information on installing Red Hat Ceph Storage, see the Red Hat Ceph Storage Installation Guide.
  • See the Removing a Ceph MDS using Ansible section in the Red Hat Ceph Storage Troubleshooting Guide for more details on removing an MDS using Ansible.

6.1.5. Adding a Ceph MDS using the command-line interface

You can manually add a Ceph Metadata Server (MDS) using the command-line interface.

Prerequisites

  • The ceph-common package is installed.
  • A running Red Hat Ceph Storage cluster.
  • Root or sudo access to the MDS nodes.
  • New or existing servers that can be provisioned as MDS nodes.

Procedure

  1. Add a new MDS node by logging into the node and creating an MDS mount point:

    Syntax

    sudo mkdir /var/lib/ceph/mds/ceph-MDS_ID

    Replace MDS_ID with the ID of the MDS node that you want to add the MDS daemon to.

    Example

    [admin@node03 ~]$ sudo mkdir /var/lib/ceph/mds/ceph-node03

  2. If this is a new MDS node, create the authentication key if you are using Cephx authentication:

    Syntax

    sudo ceph auth get-or-create mds.MDS_ID mon 'profile mds' mgr 'profile mds' mds 'allow *' osd 'allow *' > /var/lib/ceph/mds/ceph-MDS_ID/keyring

    Replace MDS_ID with the ID of the MDS node to deploy the MDS daemon on.

    Example

    [admin@node03 ~]$ sudo ceph auth get-or-create mds.node03 mon 'profile mds' mgr 'profile mds' mds 'allow *' osd 'allow *' > /var/lib/ceph/mds/ceph-node03/keyring

    Cephx authentication is enabled by default. See the Cephx authentication link in the Additional Resources section for more information about Cephx authentication.

  3. Start the MDS daemon:

    Syntax

    sudo systemctl start ceph-mds@HOST_NAME

    Replace HOST_NAME with the short name of the host to start the daemon.

    Example

    [admin@node03 ~]$ sudo systemctl start ceph-mds@node03

  4. Enable the MDS service:

    Syntax

    systemctl enable ceph-mds@HOST_NAME

    Replace HOST_NAME with the short name of the host to enable the service.

    Example

    [admin@node03 ~]$ sudo systemctl enable ceph-mds@node03

Verification

  • Check the status of the MDS daemons:

    Example

    [admin@mon]$ ceph fs dump
    
    [mds.node01 {0:115304} state up:active seq 5 addr [v2:172.25.250.10:6800/695510951,v1:172.25.250.10:6801/695510951]]
    
    Standby daemons:
    [mds.node03 {-1:144437} state up:standby seq 2 addr [v2:172.25.250.11:6800/172950087,v1:172.25.250.11:6801/172950087]]

  • Alternatively, you can use the ceph mds stat command to check if the MDS is in an active state:

    Example

    [ansible@admin ceph-ansible]$ ceph mds stat
    cephfs:1 {0=node01=up:active} 1 up:standby

Additional Resources

  • For more information on installing Red Hat Ceph Storage, see the Red Hat Ceph Storage Installation Guide.
  • For more information on Cephx authentication, see the Red Hat Ceph Storage Configuration Guide.
  • See the Removing a Ceph MDS using the command line interface section in the Red Hat Ceph Storage Troubleshooting Guide for more details on removing an MDS using the command line interface.

I am Graduate student at University of Texas at Dallas.

Currently I am working on one of my cloud project which requires installation of the ceph cluster.

Our group is in bit of a bind and will require your expertise regarding this project. We already followed the official documentation for installing the ceph cluster with Monitor nodes and OSDs but we are really struggling while adding the metadata server.

It will be great if you can help us in this task. we tried adding MDS using ceph-deploy mds create but we are getting the following error:

> [root@ceph-mon1 ceph]# ceph-deploy --overwrite-conf mds create ceph-mds2
> [ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
> [ceph_deploy.cli][INFO  ] Invoked (1.5.28): /usr/bin/ceph-deploy --overwrite-conf mds create ceph-mds2
> [ceph_deploy.cli][INFO  ] ceph-deploy options:
> [ceph_deploy.cli][INFO  ]  username                      : None
> [ceph_deploy.cli][INFO  ]  verbose                       : False
> [ceph_deploy.cli][INFO  ]  overwrite_conf                : True
> [ceph_deploy.cli][INFO  ]  subcommand                    : create
> [ceph_deploy.cli][INFO  ]  quiet                         : False
> [ceph_deploy.cli][INFO  ]  cd_conf                       : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f226df6a170>
> [ceph_deploy.cli][INFO  ]  cluster                       : ceph
> [ceph_deploy.cli][INFO  ]  func                          : <function mds at 0x7f226e7f88c0>
> [ceph_deploy.cli][INFO  ]  ceph_conf                     : None
> [ceph_deploy.cli][INFO  ]  mds                           : [('ceph-mds2', 'ceph-mds2')]
> [ceph_deploy.cli][INFO  ]  default_release               : False
> [ceph_deploy.mds][DEBUG ] Deploying mds, cluster ceph hosts ceph-mds2:ceph-mds2
> [ceph-mds2][DEBUG ] connected to host: ceph-mds2 
> [ceph-mds2][DEBUG ] detect platform information from remote host
> [ceph-mds2][DEBUG ] detect machine type
> [ceph_deploy.mds][INFO  ] Distro info: CentOS Linux 7.1.1503 Core
> [ceph_deploy.mds][DEBUG ] remote host will use sysvinit
> [ceph_deploy.mds][DEBUG ] deploying mds bootstrap to ceph-mds2
> [ceph-mds2][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
> [ceph-mds2][DEBUG ] create path if it doesn't exist
> [ceph-mds2][INFO  ] Running command: ceph --cluster ceph --name client.bootstrap-mds --keyring /var/lib/ceph/bootstrap-mds/ceph.keyring auth get-or-create mds.ceph-mds2 osd allow rwx mds allow mon allow profile mds -o /var/lib/ceph/mds/ceph-ceph-mds2/keyring
> [ceph-mds2][ERROR ] 2015-11-14 16:35:13.599807 7fc8dfea5700  0 librados: client.bootstrap-mds authentication error (1) Operation not permitted
> [ceph-mds2][ERROR ] Error connecting to cluster: PermissionError
> [ceph-mds2][ERROR ] exit code from command was: 1
> [ceph_deploy.mds][ERROR ] could not create mds
> [ceph_deploy][ERROR ] GenericError: Failed to create 1 MDSs

I’m also attaching our ceph.conf for the reference.

ceph.conf:

> [global]
> fsid = f1fccdcf-f53f-4a6d-93f2-002f392e98da
> mon_initial_members = ceph-mon1
> mon_host = 192.168.122.213
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> filestore_xattr_use_omap = true
> public network = 192.168.122.0/256
> 
> #[global]
> #auth_service_required = cephx
> #auth_client_required = cephx
> #auth_cluster_required = cephx
> 
> [mon.ceph-mon1]
> mon_addr = 192.168.122.213:6789
> host = ceph-mon1
> 
> [osd]
> filestore_xattr_use_omap = true
> osd_data = /var/lib/ceph/osd/$cluster-$id
> osd_journal_size = 1024
> 
> [mds]
> #mds data = /var/lib/ceph/mds/mds.$id
> #keyring = /etc/ceph/keyring.$id
> debug mon = 20
> debug paxos = 20
> debug auth = 20
> 
> [mds.ceph-mds2]
> host = ceph-mds2                 

I will be really grateful to you if you can help us in resolving this isssue

Best Regards

UTDCloudgroupB

Bug Report

What happened:
After deploying, I tried to mount cephfs using ceph-fuse, but it complained about not having a MDS. The MDS appears to be stuck in the ‘creating’ state. After wiping everything and deploying again, it appears that the MDS can’t complete its creating task.

The status reports that there is reduced availability, but I suspect that this is due to the issue with the MDS:

cluster:
id: 43847e46-e422-478f-8c92-ba68e9b30b8b
health: HEALTH_WARN
mon ubuntu is low on available space
Reduced data availability: 65 pgs inactive
Degraded data redundancy: 65 pgs undersized

services:
mon: 1 daemons, quorum ubuntu (age 40m)
mgr: ubuntu(active, since 40m)
mds: cephfs:1 {0=ubuntu=up:creating}
osd: 9 osds: 9 up (since 39m), 9 in (since 39m)

data:
pools: 3 pools, 65 pgs
objects: 0 objects, 0 B
usage: 9.5 GiB used, 98 GiB / 108 GiB avail
pgs: 100.000% pgs not active
65 undersized+peered

The health detail reports similarly:
HEALTH_WARN mon ubuntu is low on available space; Reduced data availability: 65 pgs inactive; Degraded data redundancy: 65 pgs undersized
[WRN] MON_DISK_LOW: mon ubuntu is low on available space
mon.ubuntu has 23% avail
[WRN] PG_AVAILABILITY: Reduced data availability: 65 pgs inactive
pg 1.0 is stuck inactive for 42m, current state undersized+peered, last acting [8]
pg 2.0 is stuck inactive for 41m, current state undersized+peered, last acting [3]
pg 2.1 is stuck inactive for 41m, current state undersized+peered, last acting [7]
pg 2.2 is stuck inactive for 41m, current state undersized+peered, last acting [5]
pg 2.3 is stuck inactive for 41m, current state undersized+peered, last acting [5]
pg 2.4 is stuck inactive for 41m, current state undersized+peered, last acting [1]
pg 2.5 is stuck inactive for 41m, current state undersized+peered, last acting [8]
pg 2.6 is stuck inactive for 41m, current state undersized+peered, last acting [8]
pg 2.7 is stuck inactive for 41m, current state undersized+peered, last acting [6]
pg 2.8 is stuck inactive for 41m, current state undersized+peered, last acting [0]
pg 2.9 is stuck inactive for 41m, current state undersized+peered, last acting [4]
pg 2.a is stuck inactive for 41m, current state undersized+peered, last acting [1]
pg 2.b is stuck inactive for 41m, current state undersized+peered, last acting [8]
pg 2.c is stuck inactive for 41m, current state undersized+peered, last acting [2]
pg 2.d is stuck inactive for 41m, current state undersized+peered, last acting [5]
pg 2.e is stuck inactive for 41m, current state undersized+peered, last acting [8]
pg 2.f is stuck inactive for 41m, current state undersized+peered, last acting [8]
pg 2.10 is stuck inactive for 41m, current state undersized+peered, last acting [8]
pg 2.11 is stuck inactive for 41m, current state undersized+peered, last acting [0]
pg 2.12 is stuck inactive for 41m, current state undersized+peered, last acting [3]
pg 2.13 is stuck inactive for 41m, current state undersized+peered, last acting [0]
pg 2.14 is stuck inactive for 41m, current state undersized+peered, last acting [2]
pg 2.15 is stuck inactive for 41m, current state undersized+peered, last acting [1]
pg 2.16 is stuck inactive for 41m, current state undersized+peered, last acting [8]
pg 2.17 is stuck inactive for 41m, current state undersized+peered, last acting [6]
pg 2.18 is stuck inactive for 41m, current state undersized+peered, last acting [5]
pg 3.0 is stuck inactive for 41m, current state undersized+peered, last acting [1]
pg 3.1 is stuck inactive for 41m, current state undersized+peered, last acting [6]
pg 3.2 is stuck inactive for 41m, current state undersized+peered, last acting [3]
pg 3.3 is stuck inactive for 41m, current state undersized+peered, last acting [4]
pg 3.4 is stuck inactive for 41m, current state undersized+peered, last acting [8]
pg 3.5 is stuck inactive for 41m, current state undersized+peered, last acting [5]
pg 3.6 is stuck inactive for 41m, current state undersized+peered, last acting [0]
pg 3.7 is stuck inactive for 41m, current state undersized+peered, last acting [4]
pg 3.8 is stuck inactive for 41m, current state undersized+peered, last acting [4]
pg 3.9 is stuck inactive for 41m, current state undersized+peered, last acting [4]
pg 3.a is stuck inactive for 41m, current state undersized+peered, last acting [6]
pg 3.b is stuck inactive for 41m, current state undersized+peered, last acting [7]
pg 3.c is stuck inactive for 41m, current state undersized+peered, last acting [5]
pg 3.d is stuck inactive for 41m, current state undersized+peered, last acting [7]
pg 3.e is stuck inactive for 41m, current state undersized+peered, last acting [7]
pg 3.f is stuck inactive for 41m, current state undersized+peered, last acting [7]
pg 3.10 is stuck inactive for 41m, current state undersized+peered, last acting [6]
pg 3.11 is stuck inactive for 41m, current state undersized+peered, last acting [3]
pg 3.12 is stuck inactive for 41m, current state undersized+peered, last acting [6]
pg 3.13 is stuck inactive for 41m, current state undersized+peered, last acting [8]
pg 3.14 is stuck inactive for 41m, current state undersized+peered, last acting [3]
pg 3.15 is stuck inactive for 41m, current state undersized+peered, last acting [8]
pg 3.16 is stuck inactive for 41m, current state undersized+peered, last acting [2]
pg 3.17 is stuck inactive for 41m, current state undersized+peered, last acting [0]
pg 3.19 is stuck inactive for 41m, current state undersized+peered, last acting [8]
[WRN] PG_DEGRADED: Degraded data redundancy: 65 pgs undersized
pg 1.0 is stuck undersized for 41m, current state undersized+peered, last acting [8]
pg 2.0 is stuck undersized for 41m, current state undersized+peered, last acting [3]
pg 2.1 is stuck undersized for 41m, current state undersized+peered, last acting [7]
pg 2.2 is stuck undersized for 41m, current state undersized+peered, last acting [5]
pg 2.3 is stuck undersized for 41m, current state undersized+peered, last acting [5]
pg 2.4 is stuck undersized for 41m, current state undersized+peered, last acting [1]
pg 2.5 is stuck undersized for 41m, current state undersized+peered, last acting [8]
pg 2.6 is stuck undersized for 41m, current state undersized+peered, last acting [8]
pg 2.7 is stuck undersized for 41m, current state undersized+peered, last acting [6]
pg 2.8 is stuck undersized for 41m, current state undersized+peered, last acting [0]
pg 2.9 is stuck undersized for 41m, current state undersized+peered, last acting [4]
pg 2.a is stuck undersized for 41m, current state undersized+peered, last acting [1]
pg 2.b is stuck undersized for 41m, current state undersized+peered, last acting [8]
pg 2.c is stuck undersized for 41m, current state undersized+peered, last acting [2]
pg 2.d is stuck undersized for 41m, current state undersized+peered, last acting [5]
pg 2.e is stuck undersized for 41m, current state undersized+peered, last acting [8]
pg 2.f is stuck undersized for 41m, current state undersized+peered, last acting [8]
pg 2.10 is stuck undersized for 41m, current state undersized+peered, last acting [8]
pg 2.11 is stuck undersized for 41m, current state undersized+peered, last acting [0]
pg 2.12 is stuck undersized for 41m, current state undersized+peered, last acting [3]
pg 2.13 is stuck undersized for 41m, current state undersized+peered, last acting [0]
pg 2.14 is stuck undersized for 41m, current state undersized+peered, last acting [2]
pg 2.15 is stuck undersized for 41m, current state undersized+peered, last acting [1]
pg 2.16 is stuck undersized for 41m, current state undersized+peered, last acting [8]
pg 2.17 is stuck undersized for 41m, current state undersized+peered, last acting [6]
pg 2.18 is stuck undersized for 41m, current state undersized+peered, last acting [5]
pg 3.0 is stuck undersized for 41m, current state undersized+peered, last acting [1]
pg 3.1 is stuck undersized for 41m, current state undersized+peered, last acting [6]
pg 3.2 is stuck undersized for 41m, current state undersized+peered, last acting [3]
pg 3.3 is stuck undersized for 41m, current state undersized+peered, last acting [4]
pg 3.4 is stuck undersized for 41m, current state undersized+peered, last acting [8]
pg 3.5 is stuck undersized for 41m, current state undersized+peered, last acting [5]
pg 3.6 is stuck undersized for 41m, current state undersized+peered, last acting [0]
pg 3.7 is stuck undersized for 41m, current state undersized+peered, last acting [4]
pg 3.8 is stuck undersized for 41m, current state undersized+peered, last acting [4]
pg 3.9 is stuck undersized for 41m, current state undersized+peered, last acting [4]
pg 3.a is stuck undersized for 41m, current state undersized+peered, last acting [6]
pg 3.b is stuck undersized for 41m, current state undersized+peered, last acting [7]
pg 3.c is stuck undersized for 41m, current state undersized+peered, last acting [5]
pg 3.d is stuck undersized for 41m, current state undersized+peered, last acting [7]
pg 3.e is stuck undersized for 41m, current state undersized+peered, last acting [7]
pg 3.f is stuck undersized for 41m, current state undersized+peered, last acting [7]
pg 3.10 is stuck undersized for 41m, current state undersized+peered, last acting [6]
pg 3.11 is stuck undersized for 41m, current state undersized+peered, last acting [3]
pg 3.12 is stuck undersized for 41m, current state undersized+peered, last acting [6]
pg 3.13 is stuck undersized for 41m, current state undersized+peered, last acting [8]
pg 3.14 is stuck undersized for 41m, current state undersized+peered, last acting [3]
pg 3.15 is stuck undersized for 41m, current state undersized+peered, last acting [8]
pg 3.16 is stuck undersized for 41m, current state undersized+peered, last acting [2]
pg 3.17 is stuck undersized for 41m, current state undersized+peered, last acting [0]
pg 3.19 is stuck undersized for 41m, current state undersized+peered, last acting [8]

What you expected to happen:
I expect the MDSs to complete, or issue some warning as to why they cannot make forward progress

How to reproduce it (minimal and precise):

all.yml:
mon_group_name: mons
osd_group_name: osds
mds_group_name: mdss
mgr_group_name: mgrs
monitoring_group_name: monitoring
ceph_origin: repository
ceph_stable_release: octopus
generate_fsid: true
ceph_conf_key_directory: /etc/ceph
ceph_keyring_permissions: ‘0600’
cephx: true
monitor_interface: ens192
journal_size: 512
public_network: 192.168.0.0/16
dashboard_enabled: False

osds.yml:
copy_admin_key: true
devices:

  • /dev/sdb
  • /dev/sdc
  • /dev/sdd

mdss.yml:
copy_admin_key: true

mons.yml:
mon_group_name: mons

clients.yml:
copy_admin_key: true

inventory:
[clients]
Client_0 ansible_host=192.168.19.15 ansible_python_interpreter=/usr/bin/python3

[mdss]
MDS_0 ansible_host=192.168.13.15 ansible_python_interpreter=/usr/bin/python3
MDS_1 ansible_host=192.168.14.15 ansible_python_interpreter=/usr/bin/python3
MDS_2 ansible_host=192.168.15.15 ansible_python_interpreter=/usr/bin/python3

[monitoring]
Mon_0 ansible_host=192.168.16.15 ansible_python_interpreter=/usr/bin/python3
Mon_1 ansible_host=192.168.17.15 ansible_python_interpreter=/usr/bin/python3
Mon_2 ansible_host=192.168.18.15 ansible_python_interpreter=/usr/bin/python3

[mons]
Mon_0 ansible_host=192.168.16.15 ansible_python_interpreter=/usr/bin/python3
Mon_1 ansible_host=192.168.17.15 ansible_python_interpreter=/usr/bin/python3
Mon_2 ansible_host=192.168.18.15 ansible_python_interpreter=/usr/bin/python3

[osds]
OSD_0 ansible_host=192.168.10.15 ansible_python_interpreter=/usr/bin/python3
OSD_1 ansible_host=192.168.11.15 ansible_python_interpreter=/usr/bin/python3
OSD_2 ansible_host=192.168.12.15 ansible_python_interpreter=/usr/bin/python3

ansible.log

Environment:

  • OS (e.g. from /etc/os-release): Ubuntu 18.04.5 LTS
  • Kernel (e.g. uname -a): Linux ubuntu 4.15.0-112-generic Add developement packages support #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
  • Ansible version (e.g. ansible-playbook --version): ansible-playbook 2.9.0
    config file = /etc/ansible/ansible.cfg
    configured module search path = [‘/home/blue/.ansible/plugins/modules’, ‘/usr/share/ansible/plugins/modules’]
    ansible python module location = /home/blue/.local/lib/python3.6/site-packages/ansible
    executable location = /home/blue/.local/bin/ansible-playbook
    python version = 3.6.9 (default, Jul 17 2020, 12:50:27) [GCC 8.4.0]
  • ceph-ansible version (e.g. git head or tag or stable branch): master, 54ba38e
  • Ceph version (e.g. ceph -v): ceph version 15.2.5 octopus stable

We all love it when our computers are running seamlessly with little or no annoying issue. However, like every other technology product, we sometimes encounter these problems that often get us sad. 

These issues range from our hardware to the software. On rare occasions, we encounter software problems that affect the hardware, such as the mds_stores on Mac and high CPU consumption. 

When this error occurs, that seamless surfing that you enjoyed ceases, and your machine severely slows down. Most times, your system overheats and places more pressure on your cooling fan.

If you are at this juncture, don’t fret, this article will show you how to solve that annoying problem, but first, let’s get to know more about it. 

What is the mds_stores process? 

The mds stand for MetaData Server; it is a background process that is often associated with Mac’s Spotlight. No doubt, the mds_stores was created for good, but sometimes it does more than is expected, resulting in 90% – 100% consumption of the CPU process, which makes your Mac slow down unnecessarily. 

How this happens is, Spotlight by default, makes searching on your Mac computer easy; what it does is, with the help of the mds, mds_stores, and mdworker process, it indexes all the data that are in your hard drive. Hence, you easily find whatever you want with a simple search on Spotlight. 

Why does it take long to index? 

Why does it take long to index

We can agree that Spotlight is a unique and essential feature of Mac, so allowing it to do its job is ideal. However, it’s annoying when a supposed background process stops you from doing anything, so will we let it stop us from using our computer? No. 

By default, the indexing process should take between 15 to 45 minutes; its duration depends on a few factors, which you will see below. You can confirm if MDS and Spotlight are indexing by checking the Spotlight icon, often located at the menu bar’s upper right corner. A magnifying glass represents Spotlight. 

If you find a dot at the center of the Spotlight icon, you know indexing is going on. When you click on it, Mac will show you the index duration. 

What makes the mds_stores on Mac to consume a high CPU? 

You can configure the Spotlight index to start manually or automatically. However, it usually comes on when you restart your computer. The spike is often due to large deletion or addition of data on your hard drive. 

You could be having this considerable spike when Spotlight is indexing locations and folders you likely won’t need, such as your TimeMachine’s backup. It could also increase CPU usage when an app or multiple apps are continually changing some folder contents. 

Other factors like; 

  • Hard disk size
  • Size of your directories
  • Last date Spotlight indexed your device
  • Your Mac’s speed

You can confirm the high CPU usage by launching your Activity Monitor. You should see the amount of resources mds_stores on mac uses, now proceed to the part for solutions. 

How to fix the mds_stores process error

The earlier you tackle the issue, the better; when your CPU resources are overly consumed, your computer might become unusable, it might freeze, deteriorate, malfunction, hang and fail to respond to commands. If it persists, your Mac then overheats, and the fan keeps running, which isn’t advisable. You can either;

  1. Reboot your Mac
  2. Disable Spotlight
  3. Temporarily stop and restart Spotlight
  4. Reduce Spotlight indexing data

Most of the solutions here may need Administrator permission by requesting your Admin Password. If you are currently signed in with a guest account, you should switch to the Administrator account. 

 1. Reboot your Mac

Reboot your Mac

Rebooting has been known to fix many issues, including the mds_stores process on Mac. Suppose you notice any of the signs of the mds_process error and can confirm from the Activity Monitor, your first resolution should be to reboot your computer. 

Follow the usual steps you take to restart your machine. When you are back in, check the Activity Monitor if the huge resource consumption has reduced; if the error persists, go-to solution two. 

 2. Disable Spotlight 

Disable Spotlight

Since mds, mds_stores, and mdworker process are all background processes orchestrated by Spotlight, simply disabling Spotlight should solve the issue. 

  1. Locate and open the Finder app from your computer’s Desktop
  2. Select the Applications folder to expand it. 
  3. Go to Utilities to access the apps in the utility category. 
  4. Click on the Terminal app to launch it. 
  5. Input the following command in the terminal sudo mdutil -a -I off and strike the Enter/Return key. 
  6. When prompted, input your Admin Password. 
  7. Hit the Enter/Return key again to proceed. 
  8. Check if the error persists. 

To Enable Spotlight, follow steps 1 to 5 above but this time input sudo mdutil -a -I on, then follow all on-screen instructions. 

 3. Temporarily Stop and Restart Spotlight

Temporarily Stop and Restart Spotlight

You can temporarily stop Spotlight from running, this is different from the method above, but it still involves Spotlight. 

  1. Launch your Finder app and expand the Applications section. 
  2. Access the Utility apps and select Terminal
  3. Input the following command sudo launchctl unload -w/System/Library/LaunchDaemons/com.apple.metadata.mds.plist and hit Enter
  4. Input your Admin password in the field provided and strike Enter
  5. Restart your Mac and confirm if the error persists. 

To restart Spotlight 

  1. Launch Finder app > Applications > Utilities and select Terminal
  2. Type in sudo launchctl -w/System/Library/LaunchDaemons/com.apple.metadata.mds.plist then hit Enter
  3. Input your Admin Password in the field provided. 
  4. Go ahead and restart your machine and check the Activity Monitor if there’s an improvement. 

 4. Reduce Spotlight indexing data

Reduce Spotlight indexing data

If, after trying all the solutions above, the error persists, we should reduce Spotlight indexing data. What we will do here is remove some folders and locations from Spotlight indexing, but first, we will; 

  1. Delete Spotlight database 
  2. Re-index the Spotlight database 
  3. Prevent Spotlight from indexing some locations and folders

This may take some time, but it will prevent any future issues. 

 I. Delete Spotlight Database

 1. Open your Finder app and expand the Applications section. 

 2. Select Utilities and launch Terminal

 3. Input the command sudo rm -rf /.Spotlight-V100/ and hit Enter

 4. When asked, type in your Admin password, then hit Enter to proceed. That command deletes the Spotlight database. 

II. Re-index the Spotlight Database

After deleting your Spotlight database, you should restart your Mac and allow Spotlight to reindex the data in your hard drive. 

III. Prevent Spotlight from Indexing Some Locations and Folders

Prevent Spotlight from Indexing Some Locations and Folders

As we noted, the massive consumption of resources could also be due to mds_stores searching locations and folders irrelevant to you.

  1. Go to Desktop, then select System Preferences
  2. Search for Spotlight, navigate to Privacy
  3. In the Privacy tab, drag and drop the folders and add the locations you’ll love Spotlight to avoid. 

A few folders you should consider are

  • CrashPlan
  • Dropbox
  • CrashReporter
  • MobileSync
  • Timemachine backups
  • Viber PC

When you do this, Spotlight won’t consume unnecessary resources indexing these folders and locations again. Due to this, the CPU workload will reduce. 

Conclusion 

Though we don’t plan for any of these errors we encounter, we can often solve them ourselves without taking our Mac to an Apple shop. The mds_store on Mac is essential to our devices. 

However, when it starts consuming too much CPU resources, we may need to either Reboot our Mac, Disable Spotlight, Temporarily stop and restart Spotlight or Reduce Spotlight index data. Please ensure you follow the instructions verbatim and share with us how you solved the error. Thank you for stopping by. 

More Mac Troubleshooting Guides

  • 10 Quick Fixes For the “Apple iTunes Error 9006” on Mac
  • 5 Practical Ways to Recover Unsaved Word Documents on Mac
  • How to fix the MacBook trackpad not clicking Error (Solved)

Содержание

  1. TASK ERROR: start failed
  2. Mellgood
  3. How to Unlock a Proxmox VM
  4. Proxmox Locked VM Errors
  5. Error: VM is locked
  6. can’t lock file ‘/var/lock/qemu-server/lock- .conf’ – got timeout
  7. Proxmox Unlock VM Methods
  8. qm unlock
  9. Manually Deleting the Lock
  10. TASK ERROR: VM is locked (backup)
  11. jmjosebest
  12. jmjosebest
  13. PC Rescue
  14. RollMops
  15. fabian
  16. RollMops
  17. fabian
  18. RollMops
  19. TASK ERROR: VM is locked (backup)
  20. Can’t unlock vm
  21. mmohamed2
  22. t.lamprecht
  23. mmohamed2

TASK ERROR: start failed

Mellgood

New Member

Hi, after a failed clone and proxmox + host restart I got this error:
TASK ERROR: VM is locked (clone)

I tried:
qm unlock 106

but now i got another error:

kvm: -drive file=/var/lib/vz/images/106/vm-106-disk-1.qcow2,if=none,id=drive-scsi0,format=qcow2,cache=none,aio=native,detect-zeroes=on: qcow2: Image is corrupt; cannot be opened read/write
TASK ERROR: start failed: command ‘/usr/bin/kvm -id 106 -chardev ‘socket,id=qmp,path=/var/run/qemu-server/106.qmp,server,nowait’ -mon ‘chardev=qmp,mode=control’ -pidfile /var/run/qemu-server/106.pid -daemonize -smbios ‘type=1,uuid=24d6521e-95c7-463e-97fe-e79e16051387’ -name TMP -smp ‘4,sockets=2,cores=2,maxcpus=4’ -nodefaults -boot ‘menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg’ -vga std -vnc unix:/var/run/qemu-server/106.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 10000 -k it -device ‘pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e’ -device ‘pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f’ -device ‘piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2’ -device ‘usb-tablet,id=tablet,bus=uhci.0,port=1’ -chardev ‘socket,path=/var/run/qemu-server/106.qga,server,nowait,id=qga0’ -device ‘virtio-serial,id=qga0,bus=pci.0,addr=0x8’ -device ‘virtserialport,chardev=qga0,name=org.qemu.guest_agent.0’ -device ‘virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3’ -iscsi ‘initiator-name=iqn.1993-08.org.debian:01:3681fcbb6821’ -drive ‘file=/var/lib/vz/template/iso/debian-9.4.0-amd64-netinst.iso,if=none,id=drive-ide2,media=cdrom,aio=threads’ -device ‘ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200’ -device ‘virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5’ -drive ‘file=/var/lib/vz/images/106/vm-106-disk-1.qcow2,if=none,id=drive-scsi0,format=qcow2,cache=none,aio=native,detect-zeroes=on’ -device ‘scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100’ -netdev ‘type=tap,id=net0,ifname=tap106i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown’ -device ‘e1000,mac=86:CA:F0:6E:FB:EB,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300» failed: exit code 1

Is it possible to recover the corrupted img?
ty in advice

Источник

How to Unlock a Proxmox VM

From time to time, you’ll come across the need to kill a lock on your Proxmox server. Fear not, in today’s guide we’ll discuss the various lock errors you may face and how to unlock a Proxmox VM.

Proxmox Locked VM Errors

Error: VM is locked

The «VM is locked» error is the most common circumstance in which you may want to kill a VM lock. This error has a lot of variants including:

  • Error: VM is locked (backup)
  • Error: VM is locked (snapshot)
  • Error: VM is locked (clone)

Error: VM is locked in Proxmox

As you can see, they all share the same «Error: VM is locked» root, but with a suffix that indicates the task that initiated the lock, whether that task be a backup, a snapshot, or clone. This can be useful for determining IF you should clear the lock. (i.e. if the backup job is still running, you probably shouldn’t clear the lock and just let the backup process complete).

can’t lock file ‘/var/lock/qemu-server/lock- .conf’ – got timeout

This is another common error, often seen when you’re trying to shutdown/stop a virtual machine or when qm unlock fails (see below).

Proxmox Unlock VM Methods

There are two main ways of clearing a lock on a Proxmox VM: 1) using the qm unlock command and 2) manually deleting the lock.

qm unlock

qm unlock should be your first choice for unlocking a Proxmox VM.

First, find your VM ID (it’s the number next to your VM in the Proxmox GUI). If you’re not using the WebGUI, you can obtain a list of your VM IDs with:

Unlock the VM/kill the lock with:

Now, this may not always work, and the command may fail with:

In that case, let’s move on to plan B: manually deleting the lock.

Manually Deleting the Lock

If you received the error message, «can’t lock file ‘/var/lock/qemu-server/lock- .conf’ — got timeout», you can fix this by manually deleting the lock at that location. Simply put, you can just run the following command:

Obviously, this should be a last resort. It’s generally not a great practice to go around killing locks, but sometimes you have no choice.

I hope this guide helped you out. Let me know if you have any questions or feel I left something out in the comments/forums below!

Источник

TASK ERROR: VM is locked (backup)

jmjosebest

Active Member

Hi, after a full host restart I get this error starting a virtual machine KVM.

TASK ERROR: VM is locked (backup)

/var/lib/vz/lock is empty.
How can I unlock?
Thanks

jmjosebest

Active Member

PC Rescue

New Member

RollMops

Active Member

fabian

Proxmox Staff Member

Best regards,
Fabian

Do you already have a Commercial Support Subscription? — If not, Buy now and read the documentation

RollMops

Active Member

fabian

Proxmox Staff Member

check the system logs around the time of the failed backup, try manually backing up the VM a couple of times and check for failure.

it might have been a one time fluke.

Best regards,
Fabian

Do you already have a Commercial Support Subscription? — If not, Buy now and read the documentation

RollMops

Active Member

I can provoke a system crash by manually starting a backup of the VE that I use to find in a locked state.
After 1 or 2 mins., process z_wr_iss takes 97% CPU and subsequently the server fails. It then automatically reboots, but the VE is still locked from the pending/failed backup process and does not automatically recover.
I attach a screenshot of the last state seen in «top».
Could this be prevented by adding some RAM to the box?

Источник

TASK ERROR: VM is locked (backup)

Active Member

Occasionally, usually every 2-3 days (I have nightly backups setup), my lxc containers freeze and ‘pct unlock’ won’t unlock them and I have to reboot the nodes.

This appears to be a common problem with lxc (I wish I would have stayed with openvz). Is there any solution to this? I’m backing up to a LVM storage

root@rovio:/media# pvdisplay ; lvdisplay ; mount
— Physical volume —
PV Name /dev/sdb3
VG Name pve
PV Size 447.00 GiB / not usable 3.84 MiB
Allocatable yes
PE Size 4.00 MiB
Total PE 114432
Free PE 4095
Allocated PE 110337
PV UUID mK0CnO-c5LV-kPJs-E6aw-Fupo-G0qg-OaPfHE

— Physical volume —
PV Name /dev/md0
VG Name store
PV Size 931.39 GiB / not usable 4.69 MiB
Allocatable yes (but full)
PE Size 4.00 MiB
Total PE 238434
Free PE 0
Allocated PE 238434
PV UUID fU89Zd-gihP-B5ta-Wzk5-7UsT-Uju0-C2AExy

— Logical volume —
LV Path /dev/pve/swap
LV Name swap
VG Name pve
LV UUID GGcRZ9-Q12y-FLsv-n0ds-ZEO9-9yaU-60A2i8
LV Write Access read/write
LV Creation host, time proxmox, 2016-01-11 06:53:53 -0500
LV Status available
# open 2
LV Size 31.00 GiB
Current LE 7936
Segments 1
Allocation inherit
Read ahead sectors auto
— currently set to 256
Block device 252:1

— Logical volume —
LV Path /dev/pve/root
LV Name root
VG Name pve
LV UUID mw530S-A1SO-BCIR-Taad-ORid-1nL7-wJaIEC
LV Write Access read/write
LV Creation host, time proxmox, 2016-01-11 06:53:53 -0500
LV Status available
# open 1
LV Size 96.00 GiB
Current LE 24576
Segments 1
Allocation inherit
Read ahead sectors auto
— currently set to 256
Block device 252:0

— Logical volume —
LV Path /dev/pve/data
LV Name data
VG Name pve
LV UUID I7NH4F-e3VY-7z0R-RuVK-kOos-1WB3-QOWH7C
LV Write Access read/write
LV Creation host, time proxmox, 2016-01-11 06:53:53 -0500
LV Status available
# open 1
LV Size 304.00 GiB
Current LE 77825
Segments 1
Allocation inherit
Read ahead sectors auto
— currently set to 256
Block device 252:2

— Logical volume —
LV Path /dev/store/store
LV Name store
VG Name store
LV UUID ixRcPS-6wIf-k58J-ubbx-ksJx-ABdi-giLfKe
LV Write Access read/write
LV Creation host, time rovio, 2016-01-29 19:07:14 -0500
LV Status available
# open 1
LV Size 931.38 GiB
Current LE 238434
Segments 1
Allocation inherit
Read ahead sectors auto
— currently set to 256
Block device 252:3

Источник

Can’t unlock vm

mmohamed2

New Member

i have errors when i try to backup

I tried

  • qm unlock 129

but i got

  • unable to open file ‘/etc/pve/nodes/proxmoxtiebosch/qemu-server/129.conf.tmp.22475’ — Input/output error

Also i can’t create folder

How can i fix it please ?
Thank you !

t.lamprecht

Proxmox Staff Member

First: you’re on a pretty outdated version of Proxmox VE, the current version of the old 5.x release is 5.4, and 6.1 would be available already..

That said, it seems like the cluster file system hangs, this could be due to two things:

* some bug / situation which made the pmxcfs lock up

* Real hardware error, e.g., a faulty disk where the database of the pmxcfs is, or some thing similar.

Please post some additional info:

Best regards,
Thomas

Do you already have a Commercial Support Subscription? — If not, Buy now and read the documentation

mmohamed2

New Member

Thank you for reply,

Can you guide me how to upgrade Proxmox VE please ?

# ps aux|grep pmxcfs
root 4361 0.1 0.1 1136680 59048 ? Ssl Apr08 4:14 /usr/bin/pmxcfs
root 7465 0.0 0.0 12784 852 pts/1 S+ 09:52 0:00 grep pmxcfs

# systemctl status pve-cluster
● pve-cluster.service — The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2020-04-08 12:18:11 CEST; 2 days ago
Main PID: 4361 (pmxcfs)
Tasks: 13 (limit: 4915)
Memory: 59.2M
CPU: 4min 15.268s
CGroup: /system.slice/pve-cluster.service
└─4361 /usr/bin/pmxcfs

Apr 09 01:32:46 proxmoxtiebosch pmxcfs[4361]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmoxtiebosch/local: -1
Apr 09 01:32:46 proxmoxtiebosch pmxcfs[4361]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmoxtiebosch/local-zfs: -1
Apr 09 01:35:48 proxmoxtiebosch pmxcfs[4361]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmoxtiebosch/rpool2: -1
Apr 09 01:35:48 proxmoxtiebosch pmxcfs[4361]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmoxtiebosch/local: -1
Apr 09 01:35:48 proxmoxtiebosch pmxcfs[4361]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmoxtiebosch/local-zfs: -1
Apr 09 01:36:38 proxmoxtiebosch pmxcfs[4361]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmoxtiebosch/local-zfs: -1
Apr 09 01:36:38 proxmoxtiebosch pmxcfs[4361]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmoxtiebosch/local: -1
Apr 09 01:36:38 proxmoxtiebosch pmxcfs[4361]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmoxtiebosch/rpool2: -1
Apr 09 01:46:26 proxmoxtiebosch pmxcfs[4361]: [database] crit: commit transaction failed: database or disk is full#010
Apr 09 01:46:26 proxmoxtiebosch pmxcfs[4361]: [database] crit: rollback transaction failed: cannot rollback — no transaction is active#010

Disk spaces

Источник

Понравилась статья? Поделить с друзьями:
  • Task error failed to run vncproxy proxmox
  • Task error command ha manager set vm 100 state started failed exit code 255
  • Task error check if node may join a cluster failed
  • Task error cannot prepare pci pass through iommu not present
  • Tarkov ошибка авторизации после рейда