Rbd error opening default pool rbd - Исправление ошибок и поиск оптимальных решений проблем

Hello!

I’ve 4 testing nodes in cluster. I’ve successfully created ceph cluster and it is working well for VM storage (ceph-vm)

I’ve created another pool (ceph-storage) to mount it from VM to store files.

Is that possible? If so, how can I mount it?

mount -t ceph 10.10.10.1:/ /mnt/ceph-storage -o name=admin,secret=BQDz/ehZBRmwFhAAhkpfVQ5/mL8NLJLLOScsaw==

reports
mount: /mnt/ceph-storage: cannot mount 10.10.10.1:/ read-only.

Please help, cant find a solution online.

Thank you.

Check this post for guidance. Note that what your created is block storage.

https://www.virtualtothecore.com/en…unt-ceph-as-a-block-device-on-linux-machines/

For example:
rbd create data —size 10000
rbd feature disable data exclusive-lock object-map fast-diff deep-flatten
rbd map data
mkfs.xfs -L data /dev/rbd0
mount /dev/rbd0 /mnt/

Now all you have to do is this on other nodes:
rbd map data
mount /dev/rbd0 /mnt/

Now go to Datacenter and map the Directory.

Last edited: Oct 27, 2017

Thank you! Working like a charm

Now all you have to do is this on other nodes:
rbd map data
mount /dev/rbd0 /mnt/

Make sure to mount this as RO on all but one node or accidents can and do happen.

If you need to write from multiple nodes, you need to set up an nfs share or install cephfs (instructions available elsewhere on the forum.)

Check this post for guidance. Note that what your created is block storage.

https://www.virtualtothecore.com/en…unt-ceph-as-a-block-device-on-linux-machines/

For example:
rbd create data —size 10000
rbd feature disable data exclusive-lock object-map fast-diff deep-flatten
rbd map data
mkfs.xfs -L data /dev/rbd0
mount /dev/rbd0 /mnt/

Now all you have to do is this on other nodes:
rbd map data
mount /dev/rbd0 /mnt/

Now go to Datacenter and map the Directory.

Hi,

I tried this but got the following error:

rbd: error opening default pool ‘rbd’
Ensure that the default pool has been created or specify an alternate pool name.

Code:

root@virt1:~# more /etc/ceph/rbdmap
# RbdDevice             Parameters
#poolname/imagename     id=client,keyring=/etc/ceph/ceph.client.keyring
root@virt1:~# more /etc/pve/storage.cfg
dir: local
        disable
        path /var/lib/vz
        content iso,vztmpl
        maxfiles 1
        shared 0

lvmthin: local-lvm
        disable
        thinpool data
        vgname pve
        content rootdir,images

nfs: ISOS
        export /data/isos
        path /mnt/pve/ISOS
        server 192.168.102.2
        content iso
        maxfiles 1
        options vers=3

rbd: Data_vm
        content images
        krbd 0
        pool Data

rbd: Data2_vm
        content images
        krbd 0
        pool Data2

Hi guys,

I want to make small fileserver in my home, with advantages of CEPH + Proxmox.
So, to learn things, I’ve setup it all on VMware Workstation, just for testing and understanding.
What I’ve achieved:
— working Proxmox cluster
— working CEPH cluster
— 6 OSDs up and in — small ones, as those are only tests.
— created pool for all needed things (containers, images)

Now I’m downloading turnkey-fileserver appliance LXC container. It uses Samba and other sharing techniques to share data.
But it needs locally mounted directory, which then will be shared. So, according to the instructions above here in the thread, I would need to mount my storage RBD image locally (I assume, that because it is impossible to predict the exact space to share, we can create our RBD device of size much bigger then we have as physical disks, right? So even if we have some for example 4TB of usable space, we can make this device like it was 20TB?)

And now to the point: Even if we mount the same RBD device on all nodes, how to deal with High Availability?
I want to create HA resource and attach this LXC fileserver to it. So it will always survive, and would be moved to the other node in case the current one fails. But what about that mountpoint it uses to share the data? It should be probably remounted RW if the machine moves, and I wonder if this wouldn’t dirupt transfer in progress. Any clues about it?

You probably can say: set 4-th machine, install just Samba and share using iSCSI from CEPH, but why I would do it — it’s small home environment, and hey, I want to learn something

Or maybe there is another way to achieve such a ‘all-in-one’ solution (and don’t tell me: buy a NAS )

Thanks in advance for any advices for this interesting usecase

Hi guys,

I want to make small fileserver in my home, with advantages of CEPH + Proxmox.
[snipped]
Thanks in advance for any advices for this interesting usecase

Hello Twinsen, I am in somewhat the same situation as you. One thing that I have found that might interest you is that cephFS can be shared out using SMB/Cifs. If you do this on all your nodes and use DNS load balancing or possibly by using a virtual ip (but I do not yet know how to do that), I think that we could theoretically create the following example:

SMB client connects to smb://smbceph.example.lan, which load balances to 192.168.1.101, 192.168.102 and 192.168.1.103. The CephFS contents are the same on each node because CEPH. Should a node go down, load balancing ensures the connecting client seamlessly moves to the next node.

This example is based on my current understanding, which might well be lacking. It is also entirely possible that I am flat out wrong . Please point it out if that is the case.

You can find more information about using SMB/CIFS with cephFS by searching for «vfs_ceph.8«. (I would have linked directly, but as a new user on this forum I am not allowed to post external links). I am still researching this myself and I have no idea (yet) how to make this work. But perhaps we can help each other (and the community) by figuring this out together!

Edit:
So I searched around and found that ceph.so which is part of vfs_ceph is not included in the samba version that comes with Debian Stretch, but is part of Buster. As such, we would need to import a .deb file. I found the latest version which should include ceph.so as well. packages.debian.org/sid/amd64/samba/download

So that has to be installed with # dpkg -i samba_4.7.4+dfsg-1_amd64.deb
Possibly there also has to be done some cleanup with # apt-get install -f

Next up would be
1. Getting cephFS working, https://forum.proxmox.com/threads/cephfs-installation.36459/
2. Mount cephFS using ceph-fuse
3. Share cephFS mount using SMB vfs_ceph

Last edited: Feb 5, 2018

Источник

Bug 1478798
— [RBD]: «rbd trash list -a» looks for the default rbd pool

Summary:

[RBD]: «rbd trash list -a» looks for the default rbd pool

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat
Component:	RBD
Sub Component:
Version:	3.0
Hardware:	Unspecified
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	3.0
Assignee:	Jason Dillaman
QA Contact:	ceph-qe-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-08-07 06:53 UTC by Tejas
Modified:	2017-08-09 12:40 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-08-09 12:40:34 UTC
Target Upstream Version:
Dependent Products:

Attachments	(Terms of Use)
Add an attachment (proposed patch, testcase, etc.)

Источник

env¶

192.168.31.172 ceph-client

修改 hostname 为 ceph-client

INSTALL CEPH¶

admin 节点：

ceph-deploy install ceph-client
ceph-deploy admin ceph-client

ceph-client 节点：

sudo chmod +r /etc/ceph/ceph.client.admin.keyring

CREATE A BLOCK DEVICE POOL¶

admin 节点：

原来在 admin 节点，是没有这个 rbd 命令的。

要通过 sudo apt install ceph-common 之后，才会有。所以先运行一下。

cephu@cephadmin:~/my-cluster$ sudo apt install ceph-common -y
cephu@cephadmin:~/my-cluster$ rbd pool init jlch
2017-10-24 15:57:45.951917 7fde4fa6e0c0 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2017-10-24 15:57:45.951937 7fde4fa6e0c0 -1 monclient: ERROR: missing keyring, cannot use cephx for authentication
2017-10-24 15:57:45.951942 7fde4fa6e0c0  0 librados: client.admin initialization error (2) No such file or directory
rbd: couldn't connect to the cluster!
cephu@cephadmin:~/my-cluster$

ceph-client 节点：

jlch@k-m:/etc/apt/sources.list.d$ sudo ls /etc/ceph/* -l
[sudo] password for jlch:
-rw-r--r-- 1 root root  63 Oct 24 15:32 /etc/ceph/ceph.client.admin.keyring
-rw-r--r-- 1 root root 249 Oct 24 15:32 /etc/ceph/ceph.conf
-rw-r--r-- 1 root root  92 Apr 21  2017 /etc/ceph/rbdmap
-rw------- 1 root root   0 Oct 24 11:20 /etc/ceph/tmp2IJh4C
jlch@k-m:/etc/apt/sources.list.d$

我去，明明有的，为什么说 unable to find a keyring on ***，哪里出的问题?

哈哈，这个问题的原因找到了。需要把上一小节的最后一个部分，执行一下。

而且，要按上一小节的方式，osd create 一个 pool 出来，才能在这里 pool init .

比如之前就是创建了 pool mytest. 这里就是

CONFIGURE A BLOCK DEVICE¶

On the ceph-client node

cephu@ceph-client:~$ rbd create mytest --size 4096 -m mon1 -k /etc/ceph/ceph.client.admin.keyring
rbd: error opening default pool 'rbd'
Ensure that the default pool has been created or specify an alternate pool name.
cephu@ceph-client:~$

报错，因为，这里说了，opening default pool ‘rbd’ 出错，为啥？因为之前没有建立过 pool rbd 呀，之前，只有 pool mytest 呀。
怎么办？2个方法。1）重新新建立一个 pool rbd, 2）指向到已建立的pool, 如 pool mytest 。

我们走方法2.

方法1

回 ceph-admin 节点：

cephu@cephadmin:~/my-cluster$ ceph osd pool create rbd 8
pool 'rbd' created
cephu@cephadmin:~/my-cluster$ rbd pool init rbd
cephu@cephadmin:~/my-cluster$

回 ceph-client 节点：

cephu@ceph-client:~$ rbd create foo --size 4096 -m mon1 -k /etc/ceph/ceph.client.admin.keyring

方法2

cephu@ceph-client:~$ rbd help create # 这里会发现是加 -p 参数来指定 pool
cephu@ceph-client:~$ rbd create foo --size 4096 -m mon1 -k /etc/ceph/ceph.client.admin.keyring -p mytest
cephu@ceph-client:~$

好了，可以走下一步了。
我们继续。

On the ceph-client node, map the image to a block device.

cephu@ceph-client:~$ sudo rbd map foo --image client.admin -m mon1  -p mytest
rbd: sysfs write failed
rbd: error opening image client.admin: (2) No such file or directory
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: map failed: (110) Connection timed out
cephu@ceph-client:~$

报错了。

cephu@ceph-client:~$ dmesg | tail -n 100
...
[692522.117250] libceph: mon0 192.168.31.114:6789 missing required protocol features
[692532.096436] libceph: mon0 192.168.31.114:6789 feature set mismatch, my 106b84a842a42 < server's 40106b84a842a42, missing 400000000000000
[692532.099897] libceph: mon0 192.168.31.114:6789 missing required protocol features
[692542.111938] libceph: mon0 192.168.31.114:6789 feature set mismatch, my 106b84a842a42 < server's 40106b84a842a42, missing 400000000000000
[692542.115603] libceph: mon0 192.168.31.114:6789 missing required protocol features

http://www.hl10502.com/2017/08/01/ceph-rbdmap-error-1/

上面有一些详细的原理

又要农总出马了。

http://blog.csdn.net/lk142500/article/details/78275910

这里指出了解决方法。来吧。

admin 节点

cephu@cephadmin:~/my-cluster$ ceph -v
ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)
cephu@cephadmin:~/my-cluster$ ceph osd crush tunables optimal
adjusted tunables profile to optimal
cephu@cephadmin:~/my-cluster$ ceph osd crush rule ls
replicated_rule
cephu@cephadmin:~/my-cluster$ ceph osd crush rule dump
[
    {
        "rule_id": 0,
        "rule_name": "replicated_rule",
        "ruleset": 0,
        "type": 1,
        "min_size": 1,
        "max_size": 10,
        "steps": [
            {
                "op": "take",
                "item": -1,
                "item_name": "default"
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "host"
            },
            {
                "op": "emit"
            }
        ]
    }
]

cephu@cephadmin:~/my-cluster$ ceph osd crush show-tunables
{
    "choose_local_tries": 0,
    "choose_local_fallback_tries": 0,
    "choose_total_tries": 50,
    "chooseleaf_descend_once": 1,
    "chooseleaf_vary_r": 1,
    "chooseleaf_stable": 1,
    "straw_calc_version": 1,
    "allowed_bucket_algs": 54,
    "profile": "jewel",
    "optimal_tunables": 1,
    "legacy_tunables": 0,
    "minimum_required_version": "jewel",
    "require_feature_tunables": 1,
    "require_feature_tunables2": 1,
    "has_v2_rules": 0,
    "require_feature_tunables3": 1,
    "has_v3_rules": 0,
    "has_v4_buckets": 1,
    "require_feature_tunables5": 1,
    "has_v5_rules": 0
}

cephu@cephadmin:~/my-cluster$ ceph osd crush -h

General usage:
==============
usage: ceph [-h] [-c CEPHCONF] [-i INPUT_FILE] [-o OUTPUT_FILE]
            [--id CLIENT_ID] [--name CLIENT_NAME] [--cluster CLUSTER]
            [--admin-daemon ADMIN_SOCKET] [-s] [-w] [--watch-debug]
            [--watch-info] [--watch-sec] [--watch-warn] [--watch-error]
            [--watch-channel WATCH_CHANNEL] [--version] [--verbose]
            [--concise] [-f {json,json-pretty,xml,xml-pretty,plain}]
            [--connect-timeout CLUSTER_TIMEOUT]

Ceph administration tool

optional arguments:
-h, --help            request mon help
-c CEPHCONF, --conf CEPHCONF
                        ceph configuration file
-i INPUT_FILE, --in-file INPUT_FILE
                        input file, or "-" for stdin
-o OUTPUT_FILE, --out-file OUTPUT_FILE
                        output file, or "-" for stdout
--id CLIENT_ID, --user CLIENT_ID
                        client id for authentication
--name CLIENT_NAME, -n CLIENT_NAME
                        client name for authentication
--cluster CLUSTER     cluster name
--admin-daemon ADMIN_SOCKET
                        submit admin-socket commands ("help" for help
-s, --status          show cluster status
-w, --watch           watch live cluster changes
--watch-debug         watch debug events
--watch-info          watch info events
--watch-sec           watch security events
--watch-warn          watch warn events
--watch-error         watch error events
--watch-channel WATCH_CHANNEL
                        which log channel to follow when using -w/--watch. One
                        of ['cluster', 'audit', '*'
--version, -v         display version
--verbose             make verbose
--concise             make less verbose
-f {json,json-pretty,xml,xml-pretty,plain}, --format {json,json-pretty,xml,xml-pretty,plain}
--connect-timeout CLUSTER_TIMEOUT
                        set a timeout for connecting to the cluster

Local commands:
===============

ping <mon.id>           Send simple presence/life test to a mon
                        <mon.id> may be 'mon.*' for all mons
daemon {type.id|path} <cmd>
                        Same as --admin-daemon, but auto-find admin socket
daemonperf {type.id | path} [stat-pats] [priority] [<interval>] [<count>]
daemonperf {type.id | path} list|ls [stat-pats] [priority]
                        Get selected perf stats from daemon/admin socket
                        Optional shell-glob comma-delim match string stat-pats
                        Optional selection priority (can abbreviate name):
                        critical, interesting, useful, noninteresting, debug
                        List shows a table of all available stats
                        Run <count> times (default forever),
                        once per <interval> seconds (default 1)


Monitor commands:
=================
osd crush add <osdname (id|osd.id)> <float[0.0-]> <args> [<args>...]                                                 add or update crushmap position and weight for <name> with <weight> and location <args>
osd crush add-bucket <name> <type>                                                                                   add no-parent (probably root) crush bucket <name> of type <type>
osd crush class ls                                                                                                   list all crush device classes
osd crush class ls-osd <class>                                                                                       list all osds belonging to the specific <class>
osd crush class rename <srcname> <dstname>                                                                           rename crush device class <srcname> to <dstname>
osd crush create-or-move <osdname (id|osd.id)> <float[0.0-]> <args> [<args>...]                                      create entry or move existing entry for <name> <weight> at/to location <args>
osd crush dump                                                                                                       dump crush map
osd crush get-tunable straw_calc_version                                                                             get crush tunable <tunable>
osd crush link <name> <args> [<args>...]                                                                             link existing entry for <name> under location <args>
osd crush ls <node>                                                                                                  list items beneath a node in the CRUSH tree
osd crush move <name> <args> [<args>...]                                                                             move existing entry for <name> to location <args>
osd crush remove <name> {<ancestor>}                                                                                 remove <name> from crush map (everywhere, or just at <ancestor>)
osd crush rename-bucket <srcname> <dstname>                                                                          rename bucket <srcname> to <dstname>
osd crush reweight <name> <float[0.0-]>                                                                              change <name>'s weight to <weight> in crush map
osd crush reweight-all                                                                                               recalculate the weights for the tree to ensure they sum correctly
osd crush reweight-subtree <name> <float[0.0-]>                                                                      change all leaf items beneath <name> to <weight> in crush map
osd crush rm <name> {<ancestor>}                                                                                     remove <name> from crush map (everywhere, or just at <ancestor>)
osd crush rm-device-class <ids> [<ids>...]                                                                           remove class of the osd(s) <id> [<id>...],or use <all|any|*> to remove all.
osd crush rule create-erasure <name> {<profile>}                                                                     create crush rule <name> for erasure coded pool created with <profile> (default default)
osd crush rule create-replicated <name> <root> <type> {<class>}                                                      create crush rule <name> for replicated pool to start from <root>, replicate across buckets of type <type>, using a
                                                                                                                    choose mode of <firstn|indep> (default firstn; indep best for erasure pools)
osd crush rule create-simple <name> <root> <type> {firstn|indep}                                                     create crush rule <name> to start from <root>, replicate across buckets of type <type>, using a choose mode of
                                                                                                                    <firstn|indep> (default firstn; indep best for erasure pools)
osd crush rule dump {<name>}                                                                                         dump crush rule <name> (default all)
osd crush rule ls                                                                                                    list crush rules
osd crush rule ls-by-class <class>                                                                                   list all crush rules that reference the same <class>
osd crush rule rename <srcname> <dstname>                                                                            rename crush rule <srcname> to <dstname>
osd crush rule rm <name>                                                                                             remove crush rule <name>
osd crush set <osdname (id|osd.id)> <float[0.0-]> <args> [<args>...]                                                 update crushmap position and weight for <name> to <weight> with location <args>
osd crush set {<int>}                                                                                                set crush map from input file
osd crush set-device-class <class> <ids> [<ids>...]                                                                  set the <class> of the osd(s) <id> [<id>...],or use <all|any|*> to set all.
osd crush set-tunable straw_calc_version <int>                                                                       set crush tunable <tunable> to <value>
osd crush show-tunables                                                                                              show current crush tunables
osd crush swap-bucket <source> <dest> {--yes-i-really-mean-it}                                                       swap existing bucket contents from (orphan) bucket <source> and <target>
osd crush tree {--show-shadow}                                                                                       dump crush buckets and items in a tree view
osd crush tunables legacy|argonaut|bobtail|firefly|hammer|jewel|optimal|default                                      set crush tunables values to <profile>
osd crush unlink <name> {<ancestor>}                                                                                 unlink <name> from crush map (everywhere, or just at <ancestor>)
osd crush weight-set create <poolname> flat|positional                                                               create a weight-set for a given pool
osd crush weight-set create-compat                                                                                   create a default backward-compatible weight-set
osd crush weight-set dump                                                                                            dump crush weight sets
osd crush weight-set ls                                                                                              list crush weight sets
osd crush weight-set reweight <poolname> <item> <float[0.0-]> [<float[0.0-]>...]                                     set weight for an item (bucket or osd) in a pool's weight-set
osd crush weight-set reweight-compat <item> <float[0.0-]> [<float[0.0-]>...]                                         set weight for an item (bucket or osd) in the backward-compatible weight-set
osd crush weight-set rm <poolname>                                                                                   remove the weight-set for a given pool
osd crush weight-set rm-compat                                                                                       remove the backward-compatible weight-set
cephu@cephadmin:~/my-cluster$ ceph osd crush tunables hammer
adjusted tunables profile to hammer
cephu@cephadmin:~/my-cluster$ ceph osd crush show-tunables
{
    "choose_local_tries": 0,
    "choose_local_fallback_tries": 0,
    "choose_total_tries": 50,
    "chooseleaf_descend_once": 1,
    "chooseleaf_vary_r": 1,
    "chooseleaf_stable": 0,
    "straw_calc_version": 1,
    "allowed_bucket_algs": 54,
    "profile": "hammer",
    "optimal_tunables": 0,
    "legacy_tunables": 0,
    "minimum_required_version": "hammer",
    "require_feature_tunables": 1,
    "require_feature_tunables2": 1,
    "has_v2_rules": 0,
    "require_feature_tunables3": 1,
    "has_v3_rules": 0,
    "has_v4_buckets": 1,
    "require_feature_tunables5": 0,
    "has_v5_rules": 0
}

cephu@cephadmin:~/my-cluster$

回 ceph-client 节点：

cephu@ceph-client:~$ ceph -v
ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)
cephu@ceph-client:~$ rbd ls
foo
cephu@ceph-client:~$ sudo rbd map foo --name client.admin
/dev/rbd0
cephu@ceph-client:~$ ls /dev/rbd
rbd/  rbd0
cephu@ceph-client:~$ ls /dev/rbd/rbd/foo
/dev/rbd/rbd/foo
cephu@ceph-client:~$ ls /dev/rbd0
/dev/rbd0
cephu@ceph-client:~$ ls /dev/rbd0 -l
brw-rw---- 1 root disk 251, 0 Oct 25 12:03 /dev/rbd0
cephu@ceph-client:~$

成功了。。继续。。

Use the block device by creating a file system on the ceph-client node.

cephu@ceph-client:~$ sudo mkfs.ext4 -m0 /dev/rbd/rbd/foo
mke2fs 1.42.13 (17-May-2015)
Discarding device blocks: done
Creating filesystem with 1048576 4k blocks and 262144 inodes
Filesystem UUID: d83ebc8d-1956-4d81-b9db-391f939634ac
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736

Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

cephu@ceph-client:~$

Mount the file system on the ceph-client node.

cephu@ceph-client:~$ sudo mkdir /mnt/ceph-block-device
cephu@ceph-client:~$ sudo mount /dev/rbd/rbd/foo /mnt/ceph-block-device
cephu@ceph-client:~$ cd /mnt/ceph-block-device
cephu@ceph-client:/mnt/ceph-block-device$ ls
lost+found
cephu@ceph-client:/mnt/ceph-block-device$

好了，这小一节，结束了。

Источник

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Источник

verbunk

Following the README there are a couple of additions I needed,

Following,
vagrant@ceph-admin:~/test-cluster$ ssh ceph-server-3 sudo chmod +r /etc/ceph/ceph.client.admin.keyring

Do,

Create Manager

vagrant@ceph-admin:~/test-cluster$ ceph-deploy mgr create ceph-admin:mon_mgr

Following,
vagrant@ceph-admin:~/test-cluster$ ceph quorum_status —format json-pretty

Do,

Create a default rbd pool

vagrant@ceph-admin:~/test-cluster$ ceph osd pool create rbd 150 150

In case this helps someone else.

My Cluster Health is not . OK, is the any way I can troubleshoot it ?
I am geting the following:

vagrant@ceph-admin:~/test-cluster$  ceph -w
  cluster:
    id:     de6ff090-3c18-47a8-9651-6bda85920653
    health: HEALTH_WARN
            no active mgr
 
  services:
    mon: 3 daemons, quorum ceph-server-1,ceph-server-2,ceph-server-3
    mgr: no daemons active
    mds: cephfs-1/1/1 up  {0=ceph-server-1=up:active}
    osd: 3 osds: 3 up, 3 in
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0B
    usage:   0B used, 0B / 0B avail
    pgs:

swinghu

from up ceph version 12 to now, manager is needed. So you should create a mgr for every mon node
or else the ceph status will HEATH_WARN . after create mgr for mon
ceph-deploy mgr create ceph-admin:mon_mgr
ceph status will be HEALTH

It got failed with following error:

Unmounting Virtualbox Guest Additions ISO from: /mnt
==> ceph-admin: Checking for guest additions in VM...
==> ceph-admin: Setting hostname...
==> ceph-admin: Configuring and enabling network interfaces...
The following SSH command responded with a non-zero exit status.
Vagrant assumes that this means the command failed!

/sbin/ifdown 'enp0s8' || true
/sbin/ip addr flush dev 'enp0s8'
# Remove any previous network modifications from the interfaces file
sed -e '/^#VAGRANT-BEGIN/,$ d' /etc/network/interfaces > /tmp/vagrant-network-interfaces.pre
sed -ne '/^#VAGRANT-END/,$ p' /etc/network/interfaces | tac | sed -e '/^#VAGRANT-END/,$ d' | tac > /tmp/vagrant-network-interfaces.post

cat 
  /tmp/vagrant-network-interfaces.pre 
  /tmp/vagrant-network-entry 
  /tmp/vagrant-network-interfaces.post 
  > /etc/network/interfaces

rm -f /tmp/vagrant-network-interfaces.pre
rm -f /tmp/vagrant-network-entry
rm -f /tmp/vagrant-network-interfaces.post

/sbin/ifup 'enp0s8'

Stdout from the command:



Stderr from the command:

bash: line 5: /sbin/ifdown: No such file or directory
bash: line 21: /sbin/ifup: No such file or directory

Following is the vagrant output from inside the machine and machine doesn’t container /sbin/ifdown and /sbin/ifup commands.

vagrant@ceph-admin:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 02:79:70:27:56:0f brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic enp0s3
       valid_lft 86224sec preferred_lft 86224sec
    inet6 fe80::79:70ff:fe27:560f/64 scope link
       valid_lft forever preferred_lft forever
3: enp0s8: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 08:00:27:75:0f:e1 brd ff:ff:ff:ff:ff:ff

gator1

Hi,
Great work.

I followed your instructions (only to change names of the servers to mon-1—3 and osd-1—3).
I got the status as HEALTH_OK with three osd in and up.

When I tried to map a rbd I got this.

Do you have the same issue? Do you know how to fix it?

In the general term, how do you manage virtualbox ubuntu VM to manage drives?

vagrant@ceph-client:$ sudo rbd create foo —size 4096 -m mon-1
vagrant@ceph-client:$ sudo rbd map foo —pool rbd —name client.admin -m mon-1
rbd: sysfs write failed
RBD image feature set mismatch. You can disable features unsupported by the kernel with «rbd feature disable».
In some cases useful info is found in syslog — try «dmesg | tail» or so.
rbd: map failed: (6) No such device or address

There is a permission denied issue when running the activate command. The issue is that when the /var/local/osd0 and osd1 directories are created the permissions are for root:staff. Changing the line to ssh ceph-server-2 «sudo mkdir /var/local/osd0 && sudo chown ceph:ceph /var/local/osd0» will fix the issue.

After bringing up the vagrant machines and fixing some other problems, I always had the health check fail due to clock skew (about 1s difference).

First tried to install and use ntp(d), but that somehow did not work ok. Then I remembered reading that running ntp is not advised for VMs.

Then found that the clock sync provided by virtualbox guest tools works rather slowly by trying to smoothly adjust the clock and has a high threshold (20mins clock difference) for immediate adjustment.

As ceph requires less than 50ms clock skew, I added this to the Vagrantfile:

  config.vm.provider 'virtualbox' do |vb|
    vb.customize [ "guestproperty", "set", :id, "/VirtualBox/GuestAdd/VBoxService/--timesync-set-threshold", 50 ]
    vb.customize [ "guestproperty", "set", :id, "/VirtualBox/GuestAdd/VBoxService/--timesync-set-start", 1 ]
    vb.customize [ "guestproperty", "set", :id, "/VirtualBox/GuestAdd/VBoxService/--timesync-interval", 1000 ]
  end

So, if the clock is off by >= 50ms, it will get adjusted immediately.

If you upgrade to ceph >= jewel, you’ll need this for ext4:

osd max object name len = 256
osd max object namespace len = 64

Without that, the OSDs won’t start.

ceph-deploy install currently still uses 10.2.5 (not 11, although it is released).

guess a lot of people would like to play with bluestore, so adding the —release option might be nice:

-ceph-deploy install ceph-admin ceph-server-1 ceph-server-2 ceph-server-3 ceph-client
+ceph-deploy install --release=luminous ceph-admin ceph-server-1 ceph-server-2 ceph-server-3 ceph-client

Depending on what you target, maybe don’t add luminous, but show the option, so it is obvious where to give that. (just using a different debian-xxxxx repo is not enough)

==> ceph-admin: Running: inline script
==> ceph-admin: stdin: is not a tty
==> ceph-admin: Get:1 http://security.ubuntu.com trusty-security InRelease [65.9 kB]
==> ceph-admin: Ign http://archive.ubuntu.com trusty InRelease
==> ceph-admin: Get:2 http://archive.ubuntu.com trusty-updates InRelease [65.9 kB]
==> ceph-admin: Get:3 http://security.ubuntu.com trusty-security/main Sources [124 kB]
==> ceph-admin: Hit http://archive.ubuntu.com trusty-backports InRelease
==> ceph-admin: Hit http://archive.ubuntu.com trusty Release.gpg
==> ceph-admin: Get:4 http://security.ubuntu.com trusty-security/universe Sources [49.3 kB]
==> ceph-admin: Get:5 http://archive.ubuntu.com trusty-updates/main Sources [391 kB]
==> ceph-admin: Get:6 http://security.ubuntu.com trusty-security/main amd64 Packages [579 kB]
==> ceph-admin: Get:7 http://ceph.com trusty InRelease
==> ceph-admin: Splitting up /var/lib/apt/lists/partial/ceph.com_debian-hammer_dists_trusty_InRelease into data and signature failed
==> ceph-admin: Ign http://ceph.com trusty InRelease
==> ceph-admin: E
==> ceph-admin: : 
==> ceph-admin: GPG error: http://ceph.com trusty InRelease: Clearsigned file isn't valid, got 'NODATA' (does the network require authentication?)
The SSH command responded with a non-zero exit status. Vagrant
assumes that this means the command failed. The output for this command
should be in the log above. Please read the output to determine what
went wrong.

From the readme:

$ ssh-add -K ~/.vagrant.d/insecure_private_key

My ssh-add does not know -K — should that be -k?

I followed instructions of README.md. But I showed below text after running ‘ceph health’ command.
I’m newbie of Ceph, while I was finding google, I realized what is ‘clock skew’ error. Also, I cannot make block device on ceph-client (when I run rbd command, it don’t give me the response)

HEALTH_WARN clock skew detected on mon.ceph-server-1, 
mon.ceph-server-2, mon.ceph-server-3; Monitor clock skew detected

Ceph documentation said this error resolved by synchronizing clock.

What should I do if there’s a clock skew?

Synchronize your clocks. Running an NTP client may help. 
If you are already using one and you hit this sort of issues, 
check if you are using some NTP server remote
 to your network and consider hosting your own NTP server on your network. 
This last option tends to reduce the amount of issues with monitor clock skews.

So, I installed ntp client to each ceph server, but I also showed above error. What I supposed to do?

Hello, I’d like to say, amazing setup with vagrant!

Are you planning on adding whatever DreamHost Dreamobjects uses, to get Ceph to implement S3 compatibility at any point?

ssh-add -K did’t work for me; what was the -K argument supposed to do?

Источник

carmstrong / multinode-ceph-vagrant
Goto Github
PK

View Code? Open in Web Editor
NEW

69.0
4.0
41.0
22 KB

A 3-node Ceph cluster running on Vagrant VMs.

License: Apache License 2.0

multinode-ceph-vagrant’s Introduction

This workshop walks users through setting up a 3-node Ceph cluster and mounting a block device, using a CephFS mount, and storing a blob oject.

It follows the following Ceph user guides:

Preflight checklist
Storage cluster quick start
Block device quick start
Ceph FS quick start
Install Ceph object gateway
Configuring Ceph object gateway

Note that after many commands, you may see something like:

Unhandled exception in thread started by
sys.excepthook is missing
lost sys.stderr

I’m not sure what this means, but everything seems to have completed successfully, and the cluster will work.

Install prerequisites

Install Vagrant and a provider such as VirtualBox.

We’ll also need the vagrant-cachier and vagrant-hostmanager plugins:

$ vagrant plugin install vagrant-cachier
$ vagrant plugin install vagrant-hostmanager

Add your Vagrant key to the SSH agent

Since the admin machine will need the Vagrant SSH key to log into the server machines, we need to add it to our local SSH agent:

On Mac:

$ ssh-add -K ~/.vagrant.d/insecure_private_key

On *nix:

$ ssh-add -k ~/.vagrant.d/insecure_private_key

Start the VMs

This instructs Vagrant to start the VMs and install ceph-deploy on the admin machine.

Create the cluster

We’ll create a simple cluster and make sure it’s healthy. Then, we’ll expand it.

First, we need to get an interactive shell on the admin machine:

The ceph-deploy tool will write configuration files and logs to the current directory. So, let’s create a directory for the new cluster:

[email protected]:~$ mkdir test-cluster && cd test-cluster

Let’s prepare the machines:

[email protected]:~/test-cluster$ ceph-deploy new ceph-server-1 ceph-server-2 ceph-server-3

Now, we have to change a default setting. For our initial cluster, we are only going to have two object storage daemons. We need to tell Ceph to allow us to achieve an active + clean state with just two Ceph OSDs. Add osd pool default size = 2 to ./ceph.conf.

Because we’re dealing with multiple VMs sharing the same host, we can expect to see more clock skew. We can tell Ceph that we’d like to tolerate slightly more clock skew by adding the following section to ceph.conf:

mon_clock_drift_allowed = 1

After these few changes, the file should look similar to:

[global]
fsid = 7acac25d-2bd8-4911-807e-e35377e741bf
mon_initial_members = ceph-server-1, ceph-server-2, ceph-server-3
mon_host = 172.21.12.12,172.21.12.13,172.21.12.14
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd pool default size = 2
mon_clock_drift_allowed = 1

Install Ceph

We’re finally ready to install!

Note here that we specify the Ceph release we’d like to install, which is luminous.

[email protected]:~/test-cluster$ ceph-deploy install --release=luminous ceph-admin ceph-server-1 ceph-server-2 ceph-server-3 ceph-client

Configure monitor and OSD services

Next, we add a monitor node:

[email protected]:~/test-cluster$ ceph-deploy mon create-initial

And our two OSDs. For these, we need to log into the server machines directly:

[email protected]:~/test-cluster$ ssh ceph-server-2 "sudo mkdir /var/local/osd0 && sudo chown ceph:ceph /var/local/osd0"

[email protected]:~/test-cluster$ ssh ceph-server-3 "sudo mkdir /var/local/osd1 && sudo chown ceph:ceph /var/local/osd1"

Now we can prepare and activate the OSDs:

[email protected]:~/test-cluster$ ceph-deploy osd prepare ceph-server-2:/var/local/osd0 ceph-server-3:/var/local/osd1
[email protected]:~/test-cluster$ ceph-deploy osd activate ceph-server-2:/var/local/osd0 ceph-server-3:/var/local/osd1

Configuration and status

We can copy our config file and admin key to all the nodes, so each one can use the ceph CLI.

[email protected]:~/test-cluster$ ceph-deploy admin ceph-admin ceph-server-1 ceph-server-2 ceph-server-3 ceph-client

We also should make sure the keyring is readable:

[email protected]:~/test-cluster$ sudo chmod +r /etc/ceph/ceph.client.admin.keyring
vagra[email protected]:~/test-cluster$ ssh ceph-server-1 sudo chmod +r /etc/ceph/ceph.client.admin.keyring
[email protected]:~/test-cluster$ ssh ceph-server-2 sudo chmod +r /etc/ceph/ceph.client.admin.keyring
[email protected]:~/test-cluster$ ssh ceph-server-3 sudo chmod +r /etc/ceph/ceph.client.admin.keyring

We also need to create a manager for the cluster. In this case, we make ceph-admin the manager:

[email protected]:~/test-cluster$ ceph-deploy mgr create ceph-admin:mon_mgr

Finally, check on the health of the cluster:

[email protected]:~/test-cluster$ ceph health

You should see something similar to this once it’s healthy:

[email protected]:~/test-cluster$ ceph health
HEALTH_OK
[email protected]:~/test-cluster$ ceph -s
    cluster 18197927-3d77-4064-b9be-bba972b00750
     health HEALTH_OK
     monmap e2: 3 mons at {ceph-server-1=172.21.12.12:6789/0,ceph-server-2=172.21.12.13:6789/0,ceph-server-3=172.21.12.14:6789/0}, election epoch 6, quorum 0,1,2 ceph-server-1,ceph-server-2,ceph-server-3
     osdmap e9: 2 osds: 2 up, 2 in
      pgmap v13: 192 pgs, 3 pools, 0 bytes data, 0 objects
            12485 MB used, 64692 MB / 80568 MB avail
                 192 active+clean

Notice that we have two OSDs (osdmap e9: 2 osds: 2 up, 2 in) and all of the placement groups (pgs) are reporting as active+clean.

Congratulations!

Expanding the cluster

To more closely model a production cluster, we’re going to add one more OSD daemon and a Ceph Metadata Server. We’ll also add monitors to all hosts instead of just one.

Add an OSD

[email protected]:~/test-cluster$ ssh ceph-server-1 "sudo mkdir /var/local/osd2 && sudo chown ceph:ceph /var/local/osd2"

Now, from the admin node, we prepare and activate the OSD:

[email protected]:~/test-cluster$ ceph-deploy osd prepare ceph-server-1:/var/local/osd2
[email protected]:~/test-cluster$ ceph-deploy osd activate ceph-server-1:/var/local/osd2

Watch the rebalancing:

[email protected]:~/test-cluster$ ceph -w

You should eventually see it return to an active+clean state, but this time with 3 OSDs:

[email protected]:~/test-cluster$ ceph -w
    cluster 18197927-3d77-4064-b9be-bba972b00750
     health HEALTH_OK
     monmap e2: 3 mons at {ceph-server-1=172.21.12.12:6789/0,ceph-server-2=172.21.12.13:6789/0,ceph-server-3=172.21.12.14:6789/0}, election epoch 30, quorum 0,1,2 ceph-server-1,ceph-server-2,ceph-server-3
     osdmap e38: 3 osds: 3 up, 3 in
      pgmap v415: 192 pgs, 3 pools, 0 bytes data, 0 objects
            18752 MB used, 97014 MB / 118 GB avail
                 192 active+clean

Add metadata server

Let’s add a metadata server to server1:

[email protected]:~/test-cluster$ ceph-deploy mds create ceph-server-1

Add more monitors

We add monitors to servers 2 and 3.

[email protected]:~/test-cluster$ ceph-deploy mon create ceph-server-2 ceph-server-3

Watch the quorum status, and ensure it’s happy:

[email protected]:~/test-cluster$ ceph quorum_status --format json-pretty

Create a default rbd pool:

[email protected]:~/test-cluster$ ceph osd pool create rbd 150 150

Install Ceph Object Gateway

TODO

Play around!

Now that we have everything set up, let’s actually use the cluster. We’ll use the ceph-client machine for this.

Create a block device

$ vagrant ssh ceph-client
[email protected]:~$ sudo rbd create foo --size 4096 -m ceph-server-1
[email protected]:~$ sudo rbd map foo --pool rbd --name client.admin -m ceph-server-1
[email protected]:~$ sudo mkfs.ext4 -m0 /dev/rbd/rbd/foo
[email protected]:~$ sudo mkdir /mnt/ceph-block-device
[email protected]:~$ sudo mount /dev/rbd/rbd/foo /mnt/ceph-block-device

Create a mount with Ceph FS

TODO

Store a blob object

TODO

Cleanup

When you’re all done, tell Vagrant to destroy the VMs.

multinode-ceph-vagrant’s People

multinode-ceph-vagrant’s Issues

Investigate ways to easily use XFS on VirtualBox

RBD image feature set mismatch

Hi,
Great work.

I followed your instructions (only to change names of the servers to mon-1—3 and osd-1—3).
I got the status as HEALTH_OK with three osd in and up.

When I tried to map a rbd I got this.

Do you have the same issue? Do you know how to fix it?

In the general term, how do you manage virtualbox ubuntu VM to manage drives?

[email protected]:$ sudo rbd create foo —size 4096 -m mon-1
[email protected]:$ sudo rbd map foo —pool rbd —name client.admin -m mon-1
rbd: sysfs write failed
RBD image feature set mismatch. You can disable features unsupported by the kernel with «rbd feature disable».
In some cases useful info is found in syslog — try «dmesg | tail» or so.
rbd: map failed: (6) No such device or address

s3-compatible service

Hello, I’d like to say, amazing setup with vagrant!

Are you planning on adding whatever DreamHost Dreamobjects uses, to get Ceph to implement S3 compatibility at any point?

should create mgr on cep-admin node

rbd: error opening default pool ‘rbd’

Following the README there are a couple of additions I needed,

Following,
[email protected]:~/test-cluster$ ssh ceph-server-3 sudo chmod +r /etc/ceph/ceph.client.admin.keyring

Do,

Create Manager

[email protected]:~/test-cluster$ ceph-deploy mgr create ceph-admin:mon_mgr

Following,
[email protected]:~/test-cluster$ ceph quorum_status —format json-pretty

Do,

Create a default rbd pool

[email protected]:~/test-cluster$ ceph osd pool create rbd 150 150

In case this helps someone else.

show —release option in the readme

ceph-deploy install currently still uses 10.2.5 (not 11, although it is released).

guess a lot of people would like to play with bluestore, so adding the —release option might be nice:

-ceph-deploy install ceph-admin ceph-server-1 ceph-server-2 ceph-server-3 ceph-client
+ceph-deploy install --release=luminous ceph-admin ceph-server-1 ceph-server-2 ceph-server-3 ceph-client

Depending on what you target, maybe don’t add luminous, but show the option, so it is obvious where to give that. (just using a different debian-xxxxx repo is not enough)

Health is not OK

My Cluster Health is not . OK, is the any way I can troubleshoot it ?
I am geting the following:

[email protected]:~/test-cluster$  ceph -w
  cluster:
    id:     de6ff090-3c18-47a8-9651-6bda85920653
    health: HEALTH_WARN
            no active mgr
 
  services:
    mon: 3 daemons, quorum ceph-server-1,ceph-server-2,ceph-server-3
    mgr: no daemons active
    mds: cephfs-1/1/1 up  {0=ceph-server-1=up:active}
    osd: 3 osds: 3 up, 3 in
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0B
    usage:   0B used, 0B / 0B avail
    pgs:

unknown ssh-add option

From the readme:

$ ssh-add -K ~/.vagrant.d/insecure_private_key

My ssh-add does not know -K — should that be -k?

Permission Issue on osd activate command

ssh-add -K

ssh-add -K did’t work for me; what was the -K argument supposed to do?

clock skew issues

After bringing up the vagrant machines and fixing some other problems, I always had the health check fail due to clock skew (about 1s difference).

First tried to install and use ntp(d), but that somehow did not work ok. Then I remembered reading that running ntp is not advised for VMs.

As ceph requires less than 50ms clock skew, I added this to the Vagrantfile:

  config.vm.provider 'virtualbox' do |vb|
    vb.customize [ "guestproperty", "set", :id, "/VirtualBox/GuestAdd/VBoxService/--timesync-set-threshold", 50 ]
    vb.customize [ "guestproperty", "set", :id, "/VirtualBox/GuestAdd/VBoxService/--timesync-set-start", 1 ]
    vb.customize [ "guestproperty", "set", :id, "/VirtualBox/GuestAdd/VBoxService/--timesync-interval", 1000 ]
  end

So, if the clock is off by >= 50ms, it will get adjusted immediately.

Got failed with ifdown missing error

It got failed with following error:

Unmounting Virtualbox Guest Additions ISO from: /mnt
==> ceph-admin: Checking for guest additions in VM...
==> ceph-admin: Setting hostname...
==> ceph-admin: Configuring and enabling network interfaces...
The following SSH command responded with a non-zero exit status.
Vagrant assumes that this means the command failed!

/sbin/ifdown 'enp0s8' || true
/sbin/ip addr flush dev 'enp0s8'
# Remove any previous network modifications from the interfaces file
sed -e '/^#VAGRANT-BEGIN/,$ d' /etc/network/interfaces > /tmp/vagrant-network-interfaces.pre
sed -ne '/^#VAGRANT-END/,$ p' /etc/network/interfaces | tac | sed -e '/^#VAGRANT-END/,$ d' | tac > /tmp/vagrant-network-interfaces.post

cat 
  /tmp/vagrant-network-interfaces.pre 
  /tmp/vagrant-network-entry 
  /tmp/vagrant-network-interfaces.post 
  > /etc/network/interfaces

rm -f /tmp/vagrant-network-interfaces.pre
rm -f /tmp/vagrant-network-entry
rm -f /tmp/vagrant-network-interfaces.post

/sbin/ifup 'enp0s8'

Stdout from the command:



Stderr from the command:

bash: line 5: /sbin/ifdown: No such file or directory
bash: line 21: /sbin/ifup: No such file or directory

Following is the vagrant output from inside the machine and machine doesn’t container /sbin/ifdown and /sbin/ifup commands.

[email protected]:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 02:79:70:27:56:0f brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic enp0s3
       valid_lft 86224sec preferred_lft 86224sec
    inet6 fe80::79:70ff:fe27:560f/64 scope link
       valid_lft forever preferred_lft forever
3: enp0s8: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 08:00:27:75:0f:e1 brd ff:ff:ff:ff:ff:ff

config changes for jewel+

If you upgrade to ceph >= jewel, you’ll need this for ext4:

osd max object name len = 256
osd max object namespace len = 64

Without that, the OSDs won’t start.

Clock skew error.

HEALTH_WARN clock skew detected on mon.ceph-server-1, 
mon.ceph-server-2, mon.ceph-server-3; Monitor clock skew detected

Ceph documentation said this error resolved by synchronizing clock.

What should I do if there’s a clock skew?

Synchronize your clocks. Running an NTP client may help. 
If you are already using one and you hit this sort of issues, 
check if you are using some NTP server remote
 to your network and consider hosting your own NTP server on your network. 
This last option tends to reduce the amount of issues with monitor clock skews.

So, I installed ntp client to each ceph server, but I also showed above error. What I supposed to do?

vagrant up fails, ceph.com trusty gpg issue

==> ceph-admin: Running: inline script
==> ceph-admin: stdin: is not a tty
==> ceph-admin: Get:1 http://security.ubuntu.com trusty-security InRelease [65.9 kB]
==> ceph-admin: Ign http://archive.ubuntu.com trusty InRelease
==> ceph-admin: Get:2 http://archive.ubuntu.com trusty-updates InRelease [65.9 kB]
==> ceph-admin: Get:3 http://security.ubuntu.com trusty-security/main Sources [124 kB]
==> ceph-admin: Hit http://archive.ubuntu.com trusty-backports InRelease
==> ceph-admin: Hit http://archive.ubuntu.com trusty Release.gpg
==> ceph-admin: Get:4 http://security.ubuntu.com trusty-security/universe Sources [49.3 kB]
==> ceph-admin: Get:5 http://archive.ubuntu.com trusty-updates/main Sources [391 kB]
==> ceph-admin: Get:6 http://security.ubuntu.com trusty-security/main amd64 Packages [579 kB]
==> ceph-admin: Get:7 http://ceph.com trusty InRelease
==> ceph-admin: Splitting up /var/lib/apt/lists/partial/ceph.com_debian-hammer_dists_trusty_InRelease into data and signature failed
==> ceph-admin: Ign http://ceph.com trusty InRelease
==> ceph-admin: E
==> ceph-admin: : 
==> ceph-admin: GPG error: http://ceph.com trusty InRelease: Clearsigned file isn't valid, got 'NODATA' (does the network require authentication?)
The SSH command responded with a non-zero exit status. Vagrant
assumes that this means the command failed. The output for this command
should be in the log above. Please read the output to determine what
went wrong.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Источник

Ghost

@ghost~595cea90d73408ce4f6bbc1c

@kokhang Awesome! Just let me know when the fix for the mons update has been added so I then can redeploy my cluster and check the mon IP HA fix then.

will defintely do.. Thank you

i edited all the PVs to have the right mon IPs…gitlab-postgresql has successfully mounted but

Events:
  FirstSeen     LastSeen        Count   From            SubObjectPath   Type            Reason          Message
  ---------     --------        -----   ----            -------------   --------        ------          -------
  2h            2m              79      kubelet, x2                     Warning         FailedMount     Unable to mount volumes for pod "gitlab-gitlab-627609594-gt5m1_default(d2fd45c7-94b3-11e7-8bc5-782bcb1b19dd)": timeout expired waiting for volumes to attach/mount for pod "default"/"gitlab-gitlab-627609594-gt5m1". list of unattached/unmounted volumes=[gitlab-data gitlab-registry]
  2h            2m              79      kubelet, x2                     Warning         FailedSync      Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "default"/"gitlab-gitlab-627609594-gt5m1". list of unattached/unmounted volumes=[gitlab-data gitlab-registry]
  2h            2m              35      kubelet, x2                     Warning         FailedMount     MountVolume.SetUp failed for volume "kubernetes.io/rbd/d2fd45c7-94b3-11e7-8bc5-782bcb1b19dd-pvc-7aff16f6-9451-11e7-bc49-782bcb1b19dd" (spec.Name: "pvc-7aff16f6-9451-11e7-bc49-782bcb1b19dd") pod "d2fd45c7-94b3-11e7-8bc5-782bcb1b19dd" (UID: "d2fd45c7-94b3-11e7-8bc5-782bcb1b19dd") with: rbd: image k8s-dynamic-pvc-7aff16f6-9451-11e7-bc49-782bcb1b19dd-7c7e6bdb-9451-11e7-8acc-5a2dff33db5e is locked by other nodes
  2h            1m              28      kubelet, x2                     Warning         FailedMount     MountVolume.SetUp failed for volume "kubernetes.io/rbd/d2fd45c7-94b3-11e7-8bc5-782bcb1b19dd-pvc-7b17e73e-9451-11e7-bc49-782bcb1b19dd" (spec.Name: "pvc-7b17e73e-9451-11e7-bc49-782bcb1b19dd") pod "d2fd45c7-94b3-11e7-8bc5-782bcb1b19dd" (UID: "d2fd45c7-94b3-11e7-8bc5-782bcb1b19dd") with: rbd: image k8s-dynamic-pvc-7b17e73e-9451-11e7-bc49-782bcb1b19dd-7d728efc-9451-11e7-8acc-5a2dff33db5e is locked by other nodes

«rbd: image k8s-dynamic-pvc-7aff16f6-9451-11e7-bc49-782bcb1b19dd-7c7e6bdb-9451-11e7-8acc-5a2dff33db5e is locked by other nodes» for the two remaining volumes that won’t mount

@fury_twitter are you able to try to manually break the locks using the rbd lock remove command inside the rook toolbox?

if the locks get broken then on the next retry, the attach/mount may start working

@kokhang just to be clear, at attach/mount time, you will be looking up the latest monitor IP addresses and using those for the rbd map command, right?

rbd: error opening default pool ‘rbd’
Ensure that the default pool has been created or specify an alternate pool name.

you’ll need to use the --pool <your pool> arg to match the pool your PV is in.

oh, whoops. how would i find out which pool it’s in?

if you followed the normal docs/guide flow, that would be replicapool. you can do a rookctl block ls to find out for sure though.

earlier in the discussion, it sounded like we weren’t sure if pods would regenerate their ceph config file when they are restarted, i’m fairly certain that they do. here’s in the OSD code where when an existing OSD is restarting it will write it’s config file with the latest config again: https://github.com/rook/rook/blob/master/pkg/ceph/osd/agent.go#L629

unless the OSD pod is still being given an old set of monitor IPs? then it would just write out the same old mon IPs to its config file…

root@rook-tools:/# rbd lock ls replicapool/k8s-dynamic-pvc-7b17e73e-9451-11e7-bc49-782bcb1b19dd-7d728efc-9451-11e7-8acc-5a2dff33db5e
There is 1 exclusive lock on this image.
Locker       ID                    Address
client.64124 kubelet_lock_magic_x3 192.168.1.35:0/3043814279
root@rook-tools:/# rbd lock rm k8s-dynamic-pvc-cc883b47-9438-11e7-bc49-782bcb1b19dd-cc972082-9438-11e7-8acc-5a2dff33db5e kubelet_lock_magic_x3 client.64124 --pool replicapool
rbd: releasing lock failed: (2) No such file or directory

i am having some copy/paste issues in the exec session

oh man yeah those copy/paste issues while exec’d to a pod are so annoying. how did you exec to your toolbox? i normally use:

kubectl -n rook exec -it rook-tools -- bash

but that still does have some copy/paste issues for long strings

that’s what i did, and i just basically made an excessively long window so it won’t wrap

ok. removed the locks. crosses fingers

it may take a bit to retry now too if it’s failed a bunch of times…i think it does an exponential backoff.

it seems to be retrying every minute or two

so it looks like the ODS pods get the mon IPs injected as an env var from a config map. i believe the operator keeps this config map up to date as it adds/removes mons. if the pod restarts, i’m not certain if it gets a fresh look at that config map, but i am hopeful that’s what it does:

- name: ROOK_MON_ENDPOINTS
      valueFrom:
        configMapKeyRef:
          key: data
          name: rook-ceph-mon-endpoints

  24s           24s             1       kubelet, x2     spec.containers{gitlab-gitlab}  Normal          Pulled          Container image "gitlab/gitlab-ce:9.4.1-ce.0" already present on machine
  23s           23s             1       kubelet, x2     spec.containers{gitlab-gitlab}  Normal          Created         Created container with id acebcd66fa69bb5aab91ae103575b36d2873a1be2017d914938d981b9a82e221
  22s           22s             1       kubelet, x2     spec.containers{gitlab-gitlab}  Normal          Started         Started container with id acebcd66fa69bb5aab91ae103575b36d2873a1be2017d914938d981b9a82e221

HALLELUJAH

looks like the data’s still there, it’s not generating new keys like a fresh install would

thanks for all of your patience and assistance y’all i had tried storageos and glusterfs for my kubernetes cluster, was hoping the third time’s the charm, looks like i at least have a way to recover (albeit slowly and painfully) if one of the nodes dies or reboots suddenly

i’m very happy @fury_twitter. i joined this conversation a bit late, do you remember the key steps that were needed to get everything back to health? anything that we can automate and add to the Rook operator will be a win going forward.

yeah, will recap… i have a 3 node cluster (one master running debian 8.5.0, kubeadm init’d, i untainted it so the master could run pods…two other nodes on debian 9.1.0, added with kubeadm join). I had almost everything working but I needed a newer version of docker on one of the nodes to try the gitlab CI auto-deploy docker building stuff…i had downgraded to 1.11.2 to try to get around some issue with glusterfs on newer dockers, so i stopped docker…without draining the kubernetes node first >.>. theeeen I upgraded docker and rebooted the machine and it didn’t come up for several minutes. when it did come back up again, the node was essentially there, and kubernetes was able to schedule pods onto it, but the rook stuff still wasn’t working right… two of the mon pods got terminated by kubernetes as «lost» i guess, so the IPs changed, and the rookctl status was stuck at «1 root down/3 osds down». deleted the OSD and mon pods to get the rook cluster healthy again, but there were rbd locks left over from the previous unclean shutdown so the volumes could not be mounted by the pods

i’ve learned several lessons here: 1) never reboot, 2) use a different machine if at all possible for the CI runner

which i did try initially but it wasn’t able to connect to my gitlab instance (connection refused O_o)

my ultimate goal is to have the kubernetes cluster running gitlab and the company website as well as other future apps deployed through gitlab CI

and i’ve been working at this for months, so i have some of the steps down pat and written into .sh scripts :stuck_out_tongue_winking_eye:

@jbw976 yes that is something im working on now… Getting newest mon IP

cool. and i did confirm that the operator keeps that config map up to date as it adds/removes mons.

yes im adding a watcher to that configmap

is a watcher needed? or do you just need to read from it on demand when an Attach is requested?

thank you very much for the detailed recap @fury_twitter

it sounds like (depending on the exact sequence) some of the OSD pods could have come up with stale mon IPs, instead of being started with correct ones and then Ceph updating their current monmaps as the process continues running.

so your restarting of the pods got everything up to date with the latest since the ceph config is regenerated every time a pod starts.

@jbw976 would be good to repro this scenario and ensure we have test coverage on this.

yes @bassam. it also makes rook/rook#667 even more interesting.

so i was thinking of watching the configmap, and when an updates happen, the ceph config gets updated

is this a good way? @travisn do you have any input on this?

just regenerate it every time, which looks up the latest from k8s config maps, secrets, etc.

Источник

env¶

INSTALL CEPH¶

CREATE A BLOCK DEVICE POOL¶

CONFIGURE A BLOCK DEVICE¶

Recommend Projects

<img decoding="async" onError="javascript: wp_broken_images = window.wp_broken_images || function(){}; wp_broken_images(this);" loading="lazy" width="24" height="24" src="https://www.typescriptlang.org/favicon-32x32.png" alt="Typescript photo" /> Typescript

<img decoding="async" onError="javascript: wp_broken_images = window.wp_broken_images || function(){}; wp_broken_images(this);" loading="lazy" width="24" height="24" src="https://avatars2.githubusercontent.com/u/27804?s=200&amp;v=4" alt="Django photo" /> Django

<img decoding="async" onError="javascript: wp_broken_images = window.wp_broken_images || function(){}; wp_broken_images(this);" loading="lazy" width="24" height="24" src="https://laravel.com/img/logomark.min.svg" alt="Laravel photo" /> Laravel

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

<img decoding="async" onError="javascript: wp_broken_images = window.wp_broken_images || function(){}; wp_broken_images(this);" loading="lazy" width="24" height="24" src="https://avatars.githubusercontent.com/u/69631?v=4" alt="Facebook photo" /> Facebook

<img decoding="async" onError="javascript: wp_broken_images = window.wp_broken_images || function(){}; wp_broken_images(this);" loading="lazy" width="24" height="24" src="https://avatars.githubusercontent.com/u/6154722?v=4" alt="Microsoft photo" /> Microsoft

<img decoding="async" onError="javascript: wp_broken_images = window.wp_broken_images || function(){}; wp_broken_images(this);" loading="lazy" width="24" height="24" src="https://avatars.githubusercontent.com/u/1342004?v=4" alt="Google photo" /> Google

<img decoding="async" onError="javascript: wp_broken_images = window.wp_broken_images || function(){}; wp_broken_images(this);" loading="lazy" width="24" height="24" src="https://avatars.githubusercontent.com/u/1961952?v=4" alt="Alibaba photo" /> Alibaba

<img decoding="async" onError="javascript: wp_broken_images = window.wp_broken_images || function(){}; wp_broken_images(this);" loading="lazy" width="24" height="24" src="https://avatars.githubusercontent.com/u/1562726?v=4" alt="D3 photo" /> D3

Create Manager

Create a default rbd pool

carmstrong / multinode-ceph-vagrant Goto Github PK

multinode-ceph-vagrant’s Introduction

Install prerequisites

Add your Vagrant key to the SSH agent

Start the VMs

Create the cluster

Install Ceph

Configure monitor and OSD services

Configuration and status

Expanding the cluster

Add an OSD

Add metadata server

Add more monitors

Install Ceph Object Gateway

Play around!

Create a block device

Create a mount with Ceph FS

Store a blob object

Cleanup

multinode-ceph-vagrant’s People

multinode-ceph-vagrant’s Issues

Investigate ways to easily use XFS on VirtualBox

RBD image feature set mismatch

s3-compatible service

should create mgr on cep-admin node

rbd: error opening default pool ‘rbd’

Create Manager

Create a default rbd pool

show —release option in the readme

Health is not OK

unknown ssh-add option

Permission Issue on osd activate command

ssh-add -K

clock skew issues

Got failed with ifdown missing error

config changes for jewel+

Clock skew error.

vagrant up fails, ceph.com trusty gpg issue

Recommend Projects

<img decoding="async" onError="javascript: wp_broken_images = window.wp_broken_images || function(){}; wp_broken_images(this);" loading="lazy" width="24" height="24" src="https://www.typescriptlang.org/favicon-32x32.png" alt="Typescript photo" /> Typescript

<img decoding="async" onError="javascript: wp_broken_images = window.wp_broken_images || function(){}; wp_broken_images(this);" loading="lazy" width="24" height="24" src="https://avatars2.githubusercontent.com/u/27804?s=200&amp;v=4" alt="Django photo" /> Django

<img decoding="async" onError="javascript: wp_broken_images = window.wp_broken_images || function(){}; wp_broken_images(this);" loading="lazy" width="24" height="24" src="https://laravel.com/img/logomark.min.svg" alt="Laravel photo" /> Laravel

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

<img decoding="async" onError="javascript: wp_broken_images = window.wp_broken_images || function(){}; wp_broken_images(this);" loading="lazy" width="24" height="24" src="https://avatars.githubusercontent.com/u/69631?v=4" alt="Facebook photo" /> Facebook

<img decoding="async" onError="javascript: wp_broken_images = window.wp_broken_images || function(){}; wp_broken_images(this);" loading="lazy" width="24" height="24" src="https://avatars.githubusercontent.com/u/6154722?v=4" alt="Microsoft photo" /> Microsoft

<img decoding="async" onError="javascript: wp_broken_images = window.wp_broken_images || function(){}; wp_broken_images(this);" loading="lazy" width="24" height="24" src="https://avatars.githubusercontent.com/u/1342004?v=4" alt="Google photo" /> Google

<img decoding="async" onError="javascript: wp_broken_images = window.wp_broken_images || function(){}; wp_broken_images(this);" loading="lazy" width="24" height="24" src="https://avatars.githubusercontent.com/u/1961952?v=4" alt="Alibaba photo" /> Alibaba

<img decoding="async" onError="javascript: wp_broken_images = window.wp_broken_images || function(){}; wp_broken_images(this);" loading="lazy" width="24" height="24" src="https://avatars.githubusercontent.com/u/1562726?v=4" alt="D3 photo" /> D3

<img decoding="async" onError="javascript: wp_broken_images = window.wp_broken_images || function(){}; wp_broken_images(this);" loading="lazy" width="24" height="24" src="https://avatars.githubusercontent.com/u/18461506?v=4" alt="Tencent photo" /> Tencent

Typescript

Django

Laravel

Facebook

Microsoft

Google

Alibaba

D3

carmstrong / multinode-ceph-vagrant
Goto Github
PK

Typescript

Django

Laravel

Facebook

Microsoft

Google

Alibaba

D3

Tencent