Search This Blog

Wednesday, December 6, 2017

Get total provisioned size from cinder volumes

A quick way to get the total amount of provisioned space from cinder

alvaro@skyline.local: ~
$ cinder list --all-tenants
mysql like output :)

So to parse the output and add all the values in the Size col, use the next piped commands.

alvaro@skyline.local: ~
$ . admin-openrc.sh

alvaro@skyline.local: ~
$ cinder list --all-tenants | awk -F'|' '{print $6}' | sed 's/ //g' | grep -v -e '^$' | awk '{s+=$1} END {printf "%.0f", s}'
13453

The final result is in GB.

Wednesday, June 14, 2017

Ceph recovery backfilling affecting production instances

In any kind of distributed system, you will have to choose between consistency, availability, and partitioning, the CAP theorem states that in the presence of a network partition, one has to choose between consistency and availability, by default (default configurations) CEPH provides consistency and partitioning, just take in count that CEPH has many config options: ~860 in hammer, ~1100 in jewel, check this out, is jewel github config_opts.h file.

If you want any specific behavior in your cluster depends on your ability to configure and/or to change on the fly in case of contingency, this post talks about specific default recovery / backfilling option clusters, maybe you have noticed that in case of a critical failure, like losing a complete node, this causes a lot of movement of data, lots of ops on the drives, by default the cluster is going to try to recover in the fastest way possible, and also needs to support the normal operation and common use, like I said at the beginning of the post, by default CEPH have consistency and partitioning, so the common response is to start to have failures in the availability and users will start to notice high latency, high CPU usage in instances using RBD backend because of the slow response.





Try to think of this in a better way and let's analyze the problem, if we have a replica 3 cluster and we have a server down (even if we have a 3 servers cluster), the operation is still possible and the recovery jobs are no that important because CEPH will try to achieve consistency all the time, it will achieve the correct 3 replica consistency eventually, so everything will be fine, no data loss, the remaining replicas will start to regenerate the missing replica in others nodes, the big problem is the backfilling will compromise the operation, so the real problem is that we need to choose between a quick recovery or a common response to the clients and watchers connected, the response is not that hard to know, operation response is priority number 0!!!!





Lost and recovery action in CRUSH (Image from Samuel Just, Vault 2015)

This is not the non-plus ultra solution, is just my solution to this problem, all this was tested in a CEPH hammer cluster:

1.- The better one is to configure at the beginning of the installation in the ceph.conf file

******* SNIP *******
[osd]
....
osd max backfills = 1
osd recovery threads = 1
osd recovery op priority = 1
osd client op priority = 63
osd recovery max active = 1
osd snap trim sleep = 0.1
....
******* SNIP *******

2.- If not, you can inject the on-the-fly options, you can use osd.x where x is the number of the osd daemon, or like the next example applies cluster-wide, but remember to put in the config file after because these options will be lost on reboot.

ceph@stor01:~$ sudo ceph tell osd.* injectargs '--osd-max-backfills 1'
ceph@stor01:~$ sudo ceph tell osd.* injectargs '--osd-recovery-threads 1'
ceph@stor01:~$ sudo ceph tell osd.* injectargs '--osd-recovery-op-priority 1'
ceph@stor01:~$ sudo ceph tell osd.* injectargs '--osd-client-op-priority 63'
ceph@stor01:~$ sudo ceph tell osd.* injectargs '--osd-recovery-max-active 1'
ceph@stor01:~$ sudo ceph tell osd.* injectargs '--osd-snap-trim-sleep 0.1'

The final result will be a really slow recovery of the cluster, but operation without any kind of problem.

Wednesday, April 12, 2017

Keeping up to date git forked repos

A quick guide to remembering how to keep up-to-date forked repos:

First: Manage a set of tracked repositories.

alvaro@skyline.local: ~/docker-openstack-cli
$ git remote -v

origin https://github.com/alsotoes/docker-openstack-cli.git (fetch)
origin https://github.com/alsotoes/docker-openstack-cli.git (push)

Second: Add the remote repo to work with.

alvaro@skyline.local: ~/docker-openstack-cli
$ git remote add kionetworks https://github.com/kionetworks/docker-openstack-cli.git

Third: Print repo local configuration.

alvaro@skyline.local: ~/docker-openstack-cli
$ git remote -v

kionetworks https://github.com/kionetworks/docker-openstack-cli.git (fetch)
kionetworks https://github.com/kionetworks/docker-openstack-cli.git (push)
origin https://github.com/alsotoes/docker-openstack-cli.git (fetch)
origin https://github.com/alsotoes/docker-openstack-cli.git (push)

Fourth: Push to the remote repo, to complete the update.

alvaro@skyline.local: ~/docker-openstack-cli
$ git push kionetworks

Counting objects: 3, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 311 bytes | 0 bytes/s, done.
Total 3 (delta 2), reused 0 (delta 0)
remote: Resolving deltas: 100% (2/2), completed with 2 local objects.
To https://github.com/kionetworks/docker-openstack-cli.git
51bb74b..33e5dce master -> master
alvaro@skyline.local: ~/docker-openstack-cli
hist:600 jobs:0 $

Pull new changes from origin.

alvaro@skyline.local: ~/docker-openstack-cli
$ git pull

Already up-to-date.

Pull new changes from a remote called kionetworks.

alvaro@skyline.local: ~/docker-openstack-cli
$ git pull kionetworks master

From https://github.com/kionetworks/docker-openstack-cli
* branch master -> FETCH_HEAD
Already up-to-date.

Sorry if this post has too little information, is just a remember.