Headup: 2016

Wednesday, November 16, 2016

Solve Ceph Clock Skew error

Monitors can be severely affected by significant clock skews across the monitor nodes. This usually translates into weird behavior with no obvious cause. To avoid such issues, you should run a clock synchronization tool on your monitor nodes by default the monitors will allow clocks to drift up to 0.05 seconds.

This error can be seen using:

# ceph -s
# ceph health detail

root@ceph01:~# ceph -s
cluster 9227547b-bb6b-44f7-b877-3f6d25b942a4
health HEALTH_WARN
clock skew detected on mon.ceph01
monmap e3: 3 mons at {ceph01=172.18.3.5:6789/0,ceph02=172.18.5.6:6789/0,ceph03=172.18.5.7:6789/0}
election epoch 24, quorum 0,1,2 ceph01,ceph02,ceph03
mdsmap e17: 1/1/1 up {0=ceph01=up:active}
osdmap e245: 22 osds: 22 up, 22 in
pgmap v14727: 1408 pgs, 5 pools, 11977 MB data, 3183 objects
24729 MB used, 16361 GB / 16385 GB avail
1408 active+clean

The solution? just re-sync the clock in the affected mon, and restart the mon daemon.

root@ceph01:~# service ntp stop
* Stopping NTP server ntpd
root@ceph01:~# ntpdate ntp.ubuntu.com
16 Nov 01:24:16 ntpdate[4149434]: adjust time server 91.189.91.157 offset -0.002235 sec
root@ceph01:~# ntpd -gq
ntpd: time slew +0.003482s
root@ceph01:~# service ntp start
* Starting NTP server ntpd
root@ceph01:~# restart ceph-mon-all
ceph-mon-all start/running

Just to be sure, sometimes it will be better if you sync the clock on all Mon

Also, this default parameter (0.05 seconds) can be changed in the ceph config file, but that you can doesn't mean that you should, the default value is a perfect configuration.

root@ceph01:~# cat /etc/ceph/ceph.conf
....
[mon]
mon clock drift allowed = 10
...

Check again the cluster status, sometimes it takes a few seconds, like 30 seconds.

root@ceph01:~# ceph -s
cluster 9227547b-bb6b-44f7-b877-3f6d25b942a4
health HEALTH_OK
monmap e3: 3 mons at {ceph01=172.18.3.5:6789/0,ceph02=172.18.5.6:6789/0,ceph03=172.18.5.7:6789/0}
election epoch 24, quorum 0,1,2 ceph01,ceph02,ceph03
mdsmap e17: 1/1/1 up {0=ceph01=up:active}
osdmap e245: 22 osds: 22 up, 22 in
pgmap v14727: 1408 pgs, 5 pools, 11977 MB data, 3183 objects
24729 MB used, 16361 GB / 16385 GB avail
1408 active+clean

Cloning a Ceph client auth key

I don't recall any reason to do this other than using the same user and auth key to authenticate in different Ceph clusters, like in a multi-backend solution, or just because things get messy when you are not using a default configuration.

Sometimes, things get easy when we use the same user and auth key on both clusters for services to connect to, so let's see some background commands for managing users, keys, and permissions:

Create a new user and auth token (cinder client example):

root@ceph-admin:~# ceph auth get-or-create client.jerry
client.jerry
key: AQAZT05WoQuzJxAAX5BKxCbPf93CwihuHo27VQ==

So as you see the key is not a parameter, in a different server this will produce a completely different key.
Just to check, print the complete list of keys:

root@ceph-admin:~# ceph auth list
installed auth entries:
osd.0
key: AQCvCbtToC6MDhAATtuT70Sl+DymPCfDSsyV4w==
caps: [mon] allow profile osd
caps: [osd] allow *
osd.1
key: AQC4CbtTCFJBChAAVq5spj0ff4eHZICxIOVZeA==
caps: [mon] allow profile osd
caps: [osd] allow *
client.admin
key: AQBHCbtT6APDHhAA5W00cBchwkQjh3dkKsyPjw==
caps: [mds] allow
caps: [mon] allow *
caps: [osd] allow *
client.jerry
key: AQAZT05WoQuzJxAAX5BKxCbPf93CwihuHo27VQ==

Or print a user’s authentication key to standard output, execute the command in the following format

ceph auth print-key {TYPE}.{ID}

root@ceph-admin:~# ceph auth print-key client.jerry
AQAZT05WoQuzJxAAX5BKxCbPf93CwihuHo27VQ==

To change this in order to match with others, we need to update their keys and/or their capabilities, the import command is for this, remember their keys and their capabilities will update on existing users and create new ones, use the following format:

ceph auth import -i /path/to/keyring

The keyring file needs to be in this format, if not, the command will not work and the part of the work, it will just hang.

root@ceph-admin:~# cat jerry.key
[client.jerry]
key = AQAMP01WS8i8ERAAPspjwMzUm4SL00n+WppM6A==

Now we can update the auth key for the user jerry:

root@ceph-admin:~# auth import -i ./jerry.key
imported keyring

List again.

root@ceph-admin:~# ceph auth list
installed auth entries:
osd.0
key: AQCvCbtToC6MDhAATtuT70Sl+DymPCfDSsyV4w==
caps: [mon] allow profile osd
caps: [osd] allow *
osd.1
key: AQC4CbtTCFJBChAAVq5spj0ff4eHZICxIOVZeA==
caps: [mon] allow profile osd
caps: [osd] allow *
client.admin
key: AQBHCbtT6APDHhAA5W00cBchwkQjh3dkKsyPjw==
caps: [mds] allow
caps: [mon] allow *
caps: [osd] allow *
client.jerry
key: AQAMP01WS8i8ERAAPspjwMzUm4SL00n+WppM6A==

Done, I will continue posting these little helping tricks until the last post about multi-backend ceph is out.

Friday, March 18, 2016

Export instance from OpenStack with Ceph/rbd backend

Suppose that you want to migrate an instance from different infrastructures or you want to hand over an instance information to a client, so you need to recover (export) the instance volumes information.

Step 1: Get the instance UUID.

root@ceph-admin:~# nova list | grep InstanceToExport
| 2bdda36c-f0dd-4fa5-bb8b-3df346b17002 | InstanceToExport  | SHUTOFF | -          | Shutdown    | vlan8=192.168.255.53; vlan1837=10.20.37.7; vlan1829=10.20.23.53  |

UUID from the instance is returned here: 2bdda36c-f0dd-4fa5-bb8b-3df346b17002

Step 2: Get the volume UUID from the instance, using the UUID returned in step 1

root@ceph-admin:~# cinder list | grep 2bdda36c-f0dd-4fa5-bb8b-3df346b17002
| fdb279c5-24bb-45d7-a86a-a33f4c285b5a |   in-use  |       None      | 100  |     None    |   true   | 2bdda36c-f0dd-4fa5-bb8b-3df346b17002 |

UUID from the volume is returned here: fdb279c5-24bb-45d7-a86a-a33f4c285b5a

Step 3: Search from the volume on the pool in ceph, in my case this volume is stored in the cinder-volumes pool

root@ceph-admin:~# rbd --pool cinder-volumes ls | grep fdb279c5-24bb-45d7-a86a-a33f4c285b5a
volume-fdb279c5-24bb-45d7-a86a-a33f4c285b5a

By this time you have the volume name in the pool: volume-fdb279c5-24bb-45d7-a86a-a33f4c285b5a

Step 4: Export the volume.

root@ceph-admin:~# rbd export cinder-volumes/volume-fdb279c5-24bb-45d7-a86a-a33f4c285b5a ./InstanceToExport.img
Exporting image: 100% complete...done.

root@ceph-admin:~# ll -ltrh *.img
-rw-r--r-- 1 root root 100G Feb 17 17:09 InstanceToExport.img

Step 5: Compress, so you can scp or rsync faster, this step is optional but highly recommended.

root@ceph-admin:~# gzip -9 InstanceToExport.img
root@ceph-admin:~# ll *.gz
-rw-r--r-- 1 root root 1.2G Feb 17 18:02 InstanceToExport.img.gz

Step 6: Checksum, to be sure that you don't have any problem copying

root@ceph-admin:~# md5sum InstanceToExport.img >InstanceToExport.img.md5
root@ceph-admin:~# md5sum InstanceToExport.img.gz >InstanceToExport.img.gz.md5
root@ceph-admin:~# cat InstanceToExport.img.md5
5504cdf2261556135811fdd5787b33a5  InstanceToExport.img
root@ceph-admin:~# cat InstanceToExport.img.gz.md5
8a76c28d404f44cc43872e69c9965cd2  InstanceToExport.img.gz

Note: the md5sum InstanceToExport.img is going to take a lot! in my volume (100G) like 20 minutes, omit it if you want.

Saturday, March 12, 2016

Testing juju environment inside LXC container

I think we pass the part about what juju is and how it works, so I'll post direct commands and configurations of how to get the environment working inside an LXC container, created just for juju, not the local configuration that creates an LXC container, in other words, out host server does not have any juju package.

Some links to read in case you need more info, or you can post a question.

Guest environment:

root@spyder:~# cat /etc/issue
Ubuntu 15.10
root@spyder:~# uname -a
Linux spyder 4.2.0-18-generic #22-Ubuntu SMP Fri Nov 6 18:25:50 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
root@spyder:~# dpkg-query -W | grep lxc
liblxc1	1.1.5-0ubuntu5~ubuntu15.10.1~ppa1
lxc	1.1.5-0ubuntu5~ubuntu15.10.1~ppa1
lxc-templates	2.0.0~beta2-0ubuntu2~ubuntu15.10.1~ppa1
lxcfs	2.0.0~rc3-0ubuntu1~ubuntu15.10.1~ppa1
lxctl	0.3.1+debian-3
python3-lxc	1.1.5-0ubuntu5~ubuntu15.10.1~ppa1

In my host server, I have two lxcbr interfaces, but for the juju container I going to use lxcbr0, the container will have complete internet access but to access internal apps we are going to need DNAT iptables rules (at the end I'll post the iptables configuration).

root@spyder:~# ifconfig lxcbr0 | grep inet
          inet addr:10.0.2.1  Bcast:0.0.0.0  Mask:255.255.255.0
          inet6 addr: fe80::58b8:36ff:fe6e:4e57/64 Scope:Link

Original lxc-ls output

root@spyder:~# lxc-ls --fancy
NAME        STATE    IPV4                   IPV6  GROUPS  AUTOSTART
-------------------------------------------------------------------
ceph-admin  RUNNING  10.0.2.11, 10.0.3.84   -     -       YES
ceph01      RUNNING  10.0.2.85, 10.0.3.85   -     -       YES
ceph02      RUNNING  10.0.2.103, 10.0.3.86  -     -       YES
ceph03      RUNNING  10.0.2.156, 10.0.3.87  -     -       YES

Now to the fun part, getting things working :)

First, create the juju container.

root@spyder:~# lxc-create -t download -n juju -- --dist ubuntu --release trusty --arch amd64
Using image from local cache
Unpacking the rootfs

---
You just created an Ubuntu container (release=trusty, arch=amd64, variant=default)

To enable sshd, run: apt-get install openssh-server

For security reason, container images ship without user accounts
and without a root password.

Use lxc-attach or chroot directly into the rootfs to set a root password
or create user accounts.

Start the container

root@spyder:~# lxc-start -n juju -d --logfile juju.log
root@spyder:~# lxc-ls --fancy
NAME        STATE    IPV4                   IPV6  GROUPS  AUTOSTART
-------------------------------------------------------------------
ceph-admin  RUNNING  10.0.2.11, 10.0.3.84   -     -       YES
ceph01      RUNNING  10.0.2.85, 10.0.3.85   -     -       YES
ceph02      RUNNING  10.0.2.103, 10.0.3.86  -     -       YES
ceph03      RUNNING  10.0.2.156, 10.0.3.87  -     -       YES
juju        RUNNING  10.0.2.110             -     -       YES

Now let's attach to the container (in this part you need to install openssh-server, set passwords to users, etc.)

root@spyder:~# lxc-attach --name juju
root@juju:~# ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
22: eth0@if23:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:16:3e:53:f1:f5 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.110/24 brd 10.0.2.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::216:3eff:fe53:f1f5/64 scope link
       valid_lft forever preferred_lft forever

To install Juju, you simply need to grab the latest juju-core package from the PPA:

root@juju:~# apt-get install python-software-properties
root@juju:~# apt-get install software-properties-common
root@juju:~# add-apt-repository ppa:juju/stable
 Stable release of Juju for Ubuntu 12.04 and above.
 More info: https://launchpad.net/~juju/+archive/ubuntu/stable
Press [ENTER] to continue or ctrl-c to cancel adding it

gpg: keyring `/tmp/tmpyqs7twek/secring.gpg' created
gpg: keyring `/tmp/tmpyqs7twek/pubring.gpg' created
gpg: requesting key C8068B11 from hkp server keyserver.ubuntu.com
gpg: /tmp/tmpyqs7twek/trustdb.gpg: trustdb created
gpg: key C8068B11: public key "Launchpad Ensemble PPA" imported
gpg: Total number processed: 1
gpg:               imported: 1  (RSA: 1)
OK
root@juju:~# apt-get update
root@juju:~# apt-get upgrade
root@juju:~# apt-get install juju-quickstart juju-core

Juju needs to be configured to use your cloud provider. This is done via the following file:

$HOME/.juju/environments.yaml

Juju can automatically generate the file in this way:

ubuntu@juju:~$ juju generate-config

There are different types of clouds providers, check the environments.yaml for more info, the one important for us is the manual provider because we are going to deploy on manually on our same machine (LXC container in this case), so I deleted all the other information:

ubuntu@juju:~$ cat /home/ubuntu/.juju/environments.yaml
default: manual

environments:
    manual:
        type: manual
        # bootstrap-host holds the host name of the machine where the
        # bootstrap machine agent will be started.
        bootstrap-host: juju

        # bootstrap-user specifies the user to authenticate as when
        # connecting to the bootstrap machine. It defaults to
        # the current user.
        bootstrap-user: ubuntu

        # storage-listen-ip specifies the IP address that the
        # bootstrap machine's Juju storage server will listen
        # on. By default, storage will be served on all
        # network interfaces.
        # storage-listen-ip:

        # storage-port specifes the TCP port that the
        # bootstrap machine's Juju storage server will listen
        # on. It defaults to 8040
        # storage-port: 8040
        # Whether or not to refresh the list of available updates for an
        # OS. The default option of true is recommended for use in
        # production systems.
        #
        # enable-os-refresh-update: true

        # Whether or not to perform OS upgrades when machines are
        # provisioned. The default option of false is set so that Juju
        # does not subsume any other way the system might be
        # maintained.
        #
        # enable-os-upgrade: false

The first step is to create a bootstrap environment. This is a cloud instance that Juju will use to deploy and manage services. It will be created according to the configuration you have provided, and your public SSH key will be uploaded automatically so that Juju can communicate securely with the bootstrap instance.

ubuntu@juju:~$ juju switch manual
manual -> manual
ubuntu@juju:~$ juju bootstrap
WARNING ignoring environments.yaml: using bootstrap config in file "/home/ubuntu/.juju/environments/manual.jenv"
Bootstrapping environment "manual"
Starting new instance for initial state server
Installing Juju agent on bootstrap instance
Logging to /var/log/cloud-init-output.log on remote host
Running apt-get update
Installing package: curl
Installing package: cpu-checker
Installing package: bridge-utils
Installing package: rsyslog-gnutls
Installing package: cloud-utils
Installing package: cloud-image-utils
Installing package: tmux
Fetching tools: curl -sSfw 'tools from %{url_effective} downloaded: HTTP %{http_code}; time %{time_total}s; size %{size_download} bytes; speed %{speed_download} bytes/s ' --retry 10 -o $bin/tools. tar.gz 
Bootstrapping Juju machine agent
Starting Juju machine agent (jujud-machine-0)
Bootstrap agent installed
manual -> manual
Waiting for API to become available
Waiting for API to become available
Bootstrap complete

You can see that the bridge-utils package is installed, but inside an lxc container you are not going to use it and it can pass through an outside bridge to the juju container

root@juju:~# apt-get purge bridge-utils

If you have any problem on the bootstrap delete conf files and start over, and I mean problems like the nasty “ERROR machine is already provisioned” when the machine is not really provisioned.

root@juju:~# apt-get purge lxc*
root@juju:~# apt-get purge juju*
root@juju:~# rm -rf /etc/init/juju*
root@juju:~# rm -rf /var/lib/juju

If not, just continue, if everything is well right, you will see an output similar to this one, this means that the juju service is running on machine 0 (same LXC container).

ubuntu@juju:~$ juju status
environment: manual
machines:
  "0":
    agent-state: started
    agent-version: 1.25.3
    dns-name: juju
    instance-id: 'manual:'
    series: trusty
    hardware: arch=amd64 cpu-cores=2 mem=3000M
    state-server-member-status: has-vote
services: {}

Assuming it returns successfully, we can now deploy some services and explore the basic operations of Juju, next, you simply need to deploy our first charm (juju-gui) and expose it, this charm makes it easy to deploy a Juju GUI into an existing environment.

ubuntu@juju:~$ juju deploy juju-gui --to 0
ubuntu@juju:~$ juju expose juju-gui
........
........ after a couple of minutes, juju needs to download several packages and configure all, so better use "watch juju status", untill you see and output similar to this.
........

ubuntu@juju:~$ juju status
environment: manual
machines:
  "0":
    agent-state: started
    agent-version: 1.25.3
    dns-name: juju
    instance-id: 'manual:'
    series: trusty
    hardware: arch=amd64 cpu-cores=2 mem=3000M
    state-server-member-status: has-vote
services:
  juju-gui:
    charm: cs:trusty/juju-gui-51
    exposed: true
    service-status:
      current: unknown
      since: 12 Mar 2016 09:12:45Z
    units:
      juju-gui/0:
        workload-status:
          current: unknown
          since: 12 Mar 2016 09:12:45Z
        agent-status:
          current: idle
          since: 12 Mar 2016 09:17:48Z
          version: 1.25.3
        agent-state: started
        agent-version: 1.25.3
        machine: "0"
        open-ports:
        - 80/tcp
        - 443/tcp
        public-address: juju

Now the juju-gui is installed, configured, and exposed over ports 80 and 443, but remember, this is inside the LXC container, so we can't access the GUI unless we NAT some ports from our host server.

root@spyder:~# iptables -t nat -A PREROUTING -p tcp -d 10.0.1.139 --dport 443 -j DNAT --to-destination 10.0.2.110:443
root@spyder:~# iptables -t nat -A PREROUTING -p tcp -d 10.0.1.139 --dport 80 -j DNAT --to-destination 10.0.2.110:80

And boom!!!! now we can access juju-gui, the login info is on this file:

ubuntu@juju:~$ cat .juju/environments/manual.jenv 
user: admin
password: 0d4e465c15d5880d0c348a921489a9f1
.......

Thursday, March 10, 2016

Cinder Volume Transfer

Let's assume you want to change ownership of volume from Tenant_A to Tenant_B.

Step 1: Tenant A will initiate an Ownership Transfer which will enable another tenant to take ownership of it.

$ source openrc Tenant_A Tenant_A
$ cinder transfer-create [volume_id]

An Authentication Key and a Transfer ID are returned here.

Step 2: Tenant B needs to accept the Transfer using the Transfer ID and The Authentication Key generated above.

$ source openrc Tenant_B Tenant_B
$ cinder transfer-accept [transfer_id] [auth_key]

You should now see that volume associated with Tenant_B

Headup

Pages

Search This Blog