Simple Partitioning with Ansible Storage Role

There are probably not too many people that need to do disk partitioning and storage space management on a daily basis. For most of us it is something low level that needs to be set up just once and then promptly forgotten.

Strangely enough the Ansible Storage Role can be very useful for both of these groups of people. Here, I would like to try and explain why and how to use it.

Please remember that Storage Role operations are often destructive – the user can easily lose data if he is not being careful. Backup, if possible, think twice if not.

What is Ansible and why should I care?

Imagine you need to update several dozens of systems. This would mean running the same task on multiple machines over and over. Connect to one machine, run the command, wait for it to finish, check the results, connect to another machine…

Very soon you realize you should automate the thing. And make it run asynchronously. And make it more generic. And…

That automated thing is Ansible. Ansible is an Automation Platform and its goal is to be able to execute the same command or script on multiple machines at once. It has some benefits for  single machine management as well.

Target machines (called remotes) do not need to have installed anything special but Python.

To run Ansible you just need to supply the file with the list of IP addresses of remotes (a.k.a. inventory) and the script in YAML you want to be run (known as playbook).

Ansible will then connect via SSH to all remotes and execute the playbook in each of them. All necessary files are copied to the remotes and automatically removed when finished. Ansible will be printing you results of each step during the process.

Nice thing about playbooks is that they are designed to be simple, transferable and to hide most of the boring logic from the user. Even if you need to manage just one machine, you can keep your playbooks with already prepared setup scripts.

In case you are wondering, there are reasons why the playbook has to be written in YAML format and not in – for example – bash.

Firstly, Ansible playbooks are declarative – the code in them simply describes the desired end state of the remote.

So instead of commanding ‘Computer, create file /foo/bar’, you say ‘Computer, make sure the file /foo/bar exists’.

The first approach will always recreate the file, possibly destroying its previous contents in the process. Declarative approach will do nothing if the desired state is already there.

This principle is very important to realize when working with Ansible. It allows the user to run the same playbook multiple times without fear that something gets overwritten or duplicated. The keyword for it is idempotency.

The second reason for YAML usage is that Ansible supports multiple platforms (even Windows systems can serve as a remote) and the same actions often require different commands (e.g. apt-get vs. dnf for package management). So these functions can be grouped together under a new more generic name.

(For more information about Ansible and how to properly set it up visit www.ansible.com)

This brings us to the Ansible Roles.

Ansible Roles

Ansible roles can be compared to functions included from a library in your favorite imperative programming language. They provide desired functionality and can accept parameters. In Ansible it also has added benefits of ensuring idempotency as well as encapsulating logic that takes care of handling different remote platforms.

So for example the package role will utilize apt-get on Debian based systems and dnf or yum on RedHat but its purpose stays the same.

The playbook using package role to install htop can look like this:

---
- hosts: all

  tasks:
  - name: Install htop package
    package:
      name: htop
      state: present

Please note that indentation in YAML in mandatory and defines blocks of statements.

Where:

  • hosts specifies subgroup of remotes called all which has to be defined in inventory file
  • tasks denotes beginning of the list of various roles invocations
  • The first name allows user to specify text that will show up on screen when this task starts. It has no other functionality and can be even omitted
  • package is the actual name of the invoked role. Anything under it are role specific parameters
    • name is the name of package(s) the role should handle
    • state is the desired presence of the package(s) in the system

Storage Role

Some roles are the default part of Ansible, some (like Storage Role) have to be explicitly included. Storage role can be called similarly to the previous example:

- hosts: all
  become: true
  vars:
    storage_safe_mode: false

  tasks:
    - name: Invoke the storage role
      include_role:
        name: linux-system-roles.storage

This time the example includes some additional setup.

  • become sets the need of superuser privileges for the playbook to run
  • storage_safe_mode is a storage role variable. Here it is defined at global scope. It defaults to true and when set, it allows any destructive operations the storage role may do. Still, be careful
  • include_role – Storage Role is a part of the linux-system-roles group, so it has to be included like this

This example is very basic yet perfectly viable storage role invocation, even though it will do nothing. No arguments mean that the storage will be kept unchanged (remember: idempotency).

Actually, this example will do quite a lot but it will not change anything.

The first thing you will probably notice when you run the Storage Role is the time it takes it to finish. The role always needs to make sure all the dependencies are installed and also scans the system for existing storage configuration. It takes time – and that is even when nothing is going to be changed.

Now we are finally getting to examples that do something useful.

---
- hosts: all
  become: true
  vars:
    storage_safe_mode: false

  tasks:
  - name: Create a disk device mounted on "/opt/test1"
    include_role:
      name: linux-system-roles.storage
    vars:
      storage_volumes:
      - name: test1
        type: disk
        disks:
        - "dev/sda"
        size: "12 GiB"
        fs_type: ext4
        mount_point: "/opt/test1"
        state: present
      - name: test2
        state: absent

In the examples I am using the disk names directly (e.g. /dev/sda). I strongly advise against doing it this way. Under some circumstances the names can change or shift. Typically when a piece of hardware is dis/connected – and that can happen even unintentionally (power surge, loose connector…). Device names should be obtained dynamically each time Storage Role is run.

As you can see Storage Role takes a specific list of nested parameters. This example will create a 12 GiB ext4 partition on /dev/sda device named test1 and will mount it to /opt/test1. If it already exists, the role will try to modify its parameters to match the input. Additionally the role will also try to remove any volume called test2.

Some parameters can be omitted, in which case either already existing values will be kept (when the device already exists) or default values will be used (when creating a new device).

I think that most of the parameter names are pretty self-explanatory except maybe storage_volumes. This one states that the user wants to work just with the ungrouped volumes.

The other available option is storage_pools and its purpose is to manage volume groups and generally handle the more advanced stuff. If that sounds to you like something related to LVM, you are correct.

If you are not sure what LVM is, for our purposes will be enough to know that it allows you to squash multiple disks together and create one large meta-disk that the user can divide again however he wants.

Say you have three physical disks (/dev/sda, /dev/sdb, /dev/sdc), each of them can store just 10 GiB. You squash them all into a storage pool that therefore has 30 GiB. Next, you divide the pool into two volumes:

  • One 5 GiB volume (on which you want to store sensitive data and add encryption layer to it)
  • And the second 25 GiB volume (there you plan to store big files, databases and stuff).

With just the storage volumes you would not be able to store a 12 GiB movie file, because all of your disks are too small for it. But 25 GiB volume based on the storage pool can handle it.

To create described pool using the Storage Role, the task would look like this:

---
- hosts: all
  become: true
  vars:
    storage_safe_mode: false

  tasks:
  - name: Create two LVM logical volumes under volume group 'foo' spread over three disks
    include_role:
      name: linux-system-roles.storage
    vars:
      storage_pools:
      - name: foo
        disks:
        - "/dev/sda"
        - "/dev/sdb"
        - "/dev/sdc"
        volumes:
        - name: sensitive_data
          size: "5 GiB"
          fs_type: ext4
          mount_point: "/opt/sensitive"
        - name: big_stuff
          size: "25 GiB"
          fs_type: ext4
          mount_point: "/opt/big"

Again, invocation of the role stays the same, only the parameters have changed. Also you can see that parameters under the `volumes` are similar to what was used under `storage_volumes` in the previous example.

So what’s next?

As you can see, using of the Storage Role is pretty easy. Provided examples cover Storage Role general functionality. The rest of it is mostly about adding the correct parameters to the right places. The best way how to figure out what is supported is to look at the source at GitHub.

Anyways, at this moment Storage Role also supports volume/pool encryption based on LUKS, RAID and LVM RAID and also data deduplication and compression using VDO.

The Storage Role is still being developed and new features may (and will) be added to it. In case you are wondering how to use them, the best place to start looking is probably the official documentation.

The road to UDisks 2.9.0

While the world is going crazy these days we continue to march in full strength towards the next UDisks release. It’s still a couple of weeks away and there are some interesting features still pending to be merged. With all the changes we’re bound with the promise to keep the public D-Bus and C API stable and that won’t change even that there were major changes under the hood. Overall we’ve been focusing on general stability and predictability, fixing various race conditions. But we’ve also added a couple of new interesting features.

Configurable mount options

UDisks carries a set of predefined mount options for well-known filesystems. With the primary focus on desktop use the defaults were chosen to fit typical use cases. For example the vfat filesystem defaults define the flush mount option being a typical filesystem for removable flash media and it’s important to make write operations more synchronous to prevent users tearing their drives out while kernel still writing back the cached data. Such defaults may not suit everyone and users were asking for a way to change that. We’ve even been receiving contradicting merge requests to add or remove certain mount option. Some users on public forums even suggested modifying the udisksd binary how ridiculous may that sound.

There were couple of patches and suggestions in the past that served as a valuable feedback. At the end we decided to combine the best ideas and allow redefining mount options by the config file and from udev rules as each approach has specific use cases. Each way has a defined priority so e.g. udev rules would cover the config file definitions. As redefining and allowing certain mount options may have severe security implications, only a sysadmin owned global config file in /etc/udisks2 and udev rules are considered. Redefining mount options may serve either for lockdown purposes or for a more liberal environment.

Comprehensive documentation will be provided along the API docs as this is pretty complex topic. Just a glimpse of how the current config file syntax currently looks like:

[defaults]
allow=exec,noexec,nodev,nosuid,atime,noatime,nodiratime,ro,rw,sync,dirsync,noload

vfat_defaults=uid=$UID,gid=$GID,shortname=mixed,utf8=1,showexec,flush
vfat_allow=uid=$UID,gid=$GID,flush,utf8,shortname,umask,dmask,fmask,codepage,iocharset

# common options for both the native kernel driver and exfat-fuse
exfat_defaults=uid=$UID,gid=$GID,iocharset=utf8,errors=remount-ro
exfat_allow=uid=$UID,gid=$GID,dmask,errors,fmask,iocharset,namecase,umask

ntfs_defaults=uid=$UID,gid=$GID,windows_names
ntfs_allow=uid=$UID,gid=$GID,umask,dmask,fmask,locale,norecover,ignore_case,windows_names

...

D-Bus object properties updates

The reach of UDisks has evolved in recent years from being a simple automounter backend to a complete stateful storage management API. UDisks has gained internal modularity for non-standard storage technologies (more on that in upcoming blog posts!) and is more often used in scripts. Historically the design and intended use was mostly asynchronous where clients were supposed to hook up on D-Bus signals and just update the GUI afterwards. Properties on D-Bus objects were updated usually as a result of incoming uevent, often significantly later after a method call returned. That caused race conditions on client side that expected all D-Bus properties being up-to-date once a method call returns.

We embraced the work done by Peter Rajnoha on synthetic uevent tagging that’s available since Linux kernel 4.13.0. When we expect properties being updated as a result of native uevent we queue a synthetic one with a tag and we wait until it’s received. The uevent queue is more or less serialized and this makes sure the queue is fully processed before the tag. This allows us to hold on returning from a method call and let the objects be updated with fresh data from udev. When running UDisks on older Linux kernels the new measure has no effect and works just like before.

This however brings a change the clients may observe – the D-Bus property change notifications are fired earlier. I’m not sure there was any (API stability) guarantee with regards to D-Bus properties anywhere, this might however need attention in your application.

VDO (Virtual Data Optimizer)

While you might not be aware of what VDO is or does as the kvdo kernel module still hasn’t been upstreamed and adopted by major distributions, UDisks 2.8.0 brought a basic vdo module that runs on top of standalone vdo/vdo-manager commands. There were a couple of design shortcomings leading from VDO limitations itself and together with little to no adoption of this module and recent integration of dm-vdo in LVM a decision has been made to prefer the new lvmvdo integration coming in UDisks 2.9.0 and deprecate the standalone vdo module instead.

All this work is based on libblockdev support in case a stateless API fits you more.


There are even more changes queued for the 2.9.0 release, stay tuned for upcoming articles about UDisks, the API and practical use cases!

bscalc – storage size calculator

Introduction

In the spring of this year, a new major version of the libbytesize library was released. And as part of the release this small but very useful library was amended with a small but very useful tool. It is called bscalc where the first two letters BS stand for Byte Size, of course, and calc suggests it does some calculations.

In fact, it is a simple calculator that works with storage sizes. Computers are good at working with big and long numbers representing sizes in bytes, humans are not. The simplest and probably most common problem is a question like “What the hell does 9663676416 actually mean?” Or the opposite case when some input has to be in bytes (or KiB, MiB,…) and one wants to specify say 12 GiB. But there are more complex questions like “How many 512B sectors does this disk have?” or “If I backup this data on blue-rays, how many will I need and how well will the data fit?”.

Continue reading “bscalc – storage size calculator”

Release time again for libblockdev and udisks!

A new month has come and that means new releases of libblockdev and UDisks2 have come too. We are trying to stick to the golden rule of successful open-source projects – “Release early, release often.” – and even if there are no major changes and no new major features, we do regular releases every month. Usually the target date is the end of the month which then in reality means a new release is done at the beginning of the month that follows. And that is exactly what happened this time too. libblockdev-2.15 was released on December 1st and UDisks-2.7.5 on December 4th.

Continue reading “Release time again for libblockdev and udisks!”

UDisks 2.7.0 released

A new upstream version of UDisks2 was released on Friday (June 2nd) — version 2.7.0. People following the recent development of UDisks2 and our recent blog posts [1] [2] should know that this is a big version bump which can only mean one thing: the pull request changing UDisks to use libblockdev where possible was merged! Which is almost 100 commits with changes.

Continue reading “UDisks 2.7.0 released”

UDisks to build on libblockdev!?

As a recent blog post mentioned, there is a pull request for UDisks proposing the master-libblockdev branch to be merged into master. What would that mean? master-libblockdev is a parallel branch we have been working on in the last few months which has custom code in UDisks replaced by calls of the libblockdev library where possible. So for things like creating partitions, setting up MD RAID, LVM, etc. it’s not using the CLI tools, but instead calls libblockdev functions.

Continue reading “UDisks to build on libblockdev!?”

Reporting and monitoring storage actions

Two recent blog posts are focusing on reporting and monitoring of storage events related to failures, recoveries and in general device state changes. However, there are other things happening to storage. The storage configuration is from time to time changed either by administrator(s) or automatically as a reaction to some trigger. And there are components of the system, together with its users, that could/would benefit from getting the information about such changes.

Continue reading “Reporting and monitoring storage actions”

Storaged merged with UDisks, new releases 2.6.4 and 2.6.5

Quite a lot has changed since our last blogpost about the storaged project. The biggest news is that we are no longer working on Storaged. We are now “again” working on UDisks1.

Storaged originally started as a fork of udisks project in 2015 and had a lot of attention and development since then. Storaged even replaced UDisks in Fedora 25 providing backwards compatible API and new features.

Continue reading “Storaged merged with UDisks, new releases 2.6.4 and 2.6.5”