There are probably not too many people that need to do disk partitioning and storage space management on a daily basis. For most of us it is something low level that needs to be set up just once and then promptly forgotten.
Strangely enough the Ansible Storage Role can be very useful for both of these groups of people. Here, I would like to try and explain why and how to use it.
Please remember that Storage Role operations are often destructive – the user can easily lose data if he is not being careful. Backup, if possible, think twice if not.
What is Ansible and why should I care?
Imagine you need to update several dozens of systems. This would mean running the same task on multiple machines over and over. Connect to one machine, run the command, wait for it to finish, check the results, connect to another machine…
Very soon you realize you should automate the thing. And make it run asynchronously. And make it more generic. And…
That automated thing is Ansible. Ansible is an Automation Platform and its goal is to be able to execute the same command or script on multiple machines at once. It has some benefits for single machine management as well.
Target machines (called remotes) do not need to have installed anything special but Python.
To run Ansible you just need to supply the file with the list of IP addresses of remotes (a.k.a. inventory) and the script in YAML you want to be run (known as playbook).
Ansible will then connect via SSH to all remotes and execute the playbook in each of them. All necessary files are copied to the remotes and automatically removed when finished. Ansible will be printing you results of each step during the process.
Nice thing about playbooks is that they are designed to be simple, transferable and to hide most of the boring logic from the user. Even if you need to manage just one machine, you can keep your playbooks with already prepared setup scripts.
In case you are wondering, there are reasons why the playbook has to be written in YAML format and not in – for example – bash.
Firstly, Ansible playbooks are declarative – the code in them simply describes the desired end state of the remote.
So instead of commanding ‘Computer, create file
/foo/bar’, you say ‘Computer, make sure the file
The first approach will always recreate the file, possibly destroying its previous contents in the process. Declarative approach will do nothing if the desired state is already there.
This principle is very important to realize when working with Ansible. It allows the user to run the same playbook multiple times without fear that something gets overwritten or duplicated. The keyword for it is idempotency.
The second reason for YAML usage is that Ansible supports multiple platforms (even Windows systems can serve as a remote) and the same actions often require different commands (e.g.
dnf for package management). So these functions can be grouped together under a new more generic name.
(For more information about Ansible and how to properly set it up visit www.ansible.com)
This brings us to the Ansible Roles.
Ansible roles can be compared to functions included from a library in your favorite imperative programming language. They provide desired functionality and can accept parameters. In Ansible it also has added benefits of ensuring idempotency as well as encapsulating logic that takes care of handling different remote platforms.
So for example the
package role will utilize
apt-get on Debian based systems and
yum on RedHat but its purpose stays the same.
The playbook using
package role to install
htop can look like this:
--- - hosts: all tasks: - name: Install htop package package: name: htop state: present
Please note that indentation in YAML in mandatory and defines blocks of statements.
hostsspecifies subgroup of remotes called
allwhich has to be defined in inventory file
tasksdenotes beginning of the list of various roles invocations
- The first
nameallows user to specify text that will show up on screen when this task starts. It has no other functionality and can be even omitted
packageis the actual name of the invoked role. Anything under it are role specific parameters
nameis the name of package(s) the role should handle
stateis the desired presence of the package(s) in the system
Some roles are the default part of Ansible, some (like Storage Role) have to be explicitly included. Storage role can be called similarly to the previous example:
- hosts: all become: true vars: storage_safe_mode: false tasks: - name: Invoke the storage role include_role: name: linux-system-roles.storage
This time the example includes some additional setup.
becomesets the need of superuser privileges for the playbook to run
storage_safe_modeis a storage role variable. Here it is defined at global scope. It defaults to
trueand when set, it allows any destructive operations the storage role may do. Still, be careful
nclude_role– Storage Role is a part of the linux-system-roles group, so it has to be included like this
This example is very basic yet perfectly viable storage role invocation, even though it will do nothing. No arguments mean that the storage will be kept unchanged (remember: idempotency).
Actually, this example will do quite a lot but it will not change anything.
The first thing you will probably notice when you run the Storage Role is the time it takes it to finish. The role always needs to make sure all the dependencies are installed and also scans the system for existing storage configuration. It takes time – and that is even when nothing is going to be changed.
Now we are finally getting to examples that do something useful.
--- - hosts: all become: true vars: storage_safe_mode: false tasks: - name: Create a disk device mounted on "/opt/test1" include_role: name: linux-system-roles.storage vars: storage_volumes: - name: test1 type: disk disks: - "dev/sda" size: "12 GiB" fs_type: ext4 mount_point: "/opt/test1" state: present - name: test2 state: absent
In the examples I am using the disk names directly (e.g.
/dev/sda). I strongly advise against doing it this way. Under some circumstances the names can change or shift. Typically when a piece of hardware is dis/connected – and that can happen even unintentionally (power surge, loose connector…). Device names should be obtained dynamically each time Storage Role is run.
As you can see Storage Role takes a specific list of nested parameters. This example will create a 12 GiB ext4 partition on
/dev/sda device named
test1 and will mount it to
/opt/test1. If it already exists, the role will try to modify its parameters to match the input. Additionally the role will also try to remove any volume called
Some parameters can be omitted, in which case either already existing values will be kept (when the device already exists) or default values will be used (when creating a new device).
I think that most of the parameter names are pretty self-explanatory except maybe
storage_volumes. This one states that the user wants to work just with the ungrouped volumes.
The other available option is
storage_pools and its purpose is to manage volume groups and generally handle the more advanced stuff. If that sounds to you like something related to LVM, you are correct.
If you are not sure what LVM is, for our purposes will be enough to know that it allows you to squash multiple disks together and create one large meta-disk that the user can divide again however he wants.
Say you have three physical disks (
/dev/sdc), each of them can store just 10 GiB. You squash them all into a storage pool that therefore has 30 GiB. Next, you divide the pool into two volumes:
- One 5 GiB volume (on which you want to store sensitive data and add encryption layer to it)
- And the second 25 GiB volume (there you plan to store big files, databases and stuff).
With just the storage volumes you would not be able to store a 12 GiB movie file, because all of your disks are too small for it. But 25 GiB volume based on the storage pool can handle it.
To create described pool using the Storage Role, the task would look like this:
--- - hosts: all become: true vars: storage_safe_mode: false tasks: - name: Create two LVM logical volumes under volume group 'foo' spread over three disks include_role: name: linux-system-roles.storage vars: storage_pools: - name: foo disks: - "/dev/sda" - "/dev/sdb" - "/dev/sdc" volumes: - name: sensitive_data size: "5 GiB" fs_type: ext4 mount_point: "/opt/sensitive" - name: big_stuff size: "25 GiB" fs_type: ext4 mount_point: "/opt/big"
Again, invocation of the role stays the same, only the parameters have changed. Also you can see that parameters under the `volumes` are similar to what was used under `storage_volumes` in the previous example.
So what’s next?
As you can see, using of the Storage Role is pretty easy. Provided examples cover Storage Role general functionality. The rest of it is mostly about adding the correct parameters to the right places. The best way how to figure out what is supported is to look at the source at GitHub.
Anyways, at this moment Storage Role also supports volume/pool encryption based on LUKS, RAID and LVM RAID and also data deduplication and compression using VDO.
The Storage Role is still being developed and new features may (and will) be added to it. In case you are wondering how to use them, the best place to start looking is probably the official documentation.