There are probably not too many people that need to do disk partitioning and storage space management on a daily basis. For most of us it is something low level that needs to be set up just once and then promptly forgotten.
Strangely enough the Ansible Storage Role can be very useful for both of these groups of people. Here, I would like to try and explain why and how to use it.
Please remember that Storage Role operations are often destructive – the user can easily lose data if he is not being careful. Backup, if possible, think twice if not.
What is Ansible and why should I care?
Imagine you need to update several dozens of systems. This would mean running the same task on multiple machines over and over. Connect to one machine, run the command, wait for it to finish, check the results, connect to another machine…
Very soon you realize you should automate the thing. And make it run asynchronously. And make it more generic. And…
That automated thing is Ansible. Ansible is an Automation Platform and its goal is to be able to execute the same command or script on multiple machines at once. It has some benefits for single machine management as well.
Target machines (called remotes) do not need to have installed anything special but Python.
To run Ansible you just need to supply the file with the list of IP addresses of remotes (a.k.a. inventory) and the script in YAML you want to be run (known as playbook).
Ansible will then connect via SSH to all remotes and execute the playbook in each of them. All necessary files are copied to the remotes and automatically removed when finished. Ansible will be printing you results of each step during the process.
Nice thing about playbooks is that they are designed to be simple, transferable and to hide most of the boring logic from the user. Even if you need to manage just one machine, you can keep your playbooks with already prepared setup scripts.
In case you are wondering, there are reasons why the playbook has to be written in YAML format and not in – for example – bash.
Firstly, Ansible playbooks are declarative – the code in them simply describes the desired end state of the remote.
So instead of commanding ‘Computer, create file /foo/bar
’, you say ‘Computer, make sure the file /foo/bar
exists’.
The first approach will always recreate the file, possibly destroying its previous contents in the process. Declarative approach will do nothing if the desired state is already there.
This principle is very important to realize when working with Ansible. It allows the user to run the same playbook multiple times without fear that something gets overwritten or duplicated. The keyword for it is idempotency.
The second reason for YAML usage is that Ansible supports multiple platforms (even Windows systems can serve as a remote) and the same actions often require different commands (e.g. apt-get
vs. dnf
for package management). So these functions can be grouped together under a new more generic name.
(For more information about Ansible and how to properly set it up visit www.ansible.com)
This brings us to the Ansible Roles.
Ansible Roles
Ansible roles can be compared to functions included from a library in your favorite imperative programming language. They provide desired functionality and can accept parameters. In Ansible it also has added benefits of ensuring idempotency as well as encapsulating logic that takes care of handling different remote platforms.
So for example the package
role will utilize apt-get
on Debian based systems and dnf
or yum
on RedHat but its purpose stays the same.
The playbook using package
role to install htop
can look like this:
---
- hosts: all
tasks:
- name: Install htop package
package:
name: htop
state: present
Please note that indentation in YAML in mandatory and defines blocks of statements.
Where:
hosts
specifies subgroup of remotes calledall
which has to be defined in inventory filetasks
denotes beginning of the list of various roles invocations- The first
name
allows user to specify text that will show up on screen when this task starts. It has no other functionality and can be even omitted package
is the actual name of the invoked role. Anything under it are role specific parametersname
is the name of package(s) the role should handlestate
is the desired presence of the package(s) in the system
Storage Role
Some roles are the default part of Ansible, some (like Storage Role) have to be explicitly included. Storage role can be called similarly to the previous example:
- hosts: all
become: true
vars:
storage_safe_mode: false
tasks:
- name: Invoke the storage role
include_role:
name: linux-system-roles.storage
This time the example includes some additional setup.
become
sets the need of superuser privileges for the playbook to runstorage_safe_mode
is a storage role variable. Here it is defined at global scope. It defaults totrue
and when set, it allows any destructive operations the storage role may do. Still, be careful- i
nclude_role
– Storage Role is a part of the linux-system-roles group, so it has to be included like this
This example is very basic yet perfectly viable storage role invocation, even though it will do nothing. No arguments mean that the storage will be kept unchanged (remember: idempotency).
Actually, this example will do quite a lot but it will not change anything.
The first thing you will probably notice when you run the Storage Role is the time it takes it to finish. The role always needs to make sure all the dependencies are installed and also scans the system for existing storage configuration. It takes time – and that is even when nothing is going to be changed.
Now we are finally getting to examples that do something useful.
---
- hosts: all
become: true
vars:
storage_safe_mode: false
tasks:
- name: Create a disk device mounted on "/opt/test1"
include_role:
name: linux-system-roles.storage
vars:
storage_volumes:
- name: test1
type: disk
disks:
- "dev/sda"
size: "12 GiB"
fs_type: ext4
mount_point: "/opt/test1"
state: present
- name: test2
state: absent
In the examples I am using the disk names directly (e.g. /dev/sda
). I strongly advise against doing it this way. Under some circumstances the names can change or shift. Typically when a piece of hardware is dis/connected – and that can happen even unintentionally (power surge, loose connector…). Device names should be obtained dynamically each time Storage Role is run.
As you can see Storage Role takes a specific list of nested parameters. This example will create a 12 GiB ext4 partition on /dev/sda
device named test1
and will mount it to /opt/test1
. If it already exists, the role will try to modify its parameters to match the input. Additionally the role will also try to remove any volume called test2
.
Some parameters can be omitted, in which case either already existing values will be kept (when the device already exists) or default values will be used (when creating a new device).
I think that most of the parameter names are pretty self-explanatory except maybe storage_volumes
. This one states that the user wants to work just with the ungrouped volumes.
The other available option is storage_pools
and its purpose is to manage volume groups and generally handle the more advanced stuff. If that sounds to you like something related to LVM, you are correct.
If you are not sure what LVM is, for our purposes will be enough to know that it allows you to squash multiple disks together and create one large meta-disk that the user can divide again however he wants.
Say you have three physical disks (/dev/sda
, /dev/sdb
, /dev/sdc
), each of them can store just 10 GiB. You squash them all into a storage pool that therefore has 30 GiB. Next, you divide the pool into two volumes:
- One 5 GiB volume (on which you want to store sensitive data and add encryption layer to it)
- And the second 25 GiB volume (there you plan to store big files, databases and stuff).
With just the storage volumes you would not be able to store a 12 GiB movie file, because all of your disks are too small for it. But 25 GiB volume based on the storage pool can handle it.
To create described pool using the Storage Role, the task would look like this:
---
- hosts: all
become: true
vars:
storage_safe_mode: false
tasks:
- name: Create two LVM logical volumes under volume group 'foo' spread over three disks
include_role:
name: linux-system-roles.storage
vars:
storage_pools:
- name: foo
disks:
- "/dev/sda"
- "/dev/sdb"
- "/dev/sdc"
volumes:
- name: sensitive_data
size: "5 GiB"
fs_type: ext4
mount_point: "/opt/sensitive"
- name: big_stuff
size: "25 GiB"
fs_type: ext4
mount_point: "/opt/big"
Again, invocation of the role stays the same, only the parameters have changed. Also you can see that parameters under the `volumes` are similar to what was used under `storage_volumes` in the previous example.
So what’s next?
As you can see, using of the Storage Role is pretty easy. Provided examples cover Storage Role general functionality. The rest of it is mostly about adding the correct parameters to the right places. The best way how to figure out what is supported is to look at the source at GitHub.
Anyways, at this moment Storage Role also supports volume/pool encryption based on LUKS, RAID and LVM RAID and also data deduplication and compression using VDO.
The Storage Role is still being developed and new features may (and will) be added to it. In case you are wondering how to use them, the best place to start looking is probably the official documentation.