We recently had to decide on a configuration format for one of our internal utilities. In this post I’ll talk a little about why we picked YAML as the format and the reasoning behind it.
WireLoad has a couple of servers, each running a number of different services. For a long time we had almost one backup script per service, all hand hacked in bash to fit the requirements of the application. This wasn’t great because it meant we repeated a lot of work. To make the situation a little more manageable we developed BackupWire, a simple backup utility in a single file with a minimal number of dependencies.
The design goals of BackupWire were,
- Minimal footprint: BackupWire shouldn’t be much heavier than the bash scripts we already had. Why? Because if it was huge and difficult to deploy we might end up writing little bash scripts instead!
- Minimal dependencies: BackupWire should not have many dependencies. This is for the same reason as in the previous point. BackupWire needs to be easy to install.
- Readable configuration: One of the problems with bash scripts is that once they’re a little complicated it gets hard to see what’s happening. BackupWire’s real purpose is to alleviate that headache by distilling most backup jobs down to a few lines of configuration.
In order to make everything dead simple the configuration was stored in the BackupWire script itself. This would make it easier to relocate the script, and it would decrease the chance that a config file was not found due to things such as the cron environment being sparse. However, this design decision made it hard to update the script with new versions. In addition, the configuration format became a little cumbersome because it was just a set of Python class instantiations. Hence this feature went against the third design goal of BackupWire. The latest version now uses a configuration file instead.
Thinking that the world really doesn’t need another arbitrary configuration syntax, I wanted to pick a standardized configuration format. So I read up on Wikipedia’s entry on configuration files and found the top three contenders: Lua, XML and YAML.
Lua, being a programming language these days, looked like it would add too many dependencies to BackupWire. BackupWire is written in Python, which we already have on all servers, but Lua we don’t use for anything else so it would be a new requirement which would have to be installed on each server. Also, it just struck me as a little excessive to have a full blown second programming language as a configuration format unless the application was really complex.
The other problem with Lua was that googling Lua config tutorial didn’t really give that many good results, making me think that perhaps the focus of the language has shifted from configuration to something else over time.
XML was immediately off the table, perhaps obviously to some of our readers. Most importantly XML is not a very readable language with it’s abundance of symbols and markup. But also, it’s not very easy to write for the same reason. The people behind Django’s documentation put it best when they said, “Making humans edit XML is sadistic!”
YAML is readable and easy to write both. There is also a light weight Python module called PyYAML to read the format. Using YAML the new BackupWire configuration files are definitely to the point and concise without being complicated to edit. Here is an example of the new configuration format we developed in YAML syntax:
name: "Sample Backup" to: "/backup/" frequency: "daily" tasks: - run: command: 'df -h' log_output: True - archive: name: "etc.tbz" contents: ["/etc/", "/opt/etc/"] - archive: name: "tmp.tbz" contents: [ "/tmp/" ] # Dump a database using a run task with # %(targetFolder)s to locate the destination. - run: command: 'mysqldump --quick --extended-insert --compact --single-transaction -u backup --databases sample | bzip2 >%(targetFolder)s/mysql-sample.sql.bz2' ---
Not too bad as far as readability goes and all standardized YAML to spare the world from yet one more syntax.Author: Alexander Ljungberg Tags: Technology