YAML Tutorial – A Comprehensive Guide To YAML Using Python

This YAML Tutorial Explains What is YAML, Basic Concepts of YAML such as data types, YAML Validator, Parser, Editor, Files, etc with the help of Code Examples using Python:

Text processing in computer science helps programmers to create configurable programs and applications. Markup languages play a vital role in storing and exchanging data in a human-readable format.

Furthermore, programmers use markup languages as common, and standard data interchange formats between different systems. Some examples of markup languages include HTML, XML, XHTML, and JSON.

We have shared information on one more markup language in this easy to follow YAML Tutorial.

YAML Tutorial

This tutorial helps the readers in finding answers to the below-mentioned questions. Learners can take the first steps and understand the mystery of markup languages in general and YAML in particular.

The Questions include:

  • Why do we need markup languages?
  • What does YAML stand for?
  • Why was YAML created?
  • Why Do We Need to learn YAML?
  • Why is it important today to learn YAML?
  • What type of data can I store in a YAML?

This guide is useful for experienced readers also as we discuss concepts in the context of programming in general, and also in the context of software testing. We will also cover topics such as Serialisation and Deserialization here.

What Is YAML

Creators of YAML initially named it as “Yet Another Markup language.” However, with time the acronym changed to “YAML Ain’t a MarkUp language.” YAML is an acronym that refers to itself and is called a recursive acronym.

We can make use of this language to store data and configuration in a human-readable format. YAML is an elementary language to learn. Its constructs are easy to understand too.

Clark, Ingy, and Oren created YAML to address the complexities of understanding other markup languages, which are difficult to understand, and the learning curve is also steeper than learning YAML.

To make learning more comfortable, as always, we make use of a sample project. We host this project on Github with MIT license for anyone to make modifications and submit a pull request if required.

You can clone the project using the command below.

git clone git@github.com:h3xh4wk/yamlguide.git

However, if required, you can download the zip file for the code and the examples.

Alternatively, readers can clone this project with the help of IntelliJ IDEA. Please complete the section on prerequisites to install Python and configure it with IntelliJ IDEA before cloning the project.

GetRepository_from_Github
Get Repository From Github

Why Do We Need Markup Languages

It is impossible to write everything in software code. It is because we need to maintain code from time to time, and we need to abstract the specifics to external files or databases.

It is a best practice to reduce the code to as minimum as possible and create it in a manner that it doesn’t need modification for various data inputs that it takes.

For example, we can write a function to take input data from an external file and print its content line by line rather than writing the code and data together in a single file.

It is considered a best practice because it separates the concerns of creating the data and creating the code. The programming approach of abstracting the data from code ensures easy maintenance.

Markup languages make it easier for us to store hierarchical information in a more accessible and lighter format. These files can be exchanged between programs over the internet without consuming much bandwidth and support the most common protocols.

These languages follow a universal standard and support various encodings to support characters almost from all spoken languages in the world.

The best thing about markup languages is that their general use is not associated with any system command, and this characteristic makes them safer and is the reason for their widespread and worldwide adoption. Therefore, you might not find any YAML Commands that we can directly run to create any output.

Benefits Of Using A YAML File

YAML has many benefits. The below-given table shows a comparison between YAML and JSON. JSON stands for JavaScript Object Notation, and we use it as a data-interchange format.

AttributeYAMLJSON
VerbosityLess verboseMore verbose
Data typesSupports complex data types.Does not support complex data types.
CommentsSupports writing Comments using "#".Doesn't support writing comments.
ReadabilityMore human-readable.Lesser human-readable.
Self-referencesSupports referencing elements within the same documents using "&," and *.Doesn't support self-referencing.
Multiple documentsSupports multiple documents in a single file.Supports single document in a single file.

Due to the benefits of YAML over the other file formats such as JSON, YAML is more prevalent among developers for its versatility and flexibility.

Pre-Requisites

We first install Python and then configure Python and its packages with IntelliJ IDEA. Therefore, please install IntelliJ IDEA if not already installed before proceeding.

Install Python

Follow these steps to install and setup Python on Windows 10.

Step #1

Download Python and install it by selecting the setup as shown in the below image.

Download_Python setup
Download Python

Step #2

Start the setup and select customize the installation. Select the checkbox of Adding Python to PATH.

Install_Python_Setup
Customize Installation

Step #3

Customize the location of Python as displayed in the image.

Customize Location
Customize Location

Step #4

Move ahead with the installation. At the end of the installation wizard Disable the path limit on Windows by clicking the option on the Wizard.

Disable Path Limit
Disable Path Limit

Now, Python setup is complete.

Configure Python With IntelliJ IDEA

Let’s now configure IntelliJ IDEA with Python. The first step is to install the Plugins to be able to work on Python projects.

Install Python Plugins

Install Python Community Edition

Python Community Edition Plugin
Python Community Edition Plugin

Install Python Security

Python Security Plugin
Python Security Plugin

Follow the below steps to complete the configuration.

Step #1

Use the File Menu and Go to Platform settings. Click on the Add SDK button.

Platform_settings_SDK_Python
Platform Settings SDK

Step #2

Select the Virtual environment option and select Python’s base interpreter as the one that was installed in the previous step.

Virtual Environment
Virtual Environment

Step #3

Now select the virtual environment created in the previous step under the Project SDK Settings.

Project SDK
Project SDK

We recommend one virtual environment for one project.

Step #4 [Optional]

Open the config.py file from the project explorer and click on install requirements, as shown in the below image.

Python_Install_requirements
Install Requirements

Ignore the ipython requirement if required by unchecking an option in the Choose package dialog.

Ignore Install Requirement
Ignore Install Requirement

Now, you can head over to the next section to learn the basics of YAML.

Basics Of YAML

In this section, we mention the basics of YAML with the help of an example file called config.yml and config.py. We firmly believe that explaining the concepts of YAML in parallel with its use in a Programming language makes learning better.

Therefore, while explaining the basics in YAML, we also involve the use of Python to read and write the data stored in YAML.

Now let’s Create or open the config.yml in our respective editors and understand the YAML.

---
quiz: 
  description: >
    "This Quiz is to learn YAML."
  questions:
    - ["How many planets are there in the solar system?", "Name the non-planet"]
    - "Who is found more on the web?"
    - "What is the value of pi?"
    - "Is pluto related to platonic relationships?"
    - "How many maximum members can play TT?"
    - "Which value is no value?"
    - "Don't you know that the Universe is ever-expanding?"

  answers:
    - [8, "pluto"]
    - cats
    - 3.141592653589793
    - true
    - 4
    - null
    - no
# explicit data conversion and reusing data blocks
extra:
  refer: &id011 # give a reference to data
    x: !!float 5 # explicit conversion to data type float
    y: 8
  num1: !!int "123" # conversion to integer
  str1: !!str 120 # conversion to string
  again: *id011 # call data by giving the reference

Notice that YAML files have .yml extension. The language is case sensitive. We use spaces and not tabs for indentation.

Along with these basics, let’s understand the Data Types. In the YAML mentioned, we have represented the information on a quiz. A quiz is depicted as a root level node, having attributes such as a description, questions, and answers.

YAML Data Types

YAML can store Scalars, Sequences, and Mappings. We have displayed how to write all necessary data types in the file config.yml.

Scalars are strings, integers, floats, and booleans. Data of type Strings are enclosed in double-quotes “. However, YAML doesn’t impose writing strings in double-quotes, and we can make use of > or | for writing long strings in multiple lines.

Look at the various data types and mapped values in the below table.

Data TypeExamples of Data types in Config.yml
String
Strings can be stored with or without quotes.
quiz:
description: >
This Quiz is to learn YAML
questions:
- "Who is found more on the web?"
answers:
- cats
Integer and float
Integers and float are mentioned in their original form
quiz:
questions:
- "What is the value of pi?"
- "How many maximum members can play TT?"
answers:
- 3.141592653589793
- 4
Boolean
Booleans are stored using string true/false or yes/no
quiz:
questions:
- "Is pluto related to platonic relationships?"
- "Don't you know that the Universe is ever-expanding?"
answers:
- true
- no
Sequences
Sequences are created with the help of square brackets [.
quiz:
answers:
- [8, "pluto"]
References
Self-referencing is used with the help of & and *
# explicit data conversion and reusing data blocks
extra:
refer: &id011 # give a reference to data
# Other values
again: *id011 # call data by giving the reference

Enlisted below are some of the worth noting additional elements of a YAML file.

Document

Now notice the three dashes —. It signifies the start of a document. We store the first document with a quiz as the root element, and description, questions & answers as child elements with their associated values.

Explicit Data Types

Observe the section key called extra in the config.yml. We see that with the help of double exclamations, we can explicitly mention the datatypes of the values stored in the file. We convert an integer to a float using !!float. We use !!str to convert an integer to string, and use !!int to convert string to an integer.

Python’s YAML package helps us in reading the YAML file and store it internally as a dictionary. Python stores dictionary keys as strings, and auto converts values to Python data types unless explicitly stated using “!!”.

Read YAML File In Python

In general, we make use of the YAML Editor and a YAML Validator at the time of writing YAML. YAML Validator checks the file at the time of writing.

The Python YAML package has a built-in YAML Parser, that parses the file before storing it in memory.

Now let’s create and open config.py in our respective editors with the below content.

import yaml
import pprint

def read_yaml():
    """ A function to read YAML file"""
    with open('config.yml') as f:
        config = yaml.safe_load(f)

    return config

if __name__ == "__main__":

    # read the config yaml
    my_config = read_yaml()

    # pretty print my_config
    pprint.pprint(my_config)

To test that you have completed the outlined steps mentioned above, run config.py.

Open the config.py file in IntelliJ IDEA, locate the main block and run the file using the play icon.

Config_Run_example
Config Run example

Once we run the file, we see the console with the output.

Config_Run_example_output
Config Run example output

In read_yaml function, we open the config.yml file and use the safe_load method of the YAML package to read the stream as a Python dictionary and then return this dictionary using the return keyword.

my_config variable stores the content of the config.yml file as a dictionary. Using Python’s pretty print package called pprint, we print the dictionary to the console.

Notice the above output. All the YAML tags correspond to Python’s data types so that the program can further use those values. This process of constructing Python objects from the text input is called Deserialisation.

Write YAML File In Python

Open config.py and add the following lines of code just below the read_yaml method and above the main block of the file.

def write_yaml(data):
    """ A function to write YAML file"""
    with open('toyaml.yml', 'w') as f:
        yaml.dump(data, f)

In the write_yaml method, we open a file called toyaml.yml in write mode and use the YAML packages’ dump method to write the YAML document to the file.

Now add the below lines of code at the end of the file config.py

# write A python object to a file
write_yaml(my_config)

Save the config.py and run the file using the below command or using the play icon in the IDE.

python config.py

We see that the above command prints the contents of config.yml to the console or system’s output. Python program writes the same content to another file called toyaml.yml. The process of writing the Python object to an external file is called Serialisation.

Multiple Documents In YAML

YAML is quite versatile, and we can store multiple documents in a single YAML file.

Create a copy of the file config.yml as configs.yml and paste the below lines at the end of the file.

---
quiz:
  description: |
    This is another quiz, which
    is the advanced version of the previous one
  questions:
    q1:
      desc: "Which value is no value?"
      ans: Null
    q2:
      desc: "What is the value of Pi?"
      ans: 3.1415

Three dashes — in the above snippet marks the beginning of a new document in the same file. Use of | after the description tag enables us to write a multi-line text of type string. Here in the new document, we have stored questions, and answers as separate mappings nested under questions.

Now create a new file called configs.py and paste the below-mentioned code into the file.

import yaml
import pprint

def read_yaml():
    """ A function to read YAML file"""
    with open('configs.yml') as f:
        config = list(yaml.safe_load_all(f))

    return config

def write_yaml(data):
    """ A function to write YAML file"""
    with open('toyaml.yml', 'a') as f:
        yaml.dump_all(data, f, default_flow_style=False)

if __name__ == "__main__":

    # read the config yaml
    my_config = read_yaml()

    # pretty print my_config
    pprint.pprint(my_config)

    # write A python object to a file
    write_yaml(my_config)

Notice the changes in read_yaml and write_yaml functions. In read_yaml, we use the safe_load_all method of the YAML package to read all the documents present in configs.yml as a list. Similarly, in write_yaml, we use the dump_all method to write the list of all the previously read documents to a new file called toyaml.yml.

Now run configs.py.

python configs.py

The output of the above command is displayed below.

[{'quiz': {'answers': [[8, 'pluto'],
                       'cats',
                       3.141592653589793,
                       True,
                       4,
                       None,
                       False],
           'description': 'This Quiz is to learn YAML',
           'questions': [['How many planets are there in the solar system?',
                          'Name the non planet'],
                         'Who is found more on the web?',
                         'What is the value of pi?',
                         'Is pluto related to platonic relationships?',
                         'How many maximum members can play TT?',
                         'Which value is no value?',
                         "Don't you know that Universe is ever-expanding?"]}},
 {'quiz': {'description': 'This is another quiz, which\n'
                          'is the advanced version of the previous one\n',
           'questions': {'q1': {'ans': None,
                                'desc': 'Which value is no value?'},
                         'q2': {'ans': 3.1415,
                                'desc': 'What is the value of Pi?'}}}}]

The output is similar to the previously mentioned single document output. Python converts every document in the configs.yml into a Python dictionary. It makes it easier for further processing and use of the values.

Frequently Asked Questions

You may come across the below questions while working with YAML.

Q #1) Is it possible to preserve the Order of YAML Mappings?

Answer: Yes, it is possible to customize the default behavior of the loaders in Python’s pyYAML package. It involves the use of OrderedDicts and overriding the Base resolver with custom methods, as shown here.

Q #2) How to store an image in YAML?

Answer: You can base64 encode an image and keep it in YAML, as shown below.

image: !!binary |
 iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mP8/5+hHgAHggJ/PchI7wAAAABJRU5ErkJggg==

Q #3) What is the difference between > and | tags in YAML?

Answer: Both > and | allow writing values in multiple lines in YAML. We use greater than symbol > to write multi-line strings and | to represent literal values. Values written using | need not be escaped. For example, we can store Html by using |.

template: |

  <p>This is a test paragraph</p>
  <blockquote>
    <p> This is another paragraph</p>
  </blockquote>

Q #4) What is the significance of … at the end of the YAML file.

Answer: Three periods … are optional identifiers. These can be used to mark the end of the document in a stream.

Q #5) How to write comments in the YAML file?

Answer: We use # to write a single line comment. YAML doesn’t support multi-line comments. Thus, we need to use # in multiple lines, as shown below.

# this is 
# a single line as well as multi-line 
# comment

Conclusion

In this guide, we covered the steps of preparing the development environment in both Windows as well as Linux to get started with YAML. We nearly discussed all the concepts of YAML’s basic data types, YAML editor, and YAML Parser.

We have also highlighted the benefits of using YAML vis-a-vis other markup languages and provided code examples with the help of a supporting sample project. We hope that now the learners can use YAML to abstract data from application logic to write efficient and maintainable code.

Happy Learning!!