19 December 2021

How to create your own custom Docker images


In a previous article we covered the basics of docker images usage. But there we used images built by others. That's great if you find the image you look for, but what happens if none fits your needs?

In this article we are going to explain how to create your own images and upload them to Docker Hub so they can be easily downloaded in your projects environments and shared with other developers.


To cook you need a recipe

Provided that you followed the tutorial previously linked, you'll have docker already installed in your linux box. Once docker is installed you need a recipe to tell it how to "cook" your image. That recipe is a file called Dockerfile that you'll create in a folder of your choice with any files you want to include in your image.

What you cook with that recipe? An image, but what is a docker image? A docker image is a ready to use virtual linux operating system with a specific configuration and set of dependencies installed. How you use that image is up to you. Some images are provided "as-is", as a base point where you are supposed to install your application and run it inside. Other images are more specific and contains an app that is executed as soon you run that image. Just keep in your mind that a docker image is like what an OOP language calls a "class", while a container is an instance of that class (an OOP language would call it an "object"). You may have many containers running after being started from the same image.

To understand how a Dockerfile works, we are going to asses some examples. 

Let's start with this Dockerfile from the vdist project. The image you build with that Dockerfile is intended to compile a python distribution and run an application called fpm over a folder is created at runtime when an image container is used. So, this Dockerfile is supposed to install every dependency to make possible compiling python and running fpm.

As a first step, every Dockerfile starts defining which image you are going to use as starting point. Following with the OOP metaphor, being our image a class, we need to define from which class inherits ours.

In this case, our image is derived from ubuntu:latest image, but you can use any image available at Docker Hub. Just take care to check that image Docker Hub page to find out which tag to use. Once you upload your image to Docker Hub others may use it as a base point for their respective images.

Every art piece must be signed. Your image is no different so you should define some metadata for it:

The real match comes with RUN commands. Those are what you use to configure your image. 

Some people misunderstand what RUN commands are intended for the first time they try to build a docker image. RUN commands are not executed when a container is created from an image with "docker run" command. Instead they are run only once by "docker build" to create an image from a Dockerfile.

The exact set of RUN command will vary depending of your respective project needs. In this example RUN commands check apt sources list are OK, update apt database and install a bunch of dependencies using both "apt-get" and "gem install". 

However, you'd better start your bunch of RUN commands with a "RUN set -e" this will make fail your entire image building process if any RUN command returns and error. May seem an extreme measure, but that way you are sure you are not uploading an image with unadvertised errors. 

Besides, when you review some Dockerfiles from other project you will find many of them include several shell commands inside the same RUN command as our example does in lines 14-16. Docker people recommends including inside the same RUN command a bunch of shell commands if they are related between them (i.e. if two commands have no sense separated, being executed without the other, they should be run inside the same RUN command). That's because of how images are built using a layer structure where every RUN command is a separate layer. Following Docker people advise, your images should be quicker to rebuilt when you perform any change over its Dockerfile. So to include several shell commands inside the same RUN command, remember to separate those commands in a line for each and append every line with a "&& \" to chain them (except the last line as the example shows). 

Apart of RUN commands, there are others you should know. Let's review the Dockerfile from my project markdown2man. That image is intended to run a python script to use Pandoc to make a file conversion using arguments passed by the user when a container is started from that image. So, to create that image you can find some now already familiar commands:


But, from there things turn interesting with some new commands.

With ENV commands you can create environments variables to be used in following building commands so you don't have to repeat the same string over and over again and simplify modifications:


Nevertheless, be aware that environment variables created with ENV commands outlast building phase and persist when a container is created from that image. That can provoke collisions with other environments variables created further. If you only need the environment variable for the building phase and want it removed when a image container is created, then use ARG commands instead of ENV ones.

To include your application files inside the image use COPY commands:


Those commands copy a source file, relative to the Dockerfile location, from the host where you are building the image to a path inside that image. In this example, we are copying a requirements.txt, which is located alongside Dockerfile, to a folder called /script (as SCIPT_PATH environment variable defines) inside the image.

Last, but not least, we find another new command: ENTRYPOINT.


An ENTRYPOINT defines which command to run when a container is started from this image so arguments passed to "docker run" are actually passed to this command. Container will stay alive until command defined at ENTRYPOINT returns. 

ENTRYPOINTS are great to use docker containers to run commands without needing to pollute your user system with packages needed to run those commands.

Time to cook

Once your recipe is ready, you must cook something with it.

When your Dockerfile is ready, use docker build command to create an image from it. Provided you are at the same folder of your Dockerfile:

dante@Camelot:~/Projects/markdown2man$ docker build -t dantesignal31/markdown2man:latest .
Sending build context to Docker daemon  20.42MB
Step 1/14 : FROM python:3.8
 ---> 67ec76d9f73b
Step 2/14 : LABEL maintainer="dante-signal31 (dante.signal31@gmail.com)"
 ---> Using cache
 ---> ca94c01e56af
Step 3/14 : LABEL description="Image to run markdown2man GitHub Action."
 ---> Using cache
 ---> b749bd5d4bab
Step 4/14 : LABEL homepage="https://github.com/dante-signal31/markdown2man"
 ---> Using cache
 ---> 0869d30775e0
Step 5/14 : RUN set -e
 ---> Using cache
 ---> 381750ae4a4f
Step 6/14 : RUN apt-get update     && apt-get install pandoc -y
 ---> Using cache
 ---> 8538fe6f0c06
Step 7/14 : ENV SCRIPT_PATH /script
 ---> Using cache
 ---> 25b4b27451c6
Step 8/14 : COPY requirements.txt $SCRIPT_PATH/
 ---> Using cache
 ---> 03c97cc6fce4
Step 9/14 : RUN pip install --no-cache-dir -r $SCRIPT_PATH/requirements.txt
 ---> Using cache
 ---> ccb0ee22664d
Step 10/14 : COPY src/lib/* $SCRIPT_PATH/lib/
 ---> d447ceaa00db
Step 11/14 : COPY src/markdown2man.py $SCRIPT_PATH/
 ---> 923dd9c2c1d0
Step 12/14 : RUN chmod 755 $SCRIPT_PATH/markdown2man.py
 ---> Running in 30d8cf7e0586
Removing intermediate container 30d8cf7e0586
 ---> f8386844eab5
Step 13/14 : RUN ln -s $SCRIPT_PATH/markdown2man.py /usr/bin/markdown2man
 ---> Running in aa612bf91a2a
Removing intermediate container aa612bf91a2a
 ---> 40da567a99b9
Step 14/14 : ENTRYPOINT ["markdown2man"]
 ---> Running in aaa4774f9a1a
Removing intermediate container aaa4774f9a1a
 ---> 16baba45e7aa
Successfully built 16baba45e7aa
Successfully tagged dantesignal31/markdown2man:latest

dante@Camelot:~$

I you weren't at the same folder than Dockerfile you should replace that ".", at the end of command, with a path to Dockerfile folder.

The "-t" parameter is used to give a proper name (a.k.a. tag) to your image. If you want to upload your image to Docker Hub try to follow its naming conventions. For an image to be uploaded to Docker Hub its name should be composed like: <docker-hub-user>/<repository>:<version>. You can see in the last console example that docker-hub-user parameter was dantesignal31 while repository was markdown2man and version was latest.

Upload your image to Docker Hub

If building process ended right you should be able to find your image registered in your system.

dante@Camelot:~/Projects/markdown2man$ docker images
REPOSITORY                   TAG                 IMAGE ID       CREATED          SIZE
dantesignal31/markdown2man   latest              16baba45e7aa   15 minutes ago   1.11GB


dante@Camelot:~$

But a only locally available image has a limited use. To make that image globally available you should upload it to Docker Hub. But to do that you first need an account at Docker Hub. The process to sign up for a new account ID is similar to any other online service.

Done the sign up process, login with your new account to Docker Hub. Once logged in, create a new repository. Remember that whatever name you give to your repository it will be prefixed with your username to get the repo full name.


In this example, full name for the repository would be mobythewhale/my-private-repo. Unless you are using a paid account you'll probably set a "Public" repository.

Remember to tag your image with a repository name according what you created at Docker Hub.

Before you can push your image you have to login to Docker Hub from your console with "docker login":

dante@Camelot:~/Projects/markdown2man$ docker login
Authenticating with existing credentials...
WARNING! Your password will be stored unencrypted in /home/dante/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
dante@Camelot:~$

First time you log in you will be asked your username and password.

Once logged you can now upload your image with "docker push":

dante@Camelot:~/Projects/markdown2man$ docker push dantesignal31/markdown2man:latest
The push refers to repository [docker.io/dantesignal31/markdown2man]
05271d7777b6: Pushed 
67b7e520b6c7: Pushed 
90ccec97ca8c: Pushed 
f8ffd19ea9bf: Pushed 
d637246b9604: Pushed 
16c591e22029: Pushed 
1a4ca4e12765: Pushed 
e9df9d3bdd45: Mounted from library/python 
1271cc224a6b: Mounted from library/python 
740ef99eafe1: Mounted from library/python 
b7b662b31e70: Mounted from library/python 
6f5234c0aacd: Mounted from library/python 
8a5844586fdb: Mounted from library/python 
a4aba4e59b40: Mounted from library/python 
5499f2905579: Mounted from library/python 
a36ba9e322f7: Mounted from library/debian 
latest: digest: sha256:3e2a65f043306cc33e4504a5acb2728f675713c2c24de0b1a4bc970ae79b9ec8 size: 3680

dante@Camelot:~$

Your image is now available at Docker Hub, ready to be used like anyone else.

Conclusion

We reviewed the very basics about how build a docker image. From here, the only way is practicing creating increasingly complex images starting from the simpler ones. Fortunately Docker has a great documentation so it should be easy to solve any blocker you find in your way.

Hopefully all this process should be greatly simplified when Docker Desktop is finally available for linux as now is for windows or mac.

17 December 2021

How to create your own custom Actions for GitHub Actions


In my article about GitHub Actions we reviewed how you can build entire workflows just using premade bricks, called Actions, that you can find at GitHub Marketplace

That marketplace has many ready to use Actions but chances are that sooner than later you'll need to do something that has no action at markeplace. Sure, you can still use your own scripts. In that article I used a custom script (./ci_scripts/get_version.py) at section "Sharing data between steps and jobs". Problem with that approach is that if you want to do the same task in another project you need to copy your scripts between project repositories and adapt them. You'd better transform your scripts to Actions to be easily reusable not only in your projects but publicly with other people projects.

Every Action you can find at marketplace is made in one of the ways I'm going to explain here. Actually if you enter to any Action marketplace page you will find a link, at right hand side, to that Action repository so you can asses it and learn how it works.

There are 3 main methods to build your own GitHub Actions:

  • Composite Actions: They are the simpler, and quicker, but a condition to use this way is that your Action should be based in a self-sufficient script that needs no additional dependency to be installed. It should run only with an standard linux distribution offers.
  • Docker Actions: If you need any dependency to make your script work then you'll need t follow this way.
  • Javascript Actions: Well... you can write your own Actions with javascript, but I don't like that language so I'm not going to include it in this article.
The problem with your Actions dependencies is that they can pollute the workflow environment where your Action is going to be used. Your action dependencies can even collide with those of the app being built. That's why we need to encapsulate our Action and its dependencies to be independent of the environment of the app being built. Of course, this problem does not apply if your Action is intended to setup the workflow environment installing something. There are Actions to, for example, installing and setup Pandoc to be used by the workflow. Problem arises when your Action is intended to do one specific task not related to installing something (for example copying files) and it does install something under the table, as that con compromise the workflow environment. So, best option if you need to install anything to make your Action work is installing it in a docker container and make your Action script run from inside that container, entirely independent of workflow environment. 

Composite Actions

If your Action just needs a bunch of bash commands or a python script exclusively using its built-in standard library then composite Actions is your way to go.

As an example of a composite action, we are going to review how my Action rust-app-version works. That Action looks for a rust Cargo.toml configuration file and read which version is set there for the rust app. That version string is offered as the Action output and you can use that output in your workflow, for instance, to tag a new release at GitHub. This action only uses modules available at standard python distribution. It does not need to install anything at user runner. It's true that there is a requirements.txt at rust-app-version repository but those are only dependencies for unit testing.

To have your own composite Action you first need a GitHub repository to host it. There you can place the few files really needed for your action.

At the very root of your repository you need a file called "action.yml". This file is really important as it models your Action. Your users should be able to think about your action as a black box. Something where you enter some inputs and you receive any output. Those inputs and outputs are defined in action.yml.

If we read the action.yml file at rust-app-version we can see that this action only needs an input called "cargo_toml_folder" and actually that input is optional as it can receive a value of "." if it is omitted when this action is called:



Outputs are somewhat different as the must refer to the output of an specific step in your action:


In last section we specify that this action is going to have just one output called "app_version" and that output is going to be the output called "version" of an step with an id value of "get-version".

Those inputs and outputs define what your action consumes and offers, i.e. what your action does. How your action does it is defined under "runs:" tag. There you set that your Action is a composite one and you call a sequence of steps. This particular example only has one step but you can have as many steps as you need:



Take note of line 22 where that steps receives a name: "get-version". That name is important to refer to this step from outputs configuration.

Line 24 is where your command is run. I only executed one command. If you needed multiple commands to be executed inside the same step, then you should use a bar after run: "run: |". With that bar you mark that next few lines (indented under "run:" tag) are lines separated commands to be executed sequentially.

Command at line 24 is interesting because of 3 points:
  • It calls an script located at our Action repository. To refer to our Action repository root use the github.action_path environment variable. The great thing is that although our script is hosted at its repository, GitHub runs it so that it can view the repository of the workflow from where it is called. Our script will see the workflow repository files as it was run from its root.
  • At the end of the line you may see how inputs are used through inputs context.
  • The weirdest thing of that line is how you setup that step output. You set a bash step output doing an echo "::set-output name=<ouput_name>::<output_value>". In this case name is version and its value is what get_version.py prints to console. Be aware that output_name is used to retrieve that output after step ends through ${{ steps.<id>.outputs.<output_name> }}, in this case ${{ steps.get_version.outputs.version }}
Apart from that, you only need to setup your Action metadata. That is done in the first few lines:


Be aware that "name:" is the name your action will have at GitHub Marketplace. The another parameter, "description:", its the short explanation that will be shown along name in the search results at Markeplace. And "branding:" is only the icon (from Feather icon suite) and color that will represent your action at Markeplace.

With those 24 lines at action.yml and your script at its respective path (here at rust_app_version/ subfolder), you can use your action. You just need to push the button that will appear in your repository to publish your action at Marketplace. Nevertheless, you'd better read this article to the end because I have some recommendations that may be helpful for you.

Once published, it becomes visible for other GitHub users and a Marketplace page is created for your action. To use an Action like this you only need to include in your workflow a configuration like this:



Docker actions

If your Action needs to install any dependency then you should package that Action inside a docker container. That way your Action dependencies won't mess with your user workflow dependencies.

As an example of a docker action, we are going to review how my Action markdown2man works. That action takes a README.md file and converts it to a man page. Using it you don't have to keep two sources to document your console application usage. Instead of that you may document your app usage only with README.md and convert that file to a man page.

To do that conversion markdown2man needs Pandoc package installed. But Pandoc has its respective dependencies, so installing them at user runner may break his workflow. Instead of it, we are going to install those dependencies in a docker image and run our script from that image. Remember that docker lets you execute scripts from container interacting with host files.

As with composite Actions, we need to create an action.yml at Action repository root. There we set our metadata, input and outputs like we do with composite actions. The difference here is that this specific markdown2man Action does not emit any output, so that section is omitted. Section for "runs:" is different too:


In that section we specify this Action is a docker one (at "using:"). There are two ways use a docker image in your action: 
  • Generate an specific image for that action and store it at GitHub docker registry. In that case you use the "image: Dockerfile" tag.
  • Use a prebuilt image from DockerHub registry. To do that you use the "image: <dockerhub_user>:<docker-image-tag>" tag.
If the image you are going to build is exclusively intended to be used at GitHub Action I would follow Dockerfile option. Here, with markdown2man we follow the Dockerfile approach so a docker image is build any time Action is run after a Dockerfile update. Generated image is cached at GitHub registry to be offered quicker to further Actions. Remember a Dockerfile is a kind of a recipe to "cook" an image, so commands that file contains are only executed when the image is built ("cooked"). Once build, the only command that is run is the one you set at entrypoint tag, passing in arguments set at "docker run".                                                                                                                                                                          The "args:" tag has every parameter to be passed to our script at the container. You will probably use your input here to be passed to our script. Be aware that as it happened in composite action, here user repository files are visible to our container.

As you may suspect by now, docker actions are more involved than composite Actions because of the added complexity of creating the Dockerfile. The Dockerfile for markdown2man is pretty simple. As markdown2man script is a python one, we make our image derive from the official docker image for version 3.8:



Afterwards, we set image metadata:


To configure your image, for example installing things, you use RUN commands.


ENV command generates environment variables to be used in your Dockerfile commands:


You use COPY command to copy your requirements.txt from your repository and include it in your generated image. Your scripts are copied fro your Action repository to container following the same approach:


After script files are copied, I like to make then executable and link them from /usr/bin/ folder to include it at the system path:


After that, you set your script as the image entrypoint so this script is run once image is started and that script is provided with arguments you set at the "args:" tag at action.yml file.



You can try that image at your computer building that image from the Dockerfile and running that image as a container:

dante@Camelot:~/$ docker run -ti -v ~/your_project/:/work/ dantesignal31/markdown2man:latest /work/README.md mancifra


dante@Camelot:~$

For local testing you need to mount your project folder as volume (-v flag) if your scripts to process any file form that repository. Last two argument in the example (work/README.md and mancifra) are the arguments that must be passed to entrypoint.

And that's all. Once you have tested everything you can publish your Action and use it in your workflows:


With a call like that a man file called cifra.2.gz should be created at man folder. If manpage_folder does not exist then markdown2man creates it for you.


Your Actions are first class code

Although your Action will likely be small sized, you should take of them as you would with your full-blown apps. Be aware that many people will find and your Actions through Marketplace in their workflows. An error in your Action can brreak many workflows so be diligent and test your Action as you would with any other app.

So, with my Actions I follow the same approach as in other projects and I set up a GitHub workflow to run tests against any pushes in a staging branch. Only once those tests succeed I merge staging with main branch and generate a new release for the Action.

Lets use the markdown2man workflow as example. There you can read that we have two test types:

  • Unit tests: They check the python script markdown2man is based on.

  • Integration tests: They check markdown2man behaviour as a GitHub Action. Although your Action was not published yet you can install it from a workflow in the same repository (lines 42-48). So, what I do is calling the Action from the very same staging branch we are testing and I use that Action with a markdown I have ready at test folder. If a proper man page file is generated then integration test is passed (line 53). Having the chance to test an Action against its own repository is great as it lets you test your Action as people would use it without needing to publish it.

In addition to testing it, you should write a README.md for your action in order to explain in detail how to use your Action. In that document you should include at least this information:

  • A description of what the action does.
  • Required input and output arguments.
  • Optional input and output arguments.
  • Any secret your action needs.
  • Any environment variable your action uses.
  • An example of how to use your action in a workflow.

And you should add too a LICENSE file explaining the usage terms for your Action.


Conclusion

The strong point of GitHub Action is the high degree of reusability and sharing it promotes. Every time you find yourself repeating the same bunch of commands you are encouraged to make and Action with those commands and share it through Marketplace. Doing that way you get a piece of functionality easier to use throughout your workflows than copy-pasting commands and you contribute to improve the Marketplace so that others can benefit too from that Action. 

Thanks to this philosophy GitHub Marketplace has grown to a huge amount of Actions, ready to use and to help you to save you from implementing that functionality by your own.