14 March 2015

Clean Code

Along your life there are not many books that really change your way of thinking or doing things. In my case I can count with my fingers of one hand the books like those that I've met: Kurose & Ross's "Computer Networking: a Top-Down Approach", Ross J.Anderson's "Security Engineering: A Guide to Building Dependable Distributed Systems", and the book this article is talking about Robert C. Martin's "Clean Code: a Handbook of Agile Software Craftsmanship".

I met this book in one of the PyConEs-2013 conferences. In that conference they talked about how to write code sustainable along time. The topic was very interesting to me because I was worried about a phenomenon every programmer know sooner or later: even in Python, when your code grows it gets harder to be maintenable. I had programmed applications that some months later where hard to understand when I had to make a revision over them. Many years before that I had switched to Python to avoid that same problem in Java and Perl, but then it was there again. In the conference they promised that principles explained in that book helped to prevent the problem. So I read the book and I have to admit that they were right.

Reading this book is shocking. There are so many practices that we think that are right that actually are terribly wrong that you first read some passages with a mixture of surprise and incredulity. Things like saying that code comments are a recognition of your failure to make your code readable sounds strange in the first read but afterwards you really get that author is really right.

Book examples are not in Python but in Java, nevertheless I think that no Python programmer would have any problem to grasp concepts explained there. Few of the concepts are too Java-ish but many others are useful to Python developers. Some of the main concepts are:

  • Your function names should explain clearly what the function do. No abbreviations allowed in function names.
  • Function should do one thing and one thing only. A function should have only one purpose. Of course, a function can have many steps but all of them should be focused to get function's goal, and every step should be actually implemented in it's own function. That lead to functions easier to test.
  • Functions should be short: 2 lines is great, 5 lines is good, 10 lines average, 15 poor.
  • Code comments should be restricted only to explain design decisions instead of what code does.
  • Don't mix levels of abstraction in the same function, meaning that you should not call directly python API while other steps of your function call to your own custom functions. Instead of that wrap your call to API inside another custom function.
  • Order your implementations so you can read your code from top to down.
  • Reduce as far as possible the number of arguments you pass into functions. Functions with 1 argument are good, 2  are average and 3 is likely poor.
  • Don't Repeat Yourself (well, at least this concept was known to me before reading this book).
  • Classes should be small.
  • Classes should have only one reason to change (Single Responsibility Principle). IMHO I think this principle is a logic extension of "single purpose" for functions.
  • Class attributes should be ideally used for all class methods.If you find attributes just used by an small subset of methods you should ask yourself if those attributes and methods could go in a separate class.
  • Classes should be open for extension but close to modifications. That means that we incorporate new features by subclassing existing classes not modifying them. That way we reduce the risk of breaking things when we include new features.
  • TDD or condemn yourself to hell of include further modifications in your code fearing you are going to break the whole thing.
There are many more concepts, all fully explained with examples, but those are the ones I keep in my head when a write code.

To test if principles of this book were right, I developed an application called Geolocate following these concepts and TDD ones. In the beginning it was hard to change my behaviour about writing code but as my code was getting bigger I realized it was easier than in my previous projects to find errors and fix them. Besides, when my application got a respectable size I let it rest for five months to see how easy was to retake development after so much time without reading the code. I was amazed. Although with my previous projects I would have needed some days to understand a code so big, this time I had fully recovered control of how my code worked in just an hour.

My conclusion is that this book is a "must read" that will let you improve dramatically your code quality and your peace of mind to maintain that same code afterwards.

08 January 2015

Python test coverage tools

I guess there are many metrics out there to know how effective are your unittest proofs to cover all possible cases of your developments. In the future I'm going to formalize my knowledge in TDD but nowadays I'm just playing with the concepts so I follow a very simple metric: If my tests ejecutes all my code program then I'm doing OK.

How to know if your tests executes all your code?. When code grows so your tests amount do, then it's easy to miss a fragment of code and leave it untested. There is where test coverage tools come to help you.

Those tools follow your tests while they execute your code and take note of visited code lines. That way, after tests execution you can see statistics about which percentage of your code is actually tested and which not.

We are going to see two ways to analyze your test coverage: from console and from your IDE.

If your are in an Ubuntu box you should install "python3-coverage" to get coverage analysis in the console:
dante@Camelot:~/project-directory$ sudo aptitude install python3-coverage


Once installed, python3-coverage has to be called to run your unitests:

dante@Camelot:~/project-directory$ python3-coverage run --omit "/usr/*" ./run_tests.py

I use "--omit" flag to keep "/usr" out of coverage reports. Before using that flag calls of my program to external libraries were included in the report. As I don't test external libraries because they are not developed by me, getting their coverage statistics would make my reports harder to read. The script "run_tests.py" is the same I explained in my article about unittest.

Suppose all your test run correctly, you can generate a report about your coverage in html or xml format. To get a report in html format, just run:

dante@Camelot:~/project-directory$ python3-coverage html

This command generates a folder called "htmlcov" where your html report is stored (index.html is its entry page). The only thing a bit a annoying is that htmlcov has to be removed manually before generating a new one. Besides it's boring to search generated index page to open report. That's why I prefer to use an script to automate all those boring things:

#!/usr/bin/env bash

python3-coverage run --omit "/usr/*" ./run_tests.py
echo "Removing previous reports..."
rm -rf htmlcov/*
rmdir htmlcov
echo "Removed."
echo "Building coverage report..."
python3-coverage html
echo "Coverage report built."
echo "Let's show report."
(firefox htmlcov/index.html &> /dev/null &)
echo "Bye!"

Running that script (I call it: "run_test_and_coverage.sh") I get an automatically opened firefox browser showing your just created coverage report.

If you use and IDE, chances are that it includes some sort of coverage tool. PyCharm includes coverage support in its professional version. Actually with PyCharm you get more or less the same than from console tools but integrated with your editor in a more confortable way.

At application level, default configuration should be enough:



I guess if you have not installed system package "python3-coverage" you should check "Use bundled coverage.py" option to use native coverage tool included with  PyCharm. In my case I haven't notice any difference either checking or unchecking that option (obviously I have "python3-coverage" installed).

The only tricky thing is to remember that running test with coverage support has its own button in PyCharm interface. That button is located next to "Run" and "Debug" ones:



After using that button, you get a summary panel right your main editor with percentages showing coverages in each folder. Outliner at left hand side marks coverage by each source file. Besides, your main editor window will get colored to mark lines covered with green color and not covered in red.

Click image to enlarge


Keep your test coverage as near as possible to 100% is one of your best indicator your tests are well designed. To control it you can use console or your IDE tool, its your choice, but both of then are easy enough to use them often.