Dante's Lab.

25 December 2014

Exporting and importing a virtualenv

One nice thing I recently learnt about virtualenv environments is that they ease project exportation.

When you give one of your project to a friend or colaborator, inside a compressed file, you have to tell him which dependencies to install to run the project. Fortunately virtualenv (actually pip) gives you and automated way to do it.

Suppose you have a project folder you want to export and suppose you have made a virtualenv for that project. With your virtualenv activated, run:




(env)dante@Camelot:~/project-directory$ pip freeze > requirements.txt

This will create a file called requirements.txt. In that file pip will place all packages names and versions installed for that virtualenv. Generated file (in our case requirements.txt) should be included in exported file bundle.

To import a project exported that way, importer should uncompress project folder and create a virtualenv in its location. With that virtualenv activated pip should be called this way:




(env)otherguy@host:~/project-directory$ pip install -r requirements.txt

This call to pip will install all packages and versions included in requirements.txt. Easy and efficient.

16 November 2014

Visual Studio Community

Great news for developers. Following the new trends started by Satya Nadella, Microsoft announced last 12 of November that a new Visual Studio version was about to be launched with wide opened license allowing free use by individuals and small development groups for either commercial and not commercial uses.

This new version is called Visual Studio Community and arrives at update 4 for Visual Studio 2013. Unlike Visual Studio Express edition Community promises all Visual Studio features for free, specially appreciated are plugin and cross platform support. With Express edition you had no access to the over 5.000 extensions (a.k.a plugins) available for Visual Studio, Community edition changes that and will let you install all plugins you need. Besides Express edition was focused on platform specific development (Web version, PC version, etc). Community edition, following Microsoft new guidelines towards device convergence, unifies that and lets you develop targeting cross-platforms.

With those features, why download Visual Studio Express?, actually I don't know. Some say Community version of Visual Studio will retire Express version, but Microsoft hasn't done so far.

Following its license, you can download for free Visual Studio Community for solo development, either commercial or not, or for group development under 5 individuals if you or your group are not working for an enterprise (which bassically is defined like an organization with 250 PC or 1 million dollars of yearly revenues). If you fall in enterprise category you can only use Community edition for educational and open sources purpuses.

With its tools highly integrated (designers, debuggers, editors and profilers) and its support for multiple languages like C#, Visual Basic, Visual C++, Javascript, HTML5 and (best of all) Python, Visual Studio is a great choice for developing in Windows ecosystem. I'm pretty happy with PyCharm and it's cross platform support but I guess I'll give a try to Visual Studio next time I deal with a .NET with IronPython.

18 October 2014

Virtual environments for real developments, playing with virtualenv

I don't know what you do but I usually develop multiple projects at the same time. Problem is that sometimes you need libraries for a project that are incompatible with those in another. That's why virtualenv exists in Python world.

Thanks to virtualenv you can keep all the Python versions, libraries, packages, and dependencies for a project separate from one another. It does it creating an isolated copy of python for your project directory without worry of affecting other projects. That is useful too if you have to develop and install libraries in a linux system where you have no sudo/root access to install anything.

Using virtualenv as part of your usual developments tools will keep you away future headaches.

First step about using virtualenv is to be sure about which version of python are you going to use in your project. Original virtualenv doesn't run right with Python 3, so if you use it you should keep yourself in Python 2.7. Fortunately, since 3.3 version Python includes it's own flavour of virtualenv built-in, called venv. Be aware that 3.3 version's venv install with no pip support but thankfully that support was added in Python 3.4. We are going to cover both of them in this article: first virtualenv and venv afterwards.

You can install virtualenv using your linux native package manager or python's pip installer. In Ubuntu you can install it through package manager doing:



dante@Camelot:~$ sudo aptitude install python-virtualenv

Using pip, should be enough with:



dante@Camelot:~$ pip install virtualenv

You can check which version you installed with:



dante@Camelot:~$ virtualenv --version

To use it, just navigate through console to your project directory an run following command:



dante@Camelot:~/project-directory$ virtualenv --no-site-packages env

New python executable in env/bin/python
Installing setuptools, pip...done.
dante@Camelot:~/project-directory$

That command will create a directory called env where binaries to run virtual environment are placed. The --no-site-packages flag truly isolates your work environment from the rest of your system as it does not include any packages or modules already installed on your system. That way, you have a completely isolated environment, free from any previously installed packages.

Before your work can start you should start your virtual environment running activate script:


dante@Camelot:~/project-directory$ source env/bin/activate
(env)dante@Camelot:~/project-directory$

You know you are working in a virtualenv thanks to the directory name surrounded by parentheses to the left of the path in your command line. While you stay in virtualenv all packages you install through pip will be stored in your virtual instance of python leaving your operating system python alone.

To leave virtualenv just type deactivate:


(env)dante@Camelot:~/project-directory$ deactivate
dante@Camelot:~/project-directory$

You should create a virtualenv each time you start a new project.

Using venv in Python 3.4 is not so different. Nevertheless be aware that Ubuntu 14.04 comes with a broken version of venv. If you try to create a virtual environment with venv in Ubuntu 14.04, you'll get an error like this:


dante@Camelot:~/project-directory2$ pyvenv-3.4 env
Error: Command '['/home/dante/project-directory2/env/bin/python3.4', '-Im', 'ensurepip', '--upgrade', '--default-pip']' returned non-zero exit status 1
dante@Camelot:~/project-directory2$

The only way I've found to fix this problem is upgrade Ubuntu release to 14.10:


dante@Camelot:~/project-directory2$ sudo do-release-upgrade -d

Several hours later you'll get an upgraded Ubuntu 14.10 system. There you may find that venv needs to be installed from a python package:



dante@Camelot:~/project-directory2$ cat /etc/issue
Ubuntu 14.10 \n \l
dante@Camelot:~/project-directory2$ pyvenv-3.4 env
El programa «pyvenv-3.4» no está instalado. Puede instalarlo escribiendo:
sudo apt-get install python3.4-venv
dante@Camelot:~/project-directory2$ sudo aptitude install python3.4-venv
Se instalarán los siguiente paquetes NUEVOS:     
  python3.4-venv 
0 paquetes actualizados, 1 nuevos instalados, 0 para eliminar y 0 sin actualizar.
Necesito descargar 1.438 kB de archivos. Después de desempaquetar se usarán 1.603 kB.
Des: 1 http://es.archive.ubuntu.com/ubuntu/ utopic/universe python3.4-venv amd64 3.4.2-1 [1.438 kB]
Descargados 1.438 kB en 0seg. (1.717 kB/s)
Seleccionando el paquete python3.4-venv previamente no seleccionado.
(Leyendo la base de datos ... 237114 ficheros o directorios instalados actualmente.)
Preparing to unpack .../python3.4-venv_3.4.2-1_amd64.deb ...
Unpacking python3.4-venv (3.4.2-1) ...
Processing triggers for man-db (2.7.0.2-2) ...
Configurando python3.4-venv (3.4.2-1) ...
dante@Camelot:~/project-directory2$ pyvenv-3.4 env

That way venv runs with no errors:



dante@Camelot:~/project-directory2$ source env/bin/activate
(env)dante@Camelot:~/project-directory2$

Once in your virtual environment you can install every package you need for your development using pip and without messing python libraries of your operating system. Modern linux distributions make heavy use of python applications, so it's a good practice to keep your operating system's python libraries clean with only what is really needed by your linux and mess with application specific libraries for your developments through their respectives virtual environments. For instance, you would use virtual environment if you needed to develop in Python 3 while keeping your operating system's default python interpreter in 2.7 version:



dante@Camelot:~/project-directory2$ source env/bin/activate
(env) dante@Camelot:~/project-directory2$ python
Python 3.4.2 (default, Oct  8 2014, 13:08:17) 
[GCC 4.9.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> exit()
(env) dante@Camelot:~/project-directory2$ deactivate
dante@Camelot:~/project-directory2$ python
Python 2.7.8 (default, Oct  8 2014, 06:57:53) 
[GCC 4.9.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>

Don't be shy with virtual environments. You can use them in production environments, in fact they are what is actually recommended for production situations where you don't want an update of your operating system libraries break your running developed application dependencies.

23 February 2014

Violent Python

Python is widely used in many fields including maths, physics, engineering, scripting, web programming and, of course, security. Its power to be a glue between many tools and programming languages make it the perfect option for pentesting.

"Violent Python" scratches the surface of python in the world security tools programming world. It's a correct book, actually a correct cookbook. Correct because although the example programs are short and simple they show python in action in many security fields: geolocation, obfuscation, exploit development, network analysis and forgery, web scrapping and a long etcetera.

Problem is that the book is just correct because example program are not very pythonic. Although code is simple and clear, python offers smarter ways to do that things. Besides example programs are unambitious and don't go further of mere curiosities. In my opinion, examples could have been more spectacular and many more fields in security could have been covered.

I don't regret having bought "Violent Python", but maybe I'm a bit dissapointed because book is geared to people in a more initial point than me in the learning journey into security engineering. For that people this book is a fun and a direct approach to security tools development.

15 February 2014

Testing your python code with unittest

When you are programming small applications, development cycle uses to be code->manual_test->code->manual_test. The problem with this method is as you project grows in complexity you have to spend more time testing it to be sure your latest changes don't have collateral effect in any part of your applications. It is usual to forget to test things or believe they are ok after latest changes, only to find that one part of your applications you tested it runned at the beginning of development broke by a change some cycles ago and you didn't realize.

Actually, manual testing is prone to errors and inefficient so, when your project becomes complex, you should automate your testing. One of the most used libraries for automated testing is unittest, present in python since 2.1 version. This library lets you prepare small scripts to test behavior of your program components.

If you want to use unittest to check your code I'd better follow TDD methodology. This method makes you to write first test cases, scripts to check a particular section of your code. These test cases are very useful to force you to define desired behavior and interfaces of the new functions. Once defined tests, and only then, you can write your code keeping in mind your target is to pass tests. When code is finished you put at test, if it passes you can enter your next development cycle (defining tests, write code, execute test), if your code fails the test you fix the code and try again until success.

I know that at a very first glance that this method seems innecesary complex. What a developer wants is coding his application, not spending time coding tests. That's why many developers hate this technique. But after you give it a try you really love it because it gives you great confidence in your code. Once your test are defined you only need to run them after a code change to be sure that change didn't broke anything in a remote spot of your code. Besides if you are working in a project with collaborators test are a great way to be sure that a contribution really works.

Tests can check whatever we want in our application: it's modules, it's function and classes, the GUI, etc. For example, if we were testing a web application we could combine unittest with Selenium to simulate a browser surfing our web, while if we were testing a QT based GUI we should use QTest.

When working with unittest we should keep in mind our main building block will be test cases. A test case should be focused in testing a single scenario. In python a test case is a class which inherits unittest.TestCase. They have this general structure:

    import unittest

    class TestPartOfCode(unittest.TestCase):

 def setUp(self):
     <test initialization>

 def test_something(self):
     <code to test something>
     self.assert... # All the asserts you need to be sure correctness condition is found.
     
 def test_something_else(self):
     <code to test something>
     self.assert... # All the asserts you need to be sure correctness condition is found.

 def tearDown(self):
     <test shutdown>

You can make a test case execute by itself just appending at the end:

if __name__ == '__main__':
    unittest.main()

If you don't do that you have to call your test case externally.

When unittest is run it searchs all subclasses of unittest.TestCase and the executes every method in those subclasses whose names starts with "test_". There are special methods like setUp() and tearDown(): setUp() is run prior to each test to prepare test context, while tearDown() is run after to remove that context.

Usually you don't have just one test case, you have lot of them instead to test every feature in your program. There are many approaches, in GUI applications you could have a test case for each window and the methods of that test case would check every control in that window. Another good rule of thumb is to group together in a test case all tests that share the same setUp() an tearDown() logic.

So you use to have many test case and is more efficient to load them externally to make them run in batch mode. I think is a good practice to keep your tests in a different folder than your code, for example in a "tests" folder inside your project one. I use to place an empty "__init__.py" file inside that folder to make it a package. Let's suppose that is our case, to load and run the test cases you need and script to discover them (I use to call it "run_tests.py"):

    import unittest

    def run_functional_tests(pattern=None):
       print("Running tests...")
       if pattern is None:
           tests = unittest.defaultTestLoader.discover("tests")
       else:
           pattern_with_globs = "%s" % (pattern,)
           tests = unittest.defaultTestLoader.discover("tests", pattern=pattern_with_globs)
       runner = unittest.TextTestRunner()
       runner.run(tests)

    if __name__ == "__main__":
       if len(sys.argv) == 1:
       run_functional_tests()
    else:
       run_functional_tests(pattern=sys.argv[1])

This script is usually placed at the root of your project folder, at the same level of tests directory. If it is called with no arguments it just enters tests folders an loads every test case is found inside whose filename is started by "test". If you call it with an argument, it uses it as a kind of a filter, to only load those test cases placed in python files with a name started by given argument. This way you can run only a subset of you test cases.

With unittest you can test console and web applications and even GUI one. The later are harder to test because access to GUI widgets depends on each implementation and the related tools provided by it. For instance, QT creators offer the QTest module to be used with unittest. This module let you simulate mouse and key clicks.

So, we could use a console or web example to detail how to use unittest, but as QTest tutorials (with pyQT) are so scarce I want to contribute with one of my own, that's why in this article we are going to develop test cases to check a pyQT GUI application. As example's base we are going to use pyQTmake's source code. You'd better get the whole source code using Mercurial as I explained in one of my previous articles. To clone the source code and set it to the version we are going to use type the next in your Ubuntu console:



dante@Camelot:~$ hg clone https://borjalopezm@bitbucket.org/borjalopezm/pyqtmake/ example
requesting all changes
adding changesets
adding manifests
adding file changes
added 10 changesets with 120 changes to 74 files
updating to branch default
67 files updated, 0 files merged, 0 files removed, 0 files unresolved
dante@Camelot:~/Desarrollos$ cd example
dante@Camelot:~/Desarrollos/example$ hg update 9
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
dante@Camelot:~/Desarrollos/example$

Ok, now that you have the source code we are going to work with asses pyqtmake.py code. Focus on function "connections":

def connections(MainWin):
    ## TODO: This signals are connected using old way. I must change it to new way
    MainWin.connect(MainWin.ui.action_About,  SIGNAL("triggered()"),  MainWin.onAboutAction)
    MainWin.connect(MainWin.ui.actionLanguajes,  SIGNAL("triggered()"),  MainWin.onLanguagesAction)
    MainWin.connect(MainWin.ui.actionOpen,  SIGNAL("triggered()"),  MainWin.onOpenAction)
    MainWin.connect(MainWin.ui.actionPaths_to_compilers,  SIGNAL("triggered()"),  MainWin.onPathsToCompilersAction)
    MainWin.connect(MainWin.ui.actionPyQTmake_Help,  SIGNAL("triggered()"),  MainWin.onHelpAction)
    MainWin.connect(MainWin.ui.actionQuit,  SIGNAL("triggered()"),  MainWin.close)
    MainWin.connect(MainWin.ui.actionSave,  SIGNAL("triggered()"),  MainWin.onSaveAction)
    MainWin.connect(MainWin.ui.actionSave_as,  SIGNAL("triggered()"),  MainWin.onSaveAsAction)
    return MainWin

Looks like this bunch of code could be improved to the new style of pyQT's signals connection. The point is that we don't want to break anything so we are going to develop some test cases to be sure our new code performs like the old one.

These connections allows to MainWin reply to mouse clicking on widgets opening appropiate windows. Our test should check these windows are still opened correctly after our changes in the code.

The complete code for these tests is in test_main_window.py file inside tests folder.

To check our application our test first have to start it. Unittest has two main methods prepare context for our test: setUp() and setUpClass(). First method, setUp() is run before every test in our test, whereas setUpClass() is run only once when the whole test case is created.

In this very test case we are going to use setUp() to create application every time we test one of its components:

    def setUp(self):
        # Initialization
        self.app, self.configuration = run_tests.init_application()
        # Main Window creation.
        self.MainWin = MainWindow()
        # SLOTS
        self.MainWin = pyqtmake.connections(self.MainWin)
        #EXECUTION
        self.MainWin.show()
        QTest.qWaitForWindowShown(self.MainWin)
        # self.app.exec_() # Don't call exec or your qtest commands won't reach
                           # widgets.

QTest.qWaitForWindowShown() method stops execution until waited window is really active. If not used we could call for widgets that don't exists yet.

Our first test is going to be really simple:

    def test_on_about_action(self):
        """Push "About" menu option to check if correct window opened."""
        QTest.keyClick(self.MainWin, "h", Qt.AltModifier)
        QTest.keyClick(self.MainWin.ui.menu_Help, 'a', Qt.AltModifier)
        QTest.qWaitForWindowShown(self.MainWin.About_Window)
        self.assertIsInstance(self.MainWin.About_Window, AboutWindow)

QTest.KeyClick() sends a key click to specified widget. It can be used with key modifiers, in this case Qt.AltModifiers means that we are simulating key is pressed at the same time that Alt one. Why am I using a key simulation? Can't QTest simulate mouse clicks? yes it can, problem is that QTest.mouseClick() only can interact with widgets and menu items are not (in QT) but menu actions instead, so the only way to call them is use their keyboard shortcuts (at least as far as I know).

The key call in every test is the "assert..." stuff. This family of functions checks that an specific condition is met, if so test is declared successful, if not is declared failed. There is a third exit state for a test: error, but this one only means that our test didn't run as expected and it broke at any point.

In our example self.assertIsInstance() checks, as its name points to, that About_Window attribute in MainWin actually is an instance of AboutWindow. If you study tested slot, MainWin.onAboutAction(), this only happens when called window is correctly opened, which is what we are testing.

Unittest offers a huge list of assert variants:

Nevertheless notice that only a small subset of them are included in older versions of Python.

If you want to test your code raises exceptions as expected you can use:

In this point, if you run "run_tests.py" the test will be successful. TDD says you have to develop tests that fail at first, but here we are no developing code from the scratch but modifying already working code, so is not wrong to get here a test successful to be sure our test is correct.

To start modifying our code to include "new style" slot connections we should comment all connections we want to change. To simplify our example we are going to modify only first connection:

def connections(MainWin):
    ## TODO: This signals are connected using old way. I must change it to new way
    #MainWin.connect(MainWin.ui.action_About,  SIGNAL("triggered()"),  MainWin.onAboutAction)
    MainWin.connect(MainWin.ui.actionLanguajes,  SIGNAL("triggered()"),  MainWin.onLanguagesAction)
    MainWin.connect(MainWin.ui.actionOpen,  SIGNAL("triggered()"),  MainWin.onOpenAction)
    MainWin.connect(MainWin.ui.actionPaths_to_compilers,  SIGNAL("triggered()"),  MainWin.onPathsToCompilersAction)
    MainWin.connect(MainWin.ui.actionPyQTmake_Help,  SIGNAL("triggered()"),  MainWin.onHelpAction)
    MainWin.connect(MainWin.ui.actionQuit,  SIGNAL("triggered()"),  MainWin.close)
    MainWin.connect(MainWin.ui.actionSave,  SIGNAL("triggered()"),  MainWin.onSaveAction)
    MainWin.connect(MainWin.ui.actionSave_as,  SIGNAL("triggered()"),  MainWin.onSaveAsAction)
    return MainWin

Here is where "run_tests.py" fails, so we are in the correct point for TDD. From here we have to develop code to make our test success again.

def connections(MainWin):
    ## TODO: This signals are connected using old way. I must change it to new way
    #MainWin.connect(MainWin.ui.action_About,  SIGNAL("triggered()"),  MainWin.onAboutAction)
    MainWin.ui.action_About.triggered.connect(MainWin.onAboutAction)
    MainWin.connect(MainWin.ui.actionLanguajes,  SIGNAL("triggered()"),  MainWin.onLanguagesAction)
    MainWin.connect(MainWin.ui.actionOpen,  SIGNAL("triggered()"),  MainWin.onOpenAction)
    MainWin.connect(MainWin.ui.actionPaths_to_compilers,  SIGNAL("triggered()"),  MainWin.onPathsToCompilersAction)
    MainWin.connect(MainWin.ui.actionPyQTmake_Help,  SIGNAL("triggered()"),  MainWin.onHelpAction)
    MainWin.connect(MainWin.ui.actionQuit,  SIGNAL("triggered()"),  MainWin.close)
    MainWin.connect(MainWin.ui.actionSave,  SIGNAL("triggered()"),  MainWin.onSaveAction)
    MainWin.connect(MainWin.ui.actionSave_as,  SIGNAL("triggered()"),  MainWin.onSaveAsAction)
    return MainWin

With this modification our test will success again, which is signal that our code works. You can verify it manually if you want.

Once you test is finished you usually would want your test windows closed, so your test case tearDown() should be:

def tearDown(self):
    #EXIT
    if hasattr(self.MainWin, "About_Window"):
        self.MainWin.About_Window.close()
    self.MainWin.close()
    self.app.exit()

To test more aspects of your code you only have to add more "test" methods into your unittest.TestCase subclasses.

With all of this you are prepared to equip yourself with a pretty bunch of tests to guide you through your development.

25 January 2014

Keep your code safe (Mercurial tutorial)

In one of my previous articles I wrote about some options available to keep track of your code as it evolves. We assesed main options for freelance developers: Git and Mercurial, and main cloud providers for these two: GitHub and Bitbucket. My conclusion then was that, as I'm currently a Python developer, my logical choice was Mercurial and Bitbucket, very typical in Python community. In this article we're going to learn main commands to use Mercurial and keep a cloud repository in Bitbucket.

Mercurial has installer for Windows, Linux and MacOS. Besides you can choose to use a graphical user interface to manage it (like TortoiseHg) or just console commands. In this tutorial we're going to focus in Linux version (actually Ubuntu version) with console commands. Using just console commands has the main advantage that is easier to explain and concepts are clearer.

To install Mercurial in Ubuntu you just have to type:



$ sudo aptitude install mercurial

Once you have it installed you can run it as normal user, but before that you should do a minimal configuration: in the root of your home directory create a file called ".hgrc" (pay attention to initial dot). This file set global variables used by Mercurial. The bare minimun needed by Mercurial is the username and email you want to be used to mark each update in your repository. In my case, that file has this content:



$ cat .hgrc

[ui]

username = dante <dante.signal31@gmail.com>

[extensions]

graphlog= 

$

Change that content to include your own username and email and you have it, that's all configuration you need for Mercurial. Graphlog stuff will let us get useful information when we get explanations about branches, here in this article.

Now go to folder where you have source code you'd like to track and tell Mercurial you want create a repository there. Suppose that you source folder it's just called "source" and that its contents is:



source$ ls

source$

To create a Mercurial repository here, do:



source$ ls

source$ hg init

source$ ls

source$

Wait, nothing changed? is this normal?. Actually yes because Mercurial hides its working directory to protect it from accidental deletions:



source$ ls -la

total 12

drwxrwxr-x 3 dante dante 4096 ene 17 22:11 .

drwxrwxr-x 6 dante dante 4096 ene 17 22:09 ..

drwxrwxr-x 3 dante dante 4096 ene 17 22:11 .hg

source$

Now you can see it, Mercurial uses ".hg" dir. Inside it, Mercurial will store our versions of code's files. While ".hg" folder stays safe our code will be too.

With "hg status" we can see what happens with our repository. If typed in a freshly inited repository with an still empty folder, "hg status" would have nothig to say:



source$ hg status

source$

Instead, if we create two new files:



source$ touch code_1.txt

source$ touch code_2.txt

source$ ls

code_1.txt  code_2.txt

source$ hg status

? code_1.txt

? code_2.txt

source$

Those two question marks in "hg status" output tell us Mercurial has detected two files in folder that are not still tracked in repository. To add them we must do:



source$ hg add code_1.txt

source$ hg add code_2.txt

source$ hg status

A code_1.txt

A code_2.txt

source$

Now question marks changed to "A" which means those files are recently added to repository. This time we have added files one by one, but we could have added them in one round just doing "hg add .". We could have used wildcards too. Besides we can create exclusion lists creating a ".hgignore" file inside of source folder. This way you can fine graine choose which files include in Mercurial tracking and which not. For instance you usually will tend to keep in repository source code but not compiled files (from that source) or test databases that can be regenerated easily. You'd better store in your Mercurial repository only really needed files to keep your repository size as small as possible. Keep in mind that, if you want to backup a repository to Bitbucket (or any other source code hoster), you will have a maximum size limit for your cloud repository if you want to stay as free user.

Changes in our repository won't we really valid until you commit them with "hg commit":



source$ hg commit -m "Two initial files just created empty."

source$ hg status

source$

The "-m" flag in "hg commit" let us comment this version so we can know in just a glance main changes happened there. Once a change is commited it disappears from "hg status", that's why in our last example it's empty again. If we modify one of the files:



source$ hg status

source$ echo "Hello" >> code_1.txt

source$ hg status

M code_1.txt

source$

That "M" in "hg status" output means Mercurial has detected that a tracked file has changed compared with the version it has in the repository. To include that modification in repository we must do a commit:



source$ hg commit -m "Code_2 modified."

source$ hg status

source$

Hey! wait! we have made an error! committed text is incorrect because modified file was Code_1 not Code_2. Mercurial let us fix last commit with "--amend" flag:



source$ hg log

changeset:   1:4161fbd0c054

tag:         tip

user:        dante <dante.signal31@gmail.com>

date:        Fri Jan 17 23:09:00 2014 +0100

summary:     Code_2 modified.



changeset:   0:bf50392b0bf2

user:        dante <dante.signal31@gmail.com>

date:        Fri Jan 17 22:43:34 2014 +0100

summary:     Two initial files just created empty.



source$ hg commit --amend -m "Code_1 modified."

saved backup bundle to /home/dante/Desarrollos/source/.hg/strip-backup/4161fbd0c054-amend-backup.hg

source$ hg log

changeset:   1:17759dec5135

tag:         tip

user:        dante <dante.signal31@gmail.com>

date:        Fri Jan 17 23:09:00 2014 +0100

summary:     Code_1 modified.



changeset:   0:bf50392b0bf2

user:        dante <dante.signal31@gmail.com>

date:        Fri Jan 17 22:43:34 2014 +0100

summary:     Two initial files just created empty.



source$

"hg log" shows commit history. We can see through that history that last update message was fixed thanks to "--amend" flag. Unfortunately with "-amend" you can only fix last commit. Changing an older commit is considered dangerous and has no an easy way of doing it (actually you can, but is a very advanced and delicate task).

What happen if you realize you don't need any longer one of the files in your proyect that is being tracked by Mercurial? Well, you could just remove it from source folder...



source$ ls

code_1.txt  code_2.txt

source$ rm code_2.txt

source$ ls

code_1.txt

source$ hg status

! code_2.txt

source$

... but you can see that Mercurial alerts you, throught "!" mark, that it cannot find a tracked file. To tell Mercurial to end tracking of one particular file:



source$ hg status

! code_2.txt

source$ hg remove code_2.txt

source$ hg status
R code_2.txt

source$ hg commit -m "Code_2 removed."

source$ hg status
source$

With "hg remove" a file can be marked to be removed from repository, so "hg log" shows it with a "R" that means that marked file will be removed from repository in next commit.

OK, I've removed a file from repository but know I realize that I actually I need it, can I recover code_2 file?. Actually you have two ways. First one is rollback your repository to last state where file could be found, copy the file to a temp directory, go to last state and add saved file:



source$ hg log

changeset:   2:88ac7cad647e

tag:         tip

user:        dante <dante.signal31@gmail.com>

date:        Sat Jan 18 00:39:50 2014 +0100

summary:     Code_2 removed.



changeset:   1:17759dec5135

user:        dante <dante.signal31@gmail.com>

date:        Fri Jan 17 23:09:00 2014 +0100

summary:     Code_1 modified.



changeset:   0:bf50392b0bf2

user:        dante <dante.signal31@gmail.com>

date:        Fri Jan 17 22:43:34 2014 +0100

summary:     Two initial files just created empty.



source$ hg update 1

1 files updated, 0 files merged, 0 files removed, 0 files unresolved

source$ ls

code_1.txt  code_2.txt

source$ cp code_2.txt /tmp/code_2.txt

source$ hg update 2

0 files updated, 0 files merged, 1 files removed, 0 files unresolved

source$ ls

code_1.txt

source$ cp /tmp/code_2.txt code_2.txt

source$ hg status

? code_2.txt

source$ hg add code_2.txt

source$ hg status

A code_2.txt

source$

Note that you can use "hg update" to time travel your source folder to the state it has in a particular revision. Just remember the revision number used by "hg update" is the first one of the revision id shown by "hg log". For example if you want to rollback to this state:

changeset:   1:17759dec5135
user:        dante <dante.signal31@gmail.com>
date:        Fri Jan 17 23:09:00 2014 +0100
summary:     Code_1 modified.

you should use "hg update 1" because of "changeset 1:...", do you see it?.

Problem with this aproach is that is messy and prone to errors. A more straigth approach should be to locate the state in which desired file was last modified and recover file from there with "hg revert":



source$ ls

code_1.txt

source$ hg status

source$ hg log -l 1 code_2.txt

changeset:   0:bf50392b0bf2

user:        dante <dante.signal31@gmail.com>

date:        Fri Jan 17 22:43:34 2014 +0100

summary:     Two initial files just created empty.



source$ hg revert -r 0 code_2.txt

source$ hg log

changeset:   2:88ac7cad647e

tag:         tip

user:        dante <dante.signal31@gmail.com>

date:        Sat Jan 18 00:39:50 2014 +0100

summary:     Code_2 removed.



changeset:   1:17759dec5135

user:        dante <dante.signal31@gmail.com>

date:        Fri Jan 17 23:09:00 2014 +0100

summary:     Code_1 modified.



changeset:   0:bf50392b0bf2

user:        dante <dante.signal31@gmail.com>

date:        Fri Jan 17 22:43:34 2014 +0100

summary:     Two initial files just created empty.

source$ hg status

A code_2.txt

source$

source$ hg commit -m "Code_2 recovered."

source$ hg log

changeset:   3:9214d0557080

tag:         tip

user:        dante <dante.signal31@gmail.com>

date:        Sat Jan 18 01:07:24 2014 +0100

summary:     Code_2 recovered.



changeset:   2:88ac7cad647e

user:       dante <dante.signal31@gmail.com>

date:        Sat Jan 18 00:39:50 2014 +0100

summary:     Code_2 removed.



changeset:   1:17759dec5135

user:        dante <dante.signal31@gmail.com>

date:        Fri Jan 17 23:09:00 2014 +0100

summary:     Code_1 modified.



changeset:   0:bf50392b0bf2

user:        dante <dante.signal31@gmail.com>

date:        Fri Jan 17 22:43:34 2014 +0100

summary:     Two initial files just created empty.



source$ ls

code_1.txt  code_2.txt

source$

Main point is that "hg log -l 1 code_2.txt" shows you last revision in which that file existed because it was modified. With that revision you can make Mercurial rescue desired file from there (" hg revert -r 0 code_2.txt"). And finally commit rescue.

Now lets raise bets. Sometimes you want to try developing new features but you don't want to mess your tested files. That's where branches gets into play. You create a branch to develop over a separate copy of main branch (called "default"). When you are sure brach is ready to get into production you can merge the branch with main branch mixing changes into stable files from main brach.

Suppose you want to develop two features, so lets create two branches, "feature1" and "feature2":



source$ hg branches

default                        0:03e7ab9fb0c6

source$ hg branch feature1

marked working directory as branch feature1

(branches are permanent and global, did you want a bookmark?)

source$ hg branches 

default                        0:03e7ab9fb0c6 

source$ hg status 

source$ hg commit -m "Feature1 branch created." 

source$ hg branches 

feature1                       1:6c061eff633f 

default                        0:03e7ab9fb0c6 (inactive) 

source$

"hg branches" shows branches in repository but they are not really created in repository until you commit (with "hg commit") them after "hg branch", that's the reason because first "hg branches" shows only default branch.



source$ touch code_feature1.txt

source$ ls

code_1.txt  code_2.txt  code_feature1.txt 

source$ hg status

? code_feature1.txt

source$ hg add code_feature1.txt

source$ hg commit -m "code_feature1.txt created"

To switch from a branch to another use "hg update":



source$ hg update default
0 files updated, 0 files merged, 1 files removed, 0 files unresolved

source$ ls

code_1.txt  code_2.txt

source$

When you change from a branch to another files are removed and created to recreate branch files layout.



source$ hg branch feature2

marked working directory as branch feature2

(branches are permanent and global, did you want a bookmark?)

source$ hg commit -m "Feature2 branch created"

source$ touch code_feature2.txt

source$ hg add code_feature2.txt

source$ hg commit -m "code_feature2.txt created"

source$ ls

code_1.txt  code_2.txt  code_feature2.txt

source$ hg branches

feature2                       7:42123cefb28c

feature1                       5:09f18d24ae0e

default                        3:9214d0557080 (inactive)

source$ hg update default

0 files updated, 0 files merged, 1 files removed, 0 files unresolved

source$ ls

code_1.txt  code_2.txt

source$

Of course, we can continue working on default branch:



source$ ls

code_1.txt  code_2.txt

source$ touch code_3.txt

source$ ls

code_1.txt  code_2.txt  code_3.txt

source$ hg add code_3.txt

source$ hg commit -m "code_3.txt created"

source$

When working simultaneusly with many branches is natural to feel somewhat lost. To know in which branch you are in any moment type "hg branch" with nothing else following it. To get a graphical representation of changes commited to branches you can use "hg log -G":



source$ hg log -G

@  changeset:   8:09e718575633

|  tag:         tip

|  parent:      3:9214d0557080

|  user:        dante <dante.signal31@gmail.com>

|  date:        Sat Jan 18 20:53:06 2014 +0100

|  summary:     code_3.txt created

|

| o  changeset:   7:42123cefb28c

| |  branch:      feature2

| |  user:        dante <dante.signal31@gmail.com>

| |  date:        Sat Jan 18 20:40:56 2014 +0100

| |  summary:     code_feature2.txt created

| |

| o  changeset:   6:52f1c855ba6b

|/   branch:      feature2

|    parent:      3:9214d0557080

|    user:        dante <dante.signal31@gmail.com>

|    date:        Sat Jan 18 20:39:05 2014 +0100

|    summary:     Feature2 branch created

|

| o  changeset:   5:09f18d24ae0e

| |  branch:      feature1

| |  user:        dante <dante.signal31@gmail.com>

| |  date:        Sat Jan 18 20:22:35 2014 +0100

| |  summary:     code_feature1.txt created

| |

| o  changeset:   4:2632a2e93070

|/   branch:      feature1

|    user:        dante <dante.signal31@gmail.com>

|    date:        Sat Jan 18 20:20:28 2014 +0100

|    summary:     Feature1 branch created

|

o  changeset:   3:9214d0557080

|  user:        dante <dante.signal31@gmail.com>

|  date:        Sat Jan 18 01:07:24 2014 +0100

|  summary:     Code_2 recovered.

|

o  changeset:   2:88ac7cad647e

|  user:        dante <dante.signal31@gmail.com>

|  date:        Sat Jan 18 00:39:50 2014 +0100

|  summary:     Code_2 removed.

|

o  changeset:   1:17759dec5135

|  user:        dante <dante.signal31@gmail.com>

|  date:        Fri Jan 17 23:09:00 2014 +0100

|  summary:     Code_1 modified.

|

o  changeset:   0:bf50392b0bf2

   user:        dante <dante.signal31@gmail.com>

   date:        Fri Jan 17 22:43:34 2014 +0100

   summary:     Two initial files just created empty.



source$

To use "-G" flag with "hg log" you have to include these lines in your ".hgrc" (as we did at the very beginning of this article):

[extensions]
graphlog=

When you have get a point in our of your branches in which you'd like to include it features in main branch you can use merge:



 source$ hg update feature1

1 files updated, 0 files merged, 1 files removed, 0 files unresolved

source$ ls

code_1.txt  code_2.txt  code_feature1.txt

source$ cat code_1.txt

Hello

source$ echo "World" >> code_1.txt

source$ cat code_1.txt

Hello

World

source$ hg status

M code_1.txt

source$ hg commit -m "code_1.txt modified with world"

source$ hg update default

2 files updated, 0 files merged, 1 files removed, 0 files unresolved

source$ ls

code_1.txt  code_2.txt  code_3.txt

source$ cat code_1.txt 

Hello

source$ hg merge feature1

2 files updated, 0 files merged, 0 files removed, 0 files unresolved

(branch merge, don't forget to commit)

source$ ls

code_1.txt  code_2.txt  code_3.txt  code_feature1.txt

source$ cat code_1.txt 

Hello

World

source$

Its important to note that before performing a merge you should switch to branch where you want changes inserted in. From there you call "hg merge" with branch name from where you want to import changes. Of course, merge is not included in repository until commit:



source$ hg status

M code_1.txt

M code_feature1.txt

source$ hg commit -m "Feature1 merged to default branch"

source$

See how log graph has changed to show the merge between branches:



source$ hg log -G

@    changeset:   10:677a88f54dd3

|\   tag:         tip

| |  parent:      8:1b93d501259a

| |  parent:      9:8b55fb7eec71

| |  user:        dante <dante.signal31@gmail.com>

| |  date:        Sun Jan 19 00:07:54 2014 +0100

| |  summary:     Feature1 merged to default branch

| |

| o  changeset:   9:8b55fb7eec71

| |  branch:      feature1

| |  parent:      5:197964afe12f

| |  user:        dante <dante.signal31@gmail.com>

| |  date:        Sat Jan 18 23:57:03 2014 +0100

| |  summary:     code_1.txt modified with world

| |

o |  changeset:   8:1b93d501259a

| |  parent:      3:132c0505c7b2

| |  user:        dante <dante.signal31@gmail.com>

| |  date:        Sat Jan 18 23:56:24 2014 +0100

| |  summary:     code_3.txt created

| |

| | o  changeset:   7:86391749b3c3

| | |  branch:      feature2

| | |  user:        dante <dante.signal31@gmail.com>

| | |  date:        Sat Jan 18 23:55:04 2014 +0100

| | |  summary:     code_feature2.txt created

| | |

+---o  changeset:   6:30decd2ffa21

| |    branch:      feature2

| |    parent:      3:132c0505c7b2

| |    user:        dante <dante.signal31@gmail.com>

| |    date:        Sat Jan 18 23:54:38 2014 +0100

| |    summary:     Feature2 branch created

| |

| o  changeset:   5:197964afe12f

| |  branch:      feature1

| |  user:        dante <dante.signal31@gmail.com>

| |  date:        Sat Jan 18 23:53:43 2014 +0100

| |  summary:     code_feature1.txt created

| |

| o  changeset:   4:4bbf5ca2e0b6

|/   branch:      feature1

|    user:        dante <dante.signal31@gmail.com>

|    date:        Sat Jan 18 23:52:26 2014 +0100

|    summary:     Feature1 branch created

|

o  changeset:   3:132c0505c7b2

|  user:        dante <dante.signal31@gmail.com>

|  date:        Sat Jan 18 23:52:02 2014 +0100

|  summary:     Code_2 recovered.

|

o  changeset:   2:05e0a410c49d

|  user:        dante <dante.signal31@gmail.com>

|  date:        Sat Jan 18 23:51:24 2014 +0100

|  summary:     Code_2 removed.

|

o  changeset:   1:552e1b95fffe

|  user:        dante <dante.signal31@gmail.com>

|  date:        Sat Jan 18 23:49:35 2014 +0100

|  summary:     Code_1 modified.

|

o  changeset:   0:a22ab902f1a7

   user:        dante <dante.signal31@gmail.com>

   date:        Sat Jan 18 23:48:55 2014 +0100

   summary:     Two initial files just created empty.



source$

When you have finished your work in a branch and you don't plan to do any further improvement in it you can close that branch to avoid it appear in "hg branches" list:



source$ hg branches

default                       10:677a88f54dd3

feature2                       7:86391749b3c3

feature1                       9:8b55fb7eec71 (inactive)

source$ hg update feature1

0 files updated, 0 files merged, 1 files removed, 0 files unresolved

source$ hg commit --close-branch -m "Feature1 included in default. No further work planned here"

source$ hg branches

default                       10:677a88f54dd3

feature2                       7:86391749b3c3

source$

Closed branches can be reopened just jumping in them with "hg update" and then commiting.

So far you have learnt the basics top work with Mercurial in your local source code folder. Usually it's hard to remove accidentally a hidden folder like ".hg", but you might loose your hard drive by a hardware malfunction (or you can mistype a "rm -rf" as I did while I wrote this article), in that case your repository would be lost. Besides when you are working with a team you will need a central repository where to merge advances of any member into the main (default) branch. Bitbucket is the answer for both needs. So we are going to see how can we keep a backup of our repository in Bitbucket's cloud.

Once registered in Bitbucket we can create a new repository:

You can configure your repository as public or private, set it to be used with Git or Mercurial or even include a Wiki in the repository webpage. If you are working with a team of five members or less Bitbucket will offer their services to you for free.

When repository is created you can upload your local copy with "hg push" command:



source$ hg push https://dante@bitbucket.org/dante/sourcecode
pushing to https://dante@bitbucket.org/dante/sourcecode

http authorization required

realm: Bitbucket.org HTTP

user: dante

password: 

searching for changes

remote: adding changesets

remote: adding manifests

remote: adding file changes

remote: added 12 changesets with 7 changes to 5 files (+1 heads)

source$

With repository uploaded at Bitbucket, all team members can get a local copy of project with "hg clone":



source2$ ls

source2$ hg clone https://dante@bitbucket.org/dante/sourcecode .
http authorization required

realm: Bitbucket.org HTTP

user: borjalopezm

password: 

requesting all changes

adding changesets

adding manifests

adding file changes

added 12 changesets with 7 changes to 5 files (+1 heads)

updating to branch default

4 files updated, 0 files merged, 0 files removed, 0 files unresolved

source2$ ls

code_1.txt  code_2.txt  code_3.txt  code_feature1.txt

source2$ hg update

0 files updated, 0 files merged, 0 files removed, 0 files unresolved

source2$

Pay attention to "." after hg clone's url if you don't use it downloaded files will be placed in a folder called "sourcecode" into "source2". After a clone, its a good practice to do "hg update" to be sure you are working in most updated version of project.

After that, a member can work with his local repository. To upload advances to Bitbucket you should use "hg push" again as we did in the initial upload:



source2$ ls

code_1.txt  code_2.txt  code_3.txt  code_feature1.txt

source2$ touch code_4.txt

source2$ hg add code_4.txt

source2$ hg commit -m "Code_4.txt added"

source2$ hg push https://dante@bitbucket.org/dante/sourcecode

pushing to https://dante@bitbucket.org/dante/sourcecode

http authorization required

realm: Bitbucket.org HTTP

user:dante

password: 

searching for changes

remote: adding changesets

remote: adding manifests

remote: adding file changes

remote: added 1 changesets with 1 changes to 1 files

source2$

After their initial clone, other members can get new updates (like code_4.txt) with "hg pull":



source$ hg pull https://dante@bitbucket.org/dante/sourcecode

http authorization required

realm: Bitbucket.org HTTP

user: dante

password: 

pulling from https://dante@bitbucket.org/dante/sourcecode

searching for changes

adding changesets

adding manifests

adding file changes

added 1 changesets with 1 changes to 1 files

(run 'hg update' to get a working copy)

source$ hg update default

2 files updated, 0 files merged, 0 files removed, 0 files unresolved

source$ ls

code_1.txt  code_2.txt  code_3.txt  code_4.txt  code_feature1.txt

source$

What happen if two member make modifications to the same file?. Suppose one member does:



source$ ls

code_1.txt  code_2.txt  code_3.txt  code_4.txt  code_feature1.txt

source$ cat code_1.txt 

Hello

World

source$ echo "Hello WWW" > code_1.txt

source$ hg commit -m "One line hello WWW"

source$ cat code_1.txt
Hello WWW

source$ hg push https://dante@bitbucket.org/dante/sourcecode

pushing to https://dante@bitbucket.org/dante/sourcecode

http authorization required

realm: Bitbucket.org HTTP

user: dante

password: 

searching for changes

remote: adding changesets

remote: adding manifests

remote: adding file changes

remote: added 1 changesets with 1 changes to 1 files

source$

And just a bit late, another member does in his own repository:



source2$ ls

code_1.txt  code_2.txt  code_3.txt  code_4.txt  code_feature1.txt

source2$ cat code_1.txt

Hello

World

source2$ echo "Wide Web" >> code_1.txt

source2$ cat code_1.txt

Hello

World

Wide Web

source2$ hg commit -m "Code_1 added Wide Web"
source2$ hg push https://dante@bitbucket.org/dante/sourcecode

pushing to https://dante@bitbucket.org/dante/sourcecode

http authorization required

realm: Bitbucket.org HTTP

user: dante

password: 

searching for changes

abort: push creates new remote head e716387febe4!

(you should pull and merge or use push -f to force)

source2$

What happened is that Bitbucket has detected that second push included a conflicting version of code_1.txt file. When you have two versions of a file in the same branch and revision level terminology of version control systems call it as you have "two heads". By default, Bitbucket doesn't allow you two heads and recommend you to get last updates from Bitbucket with a "hg pull" and mix it with you local version with an "hg merge":



source2$ hg heads

changeset:   13:e716387febe4

tag:         tip

user:        dante <dante.signal31@gmail.com>

date:        Mon Jan 20 21:46:00 2014 +0100

summary:     Code_1 added Wide Web



changeset:   7:86391749b3c3

branch:      feature2

user:        dante <dante.signal31@gmail.com>

date:        Sat Jan 18 23:55:04 2014 +0100

summary:     code_feature2.txt created



source2$ hg branch

default

source2$

At this point you can see we have one head for each branch. This is the normal situation. But if we pull:



source2$ hg pull https://dante@bitbucket.org/dante/sourcecode

http authorization required

realm: Bitbucket.org HTTP

user: dante

password: 

pulling from https://dante@bitbucket.org/dante/sourcecode

searching for changes

adding changesets

adding manifests

adding file changes

added 1 changesets with 1 changes to 1 files (+1 heads)

(run 'hg heads' to see heads, 'hg merge' to merge)

source2$

Pay attention to last message that alerts you that last pull created multiple heads. Indeed if we run "hg heads":



source2$ hg heads

changeset:   14:c3a688edd25a

tag:         tip

parent:      12:53443797a7da

user:        dante <dante.signal31@gmail.com>

date:        Mon Jan 20 21:46:25 2014 +0100

summary:     One line hello WWW



changeset:   13:e716387febe4

user:        dante <dante.signal31@gmail.com>

date:        Mon Jan 20 21:46:00 2014 +0100

summary:     Code_1 added Wide Web



changeset:   7:86391749b3c3

branch:      feature2

user:        dante <dante.signal31@gmail.com>

date:        Sat Jan 18 23:55:04 2014 +0100

summary:     code_feature2.txt created



source2$

We can see we have two heads at default branch. So it's time to do a merge:



source2$ hg merge

merging code_1.txt

3 archivos que editar

0 files updated, 1 files merged, 0 files removed, 0 files unresolved

(branch merge, don't forget to commit)

source2$ cat code_1.txt 

Hello WWW

World

Wide Web

source2$ hg commit -m "Code_1 merged with repository"

source2$ hg heads

changeset:   15:fed327662238

tag:         tip

parent:      13:e716387febe4

parent:      14:c3a688edd25a

user:        dante <dante.signal31@gmail.com>

date:        Mon Jan 20 22:06:39 2014 +0100

summary:     Code_1 merged with repository



changeset:   7:86391749b3c3

branch:      feature2

user:        dante <dante.signal31@gmail.com>

date:        Sat Jan 18 23:55:04 2014 +0100

summary:     code_feature2.txt created



source2$ hg https://dante@bitbucket.org/dante/sourcecode

pushing to https://dante@bitbucket.org/dante/sourcecode

http authorization required

realm: Bitbucket.org HTTP

user: dante

password: 

searching for changes

remote: adding changesets

remote: adding manifests

remote: adding file changes

remote: added 2 changesets with 2 changes to 1 files

source2$

In case of conflicts like this, "hg merge" opens a paneled editor (not shown here) so you can compare versions of the same file and modify your local copy to not to conflict with the one from Bitbuckets. I'd better use console mode with Mercurial than use one of those GUI apps out there (like TortoiseHG), but I have to admit that console editor that uses Mercurial is based on Vim and is rather awkward (I'm more fond with Nano editor).

Once merged and commited, you can see that total heads have reduced again to two (one per branch), so this time push to Bitbucket goes nicely.

With all these tools, a team of developers can work simultaneously without stepping in the toes of one and another. But Bitbucket offers you a way to contribute with a project even if you are not part of its developer team and have no write access to their repository. That way is called forking.

When you fork a Bitbuckets repository what happens in the background is that that repository is cloned in your Bitbucket account. Then you have the chance to write and test modifications against your own repository. Once your code is ready, you can ask for a "pull request" to the original owner. If he accepts, a merge between the two repositories will be performed and your changes would be incorporated to original repository.

OK, this is the end of the article. You now master the basics of version control with Mercurial and Bitbucket. I'm sorry for the extension of the article but I wanted to cover all the usual topic you may meet in an average indie project. Mercurial and Bitbucket have a lot of additional options and refinements but you usually would meet them in more complex projects.

And at last, I don't want to end this article without mentioning that most of this article's concepts are similar to those used in Git and GitHub. Try this official introductory tutorial to Git and you will realize how similar is to Mercurial.

13 January 2014

Mercurial vs Git

Not long ago I found myself developing an application in which I was adding new features and changes frecuently. At that time I did not use any version control tool so the only thing I did was to backup into a separate directory. Finally, the number of backups was so great that it was not operative. It was hard to know what was done in each version so the usefulness of that method was reduced to using the last backup disaster. I realized it was time to learn to use a version control tool. It was an idea that had been in my mind for some time but I had discarded by saying the effort was not worth for the size and complexity of my personal projects. However, in the end I made a step forward.

I started examining the existing options. I did not want to marry me with any choice but to decide which was more appropriate to learn to use a version control tool. In the future I'm open to use another tool if need arises.

Although I've seen Subversion in corporate environments, I chose to investigate other popular choices between independent developers. Launchpad, the infrastructure built by Canonical to host open source projects uses Bazaar, but I've read bad reviews about it was getting old and is too related to projects focused on Ubuntu. As my project does not necessarily focus on Ubuntu I decided to discard Bazaar for now. The following two options were Mercurial and Git.

Chosing between Mercurial and Git is far from being easy from. Internet is full of controversy about which one is better. The truth is that there are many arguments in favor of both of them. These two are very powerful tools you should know since depending on the situation one can be more suitable than the other. Actually their origin is very similar, some time ago the working group that developed the Linux kernel decided to write his own version control tool. They opened two ways of development, one led by Linus Torvalds who developed Git using C, Bash and Perl, the other way was led by Matt Mackall who Mercurial with C and Python. In the end we they Git in part because its development ended a few days before and partly, evil tongues say, because it was Linus work.

In a rather funny blog I found an analogy that was written in 2008 but seems to apply still: Git is like MacGyver while Mercurial is like James Bond.

Before someone falls in shock I will explain last lines. Git follows Unix approach of specializing executables in particular tasks, so that complex tasks are performed by combining the individual executables. In acordance installing Git involves the installation of over 100 specialized small executables. This increases the difficulty of learning Git but exponentially increases its flexibility allowing it to be configured to support the most complex workflows development that we might have. This approach of combining simple elements to get more powerful systems is what makes Git the MacGyver of the version control tools. As we said, a project that is making active use of Git in its development is the Linux kernel.

Mercurial is however much easier. It just installs an executable which is used in every situation with different arguments. This simplicity greatly benefits to learn it and, in fact, it is said that those who know Subversion have really easy to learn Mercurial because the main commands are very similar. It is easy to realize that Mercurial is pretty intuitive and clean. In the end, 80% of the time you use just a few commands in everyday jobs with Mercurial. Faced with the flexibility of Git, Mercurial offers simplicity. Mercurial is like James Bond because if you use it in the right situation it will be able to solve it smartly and yet it will let you plenty of time to drink a martini with vodka ;-) . However, this simplicity does not mean that Mercurial lacks power, large projects of the free community use it. For example, it is used by the very development team of Python. Many projects of the Mozilla Foundation use it too. Actually, for some reason the general trend is that Python developers prefer Mercurial, perhaps because it is closer to the Zen of Python when it says : "Simple is better than complex"

If you work on a project where the development model is complex because it involves many people and many work fronts maybe it would make sense to choose Git. However, if the organization of the development of our project is simple as Mercurial probably will allow you to move more quickly and effectively.

Another element to assess is the support that is given to each version control tool when uploading to the cloud our repositories to facilitate collaborative work. For Bazaar , the iconic place to upload projects is Launchpad. Problem is that, as we said, Launchpad is exclusively focused on Ubuntu projects.

For Git, the most famous place to upload our repository is GitHub, which has received a tremendous popularity in part thanks to its interesting social features to make very easy to share code with others. Their price plan charge per private repositories. Up to 5 private repositories we pay up to $7 a month. However, we can have all the public repositories we want and with unlimited collaborators (people with write access to the repository). Thats why projects like Django have chosen GitHub as their public GitHub repository.

For Mercurial, the reference site is BitBucket. Unlike GitHub they have support for both Mercurial and Git. Their functionality is similar to GitHub although the latter do have more followers. However, their pricing plan is different from GitHub because BitBucket charge the number of collaborators so that below 5 we can have all repositories we want for free, both public and private. That makes BitBucket especially interesting for developers who make many solo projects. Two example projects that used BitBucket were Sphinx and PyPi (see previous article) .

From what I 've seen out there, many developers admit to use both portals: they have their personal developments in BitBucket and when they want to make a public one and open it to the collaboration of the community they rely on GitHub .

In my case, my developments are small and private so I'm going to start using Mercurial and BitBucket. That way I will be able to familiarize myself with the typical version control procedures. In the future we'll see if it is worth learning Git (and GitHub).

=======================================

Note: As of 2021, Bitbucket no longer offer Mercurial support because it has evolved to Git only. If you look for free Mercurial repository hosting you can go to Perforce.

=======================================