During the last year, my team and I migrated over twenty software projects and the teams, that develop them, from Mercurial to Git.
While this seamed like an easy job to me at first, I quickly realized the hidden challenges in this undertaking. In this post I will summarize our reasons, our approach and my experiences gathered in the process of migrating the code base while training the teams on the new git workflow in parallel.
This first part will concentrate on the technical migration while the second part will focus on the trainings.
Reasons for the Migration
At first glance, Git and Mercurial are very similar. Both are distributed version control systems. Both have branches and tags und committing and pushing are concepts known in both systems.
However, Git has many advantages over mercurial on a closer look.
First, it is easier to find developers that are used to Git. Mercurial is somehow dead. Almost every developer we hire and almost every external development partner has already worked with Git but not necessarily with Mercurial.
Furthermore, our internal Bitbucket Server only supports Git at the moment. While there might be support for Mercurial in the future (the cloud already supports it), this was one of the main drivers by end of 2016, since Bitbucket enables us to use a better development workflow, including pull requests and reviews.
Another reason is the speed. Git allows shallow cloning and spares checkout by default, which can increase the clone speed of large repositories which is incredibly useful for CI pipelines.
Last but not least, Git is very flexible. Tags and branches are much lighter concepts in Git than in Mercurial. This may not seem like an advantage at first but in practice it increases our development speed.
Migrating the Code Bases
The most obvious task for the migration work group was the migration of the code itself. Existing Mercurial repositories must be converted to Git repositories without loss of the data or even the meta data.
Our goal was to preserve the entire commit history with all branches, tags and so on. Luckily, we were not the first company that decided to switch to Git. Hence, there are excellent tools out there, that reduced our effort a lot.
This is a script that makes use of
git fast-import to generate Git commits from Mercurial commits.
Hence, the whole process of migrating all the repository meta data such aus commit messages, authors, dates, tags, … as well as the commit data itself, was already solve.
Since most of our software is safety critical, it was not acceptable for us to solely rely on a script in a GitHub repo. Therefore, we forked the repo and did a full code review internally. This gave us enough confidence and allowed us to use this script for the migration.
Furthermore we developed a verification script. This script takes the input and output of the migration and proofs equality.
While migrating many many Mercurial repositories, we faced several impediments that had to be solved first.
First of all, some Mercurial repositories contained multiple HEADs for some branches. Something that is highly discouraged as it might be confusing and lead to loss of commits. Furthermore, multiple HEADs are just not possible with Git. So this issue had to be solved first by identifying the obsolete HEAD and removing it in Mercurial (or merging it without using any of its changes). The later option will preserve the (dangling) commits without altering the current HEAD (tip in Hg).
In some projects, Mercurial hooks were used to replace keywords in the header comments of files during checkout.
As far as I know, there is nothing similar in Git. We have delayed this issue as having this information in files is not as important anymore since alle the meta data is still contained in the Git repository. If it is needed in the files later on, we can still extract it with a script then.
Branches with similar names
In Mercurial, branches and tags can be oddly named. They may contain spaces and other symbols that are not allowed in Git branches and tags.
Luckily, the migration script automatically sanitizes the these names and replaces spaces with underscore. Surprisingly, this has not worked in all of our repositories as some repos contain branches that were named very similar: One containing spaces and one containing underscores. This leads to a conflict during migration and might cause data loss which was detected by our verification script.
This problem was easily solved by providing a mapping file for branches and tags to the migration script which allows to rename them during migration.
Verifying the Migration Result
As stated earlier, we wanted to be very certain that neither the version control meta information (authors, commit messages, timestamps) nor the files in the repository were changed. In other words: For every commit in Mercurial there must be a commit in Git that is identical with the commit hash as single exception.
For this purpose, we first did a formal review to hg-fast-export to understand what it does and how it works.
After that, we developed a verification script in-house which simply checks out every commit in the Mercurial repo and compares the meta data and all files in the workspace with the corresponding Git commit. The result of this script is a log file that shows all deviations, if any.
To be continued…
You can find the second part here. In this part, I will write about the Migration of the people: How we trained our folks and which challenges we faced.