Git submodules, subtree, and subtrac.

At summitto we get to enjoy dependency management in C++ and more often than not, vendoring things using git submodules is the easiest solution to do this. However, submodules are not without their problems and nicer tools exist. In this blog post we’re going to look at two tools that are supposed to make our lives easier when working with git submodules.

The shortcomings

Before we start looking at the potential solution, let’s first quickly examine the problem. What do submodules do and what are their shortcomings?

Git submodules allow you to add other code into your own repository, while allowing the external code to exist independently. You can use this to include external tools into your own repo (vendoring) or to ease reuse of your own code.

It is possible to use the existing methods of pulling and checking out to update these nested repositories. According to various sources and personal experience, they are not the nicest thing to work with:

  1. Submodules introduce additional steps in checking out the code. After checking out the repository, one still needs to (possibly recursively) initialize the submodules. After pulling some changes where the submodules were updated, you need to do it again. This is an easy thing to mess up which will break your build in mysterious ways, especially for newcomers to your software.

  2. Submodules cannot easily be reviewed in changesets. They will simply show up as a SHA1 hash of the new commit that will be checked out. In order to check what changes are actually in this commit, you will have to check out these commits yourself.

  3. Moving submodules requires manual intervention. While it’s not exactly the most common thing to do, occasionally you do want to move your submodule to a different part of the tree, and this is where it gets messy.

    When a commit moves a git submodule from one place to another, the existing submodule will stay, now as a collection of untracked files, and in the new location there’s an uninitialized entrypoint. The “fix” is then to git submodule update --init --recursive again, and to manually remove the files at the previous entrypoint.

  4. Working on submodules is awkward. You can edit files in your submodule and commit them, but your submodule is always in a “detached HEAD” state, so if you then want to push your changes, you first need to pull, checkout an actual branch, and then push. Then you need to commit and push the parent repo, otherwise the ref for the submodule will not be updated. If you somehow mess up either of these, you will break the build for everyone but yourself.

    One could argue that you’re not supposed to work from inside a submodule, but rather from a cloned version somewhere else, but the convenience of just making small changes directly so you can see your project compile is worth a lot.

  5. Depending on submodules creates external dependencies. Your submodules are in another repo, which can cease to exist. You wouldn’t expect a major project like GoogleTest to just suddenly disappear, but it can happen, and then you have your very own left-pad debacle.

Now that we know what problems there are to solve, let’s see how we can solve them.

Git subtree

Git subtree does away with actually having submodules, and instead includes the tree of the subproject into the main tree. It therefore doesn’t include metadata files such as .gitmodules in the project tree. Using the git-subtree family of commands you can then update the tree as needed, and potentially split off parts of the existing tree into a new separate repository.

Adding a subtree is similar to adding a submodule: git subtree add https://your.repo/url its/entrypoint. After that, the remote tree is now a subtree of your project!

You can commit, push, pull, and whatever changes in subprojects as normal. Using git-subtree is only needed when pulling changes in subprojects from upstream. Github has a nice cheat sheet for all the things that are nicer with git subtrees.

A nice benefit of having everything in-tree is that you can see (an optionally squashed version of) the repository history in your main project history, so you can have some insight into what changed in the submodules. By squashing the history, you can ignore large changelogs that you are not interested in, so they don’t clutter your history.

Unfortunately, there are some issues when the project you are including has submodules itself. Since (according to git) the subtree root is not the root of a git module, its .gitmodules is not considered. You can sort of work around this by adding the necessary submodules to your main repository, but this creates a weird dependency. The solution would be to convert all your repositories using submodules to subtrees, but this may not always be feasible.

To sum it up, git subtree does a decent job at solving all problems listed above, but it also creates an entirely new problem.

A more detailed description can be found in the man page for git-subtree.

Pros Cons
Actually part of git (albeit in git-contrib) so you don’t need to install anything to have access to git-subtree, it can already be part of your git installation. Dependency objects are included in the tree, so repos are larger on the remote host. Locally they will be the same.
No separate submodule commands needed for common use cases. Moving the entry point of a subtree from one path in your repository to another breaks the ability to easily push to the original repository.
Normal git workflow can be followed when making changes to dependencies Upstream changes in the subtree will clutter the history of your main project.
Repository can be checked-out and built without knowledge of git-subtree Nested submodules are painful.
Merges in subprojects are supported (but require some work)
No passive dependencies on other repositories, the data is all stored in-tree.

Git subtrac

Git subtrac is a more recent effort that also tries to simplify git submodule management, but unlike subtree it doesn’t change the way that you interact with them for most people. Instead, it imports the external data from your submodules into your own repository. For each branch, a $branch.trac branch is created which tracks the changes to your submodules.

The tracking branch and the main branch will not automatically stay in sync. Instead, you have to run git subtrac update whenever you make changes to the submodules, and then you must remember to push the .trac-branch.

Realistically, this only solves the second and fifth problems that we outlined, and also creates a new synchronisation step. This doesn’t make our lives much easier. A final potential downside is that this project is barely a month old, and therefore not that battle-tested yet.

A more detailed description can be found in the git-subtrac README.

Pros Cons
Normal submodule workflow can be mostly followed. Dependency objects are included in the tree, so repos are larger on the remote host. Locally they will be the same.
Transition is invisible while checking out the repository, as subtrac is only needed when modifying submodules. Very little benefit over simply using submodules.
Can easily check out isolated history of dependencies through normal git tools. External tool, so additional installations required.

In conclusion

We’ve seen that git submodules have some issues, and we have seen how both subtree and subtrac try to solve them. Subtree does a decent job at addressing all the problems I outlined, but dealing with nested submodules is too much of a pain for is to consider moving to it. Subtrac only really solves the external dependency problem while not addressing the others at all. It is an interesting idea nonetheless, so I’m curious to see where it’s going.

In the end, we decided that we’re just sticking with regular ol’ submodules for now. If we could start over from scratch, maybe we’d start with with git-subtree, and in the future, when we’re more annoyed with submodules than we are right now, we might reconsider it. Nevertheless, it is a wonderful tool that can greatly simplify your git workflow.