Every project with more than one contributor needs a Version Control System. In my opinion, it's a good idea to use a VCS for all non-trivial projects, even if you are the only developer.

According to Wikipedia:

[...] version control, also known as revision control or source control, is the management of changes to documents, computer programs, large web sites, and other collections of information

In my life as a developer, I've used only three VCS:

  • Concurrent Version System or CVS for short
  • Borland StarTeam
  • Git (with various online services: GitHub, GitLab, Bitbucket)

From the title of this post, you should have understood that I believe the latter is the best. Git has a lot of features that help to manage big projects ran by hundreds of developers, since it was created by Linus Torvalds with the purpose to maintain the source files of the Linux kernel. But don't be misled: it's not overkill for small projects.

How It Works

Git is a distributed VCS, this means that, when you clone a repository, you have the whole history and all the branches in your PC. The repository is compressed before being transferred, however the operation may take several minutes on huge repositories (like the following).

$ git clone https://github.com/torvalds/linux.git

After the clone operation, you can work offline on your local copy of the repository. You have all the history of the project in your PC, so there is no need to connect to the remote repository to check a previous version of a file. Then, when you are satisfied with your modifications, you can push it to the remote repository.

$ git push origin

The above command is OK in many cases, but, if you are working on someone else's repository, you should use pull request instead. You can read here to understand how pull requests work in GitHub.

It's not said that the remote repository you want to push to is the main repository[1]. It can be the repo of another developer (let's say a colleague named Alice Smart) that is working with you on a specific feature.

$ git remote add alice_repo git://github.com/alice_smart/fancy_project
$ git push alice_repo

In this way two or more programmers can accomplish their work without impacting the whole dev team.

Commit & Revert

How many times a new features or a bug fix involves just one single file? Usually, a single change si split across several files. Git does not track files but changes, even in multiple files. When you commit them, the modifications remain grouped together, so it's easier to identify all the changes and, if needed, revert the modification with just one command.

$ git commit --all

Uh-oh, I think I did something wrong... let's undo the last commit:

$ git revert HEAD

It's important to note that the reverted commit is not deleted. Simply a new commit with the changes applied in the reverse order is created. The cool thing is that the commit to be reverted can be any, not just the last one.

The above commit command takes into consideration all the modified fiels. However, as you can imagine, it's possible to select exactly with files shall be included, usign the command add. This is called two stage commit and it's explained here.

One of the features of Git I greately appreciate is the possibility to include in a commit just some rows of a modified file, thanks to the flag -p. This feature is well explained in this post: Git’s Patch Mode All the Way.

The Time Machine (Repository Version)

You can view the history of all the commits and easily go in a well-known repository state even without having previously set a tag. The next command rewinds the history of your repository by 3 commits (again, without deleting anything).

$ git checkout HEAD~3

Of course you can also jump to any tag...

$ git checkout version_1.2

...or any branch...

$ git checkout beta_branch

...without losing any commit, even those you haven't pushed yet.

Branching

In a Git book I've read, it's written that books on other VCS explain branching concepts in the latest chapters. But in every Git book, you'll find branches treated within the first three or four chapters. What is the reason behind this difference?

Simply because branching is a very powerful feature and in Git the creation a new branch is a very cheap operation. With "cheap" I mean that there is no data duplication or hidden overhead. Simply two commits will have the same ancestor.

$ git branch new_feature
$ git checkout new_feature

The two rows above, create a new branch named new_feature and select it as the current one. The operation is immediate and no data duplication is made. The same operations can be made with a single command:

$ git checkout -b new_feature

When you work with branches (in whatever VCS) a typical need is to do the same modification (for example an urgent bug fix) on different branches. Git provides the cherry-pick command to do exactly this: to replay a commit (i.e. the changes made to one or more files) in another branch.

$ git cherry-pick 34f7ae0

Conclusions

This posts is only an overview of Git but I hope it's enough to make you understand how powerful this tool is. If you want to know more, take a look at the official Git website.

I've written other posts about Git: you can find them all here.



Image credits: Logo for Git by Jason Long licensed under CC BY 3.0

Post last updated on 2020/08/29


  1. The concept of "main repository" does not exist in Git. It's faculty of the mantainer to define a repository as "main" for the other developers. ↩︎