Introduction to Source Code Management with Git
Brent: In working on various programming projects in ISYS 403 this semester, I have noticed how stinking useful source code management is when doing homework. This guide is largely for my cohorts in the program that have never used version control before and are interested in making their programming lives 100% better and more productive.
What is it?
Source Code Management (SCM), also called Revision Control, is all about maintaining revisions of any collection of related files. This includes websites, wikis, computer source code, and even books or other literature. SCM emerged from the serious need in the programming industry to keep track of all the incremental changes made in the development and evolution of a computer program over time, especially when that program is devloped by a team of programmers. The Open Source movement has particularly championed intelligent and efficient source code management, since the whole point of open source software is that anyone can get a copy and contribute. If anyone can mess with the code, how do you keep track of changes, audit revisions, and make sure everythign stays organized? From these myriad needs, Source Code Management was born.
What is Git?
There are various source code management systems available today, each with its own strengths and weaknesses. This article focuses on the virtues of Git.
Git was initially created by Linus Torvalds, the same fellow who created Linux. Linus once said in jest, “I name all my projects after myself.” Git is a distributed, decentralized version control system. We’ll explore what this means in more detail throughout this article. In a nutshell, Git is the tool you use to manage your source code, make revisions, and syncronize what you’re working on with what others may be working on.
For an exhaustive list of source code management tools, see this List of Revision Control Software on Wikipedia for more information.
How does Git work?
Git is built around a repository. A repository is a central place where code is kept. When the user creates a repository, the user can now make any changes they like to the files in their repository - adding, editing, modifying, rearranging, renaming, and even completely deleting files. These changes are tracked by Git. The user can then pick the files with changes that he wants to commit into the respository. A commit is a snapshot of changes saved into the repository along with a commit message that the user enters to describe the changes in the commit. When the user is ready to share their commits with others (like people on their team), the user can connect their local repository with a remote repository that is shared between team members. The user can then push their changes to this central repository, and other users can pull these changes into their own repositories, thus bringing team work into their own project. Some projects (like open source projects, including Git) are open to anyone - anyone can download the code, make changes, and share them with others. Other projects are private and protected so that only certain individuals can access and manipulate the repository. As members of the team push their commits and pull changes, the team and the project move forward and everyone is happy.
Google is, of course, your best friend in learning nearly anything, especially technology-related subjects like Git. Other great resources for Git include (ordered by usefulness for beginners):
- learn.github.com - An excellent guide to Git for newcomers
- try.github.com - Learn how to use Git in your browser
- git-scm.com - The official source of Git-related information
- gitready.com - A great, albeit slightly less user friendly approach
- git-scm.com/book - Literally the owner’s manual. This is a deep-dive approach that will make you an expert.
A few tidbits before we dive into Advanced Git. Virtually any time you are working on a professional programming project, you will use version control. Alternatives for collaborating with others are frightfully complicated and useless compared to a good version control system. Any time I have to collaborate with at least one other person, I always use one (usually git). Even when I’m going solo on projects like homework assignments, my website, personal projects, or anything else, I find that it’s so easy to use Git and so easy to set up, and that the benefits are so great that there is no reason not to use it while I work.
Finally, if you are interested in developing showcasing your development skills, few things are as impressive as a well-groomed, documented, open and published portfolio of repositories on GitHub or Bitbucket (more on those below). When you publish code, would-be employers and others can see how you solve problems, the tools you know, and what you can actually do. If they’re particularly inquisitive, they can even explore comments and feedback you may garner on your projects, or they may notice that you have given back to various open source projects. Even if all you did was correct a spelling mistake in the Readme, there’s a significant amount of “street cred” that comes from actively giving back to the community.
Advanced Git (A glossary)
Git is celebrated for implementing a relatively common SCM feature with great elegance and ease - branching. A branch is when you literally take your git repository and make a local copy of it and call it something unique. A new repository has just one branch called master. Users can create any number of branches in their projects to help organize their project. When a user creates a new branch and makes any number of changes, other branches are untouched. A great use-case for this feature would be making significant changes to the product that may destabilize the main branch while still in development. A developer can branch the master branch into a new feature branch just to focus on this particular effort. If the developer’s changes are too disruptive, the new branch can be deleted or reverted or managed in any way completely independent from the master branch and other branches. Also, because the developer’s code lives locally and is not shared until pushed, any changes by the developer don’t touch any other team member or the central repository until those changes are pushed.
After branching and making changes to the new branch, the developer can then merge the new branch into any other branch(es), including the master branch, effectively bringing work otherwise isolated in the branch.
Remote Branches are also available. These are the same as normal branches, however a remote branch exists on a remote repository that is used by a team to coordinate and sycnrhonize development. Sometimes open source projects will use branches to separate major versions of a project that are in development at the same time, or to isolate an alpha/beta feature that needs community support but isn’t ready for inclusion in production code.
As mentioned in branching, when changes have been made to a new branch and that branch is ready to be tied in to another branch like master, the user performs a merge. Merging involves taking all of the commits that are different between two branches of code and reconciling them. Git will perform most merges automatically, however if the same file has been modified in two separate branches in different ways, the file must be merged by hand. A user must review the conflicted files and determine how to best combine the sets of changes into one uniform result. Apple bundles a tool called FileMerge that helps users merge files quickly and easily. Any text editor can be used, however, in performing a merge: git simply inserts information into the file at the beginning of conflicted segments designating a portion of the file as conflicted, and git even includes both versions of the conflict in the file so they can be compared immediately. The merging user must then edit the file by hand to resolve the conflict. Typically on teams, merging is a responsibility held by a team or project lead, or some other individual with enough experience to wisely merge work from two potentially different individuals.
A tag in git can be used to bring special attention to a particular commit. A common use of tags is to designate a particular commit as a version milestone like v1.0.0 or v1.0.1. Tags should be unique (no calling two commits v1.0). Some hosted git repository services like GitHub or Bitbucket will find tags in your repository and list them separately so that users can find a particular snapshot of a repository in time quickly and easily.
In a nutshell, a hook is a piece of code that a user hooks into a git repository (often a central repository) so that when certain events occur, git will run the custom code. One example would be to write a script that is triggered when code is pushed to a central repository, which script emails team members with information about the commit including messages, authors, and changes made. Some projects that use continuous integration may use a hook that triggers a build, test, and deploy process any time changes are made to the codebase. Another interesting use of hooks is documented here. While most projects will never need to use hooks, they are worth mentioning and are worthy of a Google search for more information and instructions in how to write a hook and how to attach a hook to a project.
Forking occurs when an existing project is copied completely by a user, giving them complete ownership of the project files, who can see them or access them, and how the project is managed. This distinct and independent copy of the code is called a fork. Forking is common with Open Source projects, and is sometimes even employed by companies to help isolate developer work from the “canonical” core code of the company. For example, if a Linux user experiences a bug or a void where there should be a feature, this user could fork the Linux project and implement the improvement on their fork of Linux. When the user feels their change is adequately tested and implemented, they can push their changes in their fork back to the core Linux project, whereupon those maintaining the core Linux project can choose to accept or reject the changes. Forking differs from simply checking out or cloning a repository in that a fork is set apart as belonging to the user that forked it - the repository isn’t just linked back to the repository, and changes made to a fork cannot be pushed into the repository originally forked without approval of that repository’s owner.
A pull request is the request made by a developer of a forked repository to the original developer from whence the fork originated. In the Linux example given above in the Forking section, the request from the user with the improvements to Linux is a pull request, and the core Linux maintenance team can accept or reject the pull request. Pull requests are often used in commentary on issues, bug reports, and other discussions about a project that are typically managed by a web-based code repository hosting service (see below)
Repository hosting services
Earlier in this article I mentioned GitHub and Bitbucket as options for hosting repositories online. Web-based repository hosting companies provide tools for storing and publishing code, documentation, bugs, wikis, commits and tags, branches, and everything else that Git (or other SCM tools) can do. Such services are popular for hosting team projects, private repositories, and coordinating group efforts in software development. One unique feature of many hosting services, for example, is issue tracking. When a bug report is filed, a unique ID number is generated for that report. Users can discuss issues with developers and other team members much like in a forum, and code commits that include the bug report number in the commit message are automatically attached to the discussion, thus enabling users and developers to talk about issues and features using their own comments and their code. Many services even allow commenting on individaul lines of code, enabling managers and project coordinators to review code and highlight issues or points of interest to the developer. In addition to linking normal commits to an issue by including the issue’s ID number in the commit message, many hosting services treat pull requests as discussions or “issues” of their own, where developers and other users can comment and discuss a particular improvement to a project that has been provided from a fork of the project by another user.
By using these various communication, issue tracking, and discussion mechanisms, repository hosting services foster an incredible environment for collaborating on projects that may be distributed around the world. These projects range from typical code projects like linux or even git itself to books and even legal documents. One Git user (see the english translation of the ReadMe towards the bottom of the page) has created an automated script that downloads changes to the legal code of Germany and publishes the code and changes on GitHub. Some political activists have forked his repository and made their own changes, initiating discussions and pull requests to help create drafts of improvements for German legislators to review and potentially implement themselves. This notion of literally putting legal code and the ability to make suggestions and ammendments, improvements, and proposals for law is incredible, and is made possible by tools like Git and GitHub.
Both GitHub and Bitbucket perform most of the same things, however a few differences in community and pricing model make them attractive to different audiences.
GitHub is one of the more popular hosting services online. GitHub allows users to publish as many public repositories as they like and charges for private repositories starting at $7/month for 5 private repositories with unlimited collaborators. They also offer plans specifically for businesses. Many open source projects including Ruby on Rails, Django, open source projects from Twitter and Facebook, and countless other projets are hosted on GitHub. GitHub includes advanced feature tracking, a simple interface for creating hooks, wikis, and great collaboration tools for working together in teams.
Bitbucket is less popular than GitHub although it supports essentially the same features for collaboration, issue tracking, wikis, forks, and more. The primary difference between Bitbucket and GitHub is in their pricing structures. Bitbucket is free for unlimited repositories - public or private - and only charges for collaboration. Private repositories can have up to 5 users working together. The initial paid tier costs $10/month and allows for 10 users working together.
Git by itself runs on the command line (Like a DOS prompt or a terminal shell). While I strongly encourage fluency on the command line (it opens doors!), using a terminal is often preferable for many tasks like browsing repositories, commits, and otherwise navigating big projects. Below are a list of my personal favorites:
I use GitBox primarily. Free for one repository, currently $20 on the Mac App store for unlimited repositories. Students get 50% off if you email them first. See the website for details. GitBox is my preferred client because it just works, it’s simple, and it makes sense. It also exposes some of the more complex features of git in very simple and easy to use ways.
Similar to GitBox but it feels more like a swiss army knife, SourceTree is free. What it lacks in elegance it makes up for in power - SourceTree features just about everything GitBox can do. It also supports Mercurial, a cousin of Git in the SCM world.
GitHub (Mac, Windows)
GitHub has two free clients for Mac and Windows. Both are great, albeit not as polished as GitBox and not as powerful as SourceTree. That said, they get the job done and integrate extremely well with GitHub. And, they’re free!
There are tons of other Git clients on the market, too. Here’s a Wikipedia article listing them. Most IDEs like NetBeans, Eclipse, xCode, and Visual Studio integrate tightly with Version Control as well, and typically have their own clients built-in or offered as plugins.
There you have it! Feel free to contact me with any questions or feedback regarding this article. I hope you’ve found it useful!