Git and the Modern Web Workflow
Version control and web development are like bread and butter. Nutella and pretzels. Apple and iThings. They just go great together. A friend of mine from school recently asked me about running different environments and web deevlopment workflows, and seeing as I couldn’t find a practical guide on the subject that I liked, I thought I’d do what any self-respecting writer would do - write my own.
This guide is not meant to be a comprehensive exploration of Git. It does assume at least some high level familiarity with version control systems like git. Because I use git almost exclusively, I’ll be talking using Git terminology - however the following workflow should be achievable with most modern VCS systems. For a more complete treatment of Git, check out the article I wrote last fall on the subject.
What’s this “modern web workflow”
Let’s walk through what I call the “modern web workflow” and explore what that looks like. To be clear, I’m not advocating any particular technology stack (LAMP? NodeJS & Express? Meteor? Ruby? Python? Tomcat? Choose your poison - this works with them all). I’m also not going to dive into CSS frameworks, things like Bower or NPM or any of that stuff.
With that out of the way, I go through the following steps when I fire up a project:
I create the basic framework for the project. This could be as simple as
express create or
rails new or your own blend of files and formatting. Whatever works for you and your client.
I initialize this as a git repository (
git init) and I commit the initial version of the project.
I go to (BitBucket)[http://www.bitbucket.org] and create a new repository for my project, filling in the information as I go. It’s free, so there’s no reason not to sign up.
I take the URL from Bitbucket and add it as a remote to my project (`git remote add origin [path to the project]). Bitbucket provides lots of instructions on how to do this, so I’ll defer to their instruction.
I push my initial commit up to the project. Wahoo!
At this point, the first part of my workflow is running. I can now get into that happy development loop of coding and testing, checking my work as I go and adding features and detail in small increments. Whenever I have my project in a stable position, I
commit my work to git and then
push my changes to Bitbucket so that I have a permanent snapshot of that moment in time. For more significant commits like actual version bumps, I tag the relevant commit as
v1.0 or whatever makes sense. Git branching and all other git features are at my disposal as well.
This is where we transition to the server environment(s) for my project. The following instructions can be applied to a staging, testing, or production server environment. For every environment you add, you can use the following as a template for creating a replica of your project for testing and deployment purposes.
ssh into the server that will be running the app
Follow the instructions at Bitbucket for deployment keys and set up your server’s account with a deployment key that’s linked to your Bitbucket repository
From your server’s command line, run
git clone firstname.lastname@example.org/yourUsername/yourProjectName.git and watch the magic happen
If you have submodules, be sure to
git submodule init and
git submodule update
Wherever you configure your webserver (often it’s cPanel or some equivalent), be sure to point your domain’s root directory to your project’s public web directory
Boom. Your website is now running out of a cloned git repository.
Hang on, why is that helpful?
With all that work out of the way, let’s actually talk benefits. With this set up, my workflow is now like this:
Work on code. Fix bugs, make features, build cool stuff.
git commit code when it’s stable or when I at least want a consistent snapshot.
Use more advanced git workflows to use branches or to collaborate with my team as appropriate (see below)
git tag [tag-name] when I have a version number to snapshot, as appropriate (e.g. v1.0, v0.5-rc2, etc.)
git push my changes to the central repository frequently
ssh into the environment I want to update - testing, staging, production, etc.
git pull inside of my repository
To use a specific version of the site (like the tag
git checkout [tag name goes here]
To use the latest, most recent version of the site,
git checkout master (or replace
master with the name of the branch you want to use)
That might sound like a fair bit of work, but here are the tangible benefits:
A great version history. If I deploy and something breaks, I can go back with one command in seconds.
No more dealing with usernames, passwords, etc. The SSH keys keep things secured while enabling me to pull in changes as they happen.
If my site happens to get hacked (horrors!) I can use
git diff or
git status to see exactly what’s changed since I pulled
Using a deploy key, I can only
checkout - if my site is compromised, my code doesn’t have to be contaminated in the repository. I can rebuild the site quickly and effectively, no problem.
It’s fast - all of the commands involved take seconds to run.
It’s super easy to collaborate with a team - that’s what version control is for in the first place!
It’s also very easy to start building more sophisticated workflows like continuous integration and testing.
All of the above encourages developers to build more robust and flexible development systems - because it’s very easy to do continuous integration and testing, you are more likely to write better tests. You can enforce code review policies more easily. You are less likely to include uploaded user content with your code and will instead use a storage system like S3 - keeping your assets and your user’s files safer and better managed.
There are a lot of other benefits, these come to my mind most readily, however continuous integration (aka “Your app tests and deploys itself whenever you push”) via Jenkins or other
post-commit actions are great features that come from a git-powered workflow.
A word of warning about configuration: Most apps have some configuration involved. Whether it’s a database connection, API keys to other services, or email server settings, projects have configuration files that make it easy to tweak application behavior. Most applications also have to run in multiple environments like a developer’s machine, a testing system, a staging area that is almost production but not quite, and of course production. Each environment may require different configurations. This is especially true with databases: a developer should not be using production’s database all the time, and likewise testing or staging should have their own databases so that as the product’s database changes, testing and preparation can happen without messing with production. In order to achieve a seamless workflow experience, developers should build and use frameworks that are aware of their environment. Sometimes this is achieved using an environment variable. For some web products, the hostname or IP address can be an appropriate clue that informs the application whether it is running in staging, a development machine, testing, production, or some other environment. These clues enable the application’s logic to load a different configuration file for each environment, meaning that when the app is in stage, it behaves like it’s in stage, and without any changes by the developer the same code will behave like production when it’s running in production. This makes workflows very consistent and smooth, with deployments happening through a quick “git pull” or “git checkout” command. Very handy.
The modern web workflow when working with other people
When working with a team, we all get our own
cloned version of the bitbucket repository and hack away as expected, but with one slight twist: We introduce a bit of team overhead with a task list, typically managed in bitbucket.
When planning out activity and things to do, I’ll sit down with my customer and team and we’ll scope out what the user actually wants. Each of these becomes a task in Bitbucket’s issues list. You can approach this using user stories, descriptions of features, or whatever floats your boat. We then assign these tasks out for a given sprint or week or whatever unit of work is best for the project (yes, we standardize on practice in reality - for this post, however, I’m keeping things general since each project requires slightly different approaches to planning and execution).
When I or one of my teammates sit down to work on an issue, I’ll branch my repository into a feature branch. This branch is mine and I can do whatever I want to with it. From here, I’ll add commits, make changes, and get this branch to a state where it solves the problem in the issue I’m working on.
From here, I’ll merge my feature branch down into a development branch. The development branch exists so that we have a consistently inconsistent place to work from. When the development branch gets close to being ready for release, I (or whoever is tasked with managing and deploying a release) will branch the development branch into a staging branch so that no new code is introduced, perform final testing and other work, generate documentation, and push the finished product into it’s own commit on the master branch. The development branch and master are never deleted. Staging branches can be deleted after merger into the master branch. Feature branches should be deleted by the developer that created it after it is merged into a development branch. Each release should be tagged with the version number associated with the release. As appropriate, developers are welcome to publish their local branches to remote branches on the central repository. These remote branches are cleaned up by the developer who created the branch as branches become obsolete.
For larger teams, this approach can be replaced with the Fork and Pull request workflow which is similar, however instead of pushing and pulling from one repository, each developer has their own “forked” copy of the remote repository that is uniquely theirs. Developers are welcome to handle their own personal workflow however they like, however once they have a consistent state in their remote repository, they can issue a pull request to the original master repository. The maintainer of the master repository reviews the request (basically the differences between the repositories created by the developer), can add comments and requests to the pulling developer, and ultimately accepts the request in order to merge the developer’s work into the master repository. This is much like the other workflow, however instead of developer’s controlling a merge into the development branch, a master repository manager must approve merges into the development branch of the master repository. This provides opportunities for code review, discussion, improvements from other developers (who can add their own code to another developer’s pull request), and other collaboration.
For more information on various git workflow approaches with teams, check out Bitbucket’s great guide on the subject. I personally use a simplified version of their “Gitflow Workflow”.
One last note about Github: I personally love Github and I think it’s fantastic. I also love a free market economy that provides an alternative to Github that is free for small private repositories. Everything in this article can be accomplished using Github. It can also be accomplished using self-hosted git repositories running on SSH.
Introduction to Source Code Management with Git
Brent: In working on various programming projects in ISYS 403 this semester, I have noticed how stinking useful source code management is when doing homework. This guide is largely for my cohorts in the program that have never used version control before and are interested in making their programming lives 100% better and more productive.
What is it?
Source Code Management (SCM), also called Revision Control, is all about maintaining revisions of any collection of related files. This includes websites, wikis, computer source code, and even books or other literature. SCM emerged from the serious need in the programming industry to keep track of all the incremental changes made in the development and evolution of a computer program over time, especially when that program is devloped by a team of programmers. The Open Source movement has particularly championed intelligent and efficient source code management, since the whole point of open source software is that anyone can get a copy and contribute. If anyone can mess with the code, how do you keep track of changes, audit revisions, and make sure everythign stays organized? From these myriad needs, Source Code Management was born.
What is Git?
There are various source code management systems available today, each with its own strengths and weaknesses. This article focuses on the virtues of Git.
Git was initially created by Linus Torvalds, the same fellow who created Linux. Linus once said in jest, “I name all my projects after myself.” Git is a distributed, decentralized version control system. We’ll explore what this means in more detail throughout this article. In a nutshell, Git is the tool you use to manage your source code, make revisions, and syncronize what you’re working on with what others may be working on.
For an exhaustive list of source code management tools, see this List of Revision Control Software on Wikipedia for more information.
How does Git work?
Git is built around a repository. A repository is a central place where code is kept. When the user creates a repository, the user can now make any changes they like to the files in their repository - adding, editing, modifying, rearranging, renaming, and even completely deleting files. These changes are tracked by Git. The user can then pick the files with changes that he wants to commit into the respository. A commit is a snapshot of changes saved into the repository along with a commit message that the user enters to describe the changes in the commit. When the user is ready to share their commits with others (like people on their team), the user can connect their local repository with a remote repository that is shared between team members. The user can then push their changes to this central repository, and other users can pull these changes into their own repositories, thus bringing team work into their own project. Some projects (like open source projects, including Git) are open to anyone - anyone can download the code, make changes, and share them with others. Other projects are private and protected so that only certain individuals can access and manipulate the repository. As members of the team push their commits and pull changes, the team and the project move forward and everyone is happy.
Google is, of course, your best friend in learning nearly anything, especially technology-related subjects like Git. Other great resources for Git include (ordered by usefulness for beginners):
- learn.github.com - An excellent guide to Git for newcomers
- try.github.com - Learn how to use Git in your browser
- git-scm.com - The official source of Git-related information
- gitready.com - A great, albeit slightly less user friendly approach
- git-scm.com/book - Literally the owner’s manual. This is a deep-dive approach that will make you an expert.
A few tidbits before we dive into Advanced Git. Virtually any time you are working on a professional programming project, you will use version control. Alternatives for collaborating with others are frightfully complicated and useless compared to a good version control system. Any time I have to collaborate with at least one other person, I always use one (usually git). Even when I’m going solo on projects like homework assignments, my website, personal projects, or anything else, I find that it’s so easy to use Git and so easy to set up, and that the benefits are so great that there is no reason not to use it while I work.
Finally, if you are interested in developing showcasing your development skills, few things are as impressive as a well-groomed, documented, open and published portfolio of repositories on GitHub or Bitbucket (more on those below). When you publish code, would-be employers and others can see how you solve problems, the tools you know, and what you can actually do. If they’re particularly inquisitive, they can even explore comments and feedback you may garner on your projects, or they may notice that you have given back to various open source projects. Even if all you did was correct a spelling mistake in the Readme, there’s a significant amount of “street cred” that comes from actively giving back to the community.
Advanced Git (A glossary)
Git is celebrated for implementing a relatively common SCM feature with great elegance and ease - branching. A branch is when you literally take your git repository and make a local copy of it and call it something unique. A new repository has just one branch called master. Users can create any number of branches in their projects to help organize their project. When a user creates a new branch and makes any number of changes, other branches are untouched. A great use-case for this feature would be making significant changes to the product that may destabilize the main branch while still in development. A developer can branch the master branch into a new feature branch just to focus on this particular effort. If the developer’s changes are too disruptive, the new branch can be deleted or reverted or managed in any way completely independent from the master branch and other branches. Also, because the developer’s code lives locally and is not shared until pushed, any changes by the developer don’t touch any other team member or the central repository until those changes are pushed.
After branching and making changes to the new branch, the developer can then merge the new branch into any other branch(es), including the master branch, effectively bringing work otherwise isolated in the branch.
Remote Branches are also available. These are the same as normal branches, however a remote branch exists on a remote repository that is used by a team to coordinate and sycnrhonize development. Sometimes open source projects will use branches to separate major versions of a project that are in development at the same time, or to isolate an alpha/beta feature that needs community support but isn’t ready for inclusion in production code.
As mentioned in branching, when changes have been made to a new branch and that branch is ready to be tied in to another branch like master, the user performs a merge. Merging involves taking all of the commits that are different between two branches of code and reconciling them. Git will perform most merges automatically, however if the same file has been modified in two separate branches in different ways, the file must be merged by hand. A user must review the conflicted files and determine how to best combine the sets of changes into one uniform result. Apple bundles a tool called FileMerge that helps users merge files quickly and easily. Any text editor can be used, however, in performing a merge: git simply inserts information into the file at the beginning of conflicted segments designating a portion of the file as conflicted, and git even includes both versions of the conflict in the file so they can be compared immediately. The merging user must then edit the file by hand to resolve the conflict. Typically on teams, merging is a responsibility held by a team or project lead, or some other individual with enough experience to wisely merge work from two potentially different individuals.
A tag in git can be used to bring special attention to a particular commit. A common use of tags is to designate a particular commit as a version milestone like v1.0.0 or v1.0.1. Tags should be unique (no calling two commits v1.0). Some hosted git repository services like GitHub or Bitbucket will find tags in your repository and list them separately so that users can find a particular snapshot of a repository in time quickly and easily.
In a nutshell, a hook is a piece of code that a user hooks into a git repository (often a central repository) so that when certain events occur, git will run the custom code. One example would be to write a script that is triggered when code is pushed to a central repository, which script emails team members with information about the commit including messages, authors, and changes made. Some projects that use continuous integration may use a hook that triggers a build, test, and deploy process any time changes are made to the codebase. Another interesting use of hooks is documented here. While most projects will never need to use hooks, they are worth mentioning and are worthy of a Google search for more information and instructions in how to write a hook and how to attach a hook to a project.
Forking occurs when an existing project is copied completely by a user, giving them complete ownership of the project files, who can see them or access them, and how the project is managed. This distinct and independent copy of the code is called a fork. Forking is common with Open Source projects, and is sometimes even employed by companies to help isolate developer work from the “canonical” core code of the company. For example, if a Linux user experiences a bug or a void where there should be a feature, this user could fork the Linux project and implement the improvement on their fork of Linux. When the user feels their change is adequately tested and implemented, they can push their changes in their fork back to the core Linux project, whereupon those maintaining the core Linux project can choose to accept or reject the changes. Forking differs from simply checking out or cloning a repository in that a fork is set apart as belonging to the user that forked it - the repository isn’t just linked back to the repository, and changes made to a fork cannot be pushed into the repository originally forked without approval of that repository’s owner.
A pull request is the request made by a developer of a forked repository to the original developer from whence the fork originated. In the Linux example given above in the Forking section, the request from the user with the improvements to Linux is a pull request, and the core Linux maintenance team can accept or reject the pull request. Pull requests are often used in commentary on issues, bug reports, and other discussions about a project that are typically managed by a web-based code repository hosting service (see below)
Repository hosting services
Earlier in this article I mentioned GitHub and Bitbucket as options for hosting repositories online. Web-based repository hosting companies provide tools for storing and publishing code, documentation, bugs, wikis, commits and tags, branches, and everything else that Git (or other SCM tools) can do. Such services are popular for hosting team projects, private repositories, and coordinating group efforts in software development. One unique feature of many hosting services, for example, is issue tracking. When a bug report is filed, a unique ID number is generated for that report. Users can discuss issues with developers and other team members much like in a forum, and code commits that include the bug report number in the commit message are automatically attached to the discussion, thus enabling users and developers to talk about issues and features using their own comments and their code. Many services even allow commenting on individaul lines of code, enabling managers and project coordinators to review code and highlight issues or points of interest to the developer. In addition to linking normal commits to an issue by including the issue’s ID number in the commit message, many hosting services treat pull requests as discussions or “issues” of their own, where developers and other users can comment and discuss a particular improvement to a project that has been provided from a fork of the project by another user.
By using these various communication, issue tracking, and discussion mechanisms, repository hosting services foster an incredible environment for collaborating on projects that may be distributed around the world. These projects range from typical code projects like linux or even git itself to books and even legal documents. One Git user (see the english translation of the ReadMe towards the bottom of the page) has created an automated script that downloads changes to the legal code of Germany and publishes the code and changes on GitHub. Some political activists have forked his repository and made their own changes, initiating discussions and pull requests to help create drafts of improvements for German legislators to review and potentially implement themselves. This notion of literally putting legal code and the ability to make suggestions and ammendments, improvements, and proposals for law is incredible, and is made possible by tools like Git and GitHub.
Both GitHub and Bitbucket perform most of the same things, however a few differences in community and pricing model make them attractive to different audiences.
GitHub is one of the more popular hosting services online. GitHub allows users to publish as many public repositories as they like and charges for private repositories starting at $7/month for 5 private repositories with unlimited collaborators. They also offer plans specifically for businesses. Many open source projects including Ruby on Rails, Django, open source projects from Twitter and Facebook, and countless other projets are hosted on GitHub. GitHub includes advanced feature tracking, a simple interface for creating hooks, wikis, and great collaboration tools for working together in teams.
Bitbucket is less popular than GitHub although it supports essentially the same features for collaboration, issue tracking, wikis, forks, and more. The primary difference between Bitbucket and GitHub is in their pricing structures. Bitbucket is free for unlimited repositories - public or private - and only charges for collaboration. Private repositories can have up to 5 users working together. The initial paid tier costs $10/month and allows for 10 users working together.
Git by itself runs on the command line (Like a DOS prompt or a terminal shell). While I strongly encourage fluency on the command line (it opens doors!), using a terminal is often preferable for many tasks like browsing repositories, commits, and otherwise navigating big projects. Below are a list of my personal favorites:
I use GitBox primarily. Free for one repository, currently $20 on the Mac App store for unlimited repositories. Students get 50% off if you email them first. See the website for details. GitBox is my preferred client because it just works, it’s simple, and it makes sense. It also exposes some of the more complex features of git in very simple and easy to use ways.
Similar to GitBox but it feels more like a swiss army knife, SourceTree is free. What it lacks in elegance it makes up for in power - SourceTree features just about everything GitBox can do. It also supports Mercurial, a cousin of Git in the SCM world.
GitHub (Mac, Windows)
GitHub has two free clients for Mac and Windows. Both are great, albeit not as polished as GitBox and not as powerful as SourceTree. That said, they get the job done and integrate extremely well with GitHub. And, they’re free!
There are tons of other Git clients on the market, too. Here’s a Wikipedia article listing them. Most IDEs like NetBeans, Eclipse, xCode, and Visual Studio integrate tightly with Version Control as well, and typically have their own clients built-in or offered as plugins.
There you have it! Feel free to contact me with any questions or feedback regarding this article. I hope you’ve found it useful!