I am a big fan of Git, but my love relationship only really started when I learned about the inner workings of Git. I also feel that only the Git CLI can provide the perfect workflow for any Git repo, as opposed to GUI programs.
Therefore, I will start talking about my mental model of Git and how to navigate the different stages with the CLI. In the last part of this blog, I am going to look at the regular workflow of most Git users.
Why even source code versioning?
I should not need to explain this, but code versioning is really useful whenever you are either:
- building several features in software and need to track your progress
- working with a team of 2+ people on the same code
Except for simple scripts, all my software projects live inside one or multiple Git repos. Git or other Version Control Systems (VCS) have useful features for code (text) files. Whenever there are two changes by two people on the same file, VCS help with merging these changes. They put an ID to every change you make, and the most spectacular feature: You can go back in time and load a previous state of your software. I think of it as a savegame for your code.
How does Git work?
My assumption here is that you know other VCS or just started using Git, so I will not make comparisons with the other tools. Git is a command-line tool that saves several versions of your code. There are different parts of the software that do different things. We will go through the most important ones and introduce some vocabulary (in bold characters) that is used in Git.
The repository is the main location of your code. You often have different developers working on different things in the same repo. If developer A and developer B modify a file at the same time, which version should be in the repo in the end? If there was no local copy, every developer would access the same files and overwrite each other’s contributions.
A repository is basically a folder that contains files. When you init a new repository, the folder
.git is created that contains every information needed by Git. A repository can be local (on your computer) or remote (using GitHub, GitLab etc). The development setup of Git is: You make a copy of a remote repository, modify your local copy as you wish and then you upload the changes to the remote location. Making a local copy is called a clone, while making a remote copy is called a fork. So whenenver you find a public GitHub project and fork it, you get a copy of that project in your own account. This is like copying a software from a CD onto a new CD and putting your name onto it.
An upload to the repo is called a push. The special thing about Git is: Before you push your version, you might need to synchronize your local repository, as another developer might have pushed a new version to the shared remote repository. This is where merge conflicts can happen (later more on that).
Branches help us work on different directions in the code and only care about putting all together later on. Not only can you have a local repo, but you can declare a new branch in the repository that allows to push your code without anyone else interfering.
Branches are called like that because you internally use tree structures to visualize the current state of a repo. Most projects declare one master branch that is the main source structure of your project. In most workflows, whatever is deployed in the master is not the most recent, but the most stable version of your code. If you are a lonely coder and the only user of a repo, branches may help organize and track different features (so-called feature branches).
You can checkout branches, which means that you make a local copy of a remote branch. Checking out branches means you want to have a specific version of the code on your computer. When you clone a repo, you automatically check out the main branch (e.g. master). Whenever you are done pushing changes to a branch, you can switch back to another branch and merge them together. Merging means that the changes done in one branch are transfered to another one.
So if you created file A in branch X and file B in branch Y, you can merge Y into X, which results into X containing file A and B, while Y still only contains file B.
Changes in a branch are called commits. A commit is like a savefile in a game. You can come back later and restore your repository to a given commit state. Doing a commit is like saying “I want to save this change for myself and/or share it with my co-workers.”
A commit contains a number of files (that were modified), a message, a timestamp, an ID and information about the author of the change. So if you modify file A and commit, the commit will contain the changes to file A only. If another developer, who has a local copy of file A, receives your commit and applies (merges) it, their local file A will change to match the commit changes.
The latest changes to a branch are fetched or pulled from a remote repo. If you mess up, you can reset a branch and discard all your local changes.
There is one more step between changing a file and pushing to a repo. When you modify or create a new file, you have to add it to a stage. A stage is an intermediate step between “files were changed” and “I want to save this change”. In my work, I often change a number of different things.
Maybe I am adding a new function to file A and at the same time I notice that I named another function wrong in file B. So when I add a new function and fix the name, I have two changes in two files.
Now, there are some best practices around commits. Every commit should change exactly one thing. Therefore I can not add functionality and fix bugs in the same commit.
This is where I first add file B to the staging area and do a commit. The message describes what is happening in the commit: Fix wrong function name.
After that, I add file A to the staging area and commit again. Now I can add a second message that says Add feature XYZ and now I have two local commits that contain different changes to the code, all by selectively adding files to a commit.
So the staging area is a state that tracks all changes that you want to include in the next commit.
All the new vocabulary above is used in Git, both in the manual and in the commands. Let’s assume there already is a project that you need to work on. Usually you have an URL that points to a remote repo, such as: https://github.com/octocat/Hello-World.git
Your first step is cloning the repo:
git clone https://github.com/octocat/Hello-World.git
This creates a local copy of the repository
Hello-World/ and adds a
.git folder inside.
Now enter the repo and look at which “remote” branches it contains:
cd Hello-World/ git branch --remotes
If you forget the parameter
--remotes, you will only see your local branches. By default, only the master branch has a local copy. But we can checkout a local branch using the name of the branch. But first, look at the files in the folder. In our example, you will only see
README.md. Now let’s switch to the test branch:
git checkout test git branch
With the branch command, you can now see that you have a new local branch. Check the files now and you will see that a new file appeared. If you do a
git checkout master now, the file will disappear again, as it does not exist on the master branch.
Make sure you are on the test branch for the next step. The following command is your new best friend:
Status shows some useful information about the current state of your repo. In this case, you can see we are working the test branch and there are no changes. If you see master, you need to switch branches by using
git checkout again.
Now add your name to
CONTRIBUTING.md and look at what happened to the status.
On branch test Your branch is up to date with 'origin/test'. Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git restore <file>..." to discard changes in working directory) modified: CONTRIBUTING.md no changes added to commit (use "git add" and/or "git commit -a")
You can see that Git detected our modification, but no change was staged (added to a commit). We can look at our modification with
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 340edab..77d0ee6 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1 +1,2 @@ ## Contributing +Davide
Now we want to add our change to a commit. We first add it to the staging area with
git add CONTRIBUTING.md. Another
On branch test Your branch is up to date with 'origin/test'. Changes to be committed: (use "git restore --staged <file>..." to unstage) modified: CONTRIBUTING.md
If you do a
git diff now, nothing will happen because only unstaged changes are shown by default. Add
--staged to show the above diff again.
At this point, our change is staged, but not committed yet. Create a commit with
git commit. Your default editor will open and present you with a similar file:
# Please enter the commit message for your changes. Lines starting # with '#' will be ignored, and an empty message aborts the commit. # # On branch test # Your branch is up to date with 'origin/test'. # # Changes to be committed: # modified: CONTRIBUTING.md #
This is the commit view. The first line consists of a short title that describes the changes (e.g. Update contributions). As a standard, you leave out the next line and continue on line 3 with a more descriptive message. In my example:
Update contributions As a new developer on this repo, Davide was added to the CONTRIBUTING file in order to reflect the current changes in the team structure. # Please enter the commit message for your changes. Lines starting # with '#' will be ignored, and an empty message aborts the commit. # # On branch test # Your branch is up to date with 'origin/test'. # # Changes to be committed: # modified: CONTRIBUTING.md #
If you save the file and exit, your commit will be created. You can see the last 2 commits with
git log -n 2:
commit 6a22ecfbc1d288a967f75be872aeba35c21a0e62 (HEAD -> refs/heads/test) Author: Davide Bove <firstname.lastname@example.org> Date: Fri Dec 11 18:25:21 2020 +0100 Update contributions As a new developer on this repo, Davide was added to the CONTRIBUTING file in order to reflect the current changes in the team structure. commit b3cbd5bbd7e81436d2eee04537ea2b4c0cad4cdf (refs/remotes/origin/test) Author: The Octocat <email@example.com> Date: Tue Jun 10 15:22:26 2014 -0700 Create CONTRIBUTING.md
If you now do a
git push, your changes are pushed to the remote branch
origin/test. Since you do not have any permissions on our example repo, you should get an error message:
ERROR: Permission to octocat/Hello-World.git denied. fatal: Could not read from remote repository. Please make sure you have the correct access rights and the repository exists.
The best workflow
There are best-practices for working with git, but it ultimately depends on the repository and the team to use these standards.
The most basic workflow, and what you should learn and use as a beginner, is one that has no name yet, so I will be the first and simply call it BCPM. It stands for:
- Create a new branch where to put your work
- Add any changes to this branch through several commits
- Push the changes regularly to the remote server so that others can see your code and review it if necessary. Also it’s a great backup for your code.
- GitHub calls them Pull Requests, for Gitlab its Merge Requests: A process where other developers can review the code, comment on your changes and merge your code into the main development branch.
This is a repeating pattern and therefore the best solution until you learn more advanced git usage.
If you are asking yourself how “more advanced Git” works in practice, see the popular “git-flow” model: https://nvie.com/posts/a-successful-git-branching-model/
What I did not tell you about
I left out several details that you do not need to know as a beginner, in my opinion. The last commit of a branch is called HEAD, and you can define several tags to mark specific versions of your code. This is advanced stuff and only useful to know if you need to work in bigger teams. Same for any merge techniques, such as cherry-picking, and rebasing. As a beginner, and in a team with more advanced developers, your main workflow should be kept simple.
In the end, Git is “learning-by-doing” all the way through. You can find several great tutorials on the Internet. I collected a few resources that helped me understand more about Git, so if you are interested, check these out:
- git – the simple guide: https://rogerdudler.github.io/git-guide/
- GitHub has a few resources on Git usage that are not too bad: https://try.github.io/
- The general GitHub flow: https://guides.github.com/introduction/flow/
- oh shit, git!?! – A wonderful cheat sheet for when you fuck up your Git repo: https://ohshitgit.com/
- Git from the inside out – an essay about the inner data structures of Git. Definitely for people that need all the details and do not have time to read the source code: https://codewords.recurse.com/issues/two/git-from-the-inside-out
Something’s missing? You know some great resource for learning? Please comment below with your insights!