Going further with git

Introduction

was introduced by Linus Torvalds after some while to handle the huge amount of patches submitted by the developpers during the early life of . It is designed to be a patch database.

helps developpers to easily create patches (called commits) and share them with others. Previously it was done by mailing the patches to the maintainer of the project. Each commit contains also an explanation for the patch. This is super handy to understand some part of the code because it's like each patch contains its own documentation. Here is an example of a patch written in the repository:

From 40249c6962075c040fd071339acae524f18bfac9 Mon Sep 17 00:00:00 2001
From: Peter Oberparleiter <oberpar@linux.ibm.com>
Date: Thu, 10 Sep 2020 14:52:01 +0200
Subject: [PATCH] gcov: add support for GCC 10.1

Using gcov to collect coverage data for kernels compiled with GCC 10.1
causes random malfunctions and kernel crashes.  This is the result of a
changed GCOV_COUNTERS value in GCC 10.1 that causes a mismatch between
the layout of the gcov_info structure created by GCC profiling code and
the related structure used by the kernel.

Fix this by updating the in-kernel GCOV_COUNTERS value.  Also re-enable
config GCOV_KERNEL for use with GCC 10.

Reported-by: Colin Ian King <colin.king@canonical.com>
Reported-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Tested-and-Acked-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 kernel/gcov/Kconfig   | 1 -
 kernel/gcov/gcc_4_7.c | 4 +++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/gcov/Kconfig b/kernel/gcov/Kconfig
index bb4b680e8455..3110c77230c7 100644
--- a/kernel/gcov/Kconfig
+++ b/kernel/gcov/Kconfig
@@ -4,7 +4,6 @@ menu "GCOV-based kernel profiling"
 config GCOV_KERNEL
        bool "Enable gcov-based kernel profiling"
        depends on DEBUG_FS
-       depends on !CC_IS_GCC || GCC_VERSION < 100000
        select CONSTRUCTORS if !UML
        default n
        help
diff --git a/kernel/gcov/gcc_4_7.c b/kernel/gcov/gcc_4_7.c
index 908fdf5098c3..53c67c87f141 100644
--- a/kernel/gcov/gcc_4_7.c
+++ b/kernel/gcov/gcc_4_7.c
@@ -19,7 +19,9 @@
 #include <linux/vmalloc.h>
 #include "gcov.h"
 
-#if (__GNUC__ >= 7)
+#if (__GNUC__ >= 10)
+#define GCOV_COUNTERS                  8
+#elif (__GNUC__ >= 7)
 #define GCOV_COUNTERS                  9
 #elif (__GNUC__ > 5) || (__GNUC__ == 5 && __GNUC_MINOR__ >= 1)
 #define GCOV_COUNTERS                  10
-- 
2.43.1

As you can see, there is a first line in the patch note after the Subject tag, followed by an extensive explanation of the patch reason and goal. You will also find other interesting tags like Reported-by and Signed-off-by. These tags are optionnal but useful in huge and massively distributed projects like . They are documented here.

New features where introduced to even increase the developper productivity. Some of them are the ability to work on branches (local forks of the code) and merge them, rebase them, cherry pick a commit from another branch, the ability the squash commits, split them, reorder them in a branch, reword the patch note and so on... These are essential technics to understand and master in order to be efficient at developping code.

To finish this introduction, let's have a look at a database. If you dive into a cloned git repository (not a bare one), you will find this kind of structure:

/all_your_code_and_subdirectories
/.git
    /branches/        # Describe the branches
    /config           # Contains the local configuration of the project (the global one is in ~/.gitconfig)
    /description      # The repository description, but mostly unused nowadays
    /HEAD             # The currently stored state of the repository
    /hooks/           # Some hooks you can use to perform complementary actions after or before some git commands
    /index            # The description of the modifications with HEAD
    /info/            # To exclude data you don't want to put in .gitignore
    /logs/            # History of the local actions (not distributed). Mainly used for git reflog command
    /objects/         # Where the data are stored, mostly the patches
    /ORIG_HEAD        # The HEAD of the main remote
    /packed-refs      # The list of registered refs
    /refs/            # The links between the patches

As you can see, the database is accessible for understanding and a good documentation is provided here at the section 10. Git Internals. By the way, you can download the git documentation as pdf here

Understanding the remote

As said in the introduction, was designed to share code bases. Hence, the remote notion comes in. A remote is basically another repository we can interact with. There are several transport layers that can be used to communicate with another repository. They are:

Filesystem: another repo on the same computer (ex: /home/me/my/remote)
SSH: a remote server serving files through ssh (ex: git@github.com/my/remote)
HTTP/HTTPS: a remote server serving files through an http/https server (ex: https://github.com/my/remote)

The remote repository, can be either a bare (only the patch database) or a non-bare (the one developpers use which displays the files) repository. If a remote repository is used to share code, it's better to use a bare one because a non-bare one will impose to not push on the currently active branch of the remote but it's still possible and sometimes handy (ex: quickly push a patch on an embedded device to test something).

You can see the currently registered remotes of your repository by using the command:

git remote -v

It should display somthing like this if you originally cloned your repository and didn't add other remotes:

origin  git@gitlab.com:my/repo/origin.git (fetch)
origin  git@gitlab.com:my/repo/origin.git (push)

Interacting with the remote

As you can see in the result of the git remote -v command line, you can basically do only two things:

fetch: synchronize the local image of the remote with the remote
push: submit patches to the remote

And as you can guess the fetch url and the push url can differ. This feature is used in some worflows eventhough this is not really common nowadays.

Push

push is the simplest action to explain. It basically ask the remote server if the request to send patches is valid and then send the patches one by one to the requested remote ref.

The basic command to push to the remote is:

git push [remote] [local_ref]:[remote_ref]

If [local_ref] and [remote_ref] are the same, you can only put one and omit the colon. You can also omit the [remote] and following parameters only if you did set a tracking remote and branches to your current branch.

To see which branch track which remote branch in your repository, you can use the command:

git branch -vv

It should display something like:

  feat-integration-docker daeda0f [origin/feat-integration-docker: gone] sequencer-back: put secret in env
  master                  922ea57 [origin/master] gitmodules: use relative path to allow http or ssh cloning methods
  protobuf                db20f3e [origin/protobuf] readme: add doc for setting up dev env
* test                    922ea57 gitmodules: use relative path to allow http or ssh cloning methods

In this example, there is 3 types of branches:

One without remote tracking: the test branch
One with a remote tracking that no longer exists: the feat-integration-docker branch
And the others that have currently existing remote tracking: the master and protobuf branches.

Fetch

fetch is bit more subtle to understand. It allows you to synchronize the local representation of the remote with the remote effective content.

You can see the local representation of the remote by using the command:

git branch -a

It should display something like:

  feat-integration-docker
  master
  protobuf
* test
  remotes/origin/HEAD -> origin/master
  remotes/origin/ik-service
  remotes/origin/master
  remotes/origin/meshHelper
  remotes/origin/multiple-device
  remotes/origin/protobuf
  remotes/origin/selectParts

The branches starting with remotes/ are the local representation of the remote. The following name is the name of the remote and then is the name of the branch as stored on the remote.

You can see the content of the remote branches. For example if you want to see the content of the remote branch master on origin, use this command:

git log origin/master

Now let synchronize the local representation of the remote by using:

git fetch [remote]

The [remote] argument is optional if you have only one remote.

Now that we fetch the remote we can either, merge the remote branch in our branch or rebase our branch on the remote branch. This is done as with any other branch using one of the following commands:

git merge [remote]/[branch]
git rebase [remote]/[branch]

If a tracking remote is set on our branch, we can also perform the actions fetch+merge or fetch+rebase by using one of the following commands:

git pull --rebase
git pull --merge

You can ommit the --rebase or --merge arguments and it will use the default pull behaviour set by your configuration, and if you didn't set it, the default behaviour is merge. You can set the pull default behaviour by using the command:

git config --global pull.rebase [yes|no]

Use the --global argument if you want to set it for the global configuration otherwise it will only apply to the current repository.

Multiple remotes

Adding a remote

Sometimes you want to use some code from another remote repository. Maybe someone published that code on a fork of your repository or this is your repository that is itself a fork. This is very common if you work on the kernel for example.

In that case, you have to setup another remote and know how to interact with it. First, let's add a new remote by using the command:

git remote add [name] [URI]

name will be the name of this new remote in your local repository. And URI is the target remote location (as explained here).

For testing purpose, you can resuse the origin URI in your local repository.

Now, you can fetch it by using:

git fetch [name]

Use branch from a certain remote

Once you have setup a new remote, you can merge or rebase with this remote as explain previously. You can also cherry-pick a single commit from the remote by using the command:

git cherry-pick [commit-sha1]

You can find the commit-sha1 that you want to cherry-pick by logging a branch that contains it:

git log [remote]/[branch]

The commit-sha1 is the hexadecimal sequence displayed on top of the commit message. Example:

commit 41bac37b0b3ce15358aba1b3c366488fe7897e27  <== THIS IS THE SHA1
Author: Franck Duriez <franck@duriez.info>
Date:   Thu Feb 15 12:11:19 2024 +0100

    article/git: add todo markers in commits section

Handle the commit

Before starting this, let's clarify some points about what is and what is not . As said in the introduction, is a patch database, you start from an empty repository and gradually sum patches to get its current state.

is not a backup system. If you see it this way you will tend to produce huge commits from time to time and have some commit messages like: New version or WIP or Backup 20 sept 2006 which are missing the point of documentating what's going on.

If you look back to the patch example of the introduction, you will see the there is actually as much documentation as actual code modification. This is super important to chunk the commits into small coherent set of modifications and give informations about what is going on. This is even more important when the number of developper is large and even more if they do not all work for the same company, because you may not have the author of this nasty line of code in reach to ask him what hell is going on.

Commit with the GUI

comes with a powerful GUI tool to shape your commits easily. You can call it by running the command git gui. The following window should then appear:

This tool allow you to stage file to commit by clicking on them in the Unstaged Changes section: you have to click on the file icon to do so, otherwise it select it to display its modifications in the top right section.

If you want to only stage a part of the file, you need to select the modification in the top right section, right click and choose the option Stage Lines For Commit.

This tool allows you to unstage a file to commit by clicking on them in the Staged Changes section: you have to click on the file icon to do so, otherwise it select it to display the staged modifications in the top right section.

If you want to only unstage a part of the file, you need to select the modification in the top right section, right click and choose the option Unstage Lines From Commit.

Once everything is ok, you only have to put the commit message in the bottom right secttion and press the Commit button.

Oops, I did it again

It may happen that during the development, you changed your way of solving your problem. Maybe you have done something three commits before, and then removed it later. Maybe you forgot a file 5 commits before. Maybe you did a WIP commit in your branch and forgot to chunk it later. Or maybe you're just not satisfied of the way you chunked your modifications. Shit happens...

That's why you have to understand how to fix it. They are actually several ways to do it. But first, I have to be sure you understand the way to interact with a remote git server to not mess up everything. If this is not the case please read and understand the Understanding the remote section.

Case 0: Amend a commit

Let's start with an easy problem: I want to modify the last commit for any reason. If you use have to add something to the last commit, you can simply do something like:

git add some_file_to_add_to_the_previous_commit
git commit --amend --no-edit

If you want to do something more complex, the simplest way is to open the GUI again and toggle the Amend Last Commit radio. Once everything is done, only push the Commit button to validate the change.

Case 1: I forgot a file 3 commits away

The simplest way is to:

Step 1: create a new temporary commit with the missing file. The commit message do not have any importance here, you can set fixup, but if you have multiple modification to multiple commits, it's better to set an indicative message to know which commit fix which problem.

Step 2: interactively rebase your branch to the remote branch (if any, but let's start simple) by using the command:

  git rebase -i [remote]/[branch_name]

For example, if you work on the master branch and have a single remote called origin, you should use:

  git rebase -i origin/master

This will open your default editor with this kind of content:

You can see that there is instructions writen at the bottom of the editor. In our case, we have to move the last commit up to the desired place, replace the pick keyword by fix and close the editor. Et voilà!

Case 2: I forgot to chunk a WIP commit

To do so, use again the rebase -i command and instead of using fixup, use the edit command on the commit you want to split and exit the editor. will place you on that commit so that you can amend it with the GUI for example. In the GUI, use the amend last commit option and unstage all the changes. Now you only have to make new commits with the changes. Once it's done, use the command:

git rebase --continue

This will apply the following patches and it's done!

Case 3: Everything is a mess, let's rechunk everything

If you want to recreate all the commits starting from the last pushed commit to the remote or to the remote main/master branch (can require to push -f to the remote after that, but this is ok if this is a work branch) you can do:

git reset --soft [remote]/[branch]

After this command, your current branch will be reset to the [remote]/[branch] branch without changing the files in your local repository. It means that if you do a git status you will see all the modifications as uncommited.

You then only have to recreate all your commits. Et voilà!

OMG I messed it all by trying to modify my commits

It may happen that during your learning, you mess it up. Keep calm, backed you up.

As said in the introduction, in the .git directory, you can find a directory logs which is not shared with the remotes. It contains the history of every single action you performed in your local repository. As long as you have committed something once into the repository it may be recovered.

To see the history of modifications, use the following command:

git reflog

It will display the history in the style of the git log command. Example:

57b30fd (HEAD -> master, origin/master, origin/HEAD) HEAD@{0}: commit: article/git: explain how to amend commit higher in the commit stack
c822059 HEAD@{1}: commit: article/git: finish the reote subsection
a96b6ee HEAD@{2}: commit: article/git: finish the remote interaction subsection
7746639 HEAD@{3}: pull --rebase (finish): returning to refs/heads/master
7746639 HEAD@{4}: pull --rebase (pick): article/git: start the remotes section
41bac37 HEAD@{5}: pull --rebase (pick): article/git: add todo markers in commits section
f91d636 HEAD@{6}: pull --rebase (start): checkout f91d6365dd8b44eacdb2991af4c28a722fc84976
6a1930e HEAD@{7}: commit: article/git: start the remotes section
bab8ca5 HEAD@{8}: commit: article/git: add todo markers in commits section
0de9e98 HEAD@{9}: rebase (finish): returning to refs/heads/master
0de9e98 HEAD@{10}: rebase (pick): article/git: finish case1 in commit-oops section
5e96165 HEAD@{11}: rebase (start): checkout origin/master
2813ca8 HEAD@{12}: reset: moving to HEAD
2813ca8 HEAD@{13}: commit (amend): article/git: finish case1 in commit-oops section
5401ac7 HEAD@{14}: commit: article/git: finish case1 in commit-oops section
f6ef017 HEAD@{15}: rebase (finish): returning to refs/heads/master
f6ef017 HEAD@{16}: rebase (start): checkout origin/master
f6ef017 HEAD@{17}: reset: moving to HEAD
f6ef017 HEAD@{18}: rebase (abort): returning to refs/heads/master
f6ef017 HEAD@{19}: rebase (start): checkout origin/master
f6ef017 HEAD@{20}: commit: fix
5e96165 HEAD@{21}: commit (amend): article/git: add a commit section

As you can see, each modification is referenced by a short SHA1 like bab8ca5. If you want to reset your current branch to the state referenced by a given SHA1, simply use the following command:

git reset --hard [SHA1]

This will reset also the files to that state. If you only want to reset the branch state but not the files state, replace the --hard by --soft. But if you messed it up, chances are that you may want to use the --hard version.

Conclusion

As everything in life, things take time to learn, but once you will master the concepts presented in this article, you will fully enjoy a rich experience. It may also give you ideas to craft the git workflow that fits the need of your projects.

I hope you enjoyed. See you next time!