Understanding Git Diff Output Thoroughly
After git commit, git pull and git push, git diff is the Git command I (probably) use most often. Running git diff before doing a git commit has become a routine I always follow, to make sure there are no mistakes before I record my changes in a commit.
The git diff command tells us in detail exactly what changes have occurred between two Git reference points. By default, when we run git diff without any arguments, Git will print the changes that have occurred between the working tree and HEAD. The information printed by Git is actually very informative, but sometimes we miss it because we only care about the additions and deletions in our code. Let’s examine exactly what information we can extract from git diff.
In the case below, I renamed puisi/aku-chairil-anwar.md to puisi/aku-chairil-anwar-new.md and made some changes. For clarity, I have also included the commands I ran from a clean working tree state.
mv puisi/aku-chairil-anwar.md puisi/aku-chairil-anwar-new.md
vim puisi/aku-chairil-anwar-new.md
sudo chmod +x puisi/aku-chairil-anwar-new.md
git add .
git diff --staged
diff --git a/puisi/aku-chairil-anwar.md b/puisi/aku-chairil-anwar-new.md
old mode 100644
new mode 100755
similarity index 86%
rename from puisi/aku-chairil-anwar.md
rename to puisi/aku-chairil-anwar-new.md
index fdcc0f3..1bcc061
--- a/puisi/aku-chairil-anwar.md
+++ b/puisi/aku-chairil-anwar-new.md
@@ -1,9 +1,10 @@
# Aku
## oleh Chairil Anwar
+---
Kalau sampai waktuku
Ku mau tak seorang kan merayu
-Tidak juga kau
+Tidak juga kau!
Tak perlu sedu sedan itu
@@ -20,3 +21,7 @@ Hingga hilang pedih peri
Dan aku akan lebih tidak peduli
Ku mau hidup seribu tahun lagi
+
+---
+Diperbarui pada 21 Juli 2017
+
Let’s break it down section by section. Starting from the first line, called the diff header, we have:
diff --git a/puisi/aku-chairil-anwar.md b/puisi/aku-chairil-anwar-new.md
old mode 100644
new mode 100755
This line indicates the beginning of a comparison of a file between Git references, while also identifying the two files whose comparison will be detailed below. The a/ and b/ prefixes in front of the compared file names refer to the position of the references we pass to git diff.
In this case, because we called git diff without any reference arguments, what happens is that HEAD is in position a/, while the working tree is in position b/. If we were to call git diff with reference arguments, such as git diff HEAD~ HEAD, then HEAD~ would occupy position a/ while HEAD would occupy position b/.
The old mode and new mode lines indicate changes to the file mode (permissions), represented in octal notation as in Unix systems. 100644 means a regular file, while 100755 means an executable file.
Next, on the following lines, we see:
similarity index 86%
rename from puisi/aku-chairil-anwar.md
rename to puisi/aku-chairil-anwar-new.md
This section, called the extended header, only appears in cases where a file has been renamed. The similarity index line indicates the proportion of similarity between the two states of the file we are comparing. We get 86%, meaning our file changed 14% between the two references we are comparing.
The next two lines are fairly self-explanatory: rename from indicates the previous file name, while rename to indicates the new file name.
The next line, which is still part of the extended header, reads:
index fdcc0f3..1bcc061
This line contains hashes that mark the version of the file before and after, known as the preimage and postimage hash. Similar to a commit hash used to identify a commit, but these hashes are used to identify individual objects in Git — in this case, a file.
Moving on to the next section, which reads:
--- a/puisi/aku-chairil-anwar.md
+++ b/puisi/aku-chairil-anwar-new.md
This is the part that is often misunderstood. Many people assume that the --- marker indicates deleted content, while +++ indicates added content. That’s close, but the true definition of those two lines is to provide an indicator of the state of the code in each version of the file. In this case, the --- symbol is used as an indicator of the code state in a/puisi/aku-chairil-anwar.md, while +++ indicates the code state in b/puisi/aku-chairil-anwar-new.md, based on the order of references we pass to the git diff command.
Why do I say it’s imprecise to interpret those two markers as ‘deletion’ and ‘addition’? Because if we reverse the order of references we pass to git diff, the two indicators are also reversed. Imagine: when we make a commit that adds several lines of code, and then we run git diff between HEAD and HEAD~, the lines we added earlier will be labeled -. It doesn’t make sense to call that a deletion rather than a difference between two versions, right?
The next section is called a hunk — a description of the differences between file versions. A hunk consists of a line called a hunk header, which explains the location and length of the code excerpt that will be shown, followed by a chunk — the excerpt of code along with a description of the differences. Here is the hunk header we will break down:
@@ -1,9 +1,10 @@
There are numbers inside it that we can split into two parts: first -1,9, second +1,10. As we now know, the - and + symbols are references to the versions of the file being compared. The -1,9 notation means the chunk being shown starts at line 1 for 9 lines in the state of the file referenced by the - symbol (a/puisi/aku-chairil-anwar.md).
Meanwhile, +1,10 indicates that the same chunk is at line 1 for 10 lines in the state of the file represented by the + symbol (b/puisi/aku-chairil-anwar-new.md). This means there is a one-line difference in that chunk.
Moving on to the chunk itself:
# Aku
## oleh Chairil Anwar
+---
Kalau sampai waktuku
Ku mau tak seorang kan merayu
-Tidak juga kau
+Tidak juga kau!
Tak perlu sedu sedan itu
Here we can directly see the differences between the two versions of our file. The code differences here are indicated with the - and + symbols. Once again, bear in mind that both symbols represent two different versions of our file, not deletions and additions. We can see in this chunk that there is --- on the third line and Tidak juga kau! on the seventh line of file b/puisi/aku-chairil-anwar-new.md, while a/puisi/aku-chairil-anwar.md has Tidak juga kau on its seventh line.
The next section is a description of the next hunk containing differences, which is read in the same way as the one we just went through.
That is, more or less, how to properly read the output of git diff. There is a great deal of information that can be extracted from this command, but unfortunately comprehensive documentation on it is nearly nonexistent. I hope you find this information as interesting as I do!