blog.meain.io
Open in
urlscan Pro
2a05:d014:275:cb01::c8
Public Scan
URL:
https://blog.meain.io/2023/what-is-in-dot-git/
Submission: On October 11 via api from US — Scanned from DE
Submission: On October 11 via api from US — Scanned from DE
Form analysis
0 forms found in the DOMText Content
MEAIN/BLOG * About * RSS Oct 06, 2023 . 8 min WHAT IS IN THAT .GIT DIRECTORY? Well, I think most of you reading this blog use git more or less on a daily basis, but have you ever looked into what is in the .git folder that git creates? Let's explore it together and understand what is going on in there. This is a blog version of a talk that I recently gave. Unfortunately I can't link to the recording :(. > git at a basic level is just a bunch of text files linked to each other by > filenames. LET'S START INIT # As you all know, we start our git journey with a git init. This gives the message that we all are probably used to by now, especially if you start and abandon a lot of side projects. Copy Initialized empty Git repository in /home/meain/dev/src/git-talk/.git/ Let's look at what is in the .git repo as of now. Copy $ tree .git .git ├── config ├── HEAD ├── hooks │ └── prepare-commit-msg.msample ├── objects │ ├── info │ └── pack └── refs ├── heads └── tags It seems to create a bunch of files and folders. What are all these? Let's go over them one by one. * config is a text file that contains your git configuration for the current repo. If you look into it, you would see some basic settings for you repo like the author, filemode etc. * HEAD contains the current head of the repo. Depending on what you have set your "default" branch to be, it will be refs/heads/master or refs/heads/main or whatever else you had set to. As you might have guessed, this is pointing to refs/heads folder that you can see below, and into a file called master which does not exist as of now. This file master will only show up after you make your first commit. * hooks contain any scripts that can be run before/after git does anything. I have written another blog here which goes a bit more into how git hooks work. * objects contains the git objects, ie the data about the files, commits etc in your repo. We will go in depth into this in this blog. * refs as we previously mentioned, stores references(pointers). refs/heads contains pointers to branches and refs/tags contains pointers to tags. We will go into what these files look like soon. NOW WE ADD A FILE # Now that you have an idea of what the initial set of files in .git is, let's perform the first action that adds something into the .git directory. Let's create a file and add it(we are not committing it yet). Copy echo 'meain.io' > file git add file This does the following: Copy --- init 2024-07-02 15:14:00.584674816 +0530 +++ add 2023-07-02 15:13:53.869525054 +0530 @@ -3,7 +3,10 @@ ├── HEAD ├── hooks │ └── prepare-commit-msg.msample +├── index ├── objects +│ ├── 4c +│ │ └── 5b58f323d7b459664b5d3fb9587048bb0296de │ ├── info │ └── pack └── refs This causes two main changes as you can see. The first thing it modifies is the index file. The index is what stores the information about what is currently staged. This is used to signify that the file named file has been added to the index. The second and more important change is the addition of a new folder objects/4c and a file 5b58f323d7b459664b5d3fb9587048bb0296de inside it. BUT WHAT IS IN THAT FILE? # Here is where we get into the details of how git stores things. Let's start with looking at what kind of data is present in that. Copy $ file .git/objects/4c/5b58f323d7b459664b5d3fb9587048bb0296de .git/objects/4c/5b58f323d7b459664b5d3fb9587048bb0296de: zlib compressed data Hmm, but what is the zlib compressed data? Copy $ zlib-flate -uncompress <.git/objects/4c/5b58f323d7b459664b5d3fb9587048bb0296de blob 9\0meain.io Looks like it contains the type, size and data of the file named file that we did a git add on. In this case, the data says that it is a blob of size 9 and the content is meain.io. OK, BUT WHAT IS WITH THAT FILENAME? # Well, good question. It comes from the sha1 of the content. If you take the zlib compressed data and pipe it through sha1sum, you get the filename. Copy $ zlib-flate -uncompress <.git/objects/4c/5b58f323d7b459664b5d3fb9587048bb0296de|sha1sum 4c5b58f323d7b459664b5d3fb9587048bb0296de git takes the sha1 of the content to be written, takes the first two characters, in this case 4c, creates a folder and then uses the rest of it as the filename. git creates folders from the first two chars to make sure we don't have too many files under the single objects folder. SAY HELLO TO GIT CAT-FILE # In fact, since this is one of the more important parts of git, git also has a plumbing command to view the content of an object. You can use git cat-file with -t for type, -s for size and -p for content. Copy $ git cat-file -t 4c5b58f323d7b459664b5d3fb9587048bb0296de blob $ git cat-file -s 4c5b58f323d7b459664b5d3fb9587048bb0296de 9 $ git cat-file -p 4c5b58f323d7b459664b5d3fb9587048bb0296de meain.io LET'S COMMIT # Now that we know what changes when we add a file, let's take this to the next level by committing. Copy $ git commit -m 'Initial commit' [master (root-commit) 4c201df] Initial commit 1 file changed, 1 insertion(+) create mode 100644 file Here is what changed: Copy --- init 2024-07-02 15:14:00.584674816 +0530 +++ commit 2023-07-02 15:33:28.536144046 +0530 @@ -1,11 +1,25 @@ .git +├── COMMIT_EDITMSG ├── config ├── HEAD ├── hooks │ └── prepare-commit-msg.msample ├── index +├── logs +│ ├── HEAD +│ └── refs +│ └── heads +│ └── master ├── objects +│ ├── 3c +│ │ └── 201df6a1c4d4c87177e30e93be1df8bfe2fe19 │ ├── 4c │ │ └── 5b58f323d7b459664b5d3fb9587048bb0296de +│ ├── 62 +│ │ └── 901ec0eca9faceb8fe0a9870b9b6cde75a9545 │ ├── info │ └── pack └── refs ├── heads + │ └── master └── tags Woah, looks like there are a bunch of changes. Let's walk through them one by one. The first one is a new file COMMIT_EDITMSG. As the name might suggest, it contains the (last) commit message. If you where to run the git commit command without the -m flag, the way git gets a commit message is to open an editor with the COMMIT_EDITMSG file to let the user edit the commit message and once the user has updated it and exited the editor, git uses the contents of the file as the commit message. It also added a whole new folder logs. This is a way for git to log all the commits changes in a repo. You will be able to see the changes in commits for all refs and HEAD here. The object dir also got some changes, but I want you to first look into the refs/heads directory where we now have the file master. This as you might have guessed is the reference to the master branch. Let's see what is in it. Copy $ cat refs/heads/master 3c201df6a1c4d4c87177e30e93be1df8bfe2fe19 Looks like it is pointing to one of the new objects. We know how to look at objects, let's do that. Copy $ git cat-file -t 3c201df6a1c4d4c87177e30e93be1df8bfe2fe19 commit $ git cat-file -p 3c201df6a1c4d4c87177e30e93be1df8bfe2fe19 tree 62902ec0eca9faceb8fe0a9870b9b6cde75a9545 author Abin Simon <mail@meain.io> 1688292123 +0530 committer Abin Simon <mail@meain.io> 1688292123 +0530 Initial commit > You could have also done git cat-file -t refs/heads/master Well, looks like that is new kind of object. This seems to be a commit object. The contents of the commit object tells us that it contains a tree object with the hash 62902ec0eca9faceb8fe0a9870b9b6cde75a9545, which looks like the other object that got added when we did the commit. The commit object also has the information about who the author and committer is, which in this case is both me. Lastly is also shows what the commit message for this commit was. Now let's look at what the tree object contains. Copy $ git cat-file -t 62902ec0eca9faceb8fe0a9870b9b6cde75a9545 tree $ git cat-file -p 62901ec0eca9faceb8fe0a9870b9b6cde75a9545 100644 blob 4c5b58f323d7b459664b5d3fb9587048bb0296de file A tree object will contain the state of working directory in the form of other tree and blob objects. In this case, since we just have a single file named file, you will just see a single object. If you see, the file is pointing to the original object that got added when we did a git add file. Here is what a tree for a more mature repo look like. More tree objects are used inside tree object linked from the commit object to denote folders. Copy $ git cat-file -p 2e5e84c3ee1f7e4cb3f709ff5ca0ddfc259a8d04 100644 blob 3cf56579491f151d82b384c211cf1971c300fbf8 .dockerignore 100644 blob 02c348c202dd41f90e66cfeb36ebbd928677cff6 .gitattributes 040000 tree ab2ba080c4c3e4f2bc643ae29d5040f85aca2551 .github 100644 blob bdda0724b18c16e69b800e5e887ed2a8a210c936 .gitignore 100644 blob 3a592bc0200af2fd5e3e9d2790038845f3a5cf9b CHANGELOG.md 100644 blob 71a7a8c5aacbcaccf56740ce16a6c5544783d095 CODE_OF_CONDUCT.md 100644 blob f433b1a53f5b830a205fd2df78e2b34974656c7b LICENSE 100644 blob 413072d502db332006536e1af3fad0dce570e727 README.md 100644 blob 1dd7ed99019efd6d872d5f6764115a86b5121ae9 SECURITY.md 040000 tree 918756f1a4e5d648ae273801359c440c951555f9 build 040000 tree 219a6e58af53f2e53b14b710a2dd8cbe9fea15f5 design 040000 tree 5810c119dd4d9a1c033c38c12fae781aeffeafc1 docker 040000 tree f09c5708676cdca6562f10e1f36c9cfd7ee45e07 src 040000 tree e6e1595f412599d0627a9e634007fcb2e32b62e5 website MAKING A CHANGE # Let's make a change to the file and see how that works. Copy $ echo 'blog.meain.io' > file $ git commit -am 'Use blog link' [master 68ed5aa] Use blog link 1 file changed, 1 insertion(+), 1 deletion(-) Here is what it does: Copy --- commit 2024-07-02 15:33:28.536144046 +0530 +++ update 2023-07-02 15:47:20.841154907 +0530 @@ -17,6 +17,12 @@ │ │ └── 5b58f323d7b459664b5d3fb9587048bb0296de │ ├── 62 │ │ └── 901ec0eca9faceb8fe0a9870b9b6cde75a9545 +│ ├── 67 +│ │ └── ed5aa2372445cf2249d85573ade1c0cbb312b1 +│ ├── 8a +│ │ └── b377e2f9acd9eaca12e750a7d3cb345065049e +│ ├── e5 +│ │ └── ec63cd761e6ab9d11e7dc2c4c2752d682b36e2 │ ├── info │ └── pack └── refs Well, we added 3 new objects. One of them would be a blob object with the new contents of the file, one would be a tree object and the last one will be a commit object. Let's trace them again from the HEAD or refs/heads/master. Copy $ git cat-file -p refs/heads/master tree 9ab377e2f9acd9eaca12e750a7d3cb345065049e parent 3c201df6a1c4d4c87177e30e93be1df8bfe2fe19 author Abin Simon <mail@meain.io> 1688292975 +0530 committer Abin Simon <mail@meain.io> 1688292975 +0530 Use blog link $ git cat-file -p 9ab377e2f9acd9eaca12e750a7d3cb345065049e 100644 blob e5ec63cd761e6ab9d11e7dc2c4c2752d682b36e2 file $ git cat-file -p e6ec63cd761e6ab9d11e7dc2c4c2752d682b36e2 blog.meain.io Those paying attention might have noticed that the commit object now has an additional key called parent which links to the previous commit as this commit is created on top of the previous commit. CREATING A BRANCH # About time we created a branch. Let's do that with git branch fix-url. Copy --- update 2024-07-02 15:47:20.841154907 +0530 +++ branch 2023-07-02 15:55:25.165204941 +0530 @@ -27,5 +28,6 @@ │ └── pack └── refs ├── heads + │ ├── fix-url │ └── master └── tags This adds a new file under the folder refs/heads with a file as the branch name and the content as the id of the latest commit. Copy $ cat .git/refs/heads/fix-url 68ed5aa2372445cf2249d85573ade1c0cbb312b1 This is pretty much all there is to creating a branch. Branches in git are really cheap. Tags also behave the same way, except that they are created under refs/tags. A file is also added under the logs directory to store the commit history data similar to master branch. CHECKING OUT A BRANCH # Checking out in git is git getting the tree object of a commit and updating the files in your worktree to match the state recorded in it. In this case, since we are switching from master to fix-url, both of which point to the same commit and underlying tree object, git does not have anything to do in the working tree. Copy git checkout fix-url The only change that happens when you do a checkout inside .git is that the .git/HEAD file will now point to fix-url. Copy $ cat .git/HEAD ref: refs/heads/fix-url Wile we are here, let me make a commit. I'm gonna need this to show what merging does later. Copy $ echo 'https://blog.meain.io'>file $ git commit -am 'Fix url' MERGING # There are primarily 3 ways of merging. 1. The simplest and the most easiest is a fast forward merge. In this case you just update the commit a branch is pointing to a commit another branch is pointing to. This pretty much involves copying the hash in refs/heads/fix-url to refs/heads/master. 2. The second one is rebase merge. In this case we first apply our changes on top of what main is currently pointing to one commit at a time and then perform something similar to a fast forward merge. 3. The last one would be to just merge two branches using a separate merge commit. This is a bit different in that it will have two parent entries in its commit object. We will go a bit more into this towards the end. First let's see what the graph looks like before a merge. Copy git log --graph --oneline --all * 42c6318 (fix-url) Fix url * 67ed5aa (HEAD -> master) Use blog link * 3c201df Initial commit Now to perform the merge: Copy $ git merge fix-url # updates refs/heads/master to the hash in refs/heads/fix-url Copy $ git log --graph --oneline --all * 42c6318 (HEAD -> master) (fix-url) Fix url * 67ed5aa Use blog link * 3c201df Initial commit PUSHING # Now that we have been playing around with our local git repo for some time, let's see what happen when we push it. What is being sent to the git repo on the other side? To show this, first let me create another git repo which can be used as remote for this repo. Copy $ mkdir git-talk-2 $ cd git-talk-2 && git init --bare $ cd ../git-talk && git remote add origin ../git-talk-2 Btw, this change of adding a new remote is a config change and you can see that change in .git/config file. I'm gonna let you go look what the change was on your own. Now let's push. Copy $ git push origin master Let's see what changed in our repo. Copy --- branch 2023-07-02 15:55:25.165204941 +0530 +++ remote 2023-07-02 17:41:05.170923141 +0530 @@ -22,12 +29,18 @@ │ ├── e5 │ │ └── ec63cd761e6ab9d11e7dc2c4c2752d682b36e2 │ ├── info │ └── pack ├── ORIG_HEAD └── refs ├── heads │ ├── fix-url │ └── master + ├── remotes + │ └── origin + │ └── master └── tags It added a new refs/remotes to store the information on what all is available in different remotes. But what gets sent to the other git repo? It is everything that is in objects, and all the branches and tags under refs that you explicitly push. That is all the other git instance needs to get your entire git history. REFERENCES # * https://git-scm.com/book/en/v3/Git-Internals-Git-Objects * https://matthew-brett.github.io/curious-git/reading_git_objects.html * https://blog.meain.io/2020/bunch-of-git-stuff/ -------------------------------------------------------------------------------- Feel free to checkout the discussion at Hacker News and Lobsters. ← Home I know you need more sleep! 😴 🙌 | meain