Introduction

I started using RCS in the 1990s because version control is a dang good idea. I have had a lot of professional experience with CVS and Subversion. Eventually (in about 2008) I personally settled on Mercurial. However, the rest of the world seems to have just figured out that version control is good and, thanks to GitHub, they have all gravitated towards Git. Like this guy I do have some minor problems with Git/GitHub, etc., etc., but overall it’s fine. If you think you should use it, then you definitely should. Here are some notes to myself about Git for when I’m working with others who require it.

John Carmack tweets…

I need to work on my knee-jerk hostility to learning about source control systems and build systems, but I just prefer working on actual programming…

2017-11-11
— John Carmack

And.

Anyone that has me on too high of a pedestal should see me fumbling around with git.

2017-11-11
— John Carmack

The Normal Thing

XKCD

Here are the two commands I use 95% of the time.

git clone https://github.com/ros/ros_comm
git commit -a

I’m pretty sure the -a is to allow you to skip git add for all files. You could add specific files individually to a commit with git add.

Here’s what I’m doing a lot of recently because I am often rebuilding a group project. This also hints at how to do all the other stuff.

ssh -i ~/.ssh/gh_rsa -T git@github.com  # Load my key.
GIT_SSH_COMMAND="ssh -i /home/xed/.ssh/gh_rsa" git clone https://github.com/cxed/my_lil_project

This is typical. The project has been cloned locally and it is being worked on daily. Edit something and make the changes available to others.

vim ~/myproject/README.md
git commit -a
GIT_SSH_COMMAND="ssh -i /home/xed/.ssh/gh_rsa" git push origin master

What if it still really insists that you use passwords? This can happen when you clone the project using https but then want to fix something and push a commit with SSH.

I had some annoying struggles with it wanting to default to passwords, but I think this is a pretty explicit syntax that probably will be more robust for normal GitHub work. Note redefining SSH to include the special key set you want to use. Also note the protocol of the URL and the weird use of the git login name therein.

export GIT_SSH_COMMAND="ssh -i /home/xed/.ssh/special_rsa"
export REPO="customer/customer_special_project"
export GH_URL="ssh://git@github.com/${REPO}"
git clone $GH_URL

Another normal task is to get your old local copy updated with what your collaborators have checked in. Remember that there is no "update" command in Git. See this section for details on using pull instead.

git pull $GH_URL
git push $GH_URL

Commit Message Style

XKCD shows us what not to do.

XKCD-1296

I kind of like this convention system for commit messages. Start commit message with one of these.

  • feat: a new feature

  • fix: a bug fix

  • docs: changes to documentation

  • style: formatting, missing semi colons, etc; no code change

  • refactor: refactoring production code

  • test: adding tests, refactoring test; no production code change

  • chore: updating build tasks, package manager configs, etc; no production code change

It is also recommended to style the commit messages with something like this.

Add bozo detection

Avoid stuff like this.

Add bozo detection.
Added bozo detection
We're detecting bozos now

I think of it as answering the question, if this commit is applied, what will transpire? This first line is really the "subject" line of the commit message which is often all that is needed.

fix: Remove periods from commit message subject

This is a body and many commits don't really need one. Keep it under
72 characters. Keep the subject under 50! Do keep a blank line between
the subject (first line) and the body if one is included. The summary
line is what is used by the `git log` commands.

More paragraphs are ok; separate with a blank. The final section, also
optional, is the footer which can contain references to other things.
A sample footer follows this paragraph.

See also: #123, that_other_thing

Andrey’s style is interesting too. He starts messages with a symbol.

  • + some new thing or feature (code added)

  • = minor change (mostly equal to previous stuff)

  • * fix the bug (think of the asterisk as a squished bug)

  • - deprecated feature (code subtracted)

That may be a more compact approach.

Some Other Fun Things

Note that commit IDs can be shortened to the shortest unambiguous first characters.

git commit -a -m "Just like CVS."
git log
git log --stat
git diff 73af53a43bb09dcc6600478898c18cae53e57cf5 292faae8b1d912f93f3194f107acaabc6194ef96
git diff 73af 292f # Or shorten these hashes as you like/can.
git checkout b067 # Get the commit specified.
git status # Shows changed files.
git show 292faae8
git diff # Shows changes between working dir and staging area.
git diff --staged # Chages between staging and most recent commit.
git reset --hard # Ditches any changes in pwd or staging area. Caution!
git checkout master # Seems to get the latest commit.

Ever wonder how many lines of code are in your project? Or bytes? Github does a crap job of sharing that with you (presumably because worrying about code bloat is not something tech bro hipsters in Silicon Valley ever do). But if you have the repo locally, this can work.

$ git ls-files | xargs wc -l
    2 README.md
  120 ashellscript.sh
  207 monitor_thething.py
  482 importantbit.py
  181 datawrangler.py
  196 actuator.py
  129 masterplan.py
 1317 total

Did you just obtain a giant sprawling repo and you need to make sense of it? My first thought was: what are the other contributors working on right now? I don’t know if I’m doing something wrong but when I clone a repo, I get all of the files unhelpfully timestamped right at the moment of cloning. Wouldn’t it make a lot more sense to have them time stamped as when they were committed? Oh well. How do you even see those commit times? This command can give you a list of all the files currently in your repo in chronological order.

git ls-tree -r --name-only HEAD | \
while read F; do
   echo "$(git log -1 --format="%ad" --date=iso-strict -- $F) $F"
done | sort

If you want to temporarily mess up your tracked code git stash can save temporary snapshots in what looks like a stack — there are git stash push (the push is optional there for that effect), git stash pop, and git stash drop commands. You can see what has been stashed with git stash show. Details about it can be found with git stash --help.

Configurations

git config --global --edit

After doing this, you may fix the identity used for this commit with:

git commit --amend --reset-author

Another thing you can do is create a file called ${MYREPO}/.git/info/attributes which contains something like this.

*.png binary
*.jpg binary
*.jpeg binary
*.zip binary
*.gz binary

In theory this should keep Git from going crazy diffing binaries. I’m not sure this is necessary. There may be global mechanisms in place that understand such things.

Here’s one that probably will be taken care of in sensible environments with good .bashrc files. But if the EDITOR environment variable is not set, consider this.

git config --global core.editor "vim"

Sometimes if you are unlucky you get some guff like this when you’re trying to push some changes.

$ GIT_SSH_COMMAND="ssh -p222" git push
...
remote: error: refusing to update checked out branch: refs/heads/master

This happened when just trying to coordinate two versions being worked on by two people (no GitHub nonsense) on two computers.

I fixed that with this cryptic nonsense.

git config --bool core.bare true

Create A New Repository

With some extant files.

cd my_lil_project
git init
git add file1
git commit

Branching

Show the currently checked out branch.

git branch

Show the whole branch graph topology using something like this.

git log --graph --oneline mybranch master
git log --graph --all

Start a new branch.

git branch mynewbranch
git checkout mynewbranch # Will output "Switched to branch 'mynewbranch'"

Or this is a shortened equivalent syntax.

git checkout -b mynewbranch

Merge

Ok, you’ve branched. How do you bring things back together.

git merge master mynewbranch

Delete branch. This only deletes the branch name. The commits are preserved.

git branch -d mynewbranch

Merges with conflicts can be spotted when the git merge command produces a message like this.

CONFLICT (content): Merge conflict in myfile.cpp

This can leave a bunch of redundant stuff which needs to be cleaned up.

<<<<<<< HEAD
pi= 3.14159265359 # My change.
||||||| merged common ancestors
pi= 3.141 # The code that was here originally.
=======
pi= 3.141592 # Someone else's change.
>>>>>>>

Make sure when resolving conflicts to remove any of the redundant code and any of the section delimiters. Use git status to monitor conflict issues. Fix the issue by removing all the diff clutter and settling on correct content. Then do git add to put this new version into the staging area. Then git commit; you shouldn’t have to enter a commit message since it will automatically default to remind you that this commit was basically about resolving the merge problems.

Fast-Forward Merges

Many operations use merges behind the scenes. If the A is a direct ancestor of B, then merging B and A is a "fast-forward merge" where the label simply can be moved since B knows all about A because the topology simply points back to it.

Merge With Another Project

Let’s say you’ve started a project and then you realize someone has a project that is similar to yours with a lot of stuff you like and you want their project to be incorporated into yours.

cd MyBigProject
git remote add BigProjectTemplate https://github.com/otherguy/BigProjectTemplate
git fetch BigProjectTemplate
git merge --allow-unrelated-histories BigProjectTemplate/master
vim fixconflicts
git add newfile # Maybe created to avoid a conflict, maybe a second `README.md`.
git commit -a
GIT_SSH_COMMAND="ssh -i /home/xed/.ssh/gh_rsa" git push origin master

Later maybe the BigProjectTemplate folks have fixed some important things that you’ve pulled in. You might get a message like this.

This branch is 9 commits ahead, 15 commits behind BigProjectTemplate:master.

You need to do git push to get your 9 commits up to GitHub. Checkout your local master branch, do git pull. This will pull the 15 commits that you’re behind. Switch back to your development branch, and do git merge master. Then you’ll want to do another git push to get the merged version into your GitHub branch.

This is also the same situation when you have a collaborator who doesn’t register with your project as an official collaborator but rather keeps their own repository. Here’s an example of coordinating a merge, basically the same as shown before.

git clone https://github.com/cxed/coolproject.git master
git pull https://github.com/ThatGuy/coolproject.git master
# Fix any merge problems if any.
GIT_SSH_COMMAND="ssh -i /home/xed/.ssh/gh_rsa" git push origin master

Rename Files

It’s mostly as simple as taking your up-to-date repository and doing something like this.

git mv badfilename goodfilename
git commit
git push

Renaming a Github repository is more involved and requires admin privileges. Check the "settings" tab of the project.

What if your project is becoming more complicated and you want to push some things into subdirectories? Renaming also works for that just fine. Here I’m tracking a Python module where I started collecting so many examples using it that I wanted to separate them from the main module code.

$ mkdir examples
$ git mv location.py examples/
$ git mv timestamps.py examples/
$ git mv weather.py examples/
$ git add examples
$ git status
On branch master
Your branch is up to date with 'origin/master'.

Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

        renamed:    location.py -> examples/location.py
        renamed:    timestamps.py -> examples/timestamps.py
        renamed:    weather.py -> examples/weather.py
$ git commit -a
$ git push

Note that git mv actually does the moving in your file system too.

GitHub

Where Did This Code Come From?

Ever find yourself with a Github repository and actually had no idea where it came from? Obviously Git knows and here’s one way to make it tell you.

$ grep url .git/config

Email

GitHub has a nasty habit of pumping personal email addresses into the wild web. There is a "privacy" email that you can use. While logged in go to /settings/emails and Keep my email address private and Block command line pushes that expose my email. Then for good measure you can configure your command line local user account’s git settings like this.

git config --global user.email "cxed@users.noreply.github.com"

And check it like this.

$ git config --global user.email
cxed@users.noreply.github.com

Actually it can be a bit more involved now! If you get a message saying, push declined due to email privacy restrictions you can fix it following the advice described here. First find your magic Github "privacy" email address. This can be found here: https://github.com/settings/emails Then change to it in your CLI configuration. Mine looks something like this (but this is not exactly it!).

git config --global user.email "49999999+cxed@users.noreply.github.com"
git commit --amend --reset-author

If you do that and try your push again it should now work out. Still, you pretty much should assume a complete breakdown of email privacy.

SSH

Go to your identity icon and then "Settings". Then choose "SSH and GPG keys". Then "New SSH key". When you paste the public key don’t worry about all the messy hard returns. It sensibly strips that all out. GitHub will then show you the md5 fingerprint if the key was successfully accepted. You can check locally if it matches with this.

$ ssh-keygen -l -E md5 -f /home/xed/.ssh/github_rsa
2048 MD5:b7:63:d6:03:8c:91:63:04:dd:e3:be:5a:c5:61:1e:1f xed@usb64 (RSA)

(Looks like you don’t need to have the -E md5 option now. The default hash output seems to match better with what Github shows.)

Once GitHub has your SSH key you can test it. Note that the user to use is git, not your real GitHub username.

:->[usb64][~/.ssh]$ ssh -T git@github.com
Host key fingerprint is SHA256:nThbg6kXUpJWGl7E1IGOCspxxxxxxxxxxxxxxxxxxxx
+---[RSA 2048]----+
| =+o...+=o..     |
|o++... *o .      |
|*.o.  *o.        |
|o..  ..o.= .     |
|o..  ..o.= .     |
|o..  ..o.= .     |
|OE .  . o        |
| o     .         |
|                 |
+----[SHA256]-----+
Hi chrisxed! You've successfully authenticated, but GitHub does not provide shell access.

If everything goes very well, this may even load your SSH key’s password into an agent so you won’t have to worry about authentication again during this session.

The Problem Of Multiple Keys

I had problems because I had two keys that went to two separate GitHub accounts. This is how one can clarify which key to use.

$ ssh -i ~/.ssh/gh_rsa -T git@github.com
$ GIT_SSH_COMMAND="ssh -i /home/xed/.ssh/gh_rsa" git push origin master

Well, that’s super annoying.

Git Refuses To Use SSH

What if you set up SSH and it still refuses to work? If it insists on an HTTPS password it might be because you cloned the repo using HTTPS but then want to work with it with SSH. The cure I found was to look in .git/config. Here’s a tiny excerpt from a configuration that does work with SSH.

[remote "origin"]
    url = ssh://git@github.com/cxed/ateam

Basically replace https:// with ssh://git@.

Creating A New Repository

It seems like to create a repository on GitHub, you have to use the web interface and do it from the web site. All subsequent operations can be done using the command line. While logged in look for the "+" sign in the top right menu. Click that and choose "New repository". Here I’m calling it "probe" (i.e. just a test); interestingly I do not have to call the working directory the same name.

$ git remote add origin https://github.com/chrisxed/probe.git
$ git remote
origin
$ git remote -v
origin  https://github.com/chrisxed/probe.git (fetch)
origin  https://github.com/chrisxed/probe.git (push)
$ git push origin master
Username for 'https://github.com': chrisxed
Password for 'https://chrisxed@github.com':

That’s how you do it for using the web authentication credentials… But I wanted to use SSH keys. Make sure you select SSH and not HTTPS from the GitHub website after you make your repository there. Git rid of the incorrect "remote" named "origin".

$ git remote remove origin
$ git remote add origin git@github.com:chrisxed/probe.git
$ git remote -v
origin  git@github.com:chrisxed/probe.git (fetch)
origin  git@github.com:chrisxed/probe.git (push)
$ git push origin master
Host key fingerprint is SHA256:nThbg6kXUpJWGl7E1IGOCsxxxxxxxxxxxxxxxxxxxxx
+---[RSA 2048]----+
| =+o...+=o..     |
|o++... *o .      |
|*.o.  *o.        |
|oo.  ..o.= .     |
|.+o. .. S =      |
|.+o. .. S =      |
|.+o. .. S =      |
| o     .         |
|                 |
+----[SHA256]-----+
Counting objects: 6, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (5/5), done.
Writing objects: 100% (6/6), 1.13 KiB | 0 bytes/s, done.
Total 6 (delta 1), reused 0 (delta 0)
remote: Resolving deltas: 100% (1/1), done.
To github.com:chrisxed/probe.git
 * [new branch]      master -> master

Since the key is decrypted and loaded in the agent, it runs immediately with no more fuss.

Files From Elsewhere

You can create new files directly in the GitHub website. Or maybe someone else (or you from a different computer) committed some new stuff. Here’s how to update what’s in your local copy.

git pull

And if you want more control this is the explicit way to phrase the command.

git pull origin master

Remember that origin is the (arbitrary but conventional) name of the remote and master is the branch to acquire. Basically push and pull move things from remotes to branches and vice versa.

Forking

A fork is just a clone that GitHub makes for you on their own servers. GitHub also keeps track of the number of people who have made forks of your repository. If you don’t want to keep tentacles of the original project, just clone it locally. Then create a new project on GitHub, set a new remote for that new project and push it. Not having to do this is what forking is all about (but sometimes maybe that’s how you do want to play it).

Here’s some terminology topology where A,B,C,D are commits.

Branch
Local:
      /----C
A<--B+
      \----D
Clone (locally)
Local:       Local:
 - - - - -    - - - - -
|A<--B<--C|  |A<--B<--C|
 - - - - -    - - - - -
Clone
GitHub:      Local:
 - - - - -    - - - - -
|A<--B<--C|  |A<--B<--C|
 - - - - -    - - - - -
Fork
GitHub:      GitHub:
 - - - - -    - - - - -
|A<--B<--C|  |A<--B<--C|
 - - - - -    - - - - -

Downgrading To Older Version

A subtopic of forking is how to use an old version or "downgrading". For example, if a project compiles against the latest CUDA libraries (ahem Meshroom) and Debian is taking its careful time about adopting those, you may want the older version of the project. To do this you’ll need the commit hash of the version you want.

Start by cloning the project’s ordinary repo, which will give you the latest.

git clone https://github.com/alicevision/meshroom/

Then check out the specific version you need.

cd meshroom
git checkout f676b40

Or here’s is a sensible variation on the theme described here:

git checkout dev

That should be it. You can check that did what you wanted with git log.

Recovering A Deleted File

Let’s say you accidentally delete a perfectly good repo tracked file that was minding its own business. If you need to get it back do this. Note the circumflex.

$ git rev-list -n 1 HEAD -- whoops.txt
9689634848980ac73b724c759c5271a8dd30e6a1
$ git checkout 9689634848980ac73b724c759c5271a8dd30e6a1^ -- whoops.txt

Obtaining Only A Subdirectory

Often there’s a giant project and it will contain something like an examples directory filled with different examples all in their own directories. I don’t want all the examples I don’t want and I definitely don’t want the entire project (which may be living comfortably on my system as a distro binary). So how does one get a subdirectory?

Well… It’s ugly. Very ugly. There are three ways I have heard of.

First is the absurd export to (from?) subversion. I’m not even kidding. Check out the details here. But what if you don’t have easy access to subversion?

The second method is probably the official Github method. This involves doing a sparse checkout. Here is a description of the process. I find that procedure kind of unnerving because it makes configuration changes which doesn’t seem quite right. Maybe you want a one time thing but normally you don’t want any surprising nonstandard behavior everywhere else you use Github. So I don’t know.

The third method is to give up. Don’t try to save bandwidth. If you’re not on a ship, you can take comfort in knowing that Microsoft will pick up the bill for moving those extra bytes.

git clone https://github.com/someone/cool_project
mv cool_project/examples/extracool cool_project-repo-extracool-example
rm -frv cool_project

If someone knows the less awful way to do this, let me know!

Adding Collaborators

On the project’s repository page, go to "Settings" (right side menu). Then "Collaborators" (left side menu). Add the GitHub username of other contributors and click "Add collaborator".

If local changes have been made and it is likely that remote changes from other collaborators have also been made, then safely update the local copy with a branch to a new origin/master leaving the local master alone.

git fetch # Updates the local copy of the remote branch (origin/master).

So if you wanted to have access to all the changes that have been applied remotely without incorporating them immediately into your current commit use fetch. Note that these are equivalent.

git pull origin master             # Is the same as the following...
git fetch origin ; git merge master origin/master

Pull Requests

Basically I’m asking some project owner to pull in my changes. Or something like that. Probably it would have been better named merge request. There are a lot of tools on the web interface for requesting this and approving it. I don’t know exactly what the story is on the command line, but it’s probably not critical to me right now.

Operation

Affects

vim A

local working directory

git add A

staging area

git commit

local master branch

git pull origin master

local master branch

git push origin master

GitHub master branch

merge alt pull request

GitHub master branch

To send a new branch to GitHub as a new branch.

git checkout -b mynewbranch # Now on a new local branch.
git push origin mynewbranch # Make branch on GitHub.

Often when you’re mucking around with some code cloned from someone else, it’s important to change the "base fork:" property. In the "Open a pull request" page, find that selector and change it to something less meddlesome to the official project (unless you really want to request that the official project pull your changes into it).

When a pull request has been made on your project, assuming no problematic conflicts, you will be able to do one of the following.

  • Create a merge commit (default)

  • Squash and merge

  • Rebase and merge

Once you’ve done the merge and seen the "Pull request successfully merged and closed" message, you can delete the branch by clicking the button that encourages you to do so.

Pull Request Conflicts

Sometimes you fork some project, clone it to your local system, make some changes, and push them back to your fork. But meanwhile, someone has made conflicting changes to the project you forked messing up your intentions of a pull request. You may need to clone not just your fork’s situation but also the new situation on the main project. Note I’m not calling it the "original" project because "origin" is a GitHub slang for the remote location used in push and pull. The name "upstream" is often given to the project you forked from. So to sort that out do something like this.

git branch mynewstuff
git checkout mynewstuff
    ... Make mynewstuff changes
git remote add upstream <source_project_URL>
git checkout master
git pull upstream master
git checkout mynewstuff
git merge master mynewstuff
    ... Manually resolve conflict if needed...
git add file_fixed
git commit
git push origin mynewstuff

This will push an updated version of mynewstuff to your fork which will then stand a better chance of being accepted. Note you can’t send it directly to upstream since that is probably not yours. You can just send the "globally aware" version of your changes to your fork and that fork’s status should get updated to say, no conflicts.

Delete Repo From GitHub

Sometimes you’re just testing (or some researchers at your university are caught in a vicious turf war which turns some of your little programs into hotly contested "intellectual property" which you are no longer allowed to benefit the world by making public) and you need to just completely get rid of a repo. This kind of URL will take you to the settings page.

At the bottom of that you’ll see a red section called "Danger Zone". There you will find a button to ditch the whole mess. "This action CANNOT be undone. This will permanently delete the $MYREPONAME repository, wiki, issues, and comments, and remove all collaborator associations." You must type in the name of the repository to confirm.

And one of my least favorite things about GitHub — a complete and valid reason not to use it — is that if you accidentally check in some sensitive information, it can be damn difficult to feel confident that you have cleaned up the mess. I know that BTC pirates were crawling GitHub for AWS API keys and AWS had to start preemptively doing that too. I’ve never had that problem, but I have accidentally checked in source code that contained a personal email that I really didn’t want to make public. I just nuked the repo as described above and recreated. But there may be slighty better ways. They sure look bleak though.

GitHub CLI

It tuns out that if you’re doing a bunch of fancy stuff with a bunch of other fancy people and you’re all merrily using Microsoft GitHub you will almost certainly be forced to use GitHub’s horrendous web interface, for example to generate pull requests. But there may be a way around that problem. There is an official GitHub project called GitHub CLI that is intended to be a way to avoid the horrid web mess.

Installing

To obtain it you’d think it would be only sane to build it from source. But I don’t really want Golang mucking up my hard drive, so a binary package it is! I don’t trust just putting a Microsoft repo in my apt sources. But this method seems fine.

BINPKG=https://github.com/cli/cli/releases/download/v2.8.0/gh_2.8.0_linux_amd64.tar.gz
cd /tmp ; wget -qO- $BINPKG | tar -xvzf - ; cp /tmp/gh_2.8.0_linux_amd64/bin/gh ~/bin/

Look for the latest version here. You can skip saving the man pages because they can be found here.