Branches and Merging in Open Source Development

Introduction

Welcome to another week of my Open Source blog. In our Open Source Development course at Seneca Polytechnic this week, we were tasked with creating two issues in our codebase and creating two branches to solve each one of those issues. The idea behind this is to simulate the workflow in open source projects where multiple people are working on different branches which each pertain to a different issue. This is intended to get us used to fixing merge conflicts and teaching us about the different ways of merging that git uses. We will talk about all of this in detail in the later sections of this blog.

Creating issues in the GitHub Repository

If you have read other blogs that I have written as a part of this series, you may know that I have been working on a command line tool, GitHub Echo, that allows you to extract actionable insights about a GitHub Repository by providing the link of the repo as an argument to the command line tool.

I decided to open up two new issues, each implementing a different feature.

Add Support for Groq AI Provider

This issue was a little complicated since I tried to use the Cohere API first but then my input tokens were too high and the API did not allow me to make calls to it due to the same. Eventually, I had to switch to using the groq API so that I can provide larger inputs to the LLM. This was important as I need to send the LLM a lot of data regarding the GitHub repo that the user wants to analyze since the only way you can analyze trends about something is if you know a lot about its history. I also had to update my complete file structure to make my CLI tool more scalable and had to update the docs as well to reflect the new change.
Ensure Proper Exit Codes and Error Messages

While I initially implemented several error handling mechanisms in my code, I noticed that there were specific scenarios where potential errors were not addressed. This oversight caused the program to exit from various points, resulting in inconsistent behavior. To improve this, my objective was to raise errors from all parts of the program and handle them in a centralized manner.

Fixing the issues one by one

Adding support for the Groq AI Provider

To accomplish this, I first branched out of the main branch and named my branch issue-45 since it pertained to issue 45 on GitHub. Then, after getting done with adding support for the groq AI provider on this branch, I added all my changes to the staging area and then committed them. It was a single commit that had all the changes for the new feature including the updated documentation. As mentioned earlier, I had to change most of my files and this gave me some issues later down the road. I will talk about them later in this blog itself.

As you can see in the picture above, we are now ahead of the main branch.

Adding proper Error Handling

In our course this week we were told to develop these two different features on different branches to imitate open source projects and how to merge two separate branches into one main branch one by one. Normally, I would have merged my issue-45 branch first and then would have started to work on this second feature. However due to the specs of our assignment this week, I switched back to the main branch and then checked out into a new branch issue-44 to fix this issue.

Thie following image is the state of our git log right now. Notice how we do not see our issue-45 branch since it is ahead of main and issue-44. Currently, main and issue-44 are pointing towards the same commit hash, 796ffa.

Once I completed the changes and added error handling to my tool, I added everything to the staging area and committed all the updates at once. Similar to the last change, this was a single commit that included all the changes for the new feature, along with the updated documentation. The only difference this time was that I did not have to modify most of the files; I only changed my GitHub API handling file, as it required additional error handling.

I also updated my program's entry point file, _main.py to modify how errors are displayed to the user. Additionally, I added comments to explain that the error handling style is centralized and that all errors thrown by the program are caught here. This will help future programmers working on this project understand how error handling works in this project.

This is what the log looks like right now.

We still do not see issue-45 as now issue-45 and issue-44 are heading off in different directions. Both of them have one common starting point but have completely different code at this point.

Merging Everything Together

Now that issue-45 and issue-44 are one commit ahead of main, we are good to start merging our branches into the main branch one by one.

Let us run git log on the main branch to see everything properly. To make the output more descriptive and visual, I’ll be using git log —all —graph.

Well you might be wondering why the commit history looks a little messed up and why do we see a reference to stash here. That is because when I was doing my work for issue-44, I initially did it on the main branch by mistake. To fix this, I stashed the changes, switched to issue-44 and applied the changes there.

Next, let us merge issue-45 into main first, the only reason i’m merging this first is because this is what we worked on initially before we started issue-44 and I just felt like merging this first. No special reason.

git merge issue-45

As you can see, the changes have been merged. Next, I ran the git log —all —graph command to see what was going on.

As you can see, both the main branch and the issue-45 branch point to the same commit hash, 32abf5. This situation represents a fast-forward merge. In a fast-forward merge, the changes made on the issue-45 branch are ahead of those on the main branch, with no new commits added to main since the branching occurred.

In this case, we can simply update the pointer for the main branch to match the pointer of issue-45. This effectively incorporates all the changes from issue-45 into main without creating a new merge commit, maintaining a linear project history.

Fast-forward merges are advantageous as they keep the commit history cleaner and more straightforward, making it easier to understand the progression of changes.

Next up, I will try to merge my issue-44 branch into this new and updated main branch.

git merge issue-44

And sure enough, we run into a merge conflict. It says there was a merge conflict in _main and I need to fix them and then commit the new result.

When I switch over to _main.py in my IDE, I see the current state of the file.

Remember when I mentioned that I changed some error-handling code in my issue-44 branch? This code conflicts with the code that was in the main branch because it involves the same file and the same lines.

To resolve this, I open my merge editor and choose to accept the incoming changes from the issue-44 branch, which means I want to keep the code from that branch rather than the existing code in main.

The picture below illustrates what my _main.py file looks like right now.

After this, when I run git status, I see that all the other changes that had no conflicts have already been added to the staging area, while the _main.py file is marked as an unmerged path.

Now that the conflicts are resolved, I will use git add _main.py to add this file to the staging area, as I am satisfied with its contents.

At this point, you can see that everything is in the staging area. Next, I will commit these changes with a message stating “merge issue-44 branch.” After this commit, our main branch will point to the latest commit, while the issue-44 branch will remain on its last commit, and issue-45 will point to the commit where it was merged into main.

qIf I run git merge again, you’ll see that everything is up to date.

This type of merge is known as a three-way recursive merge.

A three-way recursive merge is a specific type of merge strategy used by Git when merging branches. It involves three key points:

Common Ancestor: The last common commit that both branches share before they diverged.
Source Branch: The branch you are merging from (in this case, issue-44).
Target Branch: The branch you are merging into (in this case, main).

During the merge process, Git looks at these three points to determine how to combine the changes. It analyzes the differences between the target branch and the source branch relative to their common ancestor. This allows Git to intelligently merge changes, resolving conflicts when necessary. If there are no conflicting changes, it can create a new commit that reflects the combined state of both branches. In our case, there was a conflicting change so we had to do this manually.

Another question that you may have is, “Why don’t all the branches point towards the same commit just like they did in a fast-forward merge”?

Here is an answer to that question:

Each branch in Git maintains its own commit history. When you merge a branch (like issue-44) into another (like main), the main branch will advance to a new commit that represents the merged state. However, the issue-44 branch remains unchanged and still points to its last commit before the merge.
A commit in Git is immutable and identified by a unique SHA-1 hash. When a branch is merged, Git creates a new commit that combines the changes from the merged branch. This new commit is referenced by the target branch (in this case, main), while the merged branch (issue-44) retains its original commit reference.
During a three-way merge, Git uses the latest commits from the target branch (main), the source branch (issue-44), and their common ancestor. The resulting merge commit contains a reference to all three points, but the source branch’s pointer does not change to this new commit. Instead, it continues to point to the last commit made on that branch before the merge.

If you look carefully in this picture below, the latest commit references to the commits that issue-45 and issue-44 are pointing to.

Oops! Forgot something..

After I was done with the merge I realized that I forgot to document the fact that if users wanted to use the Cohere API in my first feature, they would need the Cohere API Key. So, I made that change on my main branch itself and commited it. After committing this, the following is an image of the log.

Conclusion

After everything was done, I tested the tool, fixed everything and verified that everything worked until I was fully satisfied with it. I tagged the latest change and pushed it to GitHub.

git push origin main

I also published the latest version 0.3.0 for this package on PyPI.

poetry publish --build

That concludes this week’s blog.

See ya in the next one 👋