The Bus Factor

From Wikipedia: The bus factor is a measurement of the risk resulting from information and capabilities not being shared among team members, derived from the phrase "in case they get hit by a bus". It is also known as the bus problem, truck factor, bus/truck number or circus factor.

Motivation

Every company I've worked at (structural and software) has at some point raised the issue of the "bus factor" in managing project development.

As a structural engineer, this was immensely difficult to estimate because our deliverables were so spread across employees, and documentation was scarce. The only time it became evident was after someone had quit, and there was an urgent RFI (request for information) six months later on someone's calculation package (though often the official calculation package would have the EOR's name, rather than the design engineers directly responsible for the calculations). Following such incidents, there would be promises of better documentation, which would invariably fall by the wayside as all team members migrated to new projects without any debriefing. I've seen 100% turnover in design engineers on long term projects, so this is one heck of an antipattern.

As a software engineer, there are a lot of parallels in the industry, but by the nature of the work, the deliverables of shipped code are one way to measure the bus factor. At least that's what a number of researchers have examined, including this paper, which has quite a number of citations (156 according to google scholar!) since it was first published in 2016 (with a preprint made available in 2015). Shae sent me the paper, and once we discovered the original data and the source code was readily available, it was the perfect candidate for a weekend project to at least get an idea of interesting open source metrics.

The paper relies on the concept of degree of authorship "DOA" as calculated by the following formula:

The estimation of bus factor "relies on a coverage assumption: a system will face serious delays or will be likely discontinued if its current set of authors covers less than 50% of the current set of files in the system". So, the algorithm prunes contributions based on the DOA until there are less than 50% of the original files.

Methods

The first step was to see if the source code would build at all. The algorithm itself hasn't been changed since 2018, but I rarely build anything with Java. Fortunately, the README instructions worked, for the most part. There are docker commands displayed in the same instructions as the local build, but I opted to just do a local build.

Some of the commands were finicky about calling directories. In particular, the commit_log_script.sh needs to be run while inside the scripts directory, as otherwise you'll get this error:

awk: can't open file /Users/mclare/workspaces/Truck-Factor/gittruckfactor/log.awk
 source line number 1 source file /Users/mclare/workspaces/Truck-Factor/gittruckfactor/log.awk

Then, to run the program itself:

mvn package
cd Truck-Factor/gittruckfactor
java -jar ./target/gittruckfactor-1.0.jar ~/workspaces/numpy ~/workspaces/numpy

The output (if successful) will look something like this:

TF = 9 (coverage = 48.74%)
TF authors (Developer;Files;Percentage):
Charles Harris;206;15.28
Bas van Beek;204;15.13
Sebastian Berg;142;10.53
Sayed Adel;138;10.24
Travis Oliphant;116;8.61
Rohit Goswami;87;6.45
Mateusz Sokoł;82;6.08
David Cournapeau;75;5.56
Matti Picus;70;5.19

After trying this for a single package, I wanted to run this against all the repos in the original paper, which involved parallel and some other command line tools.

I figured out all the repos from the paper using the accompanying website, which loads interactive D3 graphics based on downloaded CSVs. All I needed to do was extract the first column of the loaded CSV, which I saved in a file (repo_list.txt) for later use.

Cloning Repos

parallel -j 8 git clone ::: $(cat ../meta/repo_list.txt)

These repos are about a 64 gb download!

Cloning the repos was resource intensive. Despite passing a job count of 8, the cloning/building process ate almost all of my CPU resources immediately.

HTOP Output

Running the analysis

ls ../cloned_repos | xargs -I {} echo "java -jar ./target/gittruckfactor-1.0.jar /Users/mclare/Truck-Factor/cloned_repos/{} {} > results_linguist/{}.txt" | parallel -j 8

I recommend running shell scripts with `echo` initially to double check that the output is what you expect to run.

Running this took only about 4 minutes, thanks to the parallelization (one repo, platform_framework_base was the lagging repo by about 2 minutes).

Initial Research Questions

In the interest of finishing this in the 2 day time frame, the initial research questions for this exploration were as follows:

How does running the analysis with linguist change the results?
What do we expect to see from the results for these packages after eight years?

Results

When running the analysis with linguist, I did see some significant changes in terms of code coverage, and the truck factor itself. Overall, I expected linguist to reduce the bus factor, since it should remove non-critical files, such as documentation. However, in a few cases, the linguist run analysis didn't return any bus factor. In other cases, the bus factor actually increased. This was surprising to me, and merits further analysis of what linguist is doing, and which files it's removing in those repos.

I expected that the bus factor would decrease in many of the repos that were tracked. The past decade has shown how difficult it is to be an open source maintainer, especially given the current economic climate. The most surprising bus factor change was in the Linux source repo. The original paper had a bus factor of 57, which reduced to 12 in my initial analysis, and 8 in the linguist analysis. About 30% saw no change at all (mostly those with a bus factor of 1), 15% increased by 1 (though those were TF 1-3 to begin with), and 20% decreased by 1 or more.

You can explore the results yourself by selecting two different data sets in the dropdowns.

The following plot shows a grouped bar chart of the three different data sets, which also shows that there haven't been significant improvements overall for reducing the number of low bus factor projects

Further Work

This is really just an initial exploration into how tools like this can be used to assess the health of open source projects.

One missing aspect that jumped out to me is that this only looks at authorship as a metric, rather than review. One would hope that for significant changes, the code review process would result in shared knowledge, but I don't know if this is captured, or can be easily captured from git logs.

Other research questions for another day:

Can we replicate the results of the initial paper? (Not easily, the date of github pulls is not indicated in the supporting data or original research paper)
What can be gained from setting developer aliases, as indicated in the repo optional settings? Will this potentially increase the bus factor by merging contributions from the same person?
Can we replicate the other graphs from the source data utilizing other data from the github API like number of developers?
Can we better understand the degree of authorship (DOA) metric? There seem to be some magic numbers that might indicate some kind of linear regression. This appears to come from other papers.
Can we use any of the more recent papers that cite this one to better evaluate open source projects?

Other notes

A different perspective of this experiment can be found on Shae's blog
I used Observable Plot to generate the graphs. Ironically, D3 is one of those repos with an unchanging bus factor of 1!
We forked the original repo and put our findings at spite-driven-development/Truck-Factor