The Catalyst Effect in FLOSS Repositories

Posted on April 26, 2010


In the course of my PhD studies, I proposed that when a project makes a transition from one repository to another, you could expect to see significant changes to a project’s evolutionary characteristics. Indeed, I covered this in earlier posts, discussing the transition from SourceForge to Debian. Here, we saw that the number of developers and the rate of activity tends to increase significantly after the transition. The reasoning being that once a project makes a transition such as this, it benefits from a wider audience and greater usage, thereby gaining an increase in development efforts (all motivated by Lehman’s work on software evolution).

To lend yet more empirical weight to this, the work in my thesis extended this to product metrics also. It was hypothesised that the transition of a project, P, into a repository with a proven capacity for attracting greater amounts of evolutionary activity would coincide with a positive impact on the rates of anti-regressive work done to control P‘s complexity. I took a sample of six projects that began their life in some arbitrary repository and were then later included in the Debian distribution. In each of them, I examined the rate of anti-regressive work done to control complexity over time.

(For an explanation of “anti-regressive work”, and the two measures used, see my previous post. Recall that I take each project and measure how the complexity of each of its components evolves over time).

From this sample, it was seen that five out of the six projects saw an increase in the rates of this anti-regressive work after the project was added to Debian. When taken together with the earlier study on process metrics, it points strongly to a “catalyst effect” caused by a transition between repositories. The exact figures can be found in my PhD thesis.

So, I observed this phenomena in the case of Debian, but does it generalise? Remember that there many other repositories out there that incorporate existing projects, so I studied a handful of those additional ones. Fig 1 shows the number of commits per month on the GRASS project, which began life in its own repository and was included in Debian after its 37th month of development:

Fig 1: Commits per month on GRASS project. Dashed line denotes time of transition into Debian.

The median rate of commits goes from 7 commits per month before inclusion, to 48 afterwards.

Compare this to Fig 2, the same measure for KMouth, a project for KDE that began life in a repository of its own, before being moved into the official KDE repository:

Fig 2: Commits per month on KMouth project.

Median rate goes from 40 commits per month before transition, to 67 afterwards.

We can begin to argue, then, that there is an observable catalyst effect in FLOSS repositories generally. A picture begins to emerge of some kind of evolutionary “ecosystem”, a network of FLOSS repositories, each with varying characteristics and different evolutionary effects on the software projects it incorporates. But what of the details? And what would it mean for the individual projects wanting to maximise the benefit they derive from their environment?

I will begin to bring the strands together in the next post.

Posted in: Research