Continuing the Empirical Results: Anti-Regressive Work

Posted on April 19, 2010


It has been a while since I wrote about my research into FLOSS, for which there are a few reasons. When last I wrote I was approaching my PhD defence, and for the whole PhD process to come to an end I had to wait until January this year, when my thesis was officially approved. Throughout all this time, I was making the move to Freie Universität Berlin (moving countries is more time-consuming and stressful than you think, but thankfully has been very rewarding so far.

Now that I am settled in Berlin and my thesis is done and dusted (and available to all for deeper reading), I’ve decided to cover the remaining parts of the research. Previous posts in the category of ‘Research’ have discussed the earlier parts.

I left it — all the way back in September 2009 — having discussed various process measures of six popular FLOSS repositories and demonstrated that there were both significant differences and notable similarities amongst these repositories: Debian projects tended to attracted more developers and a greater amount of effort than GNOME and KDE projects (whereas these two repos had similar values), and, in turn, this pair attracted more developers and effort than RubyForge, Savannah and SourceForge projects (which also shared similarities), hence:

\{D\} > \{G,K\} > \{R, Sa, Sf\}

(which is a blatant excuse to try out the \LaTeX function in WordPress).

The next test was whether similar findings could be observed in the product metrics. The hypothesis was that there would be evidence of anti-regressive work in the same repositories following the same pattern. Anti-regressive work would take the form of reductions in software complexity over successive versions; the greater the proportion of reductions over time, the more anti-regressive work done.

For example, say we have two versions of project P: v1 and v2 (where v2 is a later version than v1). For each function in v1 we measure the cyclometric complexity (Mc), then compare it to the Mc of the same function in v2. Every time it is lower in v2 than in v1 this is counted as an instance of anti-regressive work. We can obtain a proportion by dividing by the total reductions by the total number of functions.

This was done for five projects each from: Debian, GNOME, KDE, Savannah and SourceForge. Each project had code snapshots taken at three points in their respective histories and the anti-regressive work done to cyclometric complexity and functional coupling was measured as per the example above. We end up with a results table of this form:

Complexity Coupling
v1 -> v2 v2 -> v3 v1 -> v2 v2 -> v3

In each cell of the table there is a set of proportions (for the exact values, refer to my thesis, or the paper “Structural Complexity and Decay in FLOSS Systems: An Inter-Repository Study.” Proceedings of the 13th European Conference on Software Maintenance and Reengineering 2009 — available at all good outlets).

What we find is that the average anti-regressive work performed in Debian, GNOME, and KDE projects far outweighs that done in Savannah and SourceForge projects. Furthermore, there is an arguable, slight improvement in Debian projects over GNOME and KDE projects. The pattern still follows that observed above:

\{D\} > \{G,K\} > \{R, Sa, Sf\}

(Mmm, I like \LaTeX maths.)

Even if one does not accept a division between Debian, and GNOME and KDE, there is still apparent from the results so far, a pattern like this one:

I’ll be explaining and expounding on this diagram, and the terms within, later…

Posted in: Research