Debian vs. SourceForge – Round 2

Posted on February 10, 2009


And so, we revisit the posers put up in a previous post:

  1. Are Debian’s evolutionary characteristics significantly different to those of SourceForge?
  2. Does Debian act as a catalyst?

To answer these questions, we took a closer look at the software inside them. I’ll briefly explain the method here, but details of the steps will be part of later posts in the “Research Methods” strand.

We chose a mutually exclusive sample of 50 packages from Debian and 50 projects from SourceForge.  In both cases they were taken from the pool of “stable” projects only. They were all downloaded and each project’s activity was extracted from their version control system (using log commands) and recorded in a file. Then we delved into our little toolbox and used some nifty tools to extract the information we needed, that information being the project’s:

  • Age (time between first and last commit to the version control system)
  • Size (in lines of code)
  • Number of developers (monthly average)
  • Number of commits (monthly average)

Each attribute can be aggregated from the 50 projects into a summary value for the repository. So, for example, we can take the ages of the 50 Debian projects and use them to get a mean or a median age. If we do the same thing for SourceForge we can compare them.

And that’s just what we did.

And here’s just what we found:

Boxplots of measured attributes

Boxplots of measured attributes

Using statistical significance testing (again, I’ll cover this in a “Research Methods” post) we found that Debian projects had larger values for each attribute, i.e. they were older, larger, and attracted more developers who peformed a greater amount of work, all to a significant degree.

This leads us to our second question, is Debian responsible? Is it somehow a driver for these larger values? Our answer to this question comes in round 3.

Posted in: Research