The Problem with Software Analytics

Software Analytics is the marriage of data science and software engineering.  It hopes to use data generated from software and software engineering processes to provide insights for creating better software.

The following is a quote from a software analytics round table discussion in 2013. All of the round table members are leading academics at prestigious universities.  Obviously, they were chosen because they are very accomplished and know the field.  Now, onto the quote.

Modern software services such as GitHub, BitBucket, Ohlol, Jira, FogBugz, and the like employ wide use of visualization and even bug effort estimation. We can pat ourselves on the backs even if those developers never read a single one of our papers.

Here is the source in IEEE Computer (which most likely you cannot access unless you are an academic): Roundtable: What’s Next in Software Analytics). For the non-academics an InfoQ reprint is available free online.

The academic research community cannot take credit for what Github, BitBucket, and others have done.  Yes, that academic research community is doing some excellent work, but most software practitioners are not seeing it because that research is being hidden in academic journals. The advancements might have occurred simultaneously and coincidentally, but there is not a clear causal relationship.  Unfortunately, the academic research is not getting into the hands of the software practitioners.

I would like to think the target audience of software engineering research would be software engineers, project managers, and developers. However, as this quote points out, those practitioners hardly ever see the research. If the research does not reach the intended audience, then there is a clear problem.  A problem that needs to be fixed.

Unfortunately, I do not yet know what the fix is. If you have any ideas, please leave a comment below.

If there is enough interest, maybe I will start something (just don’t know what that something is).


5 responses to “The Problem with Software Analytics”

  1. Hai Avatar

    I am thinking to start something based on
    historic data to predict the effort and number of bugs expected, so on
    by using machine learning techniques. Lets keep in touch.

    1. Ryan Swanstrom Avatar

      True, that makes sense. I am just not sure how many people actually understand the problem.


    2. Ray Li Avatar

      Number of bugs is a tricky metric. Here’s an interesting Stack Overflow thread describing why:

      The key is that bug count isn’t indicative of software quality.

      Bugs can impact software in many different ways. Some bugs can be ignored, but others might make the software unusable.

      By counting bugs, we’ve hidden one important aspect of bugs… the bugs’ impact on a user.

    3. Ray Li Avatar

      Number of bugs is a tricky metric. Here’s a Stack Overflow thread that describes why:

      The key is that number of bugs is not indicative of software quality.

      Number of bugs summarizes bugs, but also hides the most important aspect of any bug… it’s impact on a user.

      For example:
      100 cosmetic bugs != 1 crashing bug

      1. Ryan Swanstrom Avatar

        Thanks for commenting. I agree. Bugs are not a perfect measurement. However, bugs can be assigned a severity. This addition makes the measurement much more usable, but still not perfect. Few metrics ever are perfect, but number of bugs along with severity can be used as a measurement for quality. You can probably think of other possibilities as well.


Leave a Reply