Hack Better

with

SCIENCE

Gemma Lynn / @ellotheth

Hi, I'm Gemma

I write software for a living.

  • Cornell University's NASA research station
  • Postal mailing industry
  • Health research
  • WonderMinion

Wait, people study this?

When was the last time

you read a

peer-reviewed study

in an

academic journal?

How many times

have you used

the same software process

and seen

different results?

Software engineering is not math

  • Finding the right formula is hard
  • The 'right' formula isn't always right

Software engineering research

has more in common with

behavioral science

than

computer science

Caveats

Confounds are confounding

Drawing general conclusions from empirical studies in software engineering is difficult because any process depends to a large degree on a potentially large number of relevant context variables. For this reason, we cannot assume a priori that the results of a study generalize beyond the specific environment in which it was conducted.

There's no money in replication

    Economics, 2015
    38 of 67 papers could not be replicated
    Medicine, 2012
    47 of 53 "landmark" papers could not be replicated
    Psychology, 2015
    61 of 100 papers could not be replicated

AND YET

  • We know more about economics now than we did 50 years ago
  • We know more about medicine now than we did 50 years ago
  • We know more about psychology now than we did 50 years ago
  • We know more about software engineering now than we did 50 years ago

You (probably) won't find universal truths in empirical software engineering research.

That's OK!

Instead, look for environments similar to your own.

Ok, how do I read the research?

Anatomy of a study

Abstract The TL;DR
Introduction The Setup
Method The Nitty Gritty
Results The Raw Facts
Conclusion The Spin

Abstract: The TL;DR

Abstract: The TL;DR

  • What did you study?
  • Why?
  • What did you find?

Introduction: The Hook

Introduction: The Hook

  • What's the problem, behavior, observation?
  • Why is it interesting?
  • What are you looking for?

Method: The Nitty Gritty

Method: The Nitty Gritty

  • How did you set up the environment?
  • What did you measure?

Results: The Raw Facts

Results: The Raw Facts

  • What were your measurements?
  • What might have gotten in the way of your measurements?

Conclusion: The Spin

  • What do the measurements mean?

Don't be linear!

  • You don't have to read the sections in order
  • You don't have to read all the sections
  • Peer review means it's ok if you don't understand the details

To the Ivory Tower!

Microsoft, IBM and TDD

Context

Abstract

What did you study?

Test-driven development (TDD)

Why?

Little empirical evidence supports or refutes the utility of this practice

What did you find?

Defect density of the four products decreased between 40% and 90%

The teams experienced a 15–35% increase in initial development time

Introduction

What's the observation?

One team at IBM and three teams at Microsoft, all using TDD or TDD-inspired practice.

Why is it interesting?

Each team operated in very different contexts. If their results using TDD line up, we can draw conclusions about TDD as a practice.

What are you looking for?

Any indication that TDD impacts software quality.

Method

What did the teams look like?
  • 5 to 9 people per team
  • 3 colocated, 1 distributed
  • Inexperienced to very experienced
What did the projects look like?
  • Java, C++, C#
  • 6 to 155 KLOC
  • 62% to 95% unit test coverage

Results

What were the measurements?
Team Defect density reduction Time increase
IBM (drivers) 39% 15-20%
Microsoft (Windows) 62% 25-35%
Microsoft (MSN) 76% 15%
Microsoft (VS) 91% 20-25%

Results

What could have gotten in the way of the measurements?

Developers using TDD might have been more motivated to produce higher quality code.

The projects developed using TDD might have been easier to develop.

Comparisons made via case studies can never be perfect due to the complex contexts of both the compared projects.

Conclusion

What do the results mean?

TDD seems to be applicable in various domains and can significantly reduce the defect density of developed software without significant productivity reduction of the development team.

Future releases of these products, as they continue using TDD, will also experience low defect densities due to the use of these test assets.

camelCase or under_scores

Abstract

What did you study?

The impact of program identifier style on human comprehension

Why?

The underlying hypothesis is that identifier style affects the speed and accuracy of comprehending source code.

What did you find?

Experienced software developers appear to be less affected by identifier style; however, beginners benefit from the use of camel casing

Introduction

What's the observation?

Program identifier names are at the core of program comprehension....The two dominant identifier styles are camel case and underscore.

Why is it interesting?

Research in cognitive psychology suggests that the use of underscores should increase readability and hence improve comprehension.

What are you looking for?

If a particular style significantly increases the speed of code comprehension, use of this style would have a tremendous impact.

Method

How did you set up the environment?

Groups of college students, both programmers and non-programmers, were presented with reading tasks in 5 studies.

What did you measure?

Eye tracking, response times, and responses to SAT-style reading comprehension questions.

Results

What were your measurements?

Subjects produce more accurate results using camel-case identifiers but at a cost [to] time and effort.

Expert programmers exhibit little difference in accuracy between the two styles and that, through training, any difference could most likely be mitigated.

Style appears to impact readability in simple tasks not in the context of reading programs.

While in a natural-language context underscores provide better readability, in a software context, camel casing seems to provide better readability.

Conclusions

What do the measurements mean?

The accumulated evidence leads to the conclusion that camel case is the better choice, especially for beginning programmers.

Reading natural language and source code appear to be quite different.

Attack of the Clones!

Abstract

What did you study?

The relationship between cloning and defect proneness.

Why?

Clones are generally considered bad programming practice in software engineering folklore.

What did you find?

Our findings do not support the claim that clones are really a "bad smell".

Introduction

What's the problem?

Maintenance and evolution might comprise up to 80% of the overall cost and effort.

Martin Fowler et al. suggest that code duplication or cloning one of the major indicators of poor maintainability.

Why is it interesting?

Another body of research presents evidence that clones improve productivity.

What are you looking for?

Do clones contribute a very small proportion of bugs, or the vast majority?

Method

How did you set up the environment?

Researchers chose four major Open-Source Software projects to analyze: Apache httpd, Nautilus, Evolution, and Gimp.

What did you measure?
  • What is the bug to cloned code ratio?
  • Are there more clones in buggy code?
  • Is cloned code more buggy than non-cloned code?
  • Are scattered clones buggier than colocated clones?
  • Do bugs with cloned code take more effort to fix?

Results

What were your measurements?
  • Most bugs contained hardly any cloned code.
  • Clones are not a major source of bugs.
  • The more copies, the lower the observed defect density.
  • File-scattered clones seem to have lower defect density.
  • Bugs with high clone ratio require smaller bug fixing changes.

Results

What might have gotten in the way of your measurements?
  • Bugs were collected from Bugzilla only, so the bug sets may be incomplete.
  • An automated tool did the bug linking, so it may not be completely accurate.
  • Analyses were run on monthly snapshots instead of every project revision, which may have introduced some imprecision.
  • All the projects were written in C.

Conclusions

What do the measurements mean?

Clones smell less bad than you thought they did.

i can haz

MOAR learnings?!

Research organizations

Search engines

"Social" media

Lots of researchers host their own papers independently:

I'm totally published, yo

Papers We Love

In the great outdoors

  • Libraries
  • Colleges

Go forth and SCIENCE

https://ramblinations.com/hack-better-with-science

Gemma Lynn / @ellotheth

References

References