Hack Better

with

SCIENCE

Gemma Lynn / @ellotheth

Hi, I'm Gemma

I write software for a living.

Cornell University's NASA research station
Postal mailing industry
Health research
WonderMinion

Wait, people study this?

When was the last time

you read a

peer-reviewed study

in an

academic journal?

How many times

have you used

the same software process

and seen

different results?

Software engineering is not math

Finding the right formula is hard
The 'right' formula isn't always right

Software engineering research

has more in common with

behavioral science

than

computer science

Caveats

Confounds are confounding

Drawing general conclusions from empirical studies in software engineering is difficult because any process depends to a large degree on a potentially large number of relevant context variables. For this reason, we cannot assume a priori that the results of a study generalize beyond the specific environment in which it was conducted.

There's no money in replication

Economics, 2015: 38 of 67 papers could not be replicated

Medicine, 2012: 47 of 53 "landmark" papers could not be replicated

Psychology, 2015: 61 of 100 papers could not be replicated

AND YET

We know more about economics now than we did 50 years ago
We know more about medicine now than we did 50 years ago
We know more about psychology now than we did 50 years ago
We know more about software engineering now than we did 50 years ago

You (probably) won't find universal truths in empirical software engineering research.

That's OK!

Instead, look for environments similar to your own.

Ok, how do I read the research?

Anatomy of a study

Abstract	The TL;DR
Introduction	The Setup
Method	The Nitty Gritty
Results	The Raw Facts
Conclusion	The Spin

Abstract: The TL;DR

What did you study?
Why?
What did you find?

Introduction: The Hook

What's the problem, behavior, observation?
Why is it interesting?
What are you looking for?

Method: The Nitty Gritty

How did you set up the environment?
What did you measure?

Results: The Raw Facts

What were your measurements?
What might have gotten in the way of your measurements?

Conclusion: The Spin

What do the measurements mean?

Don't be linear!

You don't have to read the sections in order
You don't have to read all the sections
Peer review means it's ok if you don't understand the details

To the Ivory Tower!

Microsoft, IBM and TDD

Context

Abstract

What did you study?

Test-driven development (TDD)

Why?

Little empirical evidence supports or refutes the utility of this practice

What did you find?

Defect density of the four products decreased between 40% and 90%

The teams experienced a 15–35% increase in initial development time

Introduction

What's the observation?: One team at IBM and three teams at Microsoft, all using TDD or TDD-inspired practice.
Why is it interesting?: Each team operated in very different contexts. If their results using TDD line up, we can draw conclusions about TDD as a practice.
What are you looking for?: Any indication that TDD impacts software quality.

Method

What did the teams look like?

5 to 9 people per team
3 colocated, 1 distributed
Inexperienced to very experienced

What did the projects look like?

Java, C++, C#
6 to 155 KLOC
62% to 95% unit test coverage

Results

What were the measurements?

Team	Defect density reduction	Time increase
IBM (drivers)	39%	15-20%
Microsoft (Windows)	62%	25-35%
Microsoft (MSN)	76%	15%
Microsoft (VS)	91%	20-25%

Results

What could have gotten in the way of the measurements?

Developers using TDD might have been more motivated to produce higher quality code.

The projects developed using TDD might have been easier to develop.

Comparisons made via case studies can never be perfect due to the complex contexts of both the compared projects.

Conclusion

What do the results mean?

TDD seems to be applicable in various domains and can significantly reduce the defect density of developed software without significant productivity reduction of the development team.

Future releases of these products, as they continue using TDD, will also experience low defect densities due to the use of these test assets.

camelCase or under_scores

Abstract

What did you study?: The impact of program identifier style on human comprehension
Why?: The underlying hypothesis is that identifier style affects the speed and accuracy of comprehending source code.
What did you find?: Experienced software developers appear to be less affected by identifier style; however, beginners benefit from the use of camel casing

Introduction

What's the observation?: Program identifier names are at the core of program comprehension....The two dominant identifier styles are camel case and underscore.
Why is it interesting?: Research in cognitive psychology suggests that the use of underscores should increase readability and hence improve comprehension.
What are you looking for?: If a particular style significantly increases the speed of code comprehension, use of this style would have a tremendous impact.

Method

How did you set up the environment?: Groups of college students, both programmers and non-programmers, were presented with reading tasks in 5 studies.
What did you measure?: Eye tracking, response times, and responses to SAT-style reading comprehension questions.

Results

What were your measurements?

Subjects produce more accurate results using camel-case identifiers but at a cost [to] time and effort.

Expert programmers exhibit little difference in accuracy between the two styles and that, through training, any difference could most likely be mitigated.

Style appears to impact readability in simple tasks not in the context of reading programs.

While in a natural-language context underscores provide better readability, in a software context, camel casing seems to provide better readability.

Conclusions

What do the measurements mean?

The accumulated evidence leads to the conclusion that camel case is the better choice, especially for beginning programmers.

Reading natural language and source code appear to be quite different.

Attack of the Clones!

Abstract

What did you study?: The relationship between cloning and defect proneness.
Why?: Clones are generally considered bad programming practice in software engineering folklore.
What did you find?: Our findings do not support the claim that clones are really a "bad smell".

Introduction

What's the problem?

Maintenance and evolution might comprise up to 80% of the overall cost and effort.

Martin Fowler et al. suggest that code duplication or cloning one of the major indicators of poor maintainability.

Why is it interesting?

Another body of research presents evidence that clones improve productivity.

What are you looking for?

Do clones contribute a very small proportion of bugs, or the vast majority?

Method

How did you set up the environment?

Researchers chose four major Open-Source Software projects to analyze: Apache httpd, Nautilus, Evolution, and Gimp.

What did you measure?

What is the bug to cloned code ratio?
Are there more clones in buggy code?
Is cloned code more buggy than non-cloned code?
Are scattered clones buggier than colocated clones?
Do bugs with cloned code take more effort to fix?

Results

What were your measurements?

Most bugs contained hardly any cloned code.
Clones are not a major source of bugs.
The more copies, the lower the observed defect density.
File-scattered clones seem to have lower defect density.
Bugs with high clone ratio require smaller bug fixing changes.

Results

What might have gotten in the way of your measurements?

Bugs were collected from Bugzilla only, so the bug sets may be incomplete.
An automated tool did the bug linking, so it may not be completely accurate.
Analyses were run on monthly snapshots instead of every project revision, which may have introduced some imprecision.
All the projects were written in C.

Conclusions

What do the measurements mean?: Clones smell less bad than you thought they did.

i can haz

MOAR learnings?!

Research organizations

Institute of Electrical and Electronics Engineers: http://ieee.org
Association for Computing Machinery: https://acm.org
Microsoft Research: https://research.microsoft.com
First Monday: http://firstmonday.org

Search engines

Google Scholar: https://scholar.google.com
Springer: http://link.springer.com

"Social" media

Lots of researchers host their own papers independently:

Academia: https://academia.edu
ResearchGate: https://researchgate.net

I'm totally published, yo

Papers We Love

In-person meetups: http://paperswelove.org
Repository: https://github.com/papers-we-love
Presentation videos: https://youtube.com/user/PapersWeLove
Twitter: https://twitter.com/papers_we_love

In the great outdoors

Libraries
Colleges

Go forth and SCIENCE

https://ramblinations.com/hack-better-with-science

Gemma Lynn / @ellotheth

References

Baker, M. (n.d.). First results from psychology’s largest reproducibility test. Retrieved January 15, 2016, from http://www.nature.com/news/first-results-from-psychology-s-largest-reproducibility-test-1.17433
Begley, C., & Ellis, L. M. (n.d.). Drug development: Raise standards for preclinical cancer research. Retrieved January 15, 2016, from http://www.nature.com/nature/journal/v483/n7391/full/483531a.html
Binkley, D., Davis, M., Lawrie, D., Maletic, J. I., Morrell, C., & Sharif, B. (2012). The impact of identifier style on effort and comprehension. Empirical Software Engineering Empir Software Eng, 18(2), 219-276. https://ramblinations.com/hack-better-with-science/studies/2013 - Impact of identifier style on effort and comprehension.pdf

References

Chang, A. C., & Li, P. (n.d.). Is Economics Research Replicable? Sixty Published Papers from Thirteen Journals Say 'Usually Not' SSRN Electronic Journal SSRN Journal.
Nagappan, N., Maximilien, E. M., Bhat, T., & Williams, L. (2008). Realizing quality improvement through test driven development: results and experiences of four industrial teams. Empirical Software Engineering, 2008(13), 289-302. Retrieved from http://research.microsoft.com/en-us/groups/ese/nagappan_tdd.pdf
Rahman, F., Bird, C., & Devanbu, P. (2011). Clones: What is that smell? Empirical Software Engineering Empir Software Eng, 17(4-5), 503-530. https://ramblinations.com/hack-better-with-science/studies/2012 - Clones - What is that smell.pdf