In the A16z podcast episode on “Feedback Loops” they talk about the controversial, and frankly just tricky, topic of measuring productivity. I found the whole episode interesting, but you might want to listen for 5-10 minutes from about 20:00 minutes in or just read my notes and trust I’ve heard them right 😃
My notes:
- Don’t use Lines of Code to measure productivity, you end up with pointless lines of code
- Don’t use agile velocity (i.e. story points burn down), these are more a measure of effort for capacity planning purposes (and they can be gamed)
- The interviewees (Nicole Forsgren and Jez Humble) recommend:
- Lead time (code commit to code deploy) – they talk about this as an “outcome” not an “output”, but I’m not sure I entirely follow here, though I do see it as a measure of the whole team’s ability to deliver value (assuming that’s what the code commit delivers) to a client
- Release frequency
- Time to restore
- Change fail rate
- …but Nicole and Jez admit there are gaps here in measuring if the change delivers value (i.e. effectiveness, efficiency, customer satisfaction, delivering mission goals)
I wonder if there’s a “time to positive outcome” metric (using the outcomes defined when the opportunity was identified), which would encourage tight iteration to allow frequent course correction? Measuring a team on the outcomes they themselves define when the opportunity is identified requires a team of high moral fibre, as there’s a possibility to game it right there.
Photo credit: © Travis Wise, 2014, Dials and Gauges
Hmm tricky one.
It seems to me that most attempts to measure quantitatively these things encourage gaming the system.
Of course it’s possible to pointlessly split a task into smaller tasks to make it seem more significant. Likewise, it’s possible to game a system by selectively attacking tasks which you know in advance gets “more points”.
Also release frequency is very much a matter which needs to be decided carefully and agreed with other parties; we can’t just say “faster is better” or we’d just end up deploying a load of crappy untested code (for more points?), or spend 90% of time doing release-engineering for tiny releases?
Do you think your concerns are addressed with a blend of metrics, e.g. working to improve and balance release frequency, time to restore, and change fail rate?
I’m not convinced that measuring those things is massively useful. Change fail rate is a tricky one, what counts as a fail? Minor bug? Does it depend what the change was?
I did know one company where management calculated release “failness” to decide how many engineers to dismiss… let’s not go there.