51

In the 1975 software project management book, The Mythical Man Month: Essays on Software Engineering, Fred Brooks states that, no matter the programming language chosen, a professional developer will write an average 10 lines of code (LoC) per day.

Productivities in [the] range of 600-800 debugged instructions per man-year were experienced by control program groups. Productivities in the [range of] 2000-3000 debugged instructions per man-year were achieved by [OS/360] language translator groups. These include planning done by the group, coding component test, system test, and some support activities.

-Page 93 of "The Mythical Man Month" (1975)

The book quotes other numbers too for other projects, saying e.g. that an O/S is more complicated and therefore slower to write than other types of software. However, a 2000 statements/year figure is virtually identical to the often-claimed 10 LoC/day. Further, it is on the nearly same page as the other part of the claim, which is that LoC/day seems independent of the programming language being used.

Productivity [in LoC] seems constant [across languages] in terms of elementary statements [so it's better to use a higher-level language if you can], a conclusion that is reasonable in terms of the thought a statement requires and the errors it may include.11

-Page 94 of "The Mythical Man Month" (1975)

This claim is still being made today (in 2006, at least). From Jeff Atwood's blog, Coding Horror:

Project Size         Lines of code (per year)     COCOMO average
    10,000 LOC           2,000 - 25,000               3,200
   100,000 LOC           1,000 - 20,000               2,600
 1,000,000 LOC             700 - 10,000               2,000
10,000,000 LOC             300 -  5,000               1,600

The COCOMO averages divided across 250 work days per year results in 6.4 to 12.8 LoC/day, encompassing the 10 LoC/day claim from the 1975 book.

Is it true that professional programmers produce code at around 10 LoC/day?

Nat
  • 4,111
  • 2
  • 27
  • 36
travisbartley
  • 1,274
  • 1
  • 11
  • 20
  • Somewhat related question at SO: [Mythical man month 10 lines per developer day - how close on large projects?](http://stackoverflow.com/questions/966800/mythical-man-month-10-lines-per-developer-day-how-close-on-large-projects) – Martin Aug 06 '13 at 09:36
  • 3
    I had quick search through the book. I cannot find any claim of this form. He does reference some studies of lines-per-year. Please include a quote of the claim from the book, so we are not targeting a strawman. As this is going to attract poor answers before this is resolved, I am putting on hold. – Oddthinking Aug 06 '13 at 10:28
  • 3
    There's a lot of data on this subject, with references, in http://www.amazon.com/Software-Estimation-Demystifying-Practices-Microsoft/dp/0735605351 I don't have my copy of that book with me, but I copied the following data from it two years ago: `For a project whose size is between 1000 and 100,000 lines of code, expect from 400 to 833 lines of code per month`. Effort varies non-linearly with project size and complexity, but 400 LOC/month is in the same ballpark as 10 LOC/day. Note that this estimate of average LOC/day includes the project's non-coding activities, for example ... – ChrisW Aug 06 '13 at 10:59
  • ... `To deliver a project with 30,000 lines of code, expect this to take more than 20 months of coding, plus more than 20 months of non-coding activities (e.g. design, documentation, and management), plus more than 15 months of testing`. – ChrisW Aug 06 '13 at 10:59
  • 2
    The question now has appropriate quotes for notability. Thank you. Let's take the rest of the discussion about notability demands to chat or meta. – Oddthinking Aug 06 '13 at 13:20
  • "Is there any weight to this claim?" Yes. It was published in a seminal book by one of the most respected software writers. – DJClayworth Aug 06 '13 at 15:13
  • 3
    @DJClayworth The 'seminal book' was first published in 1975. IMO a good answer would say whether the quoted metric is still true today, and state the sources of the (more recent) data, instead of simply 'appealing to authority'. – ChrisW Aug 06 '13 at 15:29
  • The question wasn't "is this still true" it was "was Brooks right in what he wrote". – DJClayworth Aug 06 '13 at 15:45
  • 1
    @DJClayworth The present tense is used everywhere, in the question's title and text. I would expect the past tense, if the OP were asking whether it used to be true. – ChrisW Aug 06 '13 at 16:00
  • I imagine it strongly depends on the project. I'd expect a lot of new LoCs created in greenfield project, and very little when maintaining production project. Also it's worth to remember, that *less is more*. I can be a lot more effort to write same logic with 10 lines, than just don't care about efficiency and write it in 100 lines. Another thing is that programming language vary by [expressive power](http://en.wikipedia.org/wiki/Comparison_of_programming_languages#Expressiveness), for example 1 line of Python would be equivalent of 6.5 lines of C. – vartec Aug 06 '13 at 16:08
  • @vartec That is half of the claim which is being questioned: whether it takes as long to write a line of C as it does to write a line of Python, which results in greater productivity using Python, to whatever extent (you allege "6.5") Python is more expressive. – ChrisW Aug 06 '13 at 16:21
  • @chrisw, then we need a notable claim that it is still true today. –  Aug 06 '13 at 16:27
  • @Sancho FWIW, there are many such claims; e.g. the `COCOMO averages` on http://www.codinghorror.com/blog/2006/07/diseconomies-of-scale-and-lines-of-code.html imply 10 to 20 LOC /day, depending on the project size. Can you improve the question? – ChrisW Aug 06 '13 at 16:51
  • @ChrisW Thanks, that looks good. Ill work that more current claim into the question. –  Aug 06 '13 at 17:01
  • @vartec Don't forget that the staying power of the LoCs is also a factor as well. I might technically write more code when doing blue sky development, but if almost all of that code is thrown away then the actual LoC might be quite low. – rjzii Aug 06 '13 at 17:38
  • Thanks for the critiques and edits, everyone. You really made the question a lot better than the original. My intention was not to fact check the book, but to provide a factual reference for the "10 lines of code/day" myth, because I know it gets bounced around a lot with no real thought to the facts. I want people to find this and get the facts when they search "programming 10 lines per day." I understand why we must use a notable claim. Also, I would like to clarify that the question here was not intended to be "was the book accurate," but "is the book *still* accurate today?" – travisbartley Aug 08 '13 at 02:20
  • Programming consists of two phases: Development and maintenance. Based on experience I'd say that during development programmers write hundreds, if not more, lines of code per day. While during maintenance output probably falls into the 10s and less. – Karlth Aug 09 '13 at 09:31
  • 8
    @user357320 You're forgetting to factor in the lines that are written but don't make it to the final product. When programming a project I could very easily wind up writing 300 lines of code in the space of two hours, then spend the rest of the week and next week debugging, tweaking, yelling at the compiler, realizing that two of the functions I'd originally written won't actually work, then erase half of them and restructure the rest into something that /does/ work. End result: two weeks of work, 150 lines of code. – Shadur Aug 09 '13 at 13:06
  • A datapoint: The linux kernel has ca. 15.000.000 LOC and is estimated to be "worth" ca. 50.000 "man"-years. – Martin Schröder Aug 09 '13 at 21:48
  • I just took ALL the code I've worked on ever, in just source and headers, I've got 923 kilobytes (on a Mac so base 10, not 2) of code, which equals 11,531.725 lines (I divided the byte count by 80 because I'm lazy), or about 20 lines of code for me per day. – MarcusJ Sep 20 '17 at 06:20
  • That's the current revisions of my source, if I were to go through all of my commits (my biggest project has had over 51,722 lines written and replaced through out most of it's history (I didn't start out using version control)) my average would be much higher, 94.2 repeating lines per day by that measure. – MarcusJ Sep 20 '17 at 06:26
  • Someone should combine this answer and this one to find how many software engineers worked at Ford https://skeptics.stackexchange.com/q/39559/11686 – daniel Dec 13 '17 at 21:35
  • Anecdotally this seems right. I'm like a freight train. Slow to start, but once I get going, can't stop. Some days a lot, some very little. Recently I had to implement Stripe payment processing using wire transfers. Half the time was spent reading and figuring out the docs - getting a handle on their API. A lot of lines were written, only to be thrown away during testing. – Chloe Dec 17 '17 at 01:33
  • A lot of the effort is spent changing existing lines rather than adding new ones. – user253751 Dec 30 '17 at 08:25

3 Answers3

25

According to Capers Jones, productivity across programming languages is not as constant as the claim, but the evidence supports the claim "it's better to use a higher-level language if you can." As you note in your question, productivity varies dramatically depending on project size, so it's probably not helpful to try to pin down "average LoC / engineer / month" averaged across all different project sizes. But for projects of similar size...

The following tables from "The Economics of Software Quality" show productivity on 10 PBX systems of approximately the same size:

None of the projects were exactly 1,500 function points in size, and the original sizes ranged from about 1,300 to 1,750 function points in size. Here, too, the data normalization feature was used to make all 10 versions identical in factors that would conceal the underlying similarities of the examples. -- Capers Jones; Olivier Bonsignour. The Economics of Software Quality (Kindle Locations 2251-2253). Addison-Wesley Professional. Published 2011.

enter image description here

The data show Lines-of-Code productivity decreasing in higher-level languages (ranging from 480 LoC/Developer-Month with Assembly to 162 LoC/Developer-Month with Smalltalk) but Function Point productivity increasing with higher-level languages (1.92 FP / Developer-Month with Assembly to 7.71 FP / Developer-Month with Smalltalk). This is a chart of the data:

chart

Larry OBrien
  • 15,105
  • 2
  • 70
  • 97
  • 10
    Not too many modern languages on the list. It was published 2011, but when was the research data collected? 80s-90s? – vartec Aug 07 '13 at 06:42
  • 5
    Jones has been collecting his productivity stuff since the 80s, and it's ongoing. I agree that a lot of his analysis shows a strong lead-time effect, but I think his methodology is very well established and for the purposes of the question, appropriate. While I'd be interested in seeing how Erlang, Clojure, and F# stack up, I don't doubt they'd support the "lower LoC productivity, higher FP productivity" result the charts show. – Larry OBrien Aug 07 '13 at 17:31
  • 1
    @LarryOBrien, great data, and it would be nice to see some more with other languages as vartec pointed out. Also, it would be great if you could give a brief summary of the data like, "assuming an 8 hour day, the LOC metric ranged from 9.84 for Smalltalk to 27.04 for Assembly. The data also shows that the LOC productivity metric strongly depends on the language..." – travisbartley Aug 08 '13 at 02:26
  • @trav1s This is merely my opinion but, if you exclude the two lowest (Objective C and Smalltalk), I find it remarkable how little difference there is between the others. And even a factor of 3 is less than one order of magnitude (10); other project factors (size, complexity, reliability) can IIRC also amount to an expected/historical variation of a least a factor of 3. – ChrisW Aug 08 '13 at 10:26
  • 8
    Strange that there's Ada95 *(which I've never heard of anyone actually using)* and "CHILL" *(which I've never even heard of)*, but none the more popular "modern" languages like Python, Ruby, C#, or Java *(or Erlang, which I hear is widely used in PBX now)*; and none of the outdated-but-still-widely-used-in-legacy-applications languages like COBOL, RPG, and JCL... And "Assembly" isn't even one language. **I get the strong impression that this data was cherry-picked to make the graph work.** – BlueRaja - Danny Pflughoeft Aug 09 '13 at 22:44
  • 5
    Well... Jones started his company Software Productivity Research in 1984 (http://en.wikipedia.org/wiki/Capers_Jones), literally wrote the book(s) on software economics, and is published and cited regularly throughout ACM and IEEE. His data shouldn't be dismissed breezily. CHILL, for instance, was an important telecom language in the 80s and 90s and has an ISO standard -- it's not at all suspicious to see it in a list of languages used to implement a substantial (15 years of effort) PBX system. – Larry OBrien Aug 10 '13 at 19:12
8

The 2nd edition of Code Complete cites Cusumano et al. (2003) for the best-known (till then) industry figure(s). The paper is based on a survey of 104 projects in four regions of the globe. The authors' LOC/productivity summary is in the table below.

enter image description here

The wide variance does raise an eyebrow, which the authors attribute to different characteristics of the projects in a rather undetailed manner, but they also to posit that

US programmers often have different objectives and development styles. They tend to emphasize shorter or more innovative programs and spend more time optimizing code, which ultimately reduces the number of LOC and increases effort.

There's substantial data on project characteristics in the paper (two tables: one for area/requirements/platform/customer and one for the development practices involved; both tables are a bit large to reproduce here), but the authors did not conduct any regression analysis to test if any of the characteristics explain the variance in a statistically significant way... for this sample (and in this paper). They did do something like that in a companion paper for a smaller/pilot [sub-]sample of 29 US projects:

enter image description here

The regression model was (alas) obtained by stepwise backward procedure. There is obvious multicollinearity going on since the daily builds became explanatory/significant in the final regression but having a functional specification was no longer explanatory.

Also the same (companion) paper on the (29) US projects reported some more detailed descriptive statistics on productivity (mean 26.4 LOC/day, median 17.6):

enter image description here

There's some obvious distribution asymmetry to be inferred from the non-parametric skew here, but the actual sample skewness cannot be calculated just from they summary they reported. Annoyingly, the inclusion of two extra US projects in the first paper mentioned makes it risky to compare the median outputs: 270/17.6 gives us 15.34 working days per month, which probably means a different project was selected as median in the two papers/samples.


Some newer data is reported in the DoD Software Factbook v1.1 (sample of 208 projects from 2004 to 2011, but most were from 2008 to 2010)

Their results:

• real-time software 12.2 PM per KESLOC

• engineering software 8.8 PM per KESLOC

• mission support software 5.1 PM per KESLOC

• automated information system 2.8 PM per KESLOC

PM = person month, ESLOC = equivalent source line of code, which is however higher that simply new SLOC, because it also adds modified SLOC, reused SLOC, and auto-generated SLOC, each of these three adjusted with a certain (subunitary) factor. (And 1 KESLOC = 1,000 ESLOC.)

This is based on a log-log regression with the plot below:

enter image description here

The black vertical line in the graph corresponds to an "average" project of 25 KESLOC. This is not a true average even for log-transformed data (which is much closer to normality), but a reasonable approximation (averaging in log-units corresponds to 23,442 ESLOC and the median is 28,840 ESLOC).

Using the mythical 20.8 working days per month, the DoD numbers translate to 3.94 ESLOC/day (RT software) to 17.17 ESLOC/day ("automated information system"), with the in-between figures of 5.46 ESLOC/day for engineering software and 9.42 ESLOC/day for "mission support".

Fizz
  • 57,051
  • 18
  • 175
  • 291
  • In that top table, the formula for "median output¹) is wrong, unless their "no of programmer-months" is actually average per staff—if they actually used that formula, the number is meaningless. – Kevin Dec 13 '17 at 18:06
  • @Nat: I tried to do that. Hopefully my math isn't wrong. – Fizz Dec 13 '17 at 19:30
  • 1
    Looks good to me! Might want to consider a short summary at the top, for a less-technical perspective. – Nat Dec 13 '17 at 19:33
  • 1
    I have to agree with the short summary. There is a lot of technical information and acronyms to wade through. A conclusion paragraph that restates the claim in terms ESLOC/day would be helpful. – BobTheAverage Dec 13 '17 at 19:44
  • What is KESLOC? You define ESLOC, but never KESLOC. – BobTheAverage Dec 13 '17 at 19:44
  • @BobTheAverage 1,000 ESLOC: I've edited that in now. – Fizz Dec 13 '17 at 19:54
  • 1
    Number of defects should probably be "number of reported/identified defects" which are very different and a subject for many factors including how the defects are counted and acquired. – Igor Soloydenko Dec 13 '17 at 20:39
  • Good job, didn't checked the math, but what I expected to find I found it on the end : the split between the nature (instead of "size") of projects. This give for me more credits than doing the maths :) – Walfrat Dec 14 '17 at 13:11
-4

To understand what you're asking, you have to contrast the software situation in 1975 with that of today.

In 1975, you would probably be writing in COBOL on an IBM 360. The input/output situation was very simple: text based displays, printers, and either disc or punch card data sources. Because everyone was writing essentially the same code, the same way, performing very simple tasks, lines of code per day/month/year might have been an applicable measure, although there are still defects and how well the solution addressed the need to consider.

Given the verbosity of COBOL to accomplish even the simplest task, I'd say the 10 lines of code per day sounds low. 10 lines of COBOL code won't even describe one input record.

Today, in 2017, the situation is far more complex. In 1975, most people never got near a computer terminal, and the few that did had a text display connected to a single mainframe. Today, everyone has a computer in their pocket, with a full graphical user interface, connected to the world.

The efficiency of a developer today is more determined by the system and platforms they are using, than how many lines of code they turn out. On a simple level, a web app system like PHP takes far fewer lines of code than a complex web app system like ASP.NET or JSP. Because it's simpler, a developer can produce more lines of code per day. However, as the complexity of the task rises, PHP may not be able to deliver the functionality. It's not efficient if it can't meet the need and your project ends up as a failure.

Unlike 1975, when no one was ever fired for buying IBM, today you can choose a system and platform that looks good by simple metrics, but ends up failing to deliver the functionality needed. Can't be scaled up to meet the need, can't be secured, can't be quickly updated, etc...

Also consider that in 1975, the concept of reusable blocks of code... as in objects and lately web services, didn't exist. There was a lot of duplication of effort back then, that simply doesn't happen today. How may lines of code per day did a developer produce when they reused a known tested web service as a source, rather than rolling that from scratch? Instead of 1000 lines of code, they produced one line of code to call the service, so lines of code per day paints a very misleading picture in that regard.

Lines of code per day might have been applicable in 1975, because everyone was using the same method, performing the same simple tasks. In 2017, the situation is far more diverse, and far more complex. It's like judging a 787 airliner by Wright Flyer metrics, concluding the 787 isn't very good because it it only has one set of wings instead of two.

So to answer your question... it can't be answered because lines of code per day isn't worth establishing in the 2017 computing environment.

tj1000
  • 195
  • 1
  • 4
  • There are so many inaccuracies in this that I don't know where to begin. (But, for starters, in 1975 I was writing in assembler for an IBM 5100. On an IBM 2260 video terminal.) – Daniel R Hicks Dec 21 '17 at 19:40
  • Please enlighten me... COBOL was the typical language, though assembler was used - my older brother was a lead developer on CICS, written in assembler. As I recall, assembler could be horridly inefficient to write. 200 lines of assembler might translate to one complex COBOL command, so lines of code per day would be very inaccurate... one might conclude that an assembler programmer was 'more efficient' because they turned out more lines of code. – tj1000 Dec 21 '17 at 19:57
  • If you look at the answer from Larry O Brien, developer productivity is roughly equal across a range of languages (within about a factor of two, which is about as accurate as such measures can be). Yes, some measures make C++ more efficient than assembler, but this often fails to take into account the complexities (eg, writing kernel code) which drove the choice of assembler over C++ in the first place. And the main reason why the numbers come out as close as they do is that *writing* the code is the easy part. The time goes into design, debug, documentation, et al. – Daniel R Hicks Dec 21 '17 at 20:28