75

This infographic shows the different number of lines that some software applications (or a general type of them) have on average.

The number of lines shown for an average modern high-end car strikes me as implausible. I know modern cars have lots of control of several aspects related to the car and I also know that the languages used for programming them (C and assembly) are more verbose than more high-level programming languages, but still I don't see a car software taking notably more code than a huge social network like facebook, a full operating system with lots of features like Windows Vista, or a professional IDE like Microsoft Visual Studio. It looks like there are rather fewer things to control in car software.

Maybe those lines refer to the lines in assembler code, if that were the case it would be plausible to me, but then in reality the number of coded lines would be let's say around 6 times lower, which would put it with the Boeing 787 software, which I think would make more sense.

Are around 100 million lines the amount on average of lines in source code that programmers have to code in order to create the software for an average high-end car?

Jesús Gómez
  • 1,107
  • 1
  • 7
  • 9
  • 4
    100M lines of code (LOC) doesn't seem unreasonable, but this question might be unanswerable since a lot is missing. We don't know if this cumulative across all programmable devices in a car, we don't know the languages involved (some languages are more verbose than others), and we don't know if the code in the car include accessories (i.e., on-board GPS) or not. – rjzii Sep 25 '17 at 14:13
  • I think we can assume that the code refers to that of the software installed in the car once it is available in the concessionaire directly from the factory. – Jesús Gómez Sep 25 '17 at 14:27
  • 110
    A lot of large embedded systems have either a whole linux kernel or a windows nt kernel running. I'm pretty sure they counted that – Sklivvz Sep 25 '17 at 14:28
  • 12
    I don't know if this claim's particularly meaningful. Whenever a program's compiled, the compiler can select a trade-off between code size and performance (e.g., as in [this question](https://stackoverflow.com/questions/19470873/why-does-gcc-generate-15-20-faster-code-if-i-optimize-for-size-instead-of-speed) from StackOverflow). For example, you can precompile a square root function into a look-up table for typical 64-bit floats, resulting in over 1.8-quintillion LOC (because, yes, you included negatives mapping to `NaN` 'cause why not?). – Nat Sep 25 '17 at 15:21
  • 1
    @Nat usually you measure source code though so pre-compilation - of course its also a notoriously useless measure that tells you almost nothing – jk. Sep 25 '17 at 15:46
  • 16
    Based on my (limited) experience this number may be even less meaningful than usual, because a fair bit of safety critical code is generated from formally verified models via model-driven development. So maybe there is a bunch of C or Assembly code that no human is ever allowed to directly touch. – xLeitix Sep 25 '17 at 15:47
  • 4
    Software on airplanes are real-time systems and they don't have complex, generic kernels like general computers do. Though this may be true for some cars, planes have very specific code and hardware which allows for smaller code bases. [Related question on Aviation.SE](https://aviation.stackexchange.com/questions/36853/do-safety-critical-avionics-systems-run-linux) – RomaH Sep 25 '17 at 15:56
  • @jk. Most of the code in the car's auto-generated. I like the incremental compilation perspective, but to avoid semantics, what I meant was that it's all about how the auto-generation (which I call "compilation") is done. The big issue is that cars tend to use low-power processors with real-time constraints; pre-compiling stuff that you might dynamically perform on a home computer's a big deal. – Nat Sep 25 '17 at 16:03
  • 33
    Source code that is generated is not source code – jk. Sep 25 '17 at 16:05
  • 5
    @jk. If you're auto-generating a bunch of C/C++ (often in a limited subset that's microcontroller/real-time friendly), people'll call it code, and often source code. The idea's that the code itself being generated by an automated program isn't fundamentally distinct from code being generated by a human doing the same thing. – Nat Sep 25 '17 at 16:06
  • 7
    For example, [this promotional flier](https://www.absint.com/tum_absint.pdf) advertises: **"_Right now the FCC software has some 125k lines of code, of which 92% are auto-generated._"**. – Nat Sep 25 '17 at 16:08
  • 46
    One should note, however, that LOC is not a especially useful measure for the inherent complexity of some system.. – Daniel Jour Sep 25 '17 at 16:20
  • ["Translation validation for stateflow to C"](http://ieeexplore.ieee.org/abstract/document/6881350/): "_Code generators play a critical role in the Model Based Development of complex software systems. This is particularly true in the automotive domain, where the code auto-generated from Simulink/Stateflow models is directly flashed onto embedded controllers._" – Nat Sep 25 '17 at 16:22
  • To me that claim is highly suspicious. Even if we assume we are talking LOC, not LLOC, that's still a lot of code. I've worked on the software for a large AG equipment manufacturer (some of that runs embedded) and while I don't have exact number, the LLOC was probably 1 or 2 orders of magnitude less than what's claimed here. – ventsyv Sep 25 '17 at 17:19
  • 5
    Just to note it, the trade-off I meant above was the [space–time tradeoff](https://en.wikipedia.org/wiki/Space-time_tradeoff), and the specific example about >1.8-quintillion LOC for a square-root function is an example of [lookup tables vs. recalculation](https://en.wikipedia.org/wiki/Space-time_tradeoff#Lookup_tables_vs._recalculation). The gist is that, in real-time systems, you can help ensure time-constrained performance by bloating the code; who wants a [halting problem](https://en.wikipedia.org/wiki/Halting_problem) in their car's breaking system? (Sorry, had to.) – Nat Sep 25 '17 at 17:27
  • 1
    This is an absurd comparison. Based on my my quick estimate (although it doesn't have a confident trend-line) it the average high-end car **had** 100M SLOC it would need to have 50GB of storage. – transistor09 Sep 25 '17 at 17:32
  • 1
    @transistor09 *"This is an absurd comparison. [...] the average high-end car would need to have 50GB of storage."* How is that absurd? 50 GB of storage isn't much at all, even with solid-state storage. Even if you increase that by one or two powers of ten, it's *still* not a whole lot. – user Sep 25 '17 at 17:37
  • 1
    @MichaelKjörling it doesn't happen because it would be absurd. The code (or the program that's been compiled from it) is never in the car. The number has to include all the utilities that were used to design and build it. – transistor09 Sep 25 '17 at 17:47
  • 2
    @transistor09 15m lines in the linux kernel in 2011 and it can run on I think tens of MBs of RAM. – djechlin Sep 25 '17 at 23:33
  • 1
    @transistor09: 100M SLOC and 50 GB of storage? Tinkertoys. I can walk into Walmart and buy 64GB memory sticks for $12. Back In The Day (tm) that was A LOT. Nowadays, not so much... – Bob Jarvis - Слава Україні Sep 26 '17 at 02:21
  • @djechlin: 15 million lines in the Linux kernel? Maybe in the whole project, when you include drivers for every possible piece of hardware, but any particular kernel uses only a small fraction of those. And you can optimize your kernel to use a minimum set for your hardware. – jamesqf Sep 26 '17 at 04:32
  • 1
    Is that last one just a smack to healthcare? I don't see legitimacy in that. Also, that much code just makes me think it could be done better if the programmer programmed his/her scripts a bit more dynamically. – sfxworks Sep 26 '17 at 16:29
  • @DanielJour alas, they *are* correlated with bug count ;) – Tobia Tesan Sep 27 '17 at 08:04
  • 1
    Side question: do empty lines and comments count? What about those **horrible people** that write brackets using the Allman notation rather than K&R? – Andrea Lazzarotto Sep 27 '17 at 10:11
  • Here is a link to the original http://www.informationisbeautiful.net/visualizations/million-lines-of-code/ – seth10 Sep 27 '17 at 18:39
  • 1
    Clearly a marketing claim. Edsger Dijskstra pointed out in the 1960s that LOC is a cost, not an asset. – user207421 Sep 28 '17 at 00:43
  • 1
    Lines of code is a kinda useless metric. Take a look at "gode golf", they fit in a few bytes functions that otherwise would be ~50 lines – Caterpillaraoz Sep 28 '17 at 13:40
  • 2
    My car has *so* many molecules. Waay more molecules than your puny car.... – Jared Smith Sep 28 '17 at 15:13
  • 1
    I just noticed that the question itself neglects to mention that the infographic lists sizes of ***codebases***. That should not be confused with ***source code***, because consensus on their definitions appears to be that source code is *directly compiled* to be a program, while codebase is all the code that *participates in developement* (both are human written though). For all we know, car codebases could include a full fledged car simulator to fully test and validate something that will fit on a few kilobytes on the embedded chips. – transistor09 Sep 28 '17 at 20:37

4 Answers4

75

Ford has said that the F150 pickup has 150 million lines of code.

According to the New York Times:

Twenty years ago, cars had, on average, one million lines of code. The General Motors 2010 Chevrolet Volt had about 10 million lines of code — more than an F-35 fighter jet. Today, an average car has more than 100 million lines of code.

So, even if the car isn't particularly high end, it could have that many lines.

According to Embedded Systems Security: Practical Methods for Safe and Secure Software (2012):

One of the first embedded systems within an automobile was the 1978 Cadillac Seville's trip computer, run by a Motorola 6802 microprocessor with 128 bytes of RAM and two kilobytes of ROM. ...
In contrast, even the lowest-end automobile today contains at least a dozen microprocessors; the highest-end cars are estimated to contain approximately 100 microprocessors. With infotainment systems running sophisticated operating systems such as Microsoft Windows and Linux, the total embedded software content can easily exceed 100 million lines of code.

DavePhD
  • 103,432
  • 24
  • 436
  • 464
  • I would argue that especially if the car is a high-volume mass produced car it is likely to have that many lines of code. The F150 is in the running for the best-selling vehicle of all time, it's pretty easy to justify that kind of development when you sell millions of units. – wedstrom Sep 25 '17 at 15:58
  • 51
    Keep in mind that this often includes open source projects, not all of which are used in the operation of the vehicle, but are likely compiled along with the rest of the binaries. If you check the back of the owner's manual for modern cars, there are often pages and pages of open source license acknowledgements. – Nate Diamond Sep 25 '17 at 16:12
  • 25
    Judging by how slow they are, more than half of that is in the typical new car entertainment/navigation system. – T.E.D. Sep 25 '17 at 16:25
  • 11
    It might be worth separating the explicit claim (about lines of code) from the implicit claim (about human-written lines of code), since the great bulk of the reported metric's auto-generated, e.g. by [Stateflow](https://en.wikipedia.org/wiki/Stateflow) ([image](https://www.mathworks.com/content/mathworks/www/en/products/stateflow/features/_jcr_content/productFeaturesParsys/feature5_copy/imageEnhancedParsys/image_copy.img.jpg/1505955386284.jpg)). – Nat Sep 25 '17 at 16:32
  • 1
    The NYTimes article has: **"_Today, an average car has more than 100 million lines of code. Automakers predict it won’t be long before they have 200 million. When you stop to consider that, on average, there are 15 to 50 defects per 1,000 lines of software code, the potentially exploitable weaknesses add up quickly._"** Ugh, they're interpreting it as human-written code, too! – Nat Sep 25 '17 at 16:42
  • 8
    @Nat: I'm personally more worried about *mixing* the different uses of code. I hope the code controlling direction/speed is bullet-proof (with the appropriate standard of development), but it should be relatively small; on the other hand I could care less for the AC/GPS/entertainment system, and they are likely much more verbose (full Linux/Windows kernel, JVM, ...). Then again, if we look at Toyota's cruise control system... – Matthieu M. Sep 25 '17 at 16:49
  • 5
    To put this number in perspective, the Linux kernel has grown from 3.4M LOC in early 2001 (Linux kernel version 2.4.0); to 5.9M LOC in late 2003 (2.6.0); to 15.8M LOC in 2013 (3.10); to 19.5M LOC in mid-2015 (4.1). Figures from [Wikipedia](https://en.wikipedia.org/wiki/Linux_kernel#History), but should be easy to corroborate against [the official Linux kernel source code tree](https://www.kernel.org/) if someone really cares... The point here isn't the exact number of lines of code, but that the kernel and its parts *alone* constitute a significant fraction of those 150M lines of code. – user Sep 25 '17 at 17:47
  • 8
    My Ford Focus with MyFordTouch runs an entire WindowsCE system just to play the radio and bluetooth connections... It even crashes occasionally like a regular windows system, showing Windows-like crash messages before rebooting itself. A fighter plane, like the F-35 can't have that, and also has verified code - so no, it won't be as bloated and unnecessary as my car's entertainment system - it's going to be far more streamlined. Not to mention a fighter jet's systems are going to be Real Time systems, not general purpose OS's with crap bolted on. – SnakeDoc Sep 25 '17 at 17:59
  • 98
    Also, the number of lines of code in a system isn't a sign of sophistication or something to be impressed about - it's often a sign of bloat and poor design. So basically Ford is bragging about having poor engineering... 10+ MLOC to run a radio and touchscreen is absurd, even if you include the kernel LOC count, etc. – SnakeDoc Sep 25 '17 at 18:01
  • 2
    The quote from Ford is valuable, but I fail to see how a NYT article (or any other major newspaper article) could possibly be taken as a primary source for the answer to such a question. There are far too many cases of misinformation in mass journalism, especially regarding technical topics such as this. – Reverse Engineered Sep 25 '17 at 18:39
  • 29
    Ford *sounds like* a reliable source, but the quote is probably written by some expertise-free dude in the marketing department who believes that more of something is better. That is a pretty good reason to discount it as a reliable source. I think this answer needs better sources. – matt_black Sep 25 '17 at 18:44
  • @matt_black there are quotes from specific BlackBerry and Ford people about it here: http://business.financialpost.com/technology/blackberry-ltd-inks-its-first-direct-automative-deal-with-ford-motor-co-for-qnx-car-software – DavePhD Sep 25 '17 at 18:52
  • 4
    @DavePhD But that looks like the same quote and devoid of any sign that it comes from an expert. What about some triangulation of the memory capacity of the hardware systems used in cars? Or quotes that don't look like marketing puff? – matt_black Sep 25 '17 at 19:07
  • 1
    @NateDiamond Right. To clarify for those who may not deal with this on a daily basis: you might reference a library that contains a million lines of code only to call one method. If you're using a tool like Proguard, the 999,800 lines you're not using might be stripped out of the final binary. – Kevin Krumwiede Sep 25 '17 at 19:47
  • @matt_black John Wall seems like an expert to me. – DavePhD Sep 25 '17 at 19:54
  • @SnakeDoc - having my car BSOD would kind of wreck my day - especially at 70 MPH... – Bob Jarvis - Слава Україні Sep 26 '17 at 02:24
  • @SnakeDoc please don't try to minimize someone else's work, even if you don't understand it. – TankorSmash Sep 26 '17 at 04:03
  • @MatthieuM. you should be VERY concerned if the entertainment system's code is low quality: http://www.bbc.com/news/technology-33622298 – Erik Sep 26 '17 at 07:13
  • @Erik: Ah yeah... we should also talk about *separating the subcomponents at the physical layer*! – Matthieu M. Sep 26 '17 at 11:53
  • 8
    I work in the auto industry and the number of lines is generally due to the fact that the amount of software engineers available is very low. It's mainly bloated code that more than 90% of the time is someone pulling down an entire library and only using a specific portion of it. The industry doesn't like to spend too much money on the software side, in fact most of the money in this industry is allocated to mechanical / controls engineering as well as r&m. Yes controls is software but its usage is not for gps, navigation, or all the items we are discussing here. – JonH Sep 26 '17 at 15:25
  • 3
    I worked on the model year 2009 body control module of a big 3 manufacturer. @Nat is correct - the vast majority of lines of code are autogenerated from state machine diagrams in a very verbose manner, and the compiler is expected to optimize that. Compile times, therefore, were extraordinarily long - and the joke about engineers waiting around the water cooler during compiles is painfully true, particularly since the supplier that I worked for had no incentive to provide decent computers. Furthermore, MISRA C requirements also had a lot of requirements tied to writing verbose code. – Adam Davis Sep 26 '17 at 19:05
  • 3
    I can't post this as an answer, (not backed up by a valid source) I can confirm that the number of lines of code in a single compute module targeting a small embedded processor with memory measuring in the hundreds of kB could readily have millions of lines of code. Remove comments and blank lines, though, and you'd remove nearly 75% of the lines. Each state machine had its own file, and contained the exact text of the requirement it fulfills from the manufacturer. You could very nearly remove the code and have the entire spec left over. So a lot of it is ***"what counts as a line?"*** – Adam Davis Sep 26 '17 at 19:12
  • That said, I can't speak toward the more complex systems that run Linux and other OSs - I worked on the bare metal side of development, where critical safety issues require significantly more oversight. – Adam Davis Sep 26 '17 at 19:17
  • Do you think you can please dig out how much of it is the system kernel and how much of it is linked open source code? I.e., how much of it is actually Ford's or their subcontractors' code? – yo' Sep 28 '17 at 06:52
  • @yo' I'll try, but I doubt I'll be successful. – DavePhD Sep 28 '17 at 13:03
45

LOC is a particularly bad metric, because it raises the question of what is a line of code. Do you include whitespace and comments? Compiler directives? Preprocessor definitions? How about lines containing only braces? Do you include makefiles or whatever scripts do the building? And in the end, does the number of LOC truly relate to the complexity of the code? This white paper provides a summary of many of the issues around this question, of which I've quoted a few above.

Also consider that some languages lend themselves to shorter code than others, whether due to the common C style convention of parentheses on separate lines, the standard language library providing additional features by default, or the language itself including features which in other languages are handled by library functions. This is the core of work by Halstead amongst others. This comparison between Perl and VB.net is one example. This comparison across multiple languages demonstrates what the author calls the "expressiveness" of languages, where bugfixes in more "expressive" languages such as Python appear to need fewer LOC changes on average than languages such as C.

Where it most spectacularly runs into trouble though is when you consider how much is now under software control - and what that software might be. The satnav is one separate module; the email/phone/data interface is another module; the radio is a third module; the dash is a fourth module; aircon is a fifth module; and there's often a central display which coordinates them all. And that's just for the dashboard. This EENews article estimates over 50 electronic control units in a modern car.

Back when I worked on a Ford email/text/data interface, we were using WinCE. I wouldn't be at all surprised to find that they now have Linux and/or Android in there today. And some modules will roll their own OS, if they're running on a tiny microcontroller which doesn't warrant anything else, or if they are safety-related and require a greater level of scrutiny. The LOC for WinCE appears to be unknown, but WinNT had around 10 million and WinXP was up to 40 million. It seems reasonable to assume WinCE is of the same order of magnitude. The Linux kernel itself (as maintained by Linus) is now over 20 million LOC. If you multiply that by the number of devices in the car which might use these OSes (and that actually could be reasonable if they're all using different versions of OSes) then you're easily into the hundreds of millions.

If you only count LOC in C/C++ written by the car manufacturer, it's almost certainly a lot lower. But then the same logic would give the paradox of an Android phone which might have no unique LOC for a manufacturer, even though all manner of stuff has been tweaked in the build. You can't realistically say "this phone has zero LOC" just because it's all built from off-the-shelf libraries; but equally the total LOC in the OS and libraries does not reflect the engineering effort required by a manufacturer.

Graham
  • 1,592
  • 12
  • 13
  • 11
    While this is a good point to raise amongst us Engineers, for the layperson there's no reason whatsoever they should care about the *precise* accuracy of a "LOC" report. The factors of 10 differences being reported are going to wash out any piddly <50%ish differences in how the LOC measurements are taken. – T.E.D. Sep 26 '17 at 13:59
  • 3
    @T.E.D. True, but it's important for the layperson to grasp how broken the metric is, even before we decide where to apply it. If they simply assume "well, we can standardise on *this* is the type of line we'll count, and now life is good", they need to know that the flaws run more deeply than that. Everyone accepts that these kind of metrics are just for ballpark figures, but if you've got 50% error on your ballpark figure then it's clearly not much use. – Graham Sep 27 '17 at 10:47
  • It's not "broken", you just have to know how to use it...like any tool. SLOC is best used for delta comparisons using similar codebases and identical calculation methods. There is *no* code metric that is without problems, and IMHO SLOC at least has the virtue that its trivial to calculate. Its true there are many things we software professionals need to do with code metrics where using SLOC would be like driving a screw with a hammer. However, this is not one of those cases, so there's not much point of raising it as an issue. – T.E.D. Sep 27 '17 at 13:44
  • 1
    Agreed, if you have similar code, in the same language, written under the same coding standard and style guidelines (and ideally by the same person), then you may get valid numbers. The more you deviate from this narrow niche though, the less valid it becomes. (As a principal engineer, I often could not usefully compare LOC metrics from a junior engineer and from a senior engineer, because the two would solve problems in different ways.) And when it gets used as a proxy for the complexity of software, as used by the articles the OP references, I think we'd both agree it's badly misused. – Graham Sep 27 '17 at 15:39
  • Yes, pretty much all of that is true. However, if you want to come up with a simple number for "weight" of a codebase to throw at a layman, and you are dealing in factors of 10, its is perfectly adequate for that purpose. A "misuse" would be more like using it to compare the desirability of programming languages with 10-30% differences (which you annoyingly see people doing all the time). – T.E.D. Sep 27 '17 at 16:54
  • Also a car has many different CPUs scattered around it, so if the same OS or library goes into each then they are probably counting that multiple times. – Paul Johnson Sep 28 '17 at 06:50
  • 1
    @PaulJohnson Which is what I said... – Graham Sep 28 '17 at 16:15
  • @T.E.D. "if you want to come up with a simple number for 'weight' of a codebase to throw at a layman..." No, this is not valid, either. The aggregate number doesn't tell us how much of it is code they wrote themselves vs. code they took from somewhere but don't really do much with other than compile it. If I run the Linux kernel with 100 file modifications, this isn't the same as writing a 20 million line OS myself, and it's a several order of magnitude difference in how much code I'm actually modifying vs. compiling. – jpmc26 Sep 28 '17 at 23:57
  • I think you mean braces, not parenthesis, when you mention a C style convention of putting them on separate lines. – David Conrad Oct 18 '17 at 21:03
15

NASA Report on Toyota Camry Unintended Acceleration Investigation mentions 463,473 lines of code only in the engine control module.

Rsf
  • 2,844
  • 1
  • 12
  • 20
  • 3
    Not sure what that implies, that 0.5% of 100,000 lines of code. – Sklivvz Sep 26 '17 at 05:25
  • @Sklivvz: Compare engine control with a navigation system. Database, route finding, display / touchscreen driver, 3D rendering, GPS, ... – DevSolar Sep 26 '17 at 11:13
  • 7
    +1, its sourced and doesn't include lines for extras like navigation, or the robot arm used during manufacture. – daniel Sep 26 '17 at 12:01
  • 1
    That is well sourced LOC for a safety critical module, where the code was written well before 2005 (i.e. 15 year old code). Although the code is probably arithmetically generated, it is also performance critical (so you could assume there is not too much bloat). – Sean Houlihane Sep 28 '17 at 12:02
9

That inforgraphic provides sources for all the data it contains. The number in the particular claim about car software comes from this article, which specifically states that they counted LOC used in infotainment systems, which are typically based on a customized Linux kernel and include popular media codecs and communication stacks like Bluetooth.

As a personal experience, I work on a software used in exhaust sensors. We don't use LOC metric anywhere in the project, but a quick line count on the repository gives around 250'000 lines just for one sensor.

Dmitry Grigoryev
  • 2,427
  • 12
  • 21
  • 1
    ... and may also include Android, Java... Media codecs for sure... – Volker Siegel Sep 28 '17 at 10:26
  • How much of those 250k are tests? – transistor09 Sep 28 '17 at 20:38
  • 1
    @transistor09 none, we don't keep tests in source code. – Dmitry Grigoryev Sep 28 '17 at 21:57
  • I worked for a car infortainment platform 10 years ago. I can assure you that just the code to display the songs on your ipod (remember ipods?) was many thousand lines of code. And of course the album covers are displayed as icons, probably using a jpeg library... Another thing is: What is a high end car? If you look at Teslas, the LOC count likely jumped an order of magnitude, or two. It probably just jumped by a million after they realized that they better detect white semis doing u-turns on highways. – Peter - Reinstate Monica Sep 29 '17 at 08:55