30

I read in a book that NASA's Mars Climate Orbiter was lost on September 23, 1999, at a cost of $125 million, because one engineering team used metric units, while another one used inches for a key spacecraft operation.

Was it really a conversion problem in software or were reports oversimplified?

Sklivvz
  • 78,578
  • 29
  • 321
  • 428
splattne
  • 409
  • 3
  • 10

2 Answers2

28

I asked Peter Norvig, who was on the review board, and he gave me the following more nuanced answer (the question was asked in the context of type-safety in programming language choice):

The problem involved reading data from a file and a miscommunication about what the numbers in the file were. I don't know of any language, no matter how type-strict, that forces you to tag the string "123.45" in a file with the units of force (newtons vs foot-pounds), nor do I know of any language, no matter how type-loose, in which you could not impose such a convention if you wanted to.

Beyond the initial error, the reasons why the error proved fatal were more around organizational structure than around language choice:

(1) An anomaly was detected early on, but was not entered into an official issue-tracking database. Better practices would force all such things to be tracked.

(2) The team was separated between JPL in California and Lockheed-Martin in Colorado, so there were no lunch-time discussions about "hey, did you get that anomaly straighten out? No? Well, let's look into it more carefully..."

(3) The faulty code was not carefully code-reviewed, because of improper code re-use. On the previous mission, this file was just a log file, not used during flight operations, and so was not subject to careful scrutiny. In MCO, the file and surrounding code was re-used, but then at some point they promoted it to be a part of actual navigation, and unfortunately nobody went back and subjected the relevant code to careful review.

(4) Bad onboarding process of new engineers: The faulty code was written by a new engineer -- first week (or maybe first month or so -- on the job. This was deemed ok because originally it was "just a log file", not mission-critical.

(Personal communication 2011-06-14)

Larry OBrien
  • 15,105
  • 2
  • 70
  • 97
  • 2
    This is why people use semantic text files for data, e.g. XML `` – Sklivvz Dec 18 '11 at 20:21
  • @Sklivvz Yes, you can move units into your type-specification, but as Norvig says, there's always going to be some way around it (convert to string). Similarly, you can always test for such things, even in a language with arbitrarily loose types. – Larry OBrien Dec 18 '11 at 23:01
  • 8
    @Sklivvz in systems where data volume is critical (like low bandwidth, as this likely was) you eliminate as much as possible, and likely go for fixed length records with no delimiters or units, describing those in a design document (which apparently was lacking). I've worked on such systems in the past, cell phone transmission towers and now highway traffic management systems, where communications channels are severely limited. – jwenting Dec 19 '11 at 08:50
  • 1
    @jwenting I know. Sometimes people think it's a good thing to gain a little bandwidth for some risk. Generally they are wrong :-) – Sklivvz Dec 19 '11 at 13:58
  • I like this answer better than the official one because it explores the nuanced human factors. Official results, in order to be on solid ground, often say less. Like aviation accidents often conclude "failure to maintain airspeed", or "controlled flight into terrain", when the real reason was farther back in the chain of decision-making (or lack of it), and you have to read between the lines. – Mike Dunlavey Dec 19 '11 at 17:44
  • 1
    @jwenting, sklivvz - Last time some idiots tried to skim two bytes off of a date by dropping the first two digits from a year for the sake of having less data, the entire civilization nearly got destroyed. :) [ good point though. You do NOT use semantic data formats to solve the format conversion problems, as a rule ]. And yes, having had to spend 48 hours of 2000 new years on-site, with 40 degree fever, makes me call them idiots :) – user5341 Dec 19 '11 at 18:44
  • 1
    @jwenting But the interesting thing here is that the original file was not optimized for size (although I'd think everything on a space mission would have that consideration). As a logfile, "human readable" use-case trumps semantic markup (although I've come to think that's wrong, but that's another whole thing...). – Larry OBrien Dec 19 '11 at 19:06
  • @PeterNorvig - Its called XML! And its the data transmission standar – Chad Dec 20 '11 at 19:03
  • 4
    @Chad Not in 1998 it wasn't. And it's still not the standard for log files. – Larry OBrien Dec 20 '11 at 23:12
  • well said Larry. System I work on now depends on packing as much data as possible into a serial communications channel. No xml, bit manipulation so we can drive several pieces of hardware using a single byte of data. Only way to get the throughput needed to control thousands of electronic roadsigns with no more than a few seconds' delay. – jwenting Dec 21 '11 at 06:28
  • @LarryOBrien - Actually the XML Standard was adopted in 1998 from the SGML Standard. I did not say anything about log files I said data transmission. – Chad Dec 21 '11 at 15:05
  • 2
    @jwenting Yes, I too work on a system (telescope) where the idea of using XML for the RT elements is ridiculous. But at the higher level, the point I take from the story is that they didn't foresee that this was going to be a control file, so they didn't expend effort controlling / validating it. Once in place (as a log file) they didn't put the same thought & effort into evolving it they would have if it were a new requirement. – Larry OBrien Dec 21 '11 at 20:36
24

The MCO MIB has determined that the root cause for the loss of the MCO spacecraft was the failure to use metric units in the coding of a ground software file, “Small Forces,” used in trajectory models. Specifically, thruster performance data in English units instead of metric units was used in the software application code titled SM_FORCES (small forces). A file called Angular Momentum Desaturation (AMD) contained the output data from the SM_FORCES software. The data in the AMD file was required to be in metric units per existing software interface documentation, and the trajectory modelers assumed the data was provided in metric units per the requirements.

(emphasis mine)

Source: Mars Climate Orbiter Mishap Investigation Board Phase I Report

Project Cost:

$327.6 million total for both orbiter and lander (not including Deep Space 2). $193.1 million for spacecraft development, $91.7 million for launch, and $42.8 million for mission operations.

Source

Mission Logo:

Mission Logo

Ami
  • 615
  • 5
  • 10