As pro bono President of Engineers Ireland for the last 12 months, one of the privileges of the work was to prepare and give my “Presidential Address”, in fact on numerous occasions across the country and in London.
I chose as my theme what is it about software which makes it so difficult to correctly engineer ? At a time when the entire engineering profession is moving towards higher standards and regulation of Chartered Engineers across all disciplines, can software engineering achieve the high standards and rigour demanded by Chartered Engineers in the other engineering sectors ?..
I wrote the talk for non-specialists, engineers from other disciplines, and for non-engineers and for members of the public. As a software engineer myself, I have thus admittedly taken some liberties and tried to explain things a little more simply than often they are. The speech is also quite long, but I hope you may find it interesting..
A video recording of me giving this talk appears on Engineers TV, with my slide set. Aspects of my talk were inspired by Scott Rosenberg’s epic “Dreaming in Code”, and do read this if you haven’t already done so. The various software disasters which I list are quoted multiple times by multiple sources on the internet…
In late September 1983, the world almost came to an end in nuclear annihilation. One man saved us.
Three weeks earlier the Soviets had deliberately shot down Korean Airlines Flight 007, a Boeing 747, which had strayed into sensitive Soviet air space over Sakhalin island. 269 passengers and crew were killed. NATO were just about to start the Able Archer exercises that involved raising alert levels across Europe to simulate a Soviet attack. In fact there was a series of psychological warfare exercises aimed at Moscow, including naval maneuvers in the Barents Sea near Soviet submarine bases. President Andropov was concerned about a surprise nuclear attack by President Reagan, and had thus requested his spy networks to collect evidence of any preparations.
44 year old Lt. Col. Stanislav Petrov was duty officer at Serpukhov-15, an early warning satellite system post, just outside of Moscow. Shortly after midnight, alarms started sounding. Usually, an alarm about a single missile launch in the US did not immediately go up the command chain to general staff. But in this case, the satellite reported one intercontinental missile launch, then another, then another. Soon, five Minuteman ballistic missiles had apparently been launched from the US and were speeding towards Russia. The report of a missile salvo had quickly generated an automatic alert at general staff headquarters. They in turn had full authority to respond, because there was not yet a system available to give President Andropov a remote-control role in such time critical decisions. Further, I am told that the Soviet missile systems at the time were primarily liquid fuelled, and needed a certain amount of time to be prepared for launch: time was absolutely of the essence in launching a counterstrike before the inbound US missiles destroyed the Soviet counterstrike capability.
Petrov had been warned that a US missile strike would be massive, an onslaught designed to overwhelm the Soviet system. Petrov was trained to recommend an immediate counter-launch, but he waived: he said afterwards that he had a funny feeling in his gut, he didn’t want to make a mistake. Less than five minutes after the alert began, Petrov told his superiors that the launch reports must be false. If the US wanted to start a war, he reasoned it would have launched more than five missiles.
His guess was right. Later it transpired that the software in the Russian early warning system mistook sunlight reflecting off clouds as missile plumes. The false software alarm was unknown to the West at the time, and occurred during one of the most tense periods of the Cold War.
Safety is a critical ethos of engineering. Engineering works which are not developed and subsequently maintained in a professional manner can cause not just inconvenience, but material damage and even loss of life. Computer software is now a technological foundation for society – especially military systems, but also security and intelligence systems, financial trading systems, telecommunications and mobile phones, aircraft avionics, car and truck and ship control and management, power generation and distribution, logistics and management of goods in transit – the list seems almost endless.
Software failures can, and have, caused substantial damage, and loss of life. Why is it so difficult to correctly engineer software ? Can software ever be safely and rigourously designed, as do the traditional engineering disciplines ?…..
To answer these questions, it may be helpful to first have a basic understanding of the boundary between computer hardware and software. The basic structure of most general purpose computers, including the laptop or desktop computer which you may routinely use, is of a central processing unit (or CPU); a memory, which in most computers loses its content when the power is turned off; one or more disks, which provide longer term memory which survive power downs, but which are slower than the main memory; a keyboard and mouse device; and a screen. There may well be other peripherals too such as network devices, camera, speakers and microphone and so on. All of these, and in particular the CPU and main memory, are interconnected by a fast backplane, or “bus”.
Lets focus on the CPU and main memory. The main memory holds software programs which are being executed by the CPU, together with data also in the memory. The CPU itself consists of an arithmetic unit, capable of carrying out in hardware at least addition and subtraction, together with other hardware operations such as testing for positive or negative, and logical operations with patterns of bits. The CPU also has a program counter which identifies the location with the main memory which holds the current machine instruction. The CPU executes a program by stepping through the indicated list of instructions, including making use of subroutines to which certain tasks can be subcontracted before handing back control to their caller.
The design of a CPU includes specifying what specific machine instructions can be carried out in the hardware of the CPU. This portfolio of instructions defines the boundary between the hardware and the software for the machine. A software program specifies a list of specific instructions in a program; the hardware engineer has to ensure that the CPU accurately implements every instruction. Many CPU chip manufacturers design their own instruction sets for their own CPU chips – thus Intel chipsets have different instruction sets to say Motorola chipsets.
Each CPU instruction set works on batches of bits at a time; 8-bits for example, representing any number from 0 to 255; or 16 bits, representing any number from 0 to 65,535. The size of the batch of bits manipulated at a time can be important….
Differences between different computers
In 1991, 28 died and over a 100 were injured when a US Patriot Missile in Saudi Arabia destroyed a US army barracks rather than intercepting an incoming Iraqi missile. A software rounding error had caused a mistake in accurately calculating time, leading to the Patriot system ignoring the incoming Scud. A hardware clock counted the number of tenths of a second since the ground system has been powered up over 4 days earlier. The software guidance system multiplied the clock value by 1/10 to obtain the time in seconds. The calculation was formed in a 24 bit fixed point register. The binary expansion of 1/10 is 0.00011001100110011 recurring. Using 24 fixed bits gave an error of 0.0000000000000000000000011001100… binary, or about 0.000000095 decimal. Over 100 hours of operation, that is a timing drift error of 0.34 seconds. A Patriot travels at about 1,600 meters per second, or about half a kilometer in that time. In turn this caused the Patriot to consider the Scud outside of the Patriot’s “range gate” and thus ignorable.
The software designers of the Patriot system had not taken rounding errors into account. Their computer used 24 bits: ironically, had a more modern 32 bit machine been used, they would have gotten away with it: the extra 8 bits would have given 256 times as many hours before the rounding error would have been significant, ie 25,600 hours or almost 3 years after powering up the system…
Over time, the capacity, speed and cost of different CPUs have improved with time. Different CPUs, from different manufacturers, may have different instruction sets. A software program is a list of specific instructions, and thus must be written for a particular instruction set. In moving to a different computer, software may therefore need to be re-written. In the early days of computing this was indeed the case, although this problem has now been solved to a greater or lesser extent.
Writing software by listing the specific machine instructions required is tedious and error prone, although it was the way in which computers were originally programmed. Then, in 1953, John Backus created the Fortran programming language. Fortran enabled software programs to work at a more abstract level than machine instructions. Inspired by Grace Hopper and others, Backus wrote a remarkable program, called the Fortan compiler, which is a software program which reads any Fortran software program and converts it into an equivalent and longer list of machine instructions, for any of several different instruction sets.
Fortran gave a quantum step in computing. Instead of talking to a machine at its own level of the hardware, a programmer could work at something closer to human language. In fact the programmer did not have to concern herself about which particular instruction set might be used for a specific computer made by a particular manufacturer – programs could be written to be independent of any particular hardware. As Scott Rosenberg said, rather than “hand to hand combat with the machine”, programmers could now communicate much more directly. It was as if instead of t-a-l-k-i-n-g i-n l-e-t-t-e-r-s, programmers could now talk in syllables and words.
With the advent of Fortran, it became much easier for software engineers to share their work together. A programmer could write a library of useful subroutines – for example of hyperbolic and trignometric functions – and share them with others, even if they were using a different computer with a different instruction set. As long as there was a Fortran compiler available, then the library could be re-used in new situations, and automatically translated to the idiosyncracies of any particular instruction set.
Equivalency between programming languages
Fifty years later, there are now myriad programming languages, probably running to several hundred, each specialised in some way. In the accounting field, Cobol and Basic remain popular. In numeric processing, Fortran still makes its mark. But amongst the general mass of programmers, particularly those working with the web, Java, Perl scripting, and Ruby are very popular.
However it is interesting to note that whatever programming language you use, it is technically possible to re-implement your work in any other programming language – this is the Church-Turing thesis of algorithm logic, and indeed is the theoretical basis for why the Fortran compiler (or any other programming language translator) is even possible.
If that is the case, then in principle and at some abstract level, all programming languages are equal, although one may run faster on a given piece of hardware than another. In principle a German speaker can learn Chinese and vice versa, it is just that a German speaker speaking Chinese may be slower than a native Chinese speaker. Programming languages differ in speed of execution and in how well the constructs of the language directly support particular fields – scientific computations, financial processing, the web, and so on.
Given that in theory any two programming languages can be implemented in each others terms, and given that so much software has accumulated over the years using different programming languages, then it should be possible for any of the newer languages to use libraries built with any of the older ones, right ? If I am a Java programmer, then surely I should be able to use, say, a Fortran library of trignometric and hyperbolic functions ? In general, today’s answer is “yes”: software tools and middleware have been developed which automatically convert the format of data and numbers, and the invocation conventions, of one programming language to another.
So is it relatively easy for different programming teams to work on different components of a complex software system, possibly using differing languages, and then just “glue” the system together ? Actually, not always: let me tell you of a case where NASA subcontracted some work to Lockheed Martin for the Mars Climate Orbiter.
Re-use of libraries and components
In 1998, the 125M$ Mars Climate Orbiter completed its 286 day journey from earth and fired its engines to go into Martian orbit. They fired, causing it to burn up in the Martian atmosphere at a height of 60 km above the planet, instead of the 160km planned, and about 25km lower than the height at which it could still operate properly. The software controlling the thrusters written by Lockheed Martin used imperial units rather than metric units used by NASA in the rest of the spacecraft.
So, you are probably thinking, software should therefore be built using standardised plug-in parts which are well specified and well understood. Surely the software industry has defined a set of small, indivisible and substitutable collection of software components, rather like Lego-bricks? They then could be snapped together to make arbitrary complex combinations. After all, this is what electronics engineers do with their resistors, transistors, capacitors and chips; as do civil engineers, mechanical engineers, aeronautical engineers and so on. In other engineering disciplines, there are standard components used in each industry – bridge construction, mobile phone network roll-out, engine development and so – which can be assembled into complex structures. Each component is well defined, and each has a specific, well-understood function. There may be different manufacturers who can supply a specific component, but each one of them functions similarly, although each may differ in cost or lifetime and so on.
Does software use standardised parts ?
Components of actual software programs vary tremendously in size, in structure and function. As Rosenberg has said, it is as if different components for a bridge ranged from a centimetre to ten kilometres long. Software components can also vary tremendously in their degree of coupling to other components: just like a single component of a bridge requiring very many precisely fitting joints to connect to other components. Software components are rarely substitutable: there are no widely accepted industry standard specifications of a basic, common, industry-wide set of components available from a range of suppliers. It is as if each component of a bridge were available only from a single manufacturer, and different components from different manufacturers very rarely ever simply join together. Programming is often largely a craft in which each software artifact is custom designed and built, rather than an organisational enterprise like mass production manufacturing.
Larry Constantine in the 1990s observed that “most programmers like to program. Some of them would rather program than eat or bathe. Most of them would much rather cut code than chase documentation or search catalogues or try and figure out some other stupid programmer’s idiotic work…Other things being equal, programmers design and build from scratch rather than re-cycle”. Constantine noted “if it takes the typical programmer more than two minutes and twenty seven seconds to find something, they will conclude it does not exist and therefore will reinvent it.”
Today, search engines help programmers search for existing code, and usually less than in two minutes twenty-seven seconds. Each modern language has numerous re-useable components. But still re-using software in new situations is hard. Let me tell you what happened when the software engineers who built the space flight software for the European Ariane 4 rocket installed their work into the next generation Ariane 5 in 1996.
Just 39 seconds into the maiden launch of the 300M euro European Ariane 5 rocket, with a cargo of four satellites, at an altitude of 2.5 miles, the rocket swerved off course and the strain of the aerodynamic forces of the three boosters on the rocket core caused a self-destruction. The rocket had been making an abrupt course correction for a wrong turn which had not in fact happened. The inertial guidance system, using gyroscopes and accelerometers, had generated bizarre and impossible data, which were not in fact data at all. They were just a diagnostic message to say that the guidance system had switched itself off. The guidance system had shut down because it tried to convert the lateral velocity of the rocket from a 64 bit format to a 16 format, and there had been numeric overflow. A second, backup, guidance system had then automatically taken over but failed in the same way because it ran the same software. The software designers had assumed that the particular velocity value would never be larger than the 16 bits allocated. And in the past, on the Ariane 4 rocket, it never had been. But Ariane 5 was more powerful and faster. Further: the calculation with the error, which shut down the guidance system, which confused the on board main computer, which forced the rocket an abrupt course correction, which almost ripped the boosters from the rocket causing a self-destruction, in fact served no purpose after lift-off: its purpose was to initialise the system prior to launch. But engineers some time earlier in the Ariane programme decided to leave the function running for the first 40 seconds of flight because it made it easier to resume a countdown sequence if there was a brief hold in a countdown.
Students of architecture are taught the classic forms and structures of design. They study the great works and icons of their profession, and learn the careers of the great protagonists of their art. Classical engineering is often instructed in the same way: civil engineers not only study the great buildings, but also of the great bridges and transport systems. Mechanical engineers learn of the great engine designs which advanced their profession. Electronics engineers study model circuits, by which their profession advanced.
Until the last fifteen years or so, students of software engineering rarely studied the great works of their profession. The great works were unavailable to be studied: software companies jealously guarded the software code of their products, only allowing a limited number of employees to see their inner workings. It was if the great works of literature were locked away, unavailable for the next generation to study. The open source movement, originally promoted only by certain academics in the mid 1980s, then started publishing software code. Some of these works are classic works of software, and in particular the Linux system and environment.
But even when works of software are studied, the sheer size of a software system can be daunting. The English translation of Tolstoy’s “War and Peace” has about 43,000 lines. One of the longest piece of literature in any language is “Tokugawa Leyasu” by Sohachi Yamaoka, completed after 17 years in 1967 in 40 volumes with over 10 million Japanese characters, and about 770,000 lines. The Linux kernel has about 11 million lines. Windows XP has approximately 40 million lines for the operating system. The full release of Debian version 4 for Linux has about 283 million lines of code, which is thus about 6,500 times as big as “War and Peace”.
How can the construction of large software systems, running to millions of lines of code, many software developers, and potentially large sums of money, be managed ?
Software Project Management
In July 2007 an agency of the Irish Department of Health and Children abandoned the Personnel, Payroll and Related system for 120,000 health care workers nationwide. The system used the SAP human resources package, under a managed service agreement with IBM. The project rollout was managed by consulting firm Deloitte. It was originally budgetted at 9Meuro in 1999. By the time it was abandoned, it had cost some 220Meuro to date. In one incident with the system, one employee was overpaid by 1Meuro as part of an electronic funds transfer error.
How can you measure and manage the progress of a software project ? The fruit produced each day by a software engineer is software code. As a manager, you can of course measure the lines of software code written so far, and thus how productive your software developers are. However, this is misleading: large cumbersome bloated code may contain many lines, but a more concise carefully written version may perform better and often will be easier for others to understand. There are other metrics: how many features have been developed so far – but then features can vary tremendously in scope and effort required to implement; how many defects have been detected and repaired – but sometimes as the defect rate rises, paradoxically you may be closer to a finished project; and so on.
How long will a software project take ? How do you know when a piece of software is finished ? It’s like asking how did Joyce know that “Ulysses” was finished ? Is there no single chapter, no single paragraph, no single sentence that Joyce might have wished he had time to gently re-work ? Most pieces of literature can always be improved, and certainly all software programs can be improved to clarify their structure, improve their performance, simplify their use and so on. So although a piece of software may apparently do what it is supposed to do, and may actually perform in the way it is supposed to do so, it usually is never finished. Software engineering professionals thus usually have to resign themselves to accepting that although each of their written works could probably be improved and made more elegant, it nevertheless has to be published and used in its current state.
If it is published in it’s current state, how do you know that the software works ? Let me tell you about an embarrassment for AT&T in the USA.
In the afternoon of Jan 15th, 1990, over 9 hours, 75 million phone calls failed and 200,000 airline reservations were lost in the USA. The USA nation wide AT&T telephone network collapsed when one single line of a program in a complex software program was wrong. The entire network of 114 4ESS switching systems – the world’s first fully digital switch, launched in 1976 – was undergoing a software upgrade and the new program had been installed in all of them. One of these telephone exchanges then had a minor mechanical problem and automatically shut itself down in an orderly way, just as it was supposed to. The fault was then manually repaired by a technician, and the exchange put back online. On its first attempt to place a call via another switch, the neighbouring switch automatically detected that the first switch was operational again, just as it was supposed to do. The neighbouring switch began re-setting its internal data and call routing maps accordingly, when a second call attempt came from the first switch. A single line software defect then caused confusion in the neighbour, which was still in the middle of its resetting its internal data and call routing maps. The neighbour then automatically switched itself off, and then automatically re-started. In turn this caused identical failures across the entire network, in a chain reaction.
How do you know whether a program is correct and works, after it has been written ? How do you know that it does what its supposed to do ? The obvious way of course is simply to test it: give it data and inputs, and see whether it produces the correct behaviour and results. Usually there are a large number, or even infinite number of test scenarios, with a finite amount of time and resources, with a consequence that it is impossible to test everything. Frequently, one is reduced to testing for some representative sample of inputs, with the risk that that implies. Consider a mobile phone: it may have a limited (if large) number of functions, but the sequence of functions you can ask it to do are infinite: can all of these sequences be tested ? Further, the relative timing of inputs and events may produce incorrect behaviour: in the AT&T network example, a second incoming telephone call before a switch had fully reset itself caused the failure.
Software engineers have frequently operated under instruction that their work be “good enough” – that a software system has no major problems, and should be OK to use, even though it may not be error free. But what is a definition of “good enough” ? Is it by consensus across the team ? Is when predetermined criteria are met, and then testing be stopped ? Should testing cease once the costs of testing (including salary costs) have begun to climb above what is acceptable ? Do you stop testing when the rate of discovery of defects drops below an agreed level ? Or, as happens persuasively and usually indisputably, you stop testing when your manager says “stop testing and ship it”.
Rather than testing for some subset of possible inputs and timing conditions, could the software test itself ? Self-testing software is of course common, and it is also common for software engineers to ensure that their software verifies that certain design assertions are valid as the software progresses. [In my presentation, I then had a slide showing a logic proof for a program calculating the greatest common divisor of two positive integers]. If such assertions are provided by the programmer, the software can of course automatically check whether the assertion is actually valid, each time the software is run.
Then there is of course the issue of what should be done if the software discovers that a design assertion is actually invalid. If the assertion is suddenly found to be invalid, how should the system deal with the failure in its design ?
Some computer scientist theorists have explored whether it is possible to prove that an arbitrary software program is correct, in the sense of a logical mathematical proof. If it were possible to do so, then in principle a software program could be written which would read any other software program, and using mathematical proof techniques, deduce a proof for this program. In a way, this is reminiscent of the enormous step forward made by Backus: a program (the FORTRAN compiler) which could read any other program (written in FORTRAN) and automatically understand it.
Program proving remains an important research topic, but for many substantial programs, the proofs necessary become almost intractable for humans to follow, and likewise with the mathematical statement of what the outcome of the program is, difficult to understand. In Douglas Adam’s Hitchhikers Guide to the Galaxy, the theme of the entire book is finding the ultimate answer to the universe. On the last page, the answer is in fact eventually discovered: it is 42. An automatically generated program proof might be seen in somewhat the same light: the real question is how did the answer come about, and can how we understand how the answer was formed…
Design, and the lack of a physics of software
On 2nd January 2002, the UK Public Records Office launched access to the 1901 census of England and Wales, containing records for over 400 million people. The new site was well advertised, and there was immense interest: at one point, more than 1.2 million people simultaneously tried to access the site. The site collapsed and failed, and was only intermittently available during the rest of January. After two weeks in February, the service was switched off, and then entirely re-designed. The service did not go live again until the following September.
In March 2008, Ryanair’s web site underwent a massive upgrade and was shut off for an entire weekend. It then stalled under load once it was powered up. In October 2008, the Aer Lingus web site crashed and was unavaliable for a couple of hours, under the load created by the airline’s first “no fare” offer.
A component can be specified by a function: give this value, or signal, as input, then this will be the corresponding output. If you load this beam in a bridge, then this will be the expected deflection, and stress ; if you increase the air flow, then this will be the exhaust manifold pressure; if you increase the inductance, then this will be the output voltage; and so on. Indeed software tools, such as CAD systems, help engineers design and build machines and artifacts, predicting how for example a specific design for a bridge, an airframe, a powerplant, or a transformer substation will behave under various operating conditions.
But can a CAD system predict how a design for a software system will behave ? Software engineers build CAD systems for other engineering disciplines: have we software engineers built a CAD system for ourselves ?
The answer in general, and sadly, is no, not yet with our current state of the art.
Simple software components can behave in a similar fashion to other engineering components, such as beams, manifolds, pumps, induction coils and so on – when you change the input, this is what the output will be. Most software components, like many other engineering components, in fact have more than one input variable – ie more than one degree of freedom – and more than one output value. So: again, why cannot software components be managed like other engineering components – what precisely is the issue preventing this ?
The biggest issue is that software components almost invariably have state: they can record values, and their outputs in the future can depend on what inputs they have seen in the past. The concept of time and state is intrinsic in many software components as they model the real world: not only, “what is my bank balance now ?”, but also “what was my bank balance 2 weeks ago, or at 4.30pm on the 15th November 2005 ?…” If you are a structural engineer, can you imagine working with say a beam whose deflection response could be a complex derivation of all previous loadings, or whose deflection response now was some function of a load at an arbitrary specific date and time in the past, such as 4.30pm on the 15th November 2005 ?..
Software mathematics and theory have yet to produce a tractable methodology for the physics of state and how past behaviour modifies current and future. Z-transforms in mathematics do greatly assist digital signal processing, by extracting key properties and decomposing complex composite sequences. Temporal logic extends classical predicate logic with modal operations, but it is difficult to apply these to prove practical programs. Markov chains and stochastical petri nets both help model events sequences and concurrent operations, but likewise have proved difficult to apply to practical programs. State transition diagrams work well for software control systems, and statecharts (as used in UML) help structure these into higher level views, but it remains uncertain whether they can be applied to general programs.
As software engineers, we are getting better than we were a decade ago at capturing the specification for a system; we are getting better at designing an implementation of that specification; we are getting better at predicting the performance response of that implementation; we are getting better at building that implementation by re-using other components; and we are getting better at describing our implementation so that other software engineers after us can maintain and extend what we have done.
But nevertheless we still lack a tractable physics of software that allows our profession to reason and predict how complex assemblies of varieties of software components will interact and behave under all the potential operating conditions which they may encounter during their lifetime. Without such a physics, it appears difficult to analytically evaluate alternative designs; without such a physics, it appears difficult to understand our systems, how they actually operate and sometimes why they fail; without such a physics, it appears difficult to confidently provide design guidelines; and until we have such a physics, software engineering may continue to be a creative undertaking based on heuristics and experience, rather than an applied science with a sound analytical base.
I hope it is clear from the case studies which I have given you in this presentation, that software failures are dangerous. I hope too that I have given you a few insights about the state of the software profession, and why there are deep issues about the safety of software.
It is Engineers Ireland view that all engineering projects which may affect health and safety of the public, or may damage property, should be certified by a professional engineer. Currently the Irish engineering profession is weakly regulated, and compares unfavourably to certain other jurisdictions in which certification of all engineering works is legally required. Engineers Ireland is taking initiatives to safeguard the public against poor engineering judgement and unprofessional analysis.
For software, it seems natural to expect similar regulation and certification of software systems. For example the EU medical devices directive requires certification of devices for healthcare – thus not only of electrical switches, plugs, cables, electromechanical devices and so on, but also of the software embedded into intelligent medical devices.
Certifying that an engineering work is safe has reputational, fiscal, and ethical liabilities. Certification that software system is safe raises deep issues about the nature of software itself. This is a global challenge but also an opportunity for smart and innovative engineers. Those who can reflect sufficiently deeply on the nature of software itself, and then derive a solid and pragmatic framework for ensuring that software can be safely designed and then certified as free from risk, will benefit mankind.