Electric Alchemy: 2010

This week we have seen an extraordinary number of articles claiming that malware is at least partly to blame for the fatal crash of Spanair Flight 5022 which killed 154 of the 172 souls aboard.

Reading the headlines from this story leads one to believe that a trojan was used in a deliberate attempt to bring down an airliner. Digging further into the facts of the story, nothing could be further from the truth.

The aircraft in question is a 150,000lb twin turbofan MD-82 airliner. Aircraft like these do not ordinarily make "no flap" takeoffs. The pilots failed to follow the checklists properly, and applied takeoff power with flaps retracted, and stalled when exiting ground effect.

Source: CIAIAC

The image on the left shows the layout of the forward pedestal on the MD-82. There are multiple tactile and visual indications of flap/slat deflection in the cockpit of the MD-82.

The aircraft's crew made two passes through the pre-takeoff checklist. The first pass, the flaps were correctly set to 11'. However, before reaching the runway the crew noticed an abnormal RAT (ram air temp) indication and returned to the ramp for maintenance. The ground crew "resolved" the issue with the RAT sensor by disabling it, and the MD-82 again began taxiing for takeoff. However, the cockpit voice recorder captured several signs of trouble, indicative of a forthcoming cascading chain of failure.

Source: CIAIAC

The official report [1] from CIAIAC (the Spanish equivalent of the NTSB) shows that the flight was delayed, the cabin was hot, and the copilot had his mind on his dinner plans. The pilot interrupted the pre-takeoff checklist to ask the co-pilot to call for takeoff clearance. The copilot called for clearance on the wrong frequency. This is understandable, people make mistakes. However, what's unforgivable is that when the co-pilot ran through the "takeoff imminent" checklist, the pilot was "anticipating" rather than verifying and read back a flap deflection of "11", from memory, rather than actually confirming the position of the flaps.

Source: CIAIAC

Upon applying takeoff power, the TOWS (take off warning system) should have provided an audible warning that takeoff flaps/slats were not set properly. This did not happen. So the TOWS system was infected with malware/trojans, right? NO. The TOWS system itself was disabled, but this is an onboard aircraft system with no IP connection, no USB ports, and no operating system familiar to everyday malware authors. There is quite a bit of misinformation being spread on this point, with security boffins and AV vendors latching on to the malware point. No, it was not the onboard TOWS system which had been infected. So why did the TOWS fail to callout a warning?

The report goes into significant technical detail on this point, and it's a bit more complicated than an open circuit breaker or a stray bit of malware.

Source: CIAIAC

The issue with the energized ram air temp heater was indicative of a relay failure which would put the aircraft's systems in "flight mode" which would not only explain the high RAT (the probe has a heating element which is enabled in flight but disabled on the ground) but also the disabled TOWS (which only operates in 'ground mode').

So, in a nutshell, if the ground sensing system fails to flight mode, we would have a situation where the TOWS would be disabled, and the ram air temp heater (among other things) would be enabled. So how would we know if the ground sensing system had failed?

Well, digging deeper, we find that on this particular aircraft, there were numerous instances of high RAT readings on the ground:

Source: CIAIAC

So in three days prior to the accident we had six abnormal RAT readings, while the aircraft was on the ground, as recorded by the digital flight data recorder (DFDR). Surely the crew would start to pick up on the fact that something was not right.

Source: CIAIAC

No, because the three RAT abnormalities that were actually entered into the aircraft's technical log book (ATLB) were reported by three different crews. Doh!

So this is (finally) where the malware issue starts to enter the picture. The off-board system in question was responsible for correlating, scoring and alerting on situations just like this. A computer system is ideal for this type of scenario, where the air crews and maintenance personnel rotate frequently. So why didn't the system fire an alert?

On this issue the official report from the CIAIAC is silent, though perhaps the forthcoming inquiry [2] will shed some light. Perhaps the threshold for RAT anomalies wasn't reached. Maybe this was because the system was only made aware of the 3 anomalies that were actually entered into the ATLB, rather than the 6 that were detected by the DFDR on board the aircraft. Or perhaps the malware present on the scoring and alerting system prevented the system from working as expected.

In any case, this tragic event is typical of so many in aviation. It's primary cause, in the opinion of this security consultant and pilot, is what we aviators call "Get There Itis". The flight was delayed, the cabin was hot, the copilot had dinner plans, and the passengers and flight attendants were grumpy. The pilot breezed through the "takeoff imminent" checklist, repeating from memory "11" degrees of flat deflection rather than verifying the position of the flaps/slats on the numerous indicators present in the cockpit.

The throttles were pushed forward, the stick was pulled back, and the jet momentarily became airborne before stumbling out of ground effect into a fireball that killed 154 people.

Did malware have anything to do with this tragedy? Perhaps. But it's certainly a tertiary factor. A number of recommendations came out of the post accident investigation, and these were specifically to:

Recommend that TOWS systems are checked for proper operation before each flight, rather than once per day
Recommend that checklists be streamlined to ensure that critical items (such as setting flap/slat deflection for takeoff) are performed without interruption, and verified

Source: CIAIAC

As to malware on maintenance systems? Of course that's undesirable. However, modifications to checklists and operational procedures are our best and most important defense against similar accidents going forward. Keeping airlines' computer systems free of malware is a completely reasonable requirement. Would a malware-free maintenance system have prevented this accident? Perhaps. But properly trained and professional aviators certainly would have mitigated this tragedy.

[1] http://www.skybrary.aero/bookshelf/books/777.pdf
[2] http://www.theregister.co.uk/2010/08/20/spanair_malware/

EA Principal Consultant David Campbell will be delivering a presentation at the 4th annual Rocky Mountain Information Security Conference tomorrow, 5 May 2010 in Denver, CO.

The slide deck (pdf) is posted and the abstract of the presentation is as follows:

For many years now penetration testing has been an essential component of an effective information security risk management program. PCI DSS even requires that penetration testing be performed at regular intervals. The “blackbox” or “zero knowledge” approach used by most internal penetration testing teams and external consultancies seeks to identify exploitable vulnerabilities which could lead to a compromise. This testing methodology was borne out of a desire to emulate the attacker’s view of the enterprise, in order to identify high impact vulnerabilities and remediate them with priority.

While blackbox testing captures many attributes of the “attacker’s eye view” of the threat landscape, it fails to adequately account for the fact that an Advanced Persistent Threat may have considerably more time to spend penetrating a target than a security consultancy which charges by the day or by the hour. In order to level the playing field, a paradigm shift is necessary in the industry. Penetration testing performed using a “greybox” approach, in which the security assessor is provided with source code or other relevant software artifacts during an engagement is a highly effective alternative to “zero knowledge” testing. This presentation will showcase real world cases where critical vulnerabilities were identified during greybox engagements, which had previously been overlooked by multiple blackbox assessments.

Attendees of this session should expect to gain an understanding of why not all penetration testing activities are created equally, and how to choose the right type of penetration testing to satisfy their risk mitigation requirements while managing budget, time and compliance constraints.

Electric Alchemy

Friday, December 17, 2010

EA Research Drives WSJ's "What They Know" Smartphone Investigation

Wednesday, August 25, 2010

Malware in Spanair Fatal Air Crash Case: FUD or a real factor?

Tuesday, June 1, 2010

FROC2010

Tuesday, May 4, 2010

Presentation: "The Future of Penetration Testing" at RMISC

Moderated Infosec News Feed

Archive

Stay Connected