The Discounted Failure Pitfall for autonomous system safety

The Discounted Failure Pitfall:
Arguing that something is safe because it has never failed before doesn't work if you keep ignoring its failures based on that reasoning.


A particularly tricky pitfall occurs when a proven in use argument is based upon a lack of observed field failures when field failures have been systematically under-reported or even not reported at all. In this case, the argument is based upon faulty evidence.

One way that this pitfall manifests in practice is that faults that result in low consequence failures tend to go unreported, with system redundancy tending to reduce the consequences of a typical incident. It can take time and effort to report failures, and there is little incentive to report each incident if the consequence is small, the system can readily continue service, and the subjective impression is that the system is perceived as overall safe. Perversely, reporting numerous recovered malfunctions or warnings can actually increase user confidence in a system even while events go unreported. This can be a significant issue when removing backup systems such as mechanical interlocks based on a lack of reported loss events. If the interlocks were keeping the system safe but interlock or other failsafe engagements go unreported, removing those interlocks can be deadly. Various aspects of this pitfall came into play in the Therac 25 loss events (Leveson 1993).

An alternate way that this pitfall manifests is when there is a significant economic or other incentive for suppressing or mischaracterizing the cause of field failures. For example, there can be significant pressure and even systematic approaches to failure reporting and analysis that emphasize human error over equipment failure in mishap reporting (Koopman 2018b). Similarly, if technological maturity is being measured by a trend of reduced safety mechanism engagements (e.g. autonomy disengagements during road testing (Banerjee et al. 2018)), there can be significant pressure to artificially reduce the number of events reported. Claiming proven in use integrity for a component subject to artificially reduced or suppressed error reports is again basing an argument on faulty evidence.

Finally, arguing that a component cannot be unsafe because it has never caused a mishap before can result in systems that are unsafe by induction. More specifically, if each time you argue it wasn't that component's fault, and therefore don't attribute cause to that component, then you can continue that behavior forever, all the while denying that an unsafe component is OK.

(This is an excerpt of our SSS 2019 paper:  Koopman, P., Kane, A. & Black, J., "Credible Autonomy Safety Argumentation," Safety-Critical Systems Symposium, Bristol UK, Feb. 2019.  Read the full text here)

  • Leveson, N. (1993) An investigation of the Therac-25 Accidents, IEEE Computer, July 1993, pp. 18-41.
  • Koopman, P. (2018b), "Practical Experience Report: Automotive Safety Practices vs. Accepted Principles," SAFECOMP, Sept. 2018.
  • Bannerjee, S., Jha, S., Cyriac, J., Kalbarczyk, Z, Iyer, R., (2018) “Hands Off the Wheel in Autonomous Vehicles?: A Systems Perspective on over a Million Miles of Field Data,” 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2018.

The “Small” Change Fallacy for autonomous system safety

The “small” change fallacy:
In software, even a single character source code can cause catastrophic failure.  With the possible exception of a very rigorous change analysis process, there is no such thing as a "small" change to software.


Headline: July 22, 1969: Mariner 1 Done In By A Typo

Another threat to validity for a proven in use strategy is permitting “small” changes without re-validation (e.g. as implicitly permitted by (NHTSA 2016), which requires an updated safety assessment for significant changes). Change analysis can be difficult in general for software, and ineffective for software that has high coupling between modules and resultant complex inter-dependencies.

Because even one line of bad code can result in a catastrophic system failure, it is highly questionable to argue that a change is “small” because it only affects a small fraction of the source code base. Rigorous analysis should be performed on the change and its possible effects, which generally requires analysis of design artifacts beyond just source code.

A variant of this pitfall is arguing that a particular type of technology is generally “well understood” and implying based upon this that no special attention is required to ensure safety. There is no reason to expect that a completely new implementation – potentially by a different development team with a different engineering process – has sufficient safety integrity simply because it is not a novel function to the application domain.

(This is an excerpt of our SSS 2019 paper:  Koopman, P., Kane, A. & Black, J., "Credible Autonomy Safety Argumentation," Safety-Critical Systems Symposium, Bristol UK, Feb. 2019.  Read the full text here)
  • NHTSA (2016) Federal Automated Vehicles Policy, National Highway Transporta-tion System Administration, U.S. Department of Transportation, September 2016. 

Assurance Pitfalls When Using COTS Components

Assurance Pitfalls When Using COTS Components:
Using a name-brand, familiar component doesn't automatically ensure safety.



It is common to repurpose Commercial Off-The-Shelf (COTS) software or components for use in critical autonomous vehicle applications. These include components originally developed for other domains such as mine safety, low volume research components such as LIDAR units, and automotive components such as radars previously used in non-critical or less critical ADAS applications.

Generally such COTS components are being used in a somewhat different way than the original non-critical commercial purpose, and are often modified for use as well. Moreover, even field proven automotive components are typically customized for each vehicle manufacturer to conform to customer-specific design requirements. When arguing that a COTS item is proven in use, it is important to account for at least whether there is in fact sufficient field experience, whether the field experience is for a previous or modified version of the component, and other factors such as potential supply-chain changes, manufacturing quality fade, and the possibility of counterfeit goods.

In some cases we have seen proven in use arguments attempted for which the primary evidence relied upon is the reputation of a manufacturer based on historical performance on other components. While purchasing from a reputable manufacturer is often a good start, a brand name label by itself does not necessarily demonstrate that a particular component is fit for purpose, especially if a complex supply chain is involved.

COTS components can be problematic if they don't come with the information needed for safety assessment. (Hint: source code is only a starting point. But often even that isn't provided.) While third party certification certainly is not a panacea, looking for independent evaluation that relevant technical, development process, and assurance process activities have been performed is a good start to making sure COTS components are fit for purpose.

(This is an excerpt of our SSS 2019 paper:  Koopman, P., Kane, A. & Black, J., "Credible Autonomy Safety Argumentation," Safety-Critical Systems Symposium, Bristol UK, Feb. 2019.  Read the full text here)

Proven In Use: The Violated Assumptions Pitfall for Autonomous Vehicle Validation

The Violated Assumptions Pitfall:
Arguing something is Proven In Use must address whether the new use has sufficiently similar assumptions and operational conditions as the old one.


The proven in use argumentation pattern uses field experience of a component (or, potentially, an engineering process) to argue that field failure rates have been sufficiently low to assure a necessary level of integrity.

Historically this has been used as a way to grandfather existing components into new safety standards (for example, in the process industry under IEC 61508). The basis of the argument is generally that if a huge number of operational hours of experience in the actual application have been accumulated with a sufficiently low number of safety-relevant failures, then there is compelling experimental data supporting the integrity claim for a component. We assume in this section that sufficient field experience has actually been obtained to make a credible argument. A subsequent section on field testing addresses the issue of how much experience is required to make a statistically valid claim.

A plausible variant could be that a component is initially deployed with temporary additional level of redundancy or other safety backup mechanism such as using a Doer/Checker pair. (By analogy, the Checker is a set of training wheels for a child learning to ride a bicycle.) When sufficient field experience has been accumulated to attain confidence regarding the safety of the main system, the additional safety mechanism might be removed to reduce cost.

For autonomous vehicles, the safety backup mechanism frequently manifests as a human safety driver, and the argument being made is that a sufficient amount of safe operation with a human safety driver enables removal of that safety driver for future deployments. The ability of a human safety driver to adequately supervise autonomy is itself a difficult topic (Koopman and Latronico 2019, Gao 2016), but is beyond the scope of this paper.

A significant threat to validity for proven in use argumentation is that the system will be used in an operational environment that differs in a safety-relevant way from its field history. In other words, a proven in use argument is invalidated if the component in question is exposed to operational conditions substantively different than the historical experience.

Diverging from historical usage can happen in subtle ways, especially for software components. Issues of data value limits, timing, and resource exhaustion can be relevant and might be caused by functionally unrelated components within a system. Differences in fault modes, sensor input values, or the occurrence of exceptional situations can also cause problems.

A proven in use safety argument must address how the risk of the system being deployed outside this operational profile is mitigated (e.g. by detecting distributional shifts in the environment, by external limitations on the environment, or by procedural mitigations). It is important to realize that an operational profile for a safety critical system must include not only typical cases, but also rare but expected edge cases as well. An even more challenging issue is that there might be operational environment assumptions that designers do not realize are being made, resulting in a residual risk of violated assumptions even after a thorough engineering review of the component reuse plan.

A classic example of this pitfall is the failure of Ariane 5 flight 501. That mishap involved reusing an inertial navigation system from the Ariane 4 design employing a proven in use argument (Lyons 1996). A contributing factor was that software functionality used by Ariane 4 but not required for Ariane 5 flight was left in place to avoid having to requalify the component.

(This is an excerpt of our SSS 2019 paper:  Koopman, P., Kane, A. & Black, J., "Credible Autonomy Safety Argumentation," Safety-Critical Systems Symposium, Bristol UK, Feb. 2019.  Read the full text here)
  • Koopman, P. and Latronico, B., (2019) Safety Argument Considerations for Public Road Testing of Autonomous Vehicles, SAE WCX, 2019.
  • Gao, F. (2016) Modeling Human Attention and Performance in Automated Environ-ments with Low Task Loading, Ph.D. Dissertation, Massachusetts Institute of Technolo-gy, Feb. 2016.

Pitfall: Arguing via Compliance with an Inappropriate Safety Standard

For a standard to help ensure you are actually safe, not only do you have to actually follow it, but it must be a suitable standard in domain, scope, and level of integrity assured.  Proprietary safety standards often don't actually make you safe.


Historically some car makers have used internal, proprietary software safety guidelines as their way of attempting to assure an appropriate level of safety (Koopman 2018b). If a safety argument is based in part upon conformance to a safety standard, it is essential that the comprehensiveness of the safety standard itself be supported by evidence. For public standards that argument can involve group consensus by experts, public scrutiny, assessments of historical effectiveness, improvement in response to loss events, and general industry acceptance.

On the other hand, proprietary standards lack at least public scrutiny, and often lack other aspects such as general industry acceptance as well as falling short of accepted practices in technically substantive ways. One potential approach to argue that a proprietary safety standard is adequate is to map it to a publicly available standard, although there might be other viable argumentation approaches.

Sometimes arguing compliance with a well-known and documented industry safety standard may be inadequate for a particular safety case. Although all passenger vehicle subsystems in the United States must comply with the Federal Motor Vehicle Safety Standards (FMVSS) issued by the National Highway Transportation Safety Administration (NHTSA), compliance with these standards alone concentrates on mechanical and electrical issues, and only high level electronics functionality. FMVSS compliance is insufficient for a safety case that must encompass software (Koopman 2018b).

Similarly, use of particular standards might be an accepted or recommended practice, but not a means of ensuring safety. For example, the use of AUTOSAR (Autosar.org 2018) and Model Based Design techniques are sometimes proposed as indicators of safe software, but use of these technologies has essentially no predictive power with respect to system safety.

 For automotive systems with safety-critical software, ISO 26262 is typically an appropriate standard, although other standards such as IEC 61508 (IEC 1998) can be relevant as well. For some autonomous system functions, conformance to ISO PAS 21448 (ISO 2018) might well be appropriate. A new standard, UL 4600, is specifically intended to cover autonomous vehicles and is expected to be introduced in 2019.

Arguments of the form "technology X, standard Y, or methodology Z is adequate because it has historically produced safe systems in that past" should address the potential issues with "proven in use" argumentation considered in the following section.

(This is an excerpt of our SSS 2019 paper:  Koopman, P., Kane, A. & Black, J., "Credible Autonomy Safety Argumentation," Safety-Critical Systems Symposium, Bristol UK, Feb. 2019.  Read the full text here)

  • Autosar.org (2018) The Standard Software Framework for Intelligent Mobility, https://www.autosar.org/ accessed Nov. 5, 2018.
  • Koopman, P. (2018b), "Practical Experience Report: Automotive Safety Practices vs. Accepted Principles," SAFECOMP, Sept. 2018.
  • IEC (1998) IEC 61508: Functional Safety of Electrical/Electronic/Programmable Electronic Safety-related Systems, International Electronic Commission, December, 1998.
  • ISO (2018) Road vehicles – Safety of the Intended Function, ISO/PRF PAS 21448 (under development), International Standards Organization, 2018.


The Implicit Controllability Pitfall for autonomous system safety

The Implicit Controllability Pitfall:
A safety case must account for not only failures within the autonomy system, but also failures within the vehicle. With a fully autonomous vehicle, responsibility for managing potentially unsafe equipment malfunctions that were previously mitigated by a human driver falls to the autonomy.


A subtle pitfall when arguing based on conformance to a safety standard is neglecting that the assumptions made when assessing the subsystem that have potentially been violated or changed by the use of autonomy. Of particular concern for ground vehicles is the “controllability” aspect of an ISO 26262 ASIL analysis. (Severity and exposure might also change for an autonomous vehicle due to different usage patterns and should also be considered, but are beyond the scope of this discussion.)

The risk analysis of an underlying conventional vehicle according to ISO 26262 requires taking into account the severity, exposure, and controllability of each hazard (ISO 2011). The controllability aspect assumes a competent human driver is available to react to and mitigate equipment malfunctions. Taking credit for some degree of controllability generally reduces the integrity requirements of a component. This idea is especially relevant for Advanced Driver-Assistance Systems (ADAS) safety arguments, in which it is assumed that the driver will intervene in a timely manner to correct any vehicle misbehavior, including potential ADAS malfunctions.

With a fully autonomous vehicle, responsibility for managing potentially unsafe equipment malfunctions that were previously mitigated by a human driver falls to the autonomy. That means that all the assumed capabilities of a human driver that have been built in to the safety arguments regarding under-lying vehicle malfunctions are now imposed upon the autonomy system.

If the autonomy design team does not have access to the analysis behind underlying equipment safety arguments, there might be no practical way to know what all the controllability assumptions are. In other words, the makers of an autonomy kit might be left guessing what failure response capabilities they must provide to preserve the correctness of the safety argumentation for the underlying vehicle.

The need to mitigate some malfunctions is likely obvious, but we have found that “obvious” is in the eye of the beholder. Some examples of assumed human driver interventions we have noted, or even experienced first-hand include:
  • Pressing hard on the brake pedal to compensate for loss of power assist 
  • Controlling the vehicle in the event of a tire blowout 
  • Path planning after catastrophic windshield damage from debris impact 
  • Manually pumping brakes when anti-lock brake mechanisms time out due to excessive activation on slick surfaces
  • Navigating by ambient light starlight after a lighting system electrical failure at speed while bring vehicle to a stop
  • Attempting to mitigate the effects of uncommanded propulsion power 
To the degree that the integrity level requirements of the vehicle were reduced by the assumption that a human driver could mitigate the risk inherent in such malfunction scenarios, the safety case is insufficient if an autonomy kit does not have comparable capabilities.

Creating a thorough safety argument will require either obtaining or re-verse engineering all the controllability assumptions made in the design of the underlying vehicle. Then, the autonomy must be assessed to have an adequate ability to provide the safety relevant controllability assumed in the vehicle design, or an alternate safety argument must be made.

For cases in which the controllability assumptions are not available, there are at least two approaches that should both be used by a prudent design team. First,  FMEA, HAZOP, and other appropriate analyses should be performed on vehicle components and safety relevant functions to ensure that the autonomy can react in a safe way to malfunctions. Such an analysis will likely struggle with whether or not it is safe to assume that the worst types of malfunctions will be adequately mitigated by the vehicle without autonomy intervention.

Second, defects reported on comparable production vehicles should be considered as credible malfunctions of the non-autonomous portions of any vehicle control system since they have already happened in production systems. Such malfunctions include issues such as the drivetrain reporting the opposite of the current direction of motion, uncommanded acceleration, significant braking lag, loss of headlights, and so on (Koopman 2018a).

(This is an excerpt of our SSS 2019 paper:  Koopman, P., Kane, A. & Black, J., "Credible Autonomy Safety Argumentation," Safety-Critical Systems Symposium, Bristol UK, Feb. 2019.  Read the full text here)
  • ISO (2011) Road vehicles -- Functional Safety -- Management of functional safety, ISO 26262, International Standards Organization, 2011.
  • Koopman, P., (2018a), Potentially deadly automotive software defects, https://betterembsw.blogspot.com/2018/09/potentially-deadly-automotive-software.html, Sept. 25, 2018.

Edge Cases and Autonomous Vehicle Safety -- SSS 2019 Keynote

Here is my keynote talk for SSS 2019 in Bristol UK.

Edge Cases and Autonomous Vehicle Safety

Making self-driving cars safe will require a combination of techniques. ISO 26262 and the draft SOTIF standards will help with vehicle control and trajectory stages of the autonomy pipeline. Planning might be made safe using a doer/checker architectural pattern that uses  deterministic safety envelope enforcement of non-deterministic planning algorithms. Machine-learning based perception validation will be more problematic. We discuss the issue of perception edge cases, including the potentially heavy-tail distribution of object types and brittleness to slight variations in images. Our Hologram tool injects modest amounts of noise to cause perception failures, identifying brittle aspects of perception algorithms. More importantly, in practice it is able to identify context-dependent perception failures (e.g., false negatives) in unlabeled video.



Command Override Anti-Pattern for autonomous system safety

Command Override Anti-Pattern:
Don't let a non-critical "Doer" over-ride the safety critical "Checker."  If you do this, the Doer can tell the Checker that something is safe when it isn't.


https://pixabay.com/en/base-jump-jump-base-jumper-leaping-1600668/

(First in a series of postings on pitfalls and fallacies we've seen used in safety assurance arguments for autonomous vehicles.)

A common pitfall when identifying safety relevant portions of a system is overlooking the safety relevance of sensors, actuators, software, or some other portion of a system. A common example is creating a system that permits the Doer to perform a command override of the Checker. (In other words, the designers think they are building a Doer/Checker pattern in which only the Checker is safety relevant, but in fact the Doer is safety relevant due to its ability to override the Checker’s functionality.)

The usual claim being made is that a safety layer will prevent any malfunction of an autonomy layer from creating a mishap. This claim generally involves arguing that an autonomy failure will activate a safing function in the safety layer, and that an attempt by the autonomy to do something unsafe will be prevented. A typical (deficient) scheme is to have autonomy failure detected via some sort of self-test combined with the safety layer monitoring an autonomy layer heartbeat signal. It is further argued that the safety layer is designed in conformance with a suitable functional safety standard, and therefore acts as a safety-rated Checker as part of a Doer/Checker pair.

The flaw in that safety argumentation approach is that the autonomy layer has been presumed to fail silent via a combination of self-diagnosed fault detection and lack of heartbeat. However, self-diagnosis and heartbeat detection methods provide only partial fault detection (Hammett 2001). For example, there is always the possibility that the checking function itself has been com-promised by a fault that leads to false negatives of checks for incorrect functionality.

As a simple example, a heartbeat signal might be generated by a timer interrupt in the autonomy computer that continues to function even if significant portions of the autonomy software have crashed or are generating incorrect results. In general, such an architectural pattern is unsafe because it permits a non-silent failure in the autonomy layer to issue an unsafe vehicle trajectory command that overrides the safety layer. Fixing this fault requires making the autonomy layer safety-critical, which defeats a primary purpose of using a Doer/Checker architecture.

In practice, safety layer logic is usually less permissive than the autonomy layer. By less permissive, we mean that it under-approximates the safe state space of the system in exchange for simplifying computation (Machin et al. 2018). As a practical example, the safety layer might leave a larger spatial buffer area around obstacles to simplify computations, resulting in a longer total path length for the vehicle or even denying the vehicle entry into situations such as a tight alleyway that is only slightly larger than the vehicle.

A significant safety compromise can occur when vehicle designers attempt to increase permissiveness by enabling a non-safety-rated autonomy layer to say “trust me, this is OK” to override the safety layer. This creates a way for a malfunctioning autonomy layer to override the safety layer, again subverting the safety of the Doer/Checker pair architecture.
Eliminating this command override anti-pattern requires that the designers accept that there is an inherent trade-off between permissiveness and simplicity. A simple Checker tends to have limited permissiveness. Increasing permissiveness makes the Checker more complex, increasing the fraction of the system design work that must be done with high integrity. Permitting a lower integrity Doer to override the safety-relevant behavior of a high integrity Checker in an attempt to avoid Checker complexity is unsafe.

Related pitfalls are first a system in which the Checker only covers a sub-set of the safety properties of the system. This implicitly trusts the Doer to not have certain classes of defects, including potentially requirements defects. If the Checker does not actually check some aspects of safety, then the Doer is in fact safety relevant. A second pitfall is having the Checker supervise a diagnosis operation for a Doer health check. Even if the Doer passes a health check, that does not mean its calculations are correct. At best it means that the Doer is operating as designed – which might be unsafe since the Doer has not been verified with the level of rigor required to assure safety.

We have found it productive to conduct the following thought experiment when evaluating Doer/Checker architectural patterns and other systems that rely upon assuring the integrity of only a subset of safety-related functions. Ask this question: “Assume the Doer (a portion of the system, including software, sensors, and actuators, that are not ‘safety related’) maliciously at-tempts to compromise safety in the worst way possible, with full and complete knowledge of every aspect of the design. Could it compromise safety?” If the answer is yes, then the Doer is in fact safety relevant, and must be designed to a sufficiently high level of integrity. Attempts to argue that such an outcome is unlikely in practice must be supported by strong evidence.

(This is an excerpt of our SSS 2019 paper:  Koopman, P., Kane, A. & Black, J., "Credible Autonomy Safety Argumentation," Safety-Critical Systems Symposium, Bristol UK, Feb. 2019.  Read the full text here)
  • Hammett, R., (2001) “Design by extrapolation: an evaluation of fault-tolerant avion-ics, 20th Conference on Digital Avionics Systems, IEEE, 2001.
  • Machin, M., Guiochet, J., Waeselynck, H., Blanquart, J-P., Roy, M., Masson, L., (2018) “SMOF – a safety monitoring framework for autonomous systems,” IEEE Trans. System, Man and Cybernetics Systems, 48(5) May 2018, pp. 702-715.


Credible Autonomy Safety Argumentation Paper

Here is our paper on pitfalls in safety argumentation for autonomous systems for SSS 2019.  My keynote talk will mostly be about perception stress testing but I'm of course happy to talk about this paper as well at the meeting.

Credible Autonomy Safety Argumentation
Philip Koopman, Aaron Kane, Jen Black
Carnegie Mellon University, Edge Case Research Pittsburgh, PA, USA 

Abstract   A significant challenge to deploying mission- and safety-critical autonomous systems is the difficulty of creating a credible assurance argument. This paper collects lessons learned from having observed both credible and faulty assurance argumentation attempts, with a primary emphasis on autonomous ground vehicle safety cases. Various common argumentation approaches are described, including conformance to a non-autonomy safety standard, proven in use, field testing, simulation, and formal verification. Of particular note are argumentation faults and anti-patterns that have shown up in numerous safety cases that we have encountered. These observations can help both designers and auditors detect common mistakes in safety argumentation for autonomous systems.

Download the full paper:
https://users.ece.cmu.edu/~koopman/pubs/Koopman19_SSS_CredibleSafetyArgumentation.pdf