Proven In Use: The Violated Assumptions Pitfall for Autonomous Vehicle Validation
The Violated Assumptions Pitfall:
Arguing something is Proven In Use must address whether the new use has sufficiently similar assumptions and operational conditions as the old one.
The proven in use argumentation pattern uses field experience of a component (or, potentially, an engineering process) to argue that field failure rates have been sufficiently low to assure a necessary level of integrity.
Historically this has been used as a way to grandfather existing components into new safety standards (for example, in the process industry under IEC 61508). The basis of the argument is generally that if a huge number of operational hours of experience in the actual application have been accumulated with a sufficiently low number of safety-relevant failures, then there is compelling experimental data supporting the integrity claim for a component. We assume in this section that sufficient field experience has actually been obtained to make a credible argument. A subsequent section on field testing addresses the issue of how much experience is required to make a statistically valid claim.
A plausible variant could be that a component is initially deployed with temporary additional level of redundancy or other safety backup mechanism such as using a Doer/Checker pair. (By analogy, the Checker is a set of training wheels for a child learning to ride a bicycle.) When sufficient field experience has been accumulated to attain confidence regarding the safety of the main system, the additional safety mechanism might be removed to reduce cost.
For autonomous vehicles, the safety backup mechanism frequently manifests as a human safety driver, and the argument being made is that a sufficient amount of safe operation with a human safety driver enables removal of that safety driver for future deployments. The ability of a human safety driver to adequately supervise autonomy is itself a difficult topic (Koopman and Latronico 2019, Gao 2016), but is beyond the scope of this paper.
A significant threat to validity for proven in use argumentation is that the system will be used in an operational environment that differs in a safety-relevant way from its field history. In other words, a proven in use argument is invalidated if the component in question is exposed to operational conditions substantively different than the historical experience.
Diverging from historical usage can happen in subtle ways, especially for software components. Issues of data value limits, timing, and resource exhaustion can be relevant and might be caused by functionally unrelated components within a system. Differences in fault modes, sensor input values, or the occurrence of exceptional situations can also cause problems.
A proven in use safety argument must address how the risk of the system being deployed outside this operational profile is mitigated (e.g. by detecting distributional shifts in the environment, by external limitations on the environment, or by procedural mitigations). It is important to realize that an operational profile for a safety critical system must include not only typical cases, but also rare but expected edge cases as well. An even more challenging issue is that there might be operational environment assumptions that designers do not realize are being made, resulting in a residual risk of violated assumptions even after a thorough engineering review of the component reuse plan.
A classic example of this pitfall is the failure of Ariane 5 flight 501. That mishap involved reusing an inertial navigation system from the Ariane 4 design employing a proven in use argument (Lyons 1996). A contributing factor was that software functionality used by Ariane 4 but not required for Ariane 5 flight was left in place to avoid having to requalify the component.
(This is an excerpt of our SSS 2019 paper: Koopman, P., Kane, A. & Black, J., "Credible Autonomy Safety Argumentation," Safety-Critical Systems Symposium, Bristol UK, Feb. 2019. Read the full text here)
Arguing something is Proven In Use must address whether the new use has sufficiently similar assumptions and operational conditions as the old one.
The proven in use argumentation pattern uses field experience of a component (or, potentially, an engineering process) to argue that field failure rates have been sufficiently low to assure a necessary level of integrity.
Historically this has been used as a way to grandfather existing components into new safety standards (for example, in the process industry under IEC 61508). The basis of the argument is generally that if a huge number of operational hours of experience in the actual application have been accumulated with a sufficiently low number of safety-relevant failures, then there is compelling experimental data supporting the integrity claim for a component. We assume in this section that sufficient field experience has actually been obtained to make a credible argument. A subsequent section on field testing addresses the issue of how much experience is required to make a statistically valid claim.
A plausible variant could be that a component is initially deployed with temporary additional level of redundancy or other safety backup mechanism such as using a Doer/Checker pair. (By analogy, the Checker is a set of training wheels for a child learning to ride a bicycle.) When sufficient field experience has been accumulated to attain confidence regarding the safety of the main system, the additional safety mechanism might be removed to reduce cost.
For autonomous vehicles, the safety backup mechanism frequently manifests as a human safety driver, and the argument being made is that a sufficient amount of safe operation with a human safety driver enables removal of that safety driver for future deployments. The ability of a human safety driver to adequately supervise autonomy is itself a difficult topic (Koopman and Latronico 2019, Gao 2016), but is beyond the scope of this paper.
A significant threat to validity for proven in use argumentation is that the system will be used in an operational environment that differs in a safety-relevant way from its field history. In other words, a proven in use argument is invalidated if the component in question is exposed to operational conditions substantively different than the historical experience.
Diverging from historical usage can happen in subtle ways, especially for software components. Issues of data value limits, timing, and resource exhaustion can be relevant and might be caused by functionally unrelated components within a system. Differences in fault modes, sensor input values, or the occurrence of exceptional situations can also cause problems.
A proven in use safety argument must address how the risk of the system being deployed outside this operational profile is mitigated (e.g. by detecting distributional shifts in the environment, by external limitations on the environment, or by procedural mitigations). It is important to realize that an operational profile for a safety critical system must include not only typical cases, but also rare but expected edge cases as well. An even more challenging issue is that there might be operational environment assumptions that designers do not realize are being made, resulting in a residual risk of violated assumptions even after a thorough engineering review of the component reuse plan.
A classic example of this pitfall is the failure of Ariane 5 flight 501. That mishap involved reusing an inertial navigation system from the Ariane 4 design employing a proven in use argument (Lyons 1996). A contributing factor was that software functionality used by Ariane 4 but not required for Ariane 5 flight was left in place to avoid having to requalify the component.
(This is an excerpt of our SSS 2019 paper: Koopman, P., Kane, A. & Black, J., "Credible Autonomy Safety Argumentation," Safety-Critical Systems Symposium, Bristol UK, Feb. 2019. Read the full text here)
- Koopman, P. and Latronico, B., (2019) Safety Argument Considerations for Public Road Testing of Autonomous Vehicles, SAE WCX, 2019.
- Gao, F. (2016) Modeling Human Attention and Performance in Automated Environ-ments with Low Task Loading, Ph.D. Dissertation, Massachusetts Institute of Technolo-gy, Feb. 2016.
0 comments:
Post a Comment