BEAUTIFUL VIEW

Home » Archives for 2019

The Lesson Learned from the Tempe Arizona Autonomous Driving System Testing Fatality NTSB Report

Posted by badrut Sunday, December 8, 2019 0 comments

Now that the press flurry over the NTSB's report on the Autonomous Driving System (ADS) fatality in Tempe has subsided, it's important to reflect on the lessons to be learned. Hats off to the NTSB for absolutely nailing this. Cheers to the Press who got the messaging right. But not everyone did. The goal of this essay is to help focus on the right lessons to learn, clarify publicly stated misconceptions, and emphasize the most important take-aways.

I encourage everyone in the AV industry to watch the first 5 and a half minutes of the NTSB board meeting video ( Youtube: Link // NTSB Link). Safety leadership should watch the whole thing. Probably twice. Then present a summary at your company's lunch & learn.

Pay particular attention to this part from Chairman Sumwalt: "If your company tests automated driving systems on public roads, this crash -- it was about you. If you use roads where automated driving systems were being tested, this crash -- it was about you."

I live in Pittsburgh and these public road tests happen near my place of work and my home. I take the lessons from this crash personally. In principle, every time I cross a street I'm potentially placed at risk by any company that might be cutting corners on safety. (I hope that's none. All the companies testing here have voluntarily submitted compliance reports for the well-drafted PennDOT testing guidelines. But not every state has those, and those guidelines were developed largely in response to the fatality we’re discussing.)

I also have long time friends who have invested their careers in this technology. They have brought a vibrant and promising industry to Pittsburgh and other cities. Negative publicity resulting from a major mishap can threaten the jobs of those employed by those companies.

So it is essential for all of us to get safety right.

The first step: for anyone in charge of testing who doesn't know what a Safety Management System (SMS) is: (A) Watch that NSTB hearing intro. (B) Pause testing on public roads until your company makes a good start down that path. (Again, the PennDOT guidelines are a reasonable first step accepted by a number of companies. LINK) You’ll sleep better having dramatically improved your company’s safety culture before anyone gets hurt unnecessarily.

A major implication of the NTSB report is that if there is even a single company road testing without an adequate SMS -- that is everyone's problem.

Clearing up some misconceptions

I’ve seen some articles and commentary that missed the point of all of this.Large segments of coverage emphasized technical shortcomings of the system - That's not the point. Other coverage highlighted test driver distraction - That's not the point either. The fatal mishap involved technical shortcomings, and the test driver was not paying adequate attention. Both contributed to the mishap, and both were bad things.

But the lesson to learn is that solid safety culture is without a doubt necessary to prevent avoidable fatalities like these. That is the Point.

To make the most of this teachable moment let's break things down further. These discussions are not really about the particular test platform that was involved. The NTSB report gave that company credit for significant improvement. Rather, the objective is to make sure everyone is focused on ensuring they have learned the most important lesson so we don’t suffer another avoidable ADS testing fatality.

A self-driving car killed someone - NOT THE POINT

This was not a self-driving car. It was a test platform for Automated Driving System (ADS) technology. The difference is night and day. Any argument that this vehicle was safe to operate on public roads hinged on a human driver not only taking complete responsibility for operational safety, but also being able to intervene when the test vehicle inevitably made a mistake. It's not a fully automated self-driving car if a driver is required to hover with hands above the steering wheel and foot above the brake pedal the entire time the vehicle is operating.

It's a test vehicle. The correct statement is: a test vehicle for developing ADS technology killed someone.

The pedestrian was initially said to jump out of the dark in front of the car - NOT THE POINT

I still hear this sometimes based on the initial video clip that was released immediately after the mishap. The pedestrian walked across almost 4 lanes of road in view of the test vehicle before being struck. The test vehicle detected the pedestrian 5.6 seconds before the crash. That was plenty of time to avoid the crash, and plenty of time to track the pedestrian crossing the street to predict that a crash would occur. Attempting to claim that this crash was unavoidable is incorrect, and won't prevent the next ADS testing fatality.

It's the pedestrian's fault for jaywalking - NOT THE POINT

Jaywalking is what people do when it is 125 yards to the nearest intersection and there is a paved walkway on the median. Even if there is a sign saying not to cross. Tearing up the paved walkway might help a little on this particular stretch of road, but that's not going to prevent jaywalking as a potential cause of the next ADS testing fatality.

Victim's apparent drug use - NOT THE POINT

It was unlikely that the victim was a fully functional, alert pedestrian. But much of the population isn't in this category for other reasons. Children, distracted walkers, and others with less than perfect capabilities and attention cross the street every day, and we expect drivers to do their best to avoid hitting them.

There is no indication that the victim’s medical condition substantively caused the fatality. (We're back to the fact that the pedestrian did not jump in front of the car.) It would be unreasonable to insist that the public has the responsibility to be fully alert and ready to jump out of the way of an errant ADS test platform at all times they are outside their homes.

Tracking and classification failure - NOT THE POINT

The ADS system on the test vehicle suffered some technical issues that prevented predicting where the pedestrian would be when the test vehicle got there, or even recognizing the object it was sensing was a pedestrian walking a bicycle. However, the point of operating the test vehicle was to find and fix defects.

Defects were expected, and should be expected on other ADS test vehicles. That's why there is a human safety driver. Forbidding public road testing of imperfect ADS systems basically outlaws road testing at this stage. Blaming the technology won't prevent the next ADS testing fatality, but it could hurt the industry for no reason.

It's the technology's fault for ignoring jaywalkers - NOT THE POINT

This idea has been circulating, but apparently this isn't quite true. Jaywalkers aren't ignored, but rather according to the information presented by the NTSB a pedestrian isn't expected to cross the street at first. Once the pedestrian moves for a while a track is built up that could indicate street crossing, but until then movement into the street is considered unexpected if the pedestrian is not at a designated crossing location. A deployment-ready ADS could potentially use a more sophisticated approach to predict when a pedestrian would enter the roadway.

Regardless of implementation, this did not contribute to the fatality because the system never actually classified the victim as a pedestrian. Again, improving this or other ADS technical features won't prevent the next ADS testing fatality. That’s because testing safety is about the safety driver, not which ADS prototype functions happen to be active on any particular test run.

ADS emergency braking behavior - NOT THE POINT

The ADS emergency braking function had behaviors that could hinder its ability to provide backup support to the safety driver. Perhaps another design could have done better for this particular mishap. However, it wasn't the job of the ADS emergency braking to avoid hitting a pedestrian. That was the safety driver's job. Improving ADS emergency braking capabilities might reduce the probability of an ADS testing fatality, but won't entirely prevent the next fatality from happening sooner than it should.

Native emergency braking disabled - NOT THE POINT

It looks bad to have disabled the built-in emergency braking system on the passenger vehicle used as the test platform. The purpose of such systems is to help out after the driver makes a mistake. In this case there is a good, but not perfect, chance it would have helped. But as with the ADS emergency braking function, this simply improves the odds. Any safety expert is going to say your odds are better with both belt and suspenders, but enabling this function alone won't entirely prevent the next ADS testing fatality from happening before it should.

Inattentive safety driver - NOT THE POINT

There is no doubt that an inattentive safety driver is dangerous when supervising an ADS test vehicle. And yet, driver complacency is the expected outcome of asking a human to supervise an automated system that works most of the time. That’s why it’s important to ensure that driver monitoring is done continually and used to provide feedback. (In this case a form of driver monitoring equipment was installed, but data was apparently not used in a way that assured effective driver alertness.)

While enhanced training and stringent driver selection can help, effective analysis and action taken upon monitoring data is required to ensure that drivers are actually paying attention in practice. Simply firing this driver without changing anything else won't prevent the next ADS testing fatality from happening to some other driver who has slipped into bad operational habits.

A fatality is regrettable, but human drivers killed about 100 people that same day with minimal news attention - NOT THE POINT

Some commentators point out the ratio of fatalities caused by test vehicles vs. general automotive fatality rates. They then generally argue that a few deaths in comparison to the ongoing carnage of regular cars is a necessary and appropriate price to pay for progress. However, this argument is not statistically valid.

Consider a reasonable goal that ADS testing (with highly qualified, alert drivers presumed) should be no more dangerous than the risk presented by normal cars. For normal US cars that's ballpark 500 million road miles per pedestrian fatality. This includes mishaps caused by drunk, distracted, and speeding drivers. Due to the far smaller number of miles being driven by current test platform fleet sizes, the "budget" for fatal accidents due to ADS road testing phase should, at this early stage, still be zero.

The fatality somehow “proves” that self-driving car technology isn't viable - NOT THE POINT

Some have tried to draw conclusions about the viability of ADS technology from the fact that there was a testing fatality. However, the issues with ADS technical performance only prove what we already knew. The technology is still maturing, and a human needs to intervene to keep things safe. This crash wasn't about the maturity of the technology; it was about whether the ADS public road testing itself was safe.

Concentrating on technology maturity (for example, via disclosing disengagement rates) serves to focus attention on a long term future of system performance without a safety driver. But the long term isn’t what’s at issue.

The more pressing issue is ensuring that the road testing going on right now is sufficiently safe. At worst, continued use of disengagement rates as the primary metric of ADS performance could hurt safety rather than help. This is because disengagements, if gamed, could incentivize safety drivers to take chances by avoiding disengagements in uncertain situations to make the numbers look better. (Some companies no doubt have strategies to mitigate this risk. But those are probably the companies with an SMS, which is back to the point that matters.)

THE POINT: The safety culture was broken

Safety culture issues were the enabler for this particular crash. Given the limited number of miles that can be accumulated by any current test fleet, we should see no fatalities occur during ADS testing. (Perhaps a truly unavoidable fatality will occur. This is possible, but given the numbers it is unlikely if ADS testing is reasonably safe. So our goal should be set to zero.) Safety culture is critical to ensure this.

The NTSB rightly pushes hard for a safety management system (SMS). But be careful to note that they simply say that this is a part of safety culture, not all of it. Safety culture means, among other things, taking responsibility for ensuring that their safety drivers are actually safe despite the considerable difficulty of accomplishing that. Human safety drivers will make mistakes, but a strong safety culture accounts for such mistakes in ensuring overall safety.

It is important to note that the urgent point here is not regulating self-driving car safety, but rather achieving safe ADS road testing. They are (almost) two entirely different things. Testing safety is about whether the company can consistently put an alert, able-to-react safety driver on the road. On the other hand, ADS safety is about the technology. We need to get to the technology safety part over time, but ADS road testing is the main risk to manage right now.

Perhaps dealing with ADS safety would be easier if the discussions of testing safety and deployment safety were more cleanly separated.

THE TAKEAWAYS:

Chairman Sumwalt summed it up nicely in the intro. (You did watch that 5 and half minute intro, right?) But to make sure it hits home, this is my take:

One company's crash is every company's crash. You'll note I didn't name the company involved, because really that's irrelevant to preventing there from being a next fatality and the potential damage it could do to the industry’s reputation.

The bigger point is every company can and should institute good safety culture before further fatalities take place if they have not done so already. The NTSB credited the company at issue with significant change in the right direction. But it only takes one company who hasn’t gotten the message to be a problem for everyone. We can reasonably expect fatalities involving ADS technology in the future even if these systems are many times safer than human drivers. But there simply aren’t that many vehicles on the road yet for a truly unavoidable mishap to be likely to occur. It’s far too early.

If your company is testing (or plans to test) autonomous vehicles, get a Safety Management System in place before you do public road testing. At least conform to the details in the PennDOT testing guidelines, even if you’re not testing in Pennsylvania. If you are already testing on public roads without an SMS, you should stand down until you get one in place.

Once you have an SMS, consider it a down-payment on a continuing safety culture journey.

The NTSB full report can be accessed here: https://www.ntsb.gov/investigations/AccidentReports/Reports/HAR1903.pdf

Prof. Philip Koopman, Carnegie Mellon University

Author Note: The author and his company work with a variety of customers helping to improve safety. He has been involved with self-driving car safety since the late 1990s. These opinions are his own, and this piece was not sponsored.

Autonomous Vehicle Testing Safety Needs More Transparency

Posted by badrut Monday, July 22, 2019 0 comments

Last week there were two injuries involving human-supervised autonomous test shuttles on different continents, with no apparent connection other than random chance. (For example: Link) As deployment of this work-in-progress technology scales up in public, we know that we can expect more high-profile accidents. Fortunately this time nobody was killed or suffered life-altering injuries. (But we still need to find out what actually happened.) And to be sure, human-driven vehicles are far from accident-free. But what about next time?

The bigger issue for the industry is: will the next autonomous vehicle testing mishap be due to rare and random problems that are within the bounds of reasonable risk? Or will they be due to safety issues that could and should have been addressed beforehand?

Public trust in autonomous vehicle technology has already eroded in the past year. Each new mishap has the unfortunate potential to make that situation worse, regardless of the technical root cause. While no mode of transportation is perfectly safe, it's important that the testing of experimental self driving car technology not expose the public to reasonably avoidable risk. And it's equally important that the public's perception matches the actual risk.

Historically the autonomous vehicle industry has operated under a cloak of secrecy. As we've seen, that can lead to boom and bust cycles of public perception, with booms of optimism followed by backlash after each publicized accident. But in fairness, if there is no information about public testing risk other than hype about an accident-free far-flung future, what is the public supposed to think? Self-driving cars won't be perfect. The goal is to make them better than the current situation. One hopes that along the way things won't actually get worse.

Some progress in public safety disclosure has been made, albeit with low participation rates. One of the two vehicles involved in injuries this past week has a public safety report available. The other does not. In fact, a significant majority of testing organizations have not taken the basic step of making a Voluntary Safety Self-Assessment report available to NHTSA. And to be clear, that disclosure process is more about explaining progress toward production maturity rather than the specific topic of public testing safety.

The industry needs to do better at providing transparent, credible safety information while testing this still-experimental technology. Long term public education and explanation are important. But the more pressing need revolves around what's happening on our roads right now during testing operations. That is what is making news headlines, and is the source of any current risk.

At some point either autonomous vehicle testers are actually doing safe, responsible public operations or they aren't. If they aren't, that is bound to catch up with them as operations scale up. From the point of view of a tester:

- It's a problem if you can't explain to yourself why you are acceptably safe in a methodical, rigorous way. (In that case, probably you're unsafe.)
- It's a problem if you expect human safety drivers to perform with superhuman ability. They can't.
- It's a problem if you aren't ready to explain to authorities why are still acceptably safe after a mishap.
- It's a problem if you can't explain to a jury that you used reasonable care to ensure safety.
- It's a problem if your company's testing operations get sidelined by an accident investigation.
- It's a problem for mishap victims if accidents occur that were reasonably avoidable, especially involving vulnerable road users.
- It's a problem for the whole industry if people lose trust in the technology's ability to operate safely in public areas.

Therefore:
- It's a problem if you can't explain to the public -- with technical credibility -- why they should believe you are safe. Preferably before you begin testing operation on public roads.

Some companies are working on transparent safety more aggressively than others. Some are working on safety cases that contain detailed chains of reasoning and evidence to ensure that they have all the bases covered for public road testing. Others might not be. But really, we don't know.

And that is an essential part of the problem -- we really have no idea who is being diligent about safety. Eventually the truth will out, but bad news is all too likely to come in the form of deaths and injuries. We as an industry need to keep that from happening.

It only takes one company to have a severe mishap that, potentially, dramatically hurts the entire industry. While bad luck can happen to anyone, it's more likely to happen to a company that might be cutting corners on safety to get to market faster.

The days of "trust us, we're smart" are over for the autonomous vehicle industry. Trust has to be earned. Transparency backed with technical credibility is a crucial first step to earning trust. The industry has been given significant latitude to operate on public roads, but that comes with great responsibility and a need for transparency regarding public safety.

Safety should truly come first. To that end, every company testing on public roads should immediately make a transparent, technically credible statement about their road testing safety practices. A simple "we have safety drivers" isn't enough. These disclosures can form a basis for uniform practices across the industry to make sure that this technology survives its adolescence and has a chance to reach maturity and the benefits that promises.

Author:
Dr. Philip Koopman is co-founder and CTO of Edge Case Research, which helps companies make their autonomous systems safer. He is also a professor at Carnegie Mellon University. He is a principle technical author of the UL 4600 draft standard for autonomous system safety, and has been working on self-driving car safety for more than 20 years. Contact: pkoopman@ecr.ai

Autonomous Vehicle Fault and Failure Management

Posted by badrut Monday, July 1, 2019 0 comments

When you build an autonomous vehicle you can't count on a human driver to notice when something's wrong and "do the right thing." Here is a list of faults, system limitations, and fault responses AVs will need to get right. Did you think of these?

https://www.godlikeproductions.com/sm/custom/z/l/mppijgrr.jpeg

System Limitations:

Sometimes the issue isn't that something is broken, but rather simply that all vehicles have limitations. You have to know your system's limitations.

Current capabilities of sensors and actuators, which can depend upon the operational state space.
Detecting and handling a vehicle excursion outside the operational state space for which it was validated, including all aspects of {ODD, OEDR, Maneuver, Fault} tuples.
Desired availability despite fault states, including any graceful degradation plan, and any limits placed upon the degraded operational state space.
Capability variation based on payload characteristics (e.g. passenger vehicle overloaded with cargo, uneven weight distribution, truck loaded with gravel, tanker half filled with liquid) and autonomous payload modification (e.g. trailer connect/disconnect).
Capability variation based on functional modes (e.g. pivot vs. Ackerman vs. crab steering, rear wheel steering, ABS or 4WD engaged/disengaged).
Capability variation based on ad-hoc teaming (e.g. V2V, V2I) and planned teaming (e.g. leader-follower or platooning vehicle pairing).
Incompleteness, incorrectness, corruption or unavailability of external information (V2V, V2I).

System Faults:

Perception failure, including transient and permanent faults in classification and pose of objects.
Planning failures, including those leading to collision, unsafe trajectories (e.g., rollover risk), and dangerous paths (e.g., roadway departure).
Vehicle equipment operational faults (e.g., blown tire, engine stall, brake failure, steering failure, lighting system failure, transmission failure, uncommanded engine power, autonomy equipment failure, electrical system failure, vehicle diagnostic trouble codes).
Vehicle equipment maintenance faults (e.g., improper tire pressure, bald tires, misaligned wheels, empty sensor cleaning fluid reservoir, depleted fuel/battery).
Operational degradation of sensors and actuators including temporary (e.g., accumulation of mud, dirt, dust, heat, water, ice, salt spray, smashed insects) and permanent (e.g., manufacturing imperfections, scratches, scouring, aging, wear-out, blockage, impact damage).
Equipment damage including detecting and mitigating catastrophic loss (e.g., vehicle collisions, lighting strikes, roadway departure), minor losses (e.g., sensor knocked off, actuator failures), and temporary losses (e.g., misalignment due to bent support bracket, loss of calibration).
Incorrect, missing, stale, and inaccurate map data.
Training data incompleteness, incorrectness, known bias, or unknown bias.

Fault Responses:

Some of the faults and limitations fall within the purview of safety standards that apply to non-autonomous functions. However, a unified system-level view of fault detection and mitigation can be useful to ensure that no faults are left unaddressed. More importantly, to the degree that credit has been taken for a human driver participating in fault mitigation by safety standards, that places fault mitigation obligations upon the autonomy.

How the system behaves when encountering an exceptional operational state space, experiencing a fault, or reaching a system limitation.
Diagnostic gaps (e.g., latent faults, undetected faults, undetected faulty redundancy).
How the system re-integrates failed components, including recovery from transient faults and recovery from repaired permanent faults during operation and/or after maintenance.
Response and policies for prioritizing or otherwise determining actions in inherently risky or certain-loss situations.
Withstanding an attack (system security, compromised infrastructure, compromised other vehicles), and deterring inappropriate use (e.g., malicious commands, inappropriately dangerous cargo, dangerous passenger behavior).
How the system is updated to correct functional defects, security defects, safety defects, and addition of new or improved capabilities.

Is there anything we missed?

(This is an excerpt of Koopman, P. & Fratrik, F., "How many operational design domains, objects, and events?" SafeAI 2019, AAAI, Jan 27, 2019.)

Event Detection Considerations for Autonomous Vehicles (OEDR -- part 2)

Posted by badrut Wednesday, June 26, 2019 0 comments

Object and Event Detection and Recognition (OEDR) also involves making predictions about what might happen next. Is that pedestrian waiting for a bus? or about to walk out into the crosswalk right in front of my car? Did you think of all of these aspects?

https://www.post-gazette.com/local/neighborhood/2018/09/06/Pittsburgh-Left-question-jagoff-neighborhood-vote-hearken-tradition-culture/stories/201809030108

The infamous Pittsburgh Left. The first vehicle at a red light
will (sometimes) turn left when the light turns to green.

Some factors to consider when deciding what events and behaviors your system needs to predict include:

Determining expected behaviors of other objects, which might involve a probability distribution and is likely to be based on object classification.
Normal or reasonably expected movements by objects in the environment.
Unexpected, incorrect, or exceptional movement of other vehicles, obstacles, people, or other objects in the environment.
Failure to move by other objects which are reasonably expected to move.
Operator interactions prior to, during, and post autonomy engagement including: supervising driver alertness monitoring, informing occupants, interaction with local or remote operator locations, mode selection and enablement, operator takeover, operator cancellation or redirect, operator status feedback, operator intervention latency, single operator supervision of multiple systems (multi-tasking), operator handoff, loss of operator ability to interact with vehicle.
Human interactions including: human commands (civilians performing traffic direction, police pull-over, passenger distress), normal human interactions (pedestrian crossing, passenger entry/egress), common human rule-breaking (crossing mid-block when far from an intersection, speeding, rubbernecking, use of parking chairs, distracted walking), abnormal human interactions (defiant jaywalking, attacks on vehicle, attempted carjacking), and humans who are not able to follow rules (children, impaired adults).
Non-human interactions including: animal interaction (flocks/herds, pets, dangerous wildlife, protected wildlife) and delivery robots.

Is there anything we missed? (Previous post had the "objects" part of OEDR.)

(This is an excerpt of Koopman, P. & Fratrik, F., "How many operational design domains, objects, and events?" SafeAI 2019, AAAI, Jan 27, 2019.)

Object Detection Considerations for Autonomous Vehicles (OEDR -- part 1)

Posted by badrut Wednesday, June 19, 2019 0 comments

Object and Event Detection and Recognition (OEDR) involves having an autonomous vehicle detect and classify various types of objects so that it can plan a response. Detection is only the first step; you need to also be able to classify the obstacle to predict what might happen next. Pedestrians tend to walk into the roadway. Bushes, not so much. Did you think of all of these aspects?

Q: Why did the Mr. Rogers-saurus cross the street?
A: Trick question; he doesn't actually move because he is part of the Pittsburgh Dinosaur Parade.

Some factors to consider when deciding what objects your system needs to detect and recognize include:

Ability to detect and identify (e.g. classify) all relevant objects in the environment.
Processing and thresholding of sensor data to avoid both false positives (e.g., bouncing drink can, steel bridge joint, steel road construction cover plate, roadside sign, dust cloud, falling leaves) and false negatives (e.g., highly publicized partially automated vehicle collisions with stationary vehicles)
Characterizing the likely operational parameters of other road users (e.g., braking capability of leading and following vehicle, or whether another vehicle is behaving erratically enough that there is a likely control fault.)
Permanent obstacles such as structures, curbs, median dividers, guard rails, trees, bridges, tunnels, berms, ditches, roadside and overhanging signage.
Temporary obstacles such as transient keep-out zones, spills, floods, water-filled potholes, landslides, washed out bridges, overhanging vegetation, and downed power lines. (For practical purposes, “temporary” might mean obstacles not included on maps, with some vehicle having to be the first vehicle to detect an obstacle for placement even on a dynamic map.)
People, including cooperative people, uncooperative people, malicious behaviors, and people who are unaware of the operation of the autonomous system.
At-risk populations which might be unable, incapable, or exempt from following established rules and norms, such as children as well as injured, ability-impaired, or under-the-influence people.
Other cooperative and uncooperative human-driven and autonomous vehicles.
Other road users including special purpose vehicles, temporary structures, street dining, street festivals, parades, motorcades, funeral processions, farm equipment, construction crews, draft animals, farm animals, and endangered species.
Other non-stationary objects including uncontrolled moving objects, falling objects, wind-blown objects, in-traffic cargo spills, and low-flying aircraft.

Is there anything we missed? (Next post will have the "events" part of OEDR.)

(This is an excerpt of Koopman, P. & Fratrik, F., "How many operational design domains, objects, and events?" SafeAI 2019, AAAI, Jan 27, 2019.)

Operational Design Domain (ODD) for Autonomous Systems

Posted by badrut Wednesday, June 12, 2019 0 comments

The Operational Design Domain (ODD) is the set of environmental conditions that an autonomous system is designed to work in. Typically an ODD is thought of as some sort of geo-fencing plus a obvious weather conditions (rain, snow, sun). But, it's a lot more than that. Did you think of all of these?

Canton Avenue, the unofficial steepest street in the world, is less than 4 miles from downtown Pittsburgh.
Note cobblestone on the top half and the sidewalk stairs. Cars slide (sometimes backwards) down the street in winter.
Geo-fencing is more complicated than drawing a circle around a city center.
[Wikipedia]

Characterizing the system operational environment should include at least the following:

Operational terrain, and associated location-dependent characteristics (e.g., slope, camber, curvature, banking, coefficient of friction, road roughness, air density) including immediate vehicle surroundings and projected vehicle path. It is important to note that dramatic changes can occur in relatively short distances.
Environmental and weather conditions such as surface temperature, air temperature, wind, visibility, precipitation, icing, lighting, glare, electromagnetic interference, clutter, vibration, and other types of sensor noise.
Operational infrastructure, such as availability and placement of operational surfacing, navigation aids (e.g., beacons, lane markings, augmented signage), traffic management devices (e.g., traffic lights, right of way signage, vehicle running lights), keep-out zones, special road use rules (e.g., time-dependent lane direction changes) and vehicle-to-infrastructure availability.
Rules of engagement and expectations for interaction with the environment and other aspects of the operational state space, including traffic laws, social norms, and customary signaling and negotiation procedures with other agents (both autonomous and human, including explicit signaling as well as implicit signaling via vehicle motion control).
Considerations for deployment to multiple regions/countries (e.g., blue stop signs, “right turn keep moving” stop sign modifiers, horizontal vs. vertical traffic signal orientation, side-of-road changes).
Communication modes, bandwidth, latency, stability, availability, reliability, including both machine-to-machine communications and human interaction.
Availability and freshness of infrastructure characterization data such as level of mapping detail and identification of temporary deviations from baseline data (e.g., construction zones, traffic jams, temporary traffic rules such as for hurricane evacuation).
Expected distributions of operational state space elements, including which elements are considered rare but in-scope (e.g. toll booths, police traffic stops), and which are considered outside the region of the state space in which the system is intended to operate.

Special attention should be paid to ODD aspects that are relevant to inherent equipment limitations, such as the minimum illumination required by cameras.

Are there any other aspects of ODD we missed?

(This is an excerpt of Koopman, P. & Fratrik, F., "How many operational design domains, objects, and events?" SafeAI 2019, AAAI, Jan 27, 2019.)

Ethical Problems That Matter for Self Driving Cars

Posted by badrut Tuesday, May 28, 2019 0 comments

It's time to get past the irrelevant Trolley Problem and talk about ethical issues that actually matter in the real world of self driving cars. Here's a starter list involving public road testing, human driver responsibilities, safety confidence, and grappling with how safe is safe enough.

Public Road Testing. Public road testing clearly puts non-participants such at pedestrians at risk. Is it OK to test on unconsenting human subjects? If the government hasn't given explicit permission to road test in a particular location, arguably that is what is (or has been) happening. An argument that simply having a "safety driver" mitigates risk is clearly insufficient based on the tragic fatality in Tempe AZ last year.

In general, see: https://grants.nih.gov/policy/humansubjects.htm
A credible, independently reviewed argument that a road testing campaign provides adequate safety might be enough. But how transparent should that process be? Should we just take a company's word for it that they are doing responsible, safe road testing? For an example of a safety argument approach, see: https://safeautonomy.blogspot.com/2019/04/safety-argument-consideration-for.html

Expecting Human Drivers to be Super-Human. High-end driver assistance systems might be asking the impossible of human drivers. Simply warning the driver that (s)he is responsible for vehicle safety doesn't change the well known fact that humans struggle to supervise high-end autonomy effectively, and that humans are prone to abusing highly automated systems. This gives way to questions such as:

At what point is it unethical to hold drivers accountable for tasks that require what amount to super-human abilities and performance?
Are there viable ethical approaches to solving this problem? For example, if a human unconsciously learns how to game a driver monitoring system (e.g., via falling asleep with eyes open -- yes, that is a thing) should that still be the human driver's fault if a crash occurs?
Is it OK to deploy technology that will result in drivers being punished for not being super-human if result is that the total death rate declines?

Confidence in Safety Before Deployment. There is work that advocates even slightly better than a human is acceptable (https://www.rand.org/blog/articles/2017/11/why-waiting-for-perfect-autonomous-vehicles-may-cost-lives.html). But there isn't a lot of discussion about the next level of what that really means. Important ethical sub-topics include:

Who decides when a vehicle is safe enough to deploy? Should that decision be made by a company on its own, or subject to external checks and balances? Is it OK for a company to deploy a vehicle they think is safe based just on subjective criteria alone: "we're smart, we worked hard, and we're convinced this will save lives"
What confidence is required for the actual prediction of casualties from the technology? If you are only statistically 20% confident that your self-driving car will be no more dangerous than a human driver, is that enough?
Should limited government resources that could be used for addressing known road safety issues (drunk driving, driving too fast for conditions, lack of seat belt use, distracted driving) be diverted to support self-driving vehicle initiatives using an argument of potential public safety improvement?

How Safe is Safe Enough? Even if we understand the relationship between an aggregate safety goal and self-driving car technology, where do we set the safety knob? How will the following issues affect this?

Will risk homeostatis apply? There is an argument that there will be pressure to turn up the speed/traffic volume dials on self-driving cars to increase permissiveness and traffic flow until the same risk as manual driving is reached. (Think more capable cars resulting in crazier roads with the same net injury and fatality rates.)
Is it OK to deploy initially with a higher expected death rate than human drivers under an assumption that systems will improve over time, long term reducing the total number of deaths? (And is it OK for this improvement to be assumed rather than proven to be likely?)
What redistribution of demographics for victims is OK? If fewer passengers die but more pedestrians die, is that OK if net death rate is the same? Is is OK if deaths disproportionately occur to specific sub-populations? Did any evaluation of safety before deployment account for these possibilities?

I don't purport to have the definitive answers to any of these problems (except a proposal for road testing safety, cited above). And it might be that some of these problems are more or less answered. The point is that there is so much important, relevant ethical work to be done that people shouldn't be wasting their time on trying to apply the Trolley Problem to AVs. I encourage follow-ups with pointers to relevant work.

If you're still wondering about Trolley-esque situations, see this podcast and the corresponding paper. The short version from the abstract of that paper: Trolley problems are "too contrived to be of practical use, are an inappropriate method for making decisions on issues of safety, and should not be used to inform engineering or policy." In general, it should be incredibly rare for a safely designed self-driving car to get into a no-win situation, and if it does happen they aren't going to have information about the victims and/or aren't going to have control authority to actually behave as suggested in the experiments any time soon, if ever.

Here are some links to more about applying ethics to technical systems in general (@IEEESSIT) and autonomy in particular (https://ethicsinaction.ieee.org/), as well as the IEEE P7000 standard series (https://www.standardsuniversity.org/e-magazine/march-2017/ethically-aligned-standards-a-model-for-the-future/).

Car Drivers Do More than Drive

Posted by badrut Wednesday, May 22, 2019 0 comments

How will self-driving cars handle all the non-driving tasks that drivers also perform? How will they make unaccompanied kids stop sticking their head out the window?

https://pixabay.com/photos/stuffed-animals-classic-car-driving-2798333/

Hey Kids -- Don't stick your heads out the window!

The conversation about self-driving cars is almost all about whether a computer can safely perform the "dynamic driving task." As well it should be -- at first. If that part isn't safe, then there isn't much to talk about.

But, looking forward, human drivers do more than drive. They also provide adult supervision (and, on a good day, mature judgement) about the operation of the vehicle in other respects. If you've never heard the phrase "stop doing that right now or I swear I'm going to stop the car!" then probably you've never ridden in a car with multiple children. And yet, we're already talking about sending kids to school in an automated school bus. Presumably the point is to avoid the cost of the human supervision.

But is putting a bunch of kids in a school bus without an adult a good idea? Will the red-faced person on the TV monitor yelling at the kids really be effective? Or just provide entertainment for already screaming kids?

But there's more than that to consider. Here's my start at a list of things human drivers (including vehicle owners, taxi drivers, and so on) do that isn't really driving.

Some tasks will arguably be done by a fleet maintenance function:

Preflight inspection of vehicle. (Flat tires, structural damage.)
Preflight correction of issues. (Cleaning off snow and ice. Cleaning windshield.)
Ensure routine maintenance has been performed. (Vehicle inspections, good tires, fueling/charging, fluid top-off if needed.)
Maintain vehicle interior cleanliness. And we're not just talking about empty water bottles here. (Might require taking vehicle out of service for cleaning up motion sickness results. But somehow the maintenance crew needs to know there has been a problem.)

But some things have to happen on the road when no human driver is present. Examples include:

Ensure vehicle occupants stay properly seated and secured.
Keep vehicle occupants from doing unsafe things. (Hand out window, head out sunroof, fighting, who knows what. Generally providing adult supervision. Especially if strangers or kids are sharing a vehicle.)
Responding to cargo that comes loose.
Emergency egress coordination (e.g., getting sleeping children, injured, and mobility impaired passengers out of vehicle when a dangerous situation occurs such as a vehicle fire)

Anyone who seriously wants to build vehicles that don't have a designated "person in charge" (which is the driver in conventional vehicles) will need to think through all these issues. And likely more. Any argument that a self driving vehicle is safe for unattended service will need to deal with all these issues, and more. (UL 4600 is intended to cover all this ground.)

Can you think of any other non-driving tasks that need to be handled?

Evolution of the motor vehicle

Posted by badrut Friday, May 10, 2019 0 comments

1860 The Frenchman Lenoir constructs the first internal-combustion engine; this powerplant relies on city gas as its fuel source. Thermal efficiency is in the 3% range. 1867 Otto and Langen display an improved inter- nal-combustion engine at the Paris Interna- tional Exhibition. Its thermal efficiency is ap- proximately 9%. 1876 Otto builds the first gas-powered engine to uti- lise the four-stroke compression cycle. At virtu- ally the same time Clerk constructs the first gas- powered two-stroke engine in England. 1883 Daimler and Maybach develop the first high- speed four-cycle petrol engine using a hot- tube ignition system. 1885 The first automobile from Benz (patented in 1886). First self-propelled motorcycle from Daimler (Fig. 1). 1886 First four-wheeled motor carriage with petrol engine from Daimler (Fig. 2). 1887 Bosch invents the magneto ignition. 1889 Dunlop in England produces the first pneu- matic tyres. 1893 Maybach invents the spray-nozzle carburet- tor . Diesel patents his design for a heavy oil- burning powerplant employing the self-igni- tion concept. 1897 MAN presents the first workable diesel engine. 1897 First Electromobile from Lohner-Porsche (Fig. 2). 1913 Ford introduces the production line to auto- motive manufacturing. Production of the Tin Lizzy (Model T, Fig. 3). By 1925, 9,109 were leaving the production line each day. 1916 The Bavarian Motor Works are founded. 1923 First motor lorry powered by a diesel engine produced by Benz-MAN (Fig. 4). 1936 Daimler-Benz inaugurates series-production of passenger cars propelled by diesel engines. 1938 The VW Works are founded in Wolfsburg. 1949 First low-profile tyre and first steel-belted ra- dial tyre produced by Michelin. 1954 NSU-Wankel constructs the rotary engine (Fig. 4). 1966 Electronic fuel injection (D-Jetronic) for stan- dard production vehicles produced by Bosch . 1970 Seatbelts for driver and front passengers. 1978 Mercedes-Benz installs the first Antilock Brak- ing System (ABS) in vehicles. 1984 Debut of the airbag and seatbelt tensioning system . 1985 Advent of a catalytic converter designed for operation in conjunction with closed-loop mix- ture control, intended for use with unleaded fuel. 1997 Electronic suspension control systems (ESP) . Toyota builds first passenger car with a hybrid drive. Alfa Romeo introduces the common-rail direct injection (CRDI) system for diesel engines. As of Advanced driver assistance systems , such as 2000 parking assistance, distance warning systems, lane change assistance.

Modern
Automotive Technology
Fundamentals, service, diagnostics
2nd English edition
The German edition was written by technical instructors, engineers and technicians
Editorial office (German edition): R. Gscheidle, Studiendirektor, Winnenden – Stuttgart
VERLAG EUROPA-LEHRMITTEL · Nourney, Vollmer GmbH & Co. KG
Düsselberger Straße 23 · 42781 Haan-Gruiten, Germany

Other Autonomous Vehicle Safety Argument Observations

Posted by badrut Wednesday, May 1, 2019 0 comments

Other AV Safety Issues:
We've seen some teams get it right. And some get it wrong. Don't make these mistakes if you're trying to ensure your autonomous vehicle is safe.

Defective disengagement mechanisms. Generally this involves the ability of an arbitrary fail-active autonomy failure to prevent successful disengagement by a human supervisor. As a concrete example, a system might read the state of the disengagement activation mechanism (the “big red button”) as an I/O device fed directly into the primary autonomy computer rather than using an independent safing mechanism. This is a special case of a single point of failure in the form of the autonomy computer.

Assuming perception failures are independent. Some arguments assume independent failures of multiple perception modes. While there is clearly utility in creating a safety case for the non-perception parts of an autonomous vehicle, one must argue rather than assume the safety of perception to create a credible safety case at the vehicle level.

Requiring perfect human supervision of autonomy. Humans are well known to struggle when assigned such monitoring tasks. Koopman et al. (2019) cover this topic in more detail as it relates to autonomous vehicle road testing safety.

Dismissing a potential fault as “unrealistic” without supporting data.For example, argumentation might state that a lightning strike on a moving vehicle is unrealistic or could not happen in the “real world,” despite data to the contrary (e.g. Holle 2008). To be sure, this does not mean that something like a lightning strike must be completely mitigated via keeping the vehicle fully operational. Rather, such faults must be considered in risk analysis. Dismissing hazards without risk analysis based on a subjective assertion that they are “unrealistic” results in a safety case with insufficient evidence.

Using multi-channel comparison approaches for autonomy. In general autonomy algorithms are nondeterministic, sensitive to initial conditions, and have many acceptable (or at least safe) behaviors for any given situation. Architectural approaches based on voting diverse autonomy algorithms tend to run into a problem of deciding whether the outputs are close enough to be valid. Averaging and other similar approaches are not necessarily appropriate. As a simple example, the average of veering to the right and veering to the left to avoid an obstacle could result in hitting the obstacle dead-on.

Confusion about fault vs. failure. While there is a widely recognized terminology document for dependable system design (Avizienis 2004), we have found that there is widespread confusion about the terms fault and failure in practical use. This is especially true when discussing malfunctions that are not due to a component fault, but rather a requirements gap or an excursion from the intended operational environment. It is beyond the scope of this paper to attempt to resolve this, but we note it as an area worthy of future work and particular attention in interdisciplinary discussions of autonomy safety.

(This is an excerpt of our SSS 2019 paper: Koopman, P., Kane, A. & Black, J., "Credible Autonomy Safety Argumentation," Safety-Critical Systems Symposium, Bristol UK, Feb. 2019. Read the full text here)

Avizienis, A., Laprie, J.-C., Randell B., Landwehr, C. (2004) “Basic concepts and taxonomy of dependable and secure computing,” IEEE Trans. Dependability, 1(1):11-33, Oct. 2004.
Holle, R. (2008) Lightning-caused deaths and injuries in the vicinity of vehicles, American Meteorological Society Conference on Meteorological Applications of Light-ning Data, 2008.
Koopman, P. and Latronico, B., (2019) Safety Argument Considerations for Public Road Testing of Autonomous Vehicles, SAE WCX, 2019.

Human Test Scenario Bias in Autonomous Vehicle Validation

Posted by badrut Wednesday, April 24, 2019 0 comments

Human Test Scenario Bias:
Machine Learning perceives the world differently than you do. That means your intuition is not necessarily a good source for test planning.

Simulation-based testing (including especially closed-course testing of real vehicles) can suffer from a test planning bias. The problem is that a test plan is often made according to human perception of the scenario being tested. For example, a test scenario might be “child crossing in a painted cross-walk.” Details of the test scenario might explore various corner cases involving child clothing, size, weather conditions, scene clutter, and so on.

Commonly test scenarios map to a human-interpretable taxonomy of the system and environmental state space. However, autonomy systems might have a different internal state space representation than humans, meaning that they classify the world in ways that differ from how humans do so. This in turn can lead to a situation in which a human believes apparent complete coverage via a testing plan has been achieved, while in reality significant aspects of the autonomy system have not been tested.

As a hypothetical example, the autonomy system might have deduced that a human’s shirt color is a predictor of whether that human will step into a street because of accidental correlations in a training data set. But the test plan might not specify shirt color as a variable, because test designers did not realize it was a relevant autonomy feature for pedestrian motion prediction. That means that a human-designed vision test for an autonomous vehicle can be an excellent starting point to make sure that obstacles, pedestrians, and other objects can be detected by the system. But, more is required.

Machine-learning based systems are known to be vulnerable to learning bias that is not recognized by human testers, at least initially. Some such failures have been quite dramatic (e.g. Grush 2015). Thus, simplistic tests such as having an average body size white male in neutral summer clothing cross a street to test pedestrian avoidance do not demonstrate a robust safety capability. Rather, such tests tend to demonstrate a minimum performance capability.

Interpreting the results of human-constructed test designs, including humans interpreting why a particular on-road scenario failed, are also subject to human test scenario bias. A credible safety argument that relies upon human-constructed tests or human interpretation of root cause analysis in claiming that test failures have been fixed should address this pitfall.

(This is an excerpt of our SSS 2019 paper: Koopman, P., Kane, A. & Black, J., "Credible Autonomy Safety Argumentation," Safety-Critical Systems Symposium, Bristol UK, Feb. 2019. Read the full text here)

Grush, L. (2015) “Google engineer apologizes after Photos app tags two black people as gorillas,” The Verge, July 1, 2015. https://www.theverge.com/2015/7/1/8880363/google-apologizes-photos-app-tags-two-black-people-gorillas (Accessed Oct. 27, 2018)

Safety Argument Consideration for Public Road Testing of Autonomous Vehicles

Posted by badrut Wednesday, April 10, 2019 0 comments

Beth Osyk and I are presenting our paper at SAE WCX today on how to argue sufficient road test safety for self-driving car technology.

See below slideshare or follow this link for presentation slides.

Preprint of paper here:
https://users.ece.cmu.edu/~koopman/pubs/koopman19_TestingSafetyCase_SAEWCX.pdf

Abstract:
Autonomous vehicle (AV) developers test extensively on public roads, potentially putting other road users at risk. A safety case for human supervision of road testing could improve safety transparency. A credible safety case should include: (1) the supervisor must be alert and able to respond to an autonomy failure in a timely manner, (2) the supervisor must adequately manage autonomy failures, and (3) the autonomy failure profile must be compatible with effective human supervision.\

Human supervisors and autonomous test vehicles form a combined human-autonomy system, with the total rate of observed failures including the product of the autonomy failure rate and the rate of unsuccessful failure mitigation by the supervisor. A difficulty is that human ability varies in a nonlinear way with autonomy failure rates, counter-intuitively making it more difficult for a supervisor to assure safety as autonomy maturity improves. Thus, road testing safety cases must account for both the expected failures during testing and the practical effectiveness of human supervisors given that failure profile. This paper outlines a high level safety case that identifies key factors for credibly arguing the safety of an onroad AV test program. A similar approach could be used to analyze potential safety issues for high capability semiautonomous production vehicles.

Safety Argument Consideration for Public Road Testing of Autonomous Vehicles from Philip Koopman

Nondeterministic Behavior and Legibility in Autonomous Vehicle Validation

Posted by badrut Wednesday, April 3, 2019 0 comments

Nondeterministic Behavior and Legibility:
How do you know your autonomous vehicle passed the test for the right reason? What if it just got lucky, or is gaming the test?

The nature of the algorithms used by autonomy systems creates problems for modelling and testing that go beyond typical safety critical software. Some autonomy algorithms, such as randomized path planning, are inherently non-deterministic. Others can be brittle, failing dramatically with subtle variations in data, such as perception false negatives induced by adversarial attacks (Szegedy at al. 2013) or false negatives induced by slight image degradation due to haze or defocus (Pezzementi et al. 2018).

A related issue is over-fitting to the test, in which an autonomy system over-fits and learns how to beat a fixed test. By analogy, this is the pitfall of the system cheating by having memorized the correct answers. A proposed way to deal with this risk is by randomly varying aspects of test cases.

In such a fuzzing or variable testing approach it is important to randomly vary all relevant aspects of a problem. For example, varying geometries for traffic situations can be helpful, but probably does not address potential over-fitting for perception algorithms that perform object classification.

The use of potentially non-deterministic test scenarios combined with non-deterministic system behaviors and opaque system designs means it is difficult to know whether a system has passed a test, because there is no single correct answer. Rather, there must be some algorithmic way to determine whether a particular system response is acceptable or not, making that test oracle algorithm safety critical.

Moreover, it is possible that a system has passed a particular test by chance. For example, a pedestrian might be avoided due to a properly functioning detection and avoidance algorithm. But a pedestrian might also be avoided merely because a random path planner by chance picked a path that did not intersect the pedestrian, or responded to a completely unrelated aspect of the environment that caused it to pick a fortuitously safe path. Similarly, a pedestrian might be detected in one image, but undetected in another that differs in ways that are essentially imperceptible to a human.

It is unclear if resolving this issue requires solving the difficult problem of explainable AI (Gunning 2018). As a minimum, a credible safety argument will need to address the problem of how plans to test vehicles with less than a statistically valid amount of real-world exposure data can avoid these pitfalls. It seems likely that a credible argument will also have to establish that each type of test has been passed due to safe operation of the system rather than simply by chance (Koopman & Wagner 2018).

Gunning, D. (2018), Explainable Artificial Intelligence (XAI), Defense Advanced Research Projects Agency, https://www.darpa.mil/program/explainable-artificial-intelligence (accessed October 27, 2018).
Koopman, P. & Wagner, M., (2018) "Toward a Framework for Highly Automated Vehicle Safety Validation," SAE World Congress, 2018. SAE-2018-01-1071.
Pezzementi, Z., Tabor, T., Yim, S., Chang, J., Drozd, B., Guttendorf, D., Wagner, M., Koopman, P., "Putting image manipulations in context: robustness testing for safe perception," IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), Aug. 2018.
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fer-gus, R. (2013) "Intriguing properties of neural networks." arXiv preprint arXiv:1312.6199 (2013).

Missing Rare Events in Autonomous Vehicle Simulation

Posted by badrut Wednesday, March 27, 2019 0 comments

Missing Rare Events in Simulation:
A highly accurate simulation and system model doesn't solve the problem of what scenarios to simulate. If you don't know what edge cases to simulate, your system won't be safe.

It is common, and generally desirable, to use vehicle-level simulation rather than on-road operation is used as a proxy field testing strategy. Simulation offers a number of potential advantages over field testing of a real vehicle including lower marginal cost per mile, better scalability, and reduced risk to the public from testing. Ultimately, simulation is based upon data that generates scenarios used to exercise the system under test, commonly called the simulation workload. The validity of the simulation workload is just as relevant as the validity of the simulation models and software.

Simulation-based validation is often accomplished with a weighting of scenarios that is intentionally different than the expected operational profile. Such an approach has the virtue of being able to exercise corner cases and known rare events with less total exposure than would be required by waiting for such situations to happen by chance in real-world testing (Ding 2017). To the extent that corner cases and known rare events are intentionally induced in physical vehicle field testing or closed course testing, those amount to simulation in that the occurrence of those events is being simulated for the benefit of the test vehicle.

A more sophisticated simulation approach should use a simulation “stack” with layered levels of abstraction. High level, faster simulation can explore system-level issues while more detailed but slower simulations, bench tests, and other higher fidelity validation approaches are used for subsystems and components (Koopman & Wagner 2018).

Regardless of the mix of simulation approaches, simulation fidelity and realism of the scenarios is generally recognized as a potential threat to validity. The simulation must be validated to ensure that it produces sufficiently accurate results for aspects that matter to the safety case. This might include requiring conformance of the simulation code and model data to a safety-critical software standard.

Even with a conceptually perfect simulation, the question remains as to what events to simulate. Even if simulation were to cover enough miles to statistically assure safety, the question would remain as to whether there are gaps in the types of situations simulated. This corresponds to the representativeness issue with field testing and proven in use arguments. However, representativeness is a more pressing matter if simulation scenarios are being designed as part of a test plan rather than being based solely on statistically significant amounts of collected field data.

Another way to look at this problem is that simulation can remove the need to do field testing for rare events, but does not remove determine what rare events matter. All things being equal, simulation does not reduce the number of road miles needed for data collection to observe rare events. Rather, it permits a substantial fraction of data collection to be done with a non-autonomous vehicle. Thus, even if simulating billions of miles is feasible, there needs to be a way to ensure that the test plan and simulation workload exercise all the aspects of a vehicle that would have been exercised in field testing of the same magnitude.

As with the fly-fix-fly anti-pattern, fixing defects identified in simulation requires additional simulation input data to validate the design. Simply re-running the same simulation and fixing bugs until the simulation passes invokes the “pesticide paradox.” (Beizer 1990) This paradox holds that a system which has been debugged to the point that it passes a set of tests can’t be considered completely bug free. Rather, it is simply free of the bugs that the test suite knows how to find, leaving the system exposed to bugs that might involve only very subtle differences from the test suite.

Beizer, B. (1990) Software Testing Techniques, 2nd Ed., 1990.
Koopman, P. & Wagner, M., (2018) "Toward a Framework for Highly Automated Vehicle Safety Validation," SAE World Congress, 2018. SAE-2018-01-1071.

Dealing with Edge Cases in Autonomous Vehicle Validation

Posted by badrut Wednesday, March 20, 2019 0 comments

Dealing with Edge Cases:

Some failures are neither random nor independent. Moreover, safety is typically more about dealing with unusual cases. This means that brute force testing is likely to miss important edge case safety issues.

A significant limitation to a field testing argument is the assumption of random independent failures inherent in the statistical analysis. Arguing that software failures are random and independent is clearly questionable, since multiple instances of a system will have identical software defects.

Moreover, arguing that the arrival of exceptional external events is random and independent across a fleet is clearly incorrect in the general case. A few simple examples of correlated events between vehicles in a fleet include:

· Timekeeping events (e.g. daylight savings time, leap second)

· Extreme weather (e.g. tornado, tsunami, flooding, blizzard white-out, wildfires) affecting multiple systems in the same geographic area

· Appearance of novel-looking pedestrians occurring on holidays (e.g. Halloween, Mardi Gras)

· Security vulnerabilities being attacked in a coordinated way

For life-critical systems, proper operation in typical situations needs to be validated. But this should be a given. Progressing from baseline functionality (a vehicle that can operate acceptably in normal situations) to a safe system (a vehicle that safely handles unusual situations and unexpected situations) requires dealing with unusual cases that will inevitably occur in the deployed fleet.

We define an edge case as a rare situation that will occur only occasionally, but still needs specific design attention to be dealt with in a reasonable and safe way. The quantification of “rare” is relative, and generally refers to situations or conditions that will occur often enough in a full-scale deployed fleet to be a problem but have not been captured in the design or requirements process. (It is understood that the process of identifying and handling edge cases makes them – by definition – no longer edge cases. So in practice the term applies to situations that would not have otherwise been handled had special attempts not be made to identify them during the design and validation process.)

It is useful to distinguish edge cases from corner cases. Corner cases are combinations of normal operational parameters. Not all corner cases are edge cases, and the converse. An example of a corner case could be a driving situation with an iced over road, low sun angle, heavy traffic, and a pedestrian in the roadway. This is a corner case since each item in that list ought to be an expected operational parameter, and it is the combination that might be rare. This would be an edge case only if there is some novelty to the combination that produces an emergent effect with system behavior. If the system can handle the combination of factors in a corner case without any special design work, then it’s not really an edge case by our definition. In practice, even difficult-to-handle corner cases that occur frequently will be identified during system design.

Only corner cases that are both infrequent and present novelty due to the combination of conditions are edge cases. It is worth noting that changing geographic location, season of year, or other factors can result in different corner cases being identified during design and test, and leave different sets of edge cases unresolved. Thus, in practice, edge cases that remain after normal system design procedures could differ depending upon the operational design domain of the vehicle, the test plan, and even random chance occurrences of which corner cases happened to appear in training data and field trials.

Classically an edge case refers to a type of boundary condition that affects inputs or reveals gaps in requirements. More generally, edge cases can be wholly unexpected events, such as the appearance of a unique road sign, or an unexpected animal type on a highway. They can be a corner case that was thought to be impossible, such as an icy road in a tropical climate. They can also be an unremarkable (to a human), non-corner case that somehow triggers an autonomy fault or stumbles upon a gap in training data, such as a light haze that results in perception failure. The thing that makes something an edge case is that it unexpectedly activates a requirements, design, or implementation defect in the system.

There are two implications to the occurrence of such edge cases in safety argumentation. One is that fixing edge cases as they arrive might not improve safety appreciably if the population of edge cases is large due to the heavy tail distribution problem (Koopman 2018c). This is because removing even a large number of individual defects from an essentially infinite-size pool of rarely activated defects does not materially improve things. Another implication is that the arrival of edge cases might be correlated by date, time, weather, societal events, micro-location, or combinations of these triggers. Such a correlation can invalidate an assumption that losses from activation of a safety defect will result in small losses between the time the defect first activates and the time a fix can be produced. (Such correlated mishaps can be thought of as the safety equivalent of a “zero day attack” from the security world.)

It is helpful to identify edge cases to the degree possible within the constraints of the budget and resources available to a project. This can be partially accomplished via corner case testing (e.g. Ding 2017). The strategy here would be to test essentially all corner cases to flush out any that happen to present special problems that make them edge cases. However, some edge cases also require identifying likely novel situations beyond combinations of ordinary and expected scenario components. And other edge cases are exceptional to an autonomous system, but not obviously corner cases in the eyes of a human test designer.

Ultimately, it is unclear if it can ever be shown that all edge cases have been identified and corresponding mitigations designed into the system. (Formal methods could help here, but the question would be whether any assumptions that needed to be made to support proofs were themselves vulnerable to edge cases.) Therefore, for immature systems it is important to be able to argue that inevitable edge cases will be dealt with in a safe way frequently enough to achieve an appropriate level of safety. One potential argumentation approach is to aggressively monitor and report unusual operational scenarios and proactively respond to near misses and incidents before a similar edge case can trigger a loss event, arguing that the probability of a loss event from unhandled edge cases is sufficiently low. Such an argument would have to address potential issues from correlated activation of edge cases.

(This is an excerpt of our SSS 2019 paper: Koopman, P., Kane, A. & Black, J., "Credible Autonomy Safety Argumentation," Safety-Critical Systems Symposium, Bristol UK, Feb. 2019. Read the full text here)

Koopman, P. (2018c) "The Heavy Tail Safety Ceiling," Automated and Connected Vehicle Systems Testing Symposium, June 2018.
Ding, Z., “Accelerated evaluation of automated vehicles,” http://www-personal.umich.edu/~zhaoding/accelerated-evaluation.html on 10/15/2017.