Showing posts with label supervisability. Show all posts
Showing posts with label supervisability. Show all posts

The Lesson Learned from the Tempe Arizona Autonomous Driving System Testing Fatality NTSB Report




Now that the press flurry over the NTSB's report on the Autonomous Driving System (ADS) fatality in Tempe has subsided, it's important to reflect on the lessons to be learned. Hats off to the NTSB for absolutely nailing this. Cheers to the Press who got the messaging right. But not everyone did. The goal of this essay is to help focus on the right lessons to learn, clarify publicly stated misconceptions, and emphasize the most important take-aways.

I encourage everyone in the AV industry to watch the first 5 and a half minutes of the NTSB board meeting video ( Youtube: Link // NTSB Link). Safety leadership should watch the whole thing. Probably twice. Then present a summary at your company's lunch & learn.

Pay particular attention to this part from Chairman Sumwalt: "If your company tests automated driving systems on public roads, this crash -- it was about you.  If you use roads where automated driving systems were being tested, this crash -- it was about you."

I live in Pittsburgh and these public road tests happen near my place of work and my home. I take the lessons from this crash personally. In principle, every time I cross a street I'm potentially placed at risk by any company that might be cutting corners on safety. (I hope that's none. All the companies testing here have voluntarily submitted compliance reports for the well-drafted PennDOT testing guidelines. But not every state has those, and those guidelines were developed largely in response to the fatality we’re discussing.)

I also have long time friends who have invested their careers in this technology. They have brought a vibrant and promising industry to Pittsburgh and other cities.  Negative publicity resulting from a major mishap can threaten the jobs of those employed by those companies.

So it is essential for all of us to get safety right.

The first step: for anyone in charge of testing who doesn't know what a Safety Management System (SMS) is: (A) Watch that NSTB hearing intro. (B) Pause testing on public roads until your company makes a good start down that path. (Again, the PennDOT guidelines are a reasonable first step accepted by a number of companies. LINK)  You’ll sleep better having dramatically improved your company’s safety culture before anyone gets hurt unnecessarily.


Clearing up some misconceptions
I’ve seen some articles and commentary that missed the point of all of this.Large segments of coverage emphasized technical shortcomings of the system - That's not the point. Other coverage highlighted test driver distraction - That's not the point either.  The fatal mishap involved technical shortcomings, and the test driver was not paying adequate attention. Both contributed to the mishap, and both were bad things.

But the lesson to learn is that solid safety culture is without a doubt necessary to prevent avoidable fatalities like these. That is the Point.

To make the most of this teachable moment let's break things down further. These discussions are not really about the particular test platform that was involved. The NTSB report gave that company credit for significant improvement. Rather, the objective is to make sure everyone is focused on ensuring they have learned the most important lesson so we don’t suffer another avoidable ADS testing fatality.

A self-driving car killed someone - NOT THE POINT
This was not a self-driving car. It was a test platform for Automated Driving System (ADS) technology. The difference is night and day.  Any argument that this vehicle was safe to operate on public roads hinged on a human driver not only taking complete responsibility for operational safety, but also being able to intervene when the test vehicle inevitably made a mistake. It's not a fully automated self-driving car if a driver is required to hover with hands above the steering wheel and foot above the brake pedal the entire time the vehicle is operating.

It's a test vehicle. The correct statement is: a test vehicle for developing ADS technology killed someone.

The pedestrian was initially said to jump out of the dark in front of the car - NOT THE POINT
I still hear this sometimes based on the initial video clip that was released immediately after the mishap. The pedestrian walked across almost 4 lanes of road in view of the test vehicle before being struck. The test vehicle detected the pedestrian 5.6 seconds before the crash. That was plenty of time to avoid the crash, and plenty of time to track the pedestrian crossing the street to predict that a crash would occur. Attempting to claim that this crash was unavoidable is incorrect, and won't prevent the next ADS testing fatality.

It's the pedestrian's fault for jaywalking - NOT THE POINT
Jaywalking is what people do when it is 125 yards to the nearest intersection and there is a paved walkway on the median. Even if there is a sign saying not to cross.  Tearing up the paved walkway might help a little on this particular stretch of road, but that's not going to prevent jaywalking as a potential cause of the next ADS testing fatality.

Victim's apparent drug use - NOT THE POINT
It was unlikely that the victim was a fully functional, alert pedestrian. But much of the population isn't in this category for other reasons. Children, distracted walkers, and others with less than perfect capabilities and attention cross the street every day, and we expect drivers to do their best to avoid hitting them.

There is no indication that the victim’s medical condition substantively caused the fatality. (We're back to the fact that the pedestrian did not jump in front of the car.) It would be unreasonable to insist that the public has the responsibility to be fully alert and ready to jump out of the way of an errant ADS test platform at all times they are outside their homes.

Tracking and classification failure - NOT THE POINT
The ADS system on the test vehicle suffered some technical issues that prevented predicting where the pedestrian would be when the test vehicle got there, or even recognizing the object it was sensing was a pedestrian walking a bicycle. However, the point of operating the test vehicle was to find and fix defects.

Defects were expected, and should be expected on other ADS test vehicles. That's why there is a human safety driver. Forbidding public road testing of imperfect ADS systems basically outlaws road testing at this stage. Blaming the technology won't prevent the next ADS testing fatality, but it could hurt the industry for no reason.

It's the technology's fault for ignoring jaywalkers - NOT THE POINT
This idea has been circulating, but apparently this isn't quite true. Jaywalkers aren't ignored, but rather according to the information presented by the NTSB a pedestrian isn't expected to cross the street at first. Once the pedestrian moves for a while a track is built up that could indicate street crossing, but until then movement into the street is considered unexpected if the pedestrian is not at a designated crossing location. A deployment-ready ADS could potentially use a more sophisticated approach to predict when a pedestrian would enter the roadway.

Regardless of implementation, this did not contribute to the fatality because the system never actually classified the victim as a pedestrian. Again, improving this or other ADS technical features won't prevent the next ADS testing fatality. That’s because testing safety is about the safety driver, not which ADS prototype functions happen to be active on any particular test run.

ADS emergency braking behavior - NOT THE POINT
The ADS emergency braking function had behaviors that could hinder its ability to provide backup support to the safety driver. Perhaps another design could have done better for this particular mishap. However, it wasn't the job of the ADS emergency braking to avoid hitting a pedestrian. That was the safety driver's job. Improving ADS emergency braking capabilities might reduce the probability of an ADS testing fatality, but won't entirely prevent the next fatality from happening sooner than it should.

Native emergency braking disabled - NOT THE POINT
It looks bad to have disabled the built-in emergency braking system on the passenger vehicle used as the test platform. The purpose of such systems is to help out after the driver makes a mistake. In this case there is a good, but not perfect, chance it would have helped. But as with the ADS emergency braking function, this simply improves the odds. Any safety expert is going to say your odds are better with both belt and suspenders, but enabling this function alone won't entirely prevent the next ADS testing fatality from happening before it should.

Inattentive safety driver - NOT THE POINT
There is no doubt that an inattentive safety driver is dangerous when supervising an ADS test vehicle. And yet, driver complacency is the expected outcome of asking a human to supervise an automated system that works most of the time. That’s why it’s important to ensure that driver monitoring is done continually and used to provide feedback. (In this case a form of driver monitoring equipment was installed, but data was apparently not used in a way that assured effective driver alertness.)

While enhanced training and stringent driver selection can help, effective analysis and action taken upon monitoring data is required to ensure that drivers are actually paying attention in practice. Simply firing this driver without changing anything else won't prevent the next ADS testing fatality from happening to some other driver who has slipped into bad operational habits.

A fatality is regrettable, but human drivers killed about 100 people that same day with minimal news attention - NOT THE POINT
Some commentators point out the ratio of fatalities caused by test vehicles vs. general automotive fatality rates. They then generally argue that a few deaths in comparison to the ongoing carnage of regular cars is a necessary and appropriate price to pay for progress. However, this argument is not statistically valid.

Consider a reasonable goal that ADS testing (with highly qualified, alert drivers presumed) should be no more dangerous than the risk presented by normal cars. For normal US cars that's ballpark 500 million road miles per pedestrian fatality. This includes mishaps caused by drunk, distracted, and speeding drivers. Due to the far smaller number of miles being driven by current test platform fleet sizes, the "budget" for fatal accidents due to ADS road testing phase should, at this early stage, still be zero.

The fatality somehow “proves” that self-driving car technology isn't viable - NOT THE POINT
Some have tried to draw conclusions about the viability of ADS technology from the fact that there was a testing fatality. However, the issues with ADS technical performance only prove what we already knew. The technology is still maturing, and a human needs to intervene to keep things safe. This crash wasn't about the maturity of the technology; it was about whether the ADS public road testing itself was safe.

Concentrating on technology maturity (for example, via disclosing disengagement rates) serves to focus attention on a long term future of system performance without a safety driver. But the long term isn’t what’s at issue.

The more pressing issue is ensuring that the road testing going on right now is sufficiently safe. At worst, continued use of disengagement rates as the primary metric of ADS performance could hurt safety rather than help. This is because disengagements, if gamed, could incentivize safety drivers to take chances by avoiding disengagements in uncertain situations to make the numbers look better. (Some companies no doubt have strategies to mitigate this risk. But those are probably the companies with an SMS, which is back to the point that matters.)

THE POINT: The safety culture was broken
Safety culture issues were the enabler for this particular crash. Given the limited number of miles that can be accumulated by any current test fleet, we should see no fatalities occur during ADS testing. (Perhaps a truly unavoidable fatality will occur. This is possible, but given the numbers it is unlikely if ADS testing is reasonably safe. So our goal should be set to zero.) Safety culture is critical to ensure this.

The NTSB rightly pushes hard for a safety management system (SMS). But be careful to note that they simply say that this is a part of safety culture, not all of it. Safety culture means, among other things, taking responsibility for ensuring that their safety drivers are actually safe despite the considerable difficulty of accomplishing that. Human safety drivers will make mistakes, but a strong safety culture accounts for such mistakes in ensuring overall safety.

It is important to note that the urgent point here is not regulating self-driving car safety, but rather achieving safe ADS road testing. They are (almost) two entirely different things. Testing safety is about whether the company can consistently put an alert, able-to-react safety driver on the road. On the other hand, ADS safety is about the technology. We need to get to the technology safety part over time, but ADS road testing is the main risk to manage right now.

Perhaps dealing with ADS safety would be easier if the discussions of testing safety and deployment safety were more cleanly separated.

THE TAKEAWAYS:

Chairman Sumwalt summed it up nicely in the intro. (You did watch that 5 and half minute intro, right?)  But to make sure it hits home, this is my take:

One company's crash is every company's crash.  You'll note I didn't name the company involved, because really that's irrelevant to preventing there from being a next fatality and the potential damage it could do to the industry’s reputation.

The bigger point is every company can and should institute good safety culture before further fatalities take place if they have not done so already. The NTSB credited the company at issue with significant change in the right direction.  But it only takes one company who hasn’t gotten the message to be a problem for everyone. We can reasonably expect fatalities involving ADS technology in the future even if these systems are many times safer than human drivers. But there simply aren’t that many vehicles on the road yet for a truly unavoidable mishap to be likely to occur. It’s far too early.

If your company is testing (or plans to test) autonomous vehicles, get a Safety Management System in place before you do public road testing. At least conform to the details in the PennDOT testing guidelines, even if you’re not testing in Pennsylvania. If you are already testing on public roads without an SMS, you should stand down until you get one in place.

Once you have an SMS, consider it a down-payment on a continuing safety culture journey.



Prof. Philip Koopman, Carnegie Mellon University

Author Note: The author and his company work with a variety of customers helping to improve safety. He has been involved with self-driving car safety since the late 1990s. These opinions are his own, and this piece was not sponsored.

Ethical Problems That Matter for Self Driving Cars

It's time to get past the irrelevant Trolley Problem and talk about ethical issues that actually matter in the real world of self driving cars.  Here's a starter list involving public road testing, human driver responsibilities, safety confidence, and grappling with how safe is safe enough.


  • Public Road Testing. Public road testing clearly puts non-participants such at pedestrians at risk. Is it OK to test on unconsenting human subjects? If the government hasn't given explicit permission to road test in a particular location, arguably that is what is (or has been) happening. An argument that simply having a "safety driver" mitigates risk is clearly insufficient based on the tragic fatality in Tempe AZ last year. 
  • Expecting Human Drivers to be Super-Human. High-end driver assistance systems might be asking the impossible of human drivers. Simply warning the driver that (s)he is responsible for vehicle safety doesn't change the well known fact that humans struggle to supervise high-end autonomy effectively, and that humans are prone to abusing highly automated systems. This gives way to questions such as:
    • At what point is it unethical to hold drivers accountable for tasks that require what amount to super-human abilities and performance?
    • Are there viable ethical approaches to solving this problem? For example, if a human unconsciously learns how to game a driver monitoring system (e.g., via falling asleep with eyes open -- yes, that is a thing) should that still be the human driver's fault if a crash occurs?
    • Is it OK to deploy technology that will result in drivers being punished for not being super-human if result is that the total death rate declines?
  • Confidence in Safety Before Deployment.  There is work that advocates even slightly better than a human is acceptable (https://www.rand.org/blog/articles/2017/11/why-waiting-for-perfect-autonomous-vehicles-may-cost-lives.html). But there isn't a lot of discussion about the next level of what that really means. Important ethical sub-topics include:
    • Who decides when a vehicle is safe enough to deploy? Should that decision be made by a company on its own, or subject to external checks and balances? Is it OK for a company to deploy a vehicle they think is safe based just on subjective criteria alone: "we're smart, we worked hard, and we're convinced this will save lives"
    • What confidence is required for the actual prediction of casualties from the technology? If you are only statistically 20% confident that your self-driving car will be no more dangerous than a human driver, is that enough?
    • Should limited government resources that could be used for addressing known road safety issues (drunk driving, driving too fast for conditions, lack of seat belt use, distracted driving) be diverted to support self-driving vehicle initiatives using an argument of potential public safety improvement?
  • How Safe is Safe Enough? Even if we understand the relationship between an aggregate safety goal and self-driving car technology, where do we set the safety knob?  How will the following issues affect this?
    • Will risk homeostatis apply? There is an argument that there will be pressure to turn up the speed/traffic volume dials on self-driving cars to increase permissiveness and traffic flow until the same risk as manual driving is reached. (Think more capable cars resulting in crazier roads with the same net injury and fatality rates.)
    • Is it OK to deploy initially with a higher expected death rate than human drivers under an assumption that systems will improve over time, long term reducing the total number of deaths?  (And is it OK for this improvement to be assumed rather than proven to be likely?)
    • What redistribution of demographics for victims is OK? If fewer passengers die but more pedestrians die, is that OK if net death rate is the same? Is is OK if deaths disproportionately occur to specific sub-populations? Did any evaluation of safety before deployment account for these possibilities?
I don't purport to have the definitive answers to any of these problems (except a proposal for road testing safety, cited above). And it might be that some of these problems are more or less answered. The point is that there is so much important, relevant ethical work to be done that people shouldn't be wasting their time on trying to apply the Trolley Problem to AVs. I encourage follow-ups with pointers to relevant work.

If you're still wondering about Trolley-esque situations, see this podcast and the corresponding paper. The short version from the abstract of that paper: Trolley problems are "too contrived to be of practical use, are an inappropriate method for making decisions on issues of safety, and should not be used to inform engineering or policy." In general, it should be incredibly rare for a safely designed self-driving car to get into a no-win situation, and if it does happen they aren't going to have information about the victims and/or aren't going to have control authority to actually behave as suggested in the experiments any time soon, if ever.

Here are some links to more about applying ethics to technical systems in general (@IEEESSIT) and autonomy in particular (https://ethicsinaction.ieee.org/), as well as the IEEE P7000 standard series (https://www.standardsuniversity.org/e-magazine/march-2017/ethically-aligned-standards-a-model-for-the-future/).


How Road Testing Self-Driving Cars Gets More Dangerous as the Technology Improves

Safe road testing of autonomous vehicle technology assumes that human "safety drivers" will be able to prevent mishaps. But humans are notoriously bad at supervising autonomy. Ensuring that road testing is safe requires designing the test platform to have high "supervisability." In other words, it must be easy for a human to stay in the loop and compensate for autonomy errors, even when the autonomy is gets pretty good and the supervisor job gets pretty boring. This excerpt from a draft paper explains the concept and why it matters.

(update: full paper here:  https://users.ece.cmu.edu/~koopman/pubs/koopman19_TestingSafetyCase_SAEWCX.pdf
)
Figure 1.

An essential observation regarding self-driving car road testing is that it relies upon imperfect human responses to provide safety. There is some non-zero probability that the supervisor (a "safety driver") will not react in a timely fashion, and some additional probability that the supervisor will react incorrectly. Either of these outcomes could be an incident or mishap. Such a non-zero probability of unsuccessful failure mitigation means it is necessarily the case that the frequency of autonomy failures will influence on-road safety outcomes.

However, lower autonomy failure rates are not necessarily better. The types and frequencies of autonomy failures will affect the supervisability of the system. Therefore, the field failure rate and types of failures must be compatible with the measures being taken to ensure supervisor engagement. Thus, the failure profile must be “appropriate” rather than low.

Non-Linear Autonomy/Human Interactions

A significant difficulty in reasoning about the effect of autonomy failure on safety is that there is a non-linear response of human attentiveness to autonomy failure. We propose that there are five different regions of supervisability of autonomy failures, with two different hypothetical scenarios based on comparatively lower and higher supervisability trends illustrated in the figures.

1. Autonomy fails frequently in a dangerous way. In essence this is autonomy which is not really working. A supervisor faced with an AV test platform that is trying to run off the road every few seconds should terminate the testing and demand more development. We assume that such a system would never be operated on public roads in the first place, making a public risk assessment unnecessary. (Debugging of highly immature autonomy on public roads seems like a bad idea, and presents a high risk of mishaps.)

2. Autonomy fails moderately frequently but works or is benign most of the time. In this case the supervisor is more likely to remain attentive since an autonomy failure in the next few seconds or minutes is likely. The risk in this scenario is probably dominated by the ability of the supervisor to plan and execute adequate fault responses, and eventual supervisor fatigue.

3. Autonomy fails infrequently. In this case there is a real risk that the supervisor will lose focus during testing, and fail to respond in time or respond incorrectly due to loss of situational awareness. This is perhaps the most difficult situation for on-road testing, because the autonomy could be failing frequently enough to present an unacceptably high risk, but so infrequently that the supervisor is relatively ineffective at mitigation. This dangerous situation corresponds to the “valley of degraded supervision” in Figure 1.

4. Autonomy fails very infrequently, with high diagnostic coverage. At a high level of maturity, the autonomy might fail so infrequently that it is almost safe enough, and even a relatively disengaged driver can deal with failures well enough to result in a system that is overall acceptably safe. High coverage failure detection that prompts the driver to take over in the event of a failure might help improve the effectiveness of such a system. The ultimate safety of such a system will likely depend upon its ability to detect a risky situation with sufficient advance warning for the supervisor to re-engage and take over safely. (This scenario is generally aligned with envisioned production deployment of SAE Level 3 autonomy.)

5. Autonomy essentially never fails. In this case the role of the supervisor is to be there in case the expectation of “never fails” turns out to be incorrect in testing. It is difficult to know how to evaluate the potential effectiveness of a supervisor, other than that the supervisor will have the same tasks as the “very infrequently” preceding case, but is expected not to have to perform them.
Perhaps counter-intuitively, the probability of a supervisor failure is likely to increase as the autonomy failure rate decreases from regions 1 to 5 above (from left to right along the horizontal axis of the figures). In other words, the less often autonomy fails, the less reliable supervisor intervention becomes. The most dangerous operational region will be #3, in which the autonomy is failing often enough to present a significantly elevated risk, but not often enough to keep the supervisor alert and engaged. This is a well understood risk that must be addressed in a road testing safety case.

Figure 2 illustrates this effect with hypothetical performance data that results in an overall test
platform safety value in accordance with [math in the full paper]. A hypothetical lower supervisability curve results in a region in which the vehicle is less safe than a conventional vehicle driven by a human driver.

Safe testing requires a comparatively higher supervisability curve to ensure that the overall test platform safety is sufficiently high, as shown by Figure 2.

Figure 2.


Because autonomy capabilities are generally expected to mature over time, the safety argument must
be revisited periodically during test and development campaigns as the autonomy failure rate decreases from region 2 to 3 above. An intuitive – but dangerously incorrect – approach would be to assume that the requirements for test supervision can be relaxed as autonomy becomes more mature. Rather, it seems likely that the rigor of ensuring supervisors are vigilant and continually trained to maintain their ability to react effectively needs to be increased as autonomy technology transitions from immature to moderately mature. This effect only diminishes when the AV technology starts approximating the road safety of a conventional human driver all on its own (regions 4 & 5).

If you are actively doing self-driving car testing on public roads, please contact me for a preprint of the full paper that includes a GSN safety argumentation structure for ensuring road testing safety. I plan to present the full paper at SAE WCX 2019 in April.

-- Phil Koopman, Edge Case Research & Carnegie Mellon University