Three Safety Bins: road testing doesn't get you all the way to "safe"

There are three bins for self-driving car safety: 
(1) Obviously dangerous
(2) NOT obviously dangerous, and
(3) Safe.

Road testing and driving scenario coverage can help you get from bin 1 to bin 2.

Getting to Safe (bin 3) requires a whole lot more. That's because it requires handling rare, unexpected, and novel events in both the equipment and the environment.



FiveAI Report on Autonomous Vehicle Safety Certification

FiveAI has published an autonomous vehicle safety approach that includes independent verification, transparency, and data sharing. (I provided inputs to the @_FiveAI  authors.)

Here is a pointer to the summary on Medium
https://medium.com/@_FiveAI/we-need-an-industry-wide-safety-certification-framework-for-autonomous-vehicles-fiveai-publishes-1139dacd5a8c

It's worth jumping through the registration hoop to read the full version.
https://five.ai/certificationpaper


Webinar on Robustness Testing of Perception

Zachary Pezzementi and Trenton Tabor have done some great work on perception systems in general, and how image degradation affects things.  I'd previously posted information about their paper, but now there is a webinar available here:
    Webinar home page with details & links:  http://ieeeagra.com/events/webinar-november-4-2018/

This includes pointers to slides, a recorded webinar, the paper, and papers.

My robustness testing team at NREC worked with them on the perception stress testing parts, so here are quick links to the parts covering that part:


Potential Autonomous Vehicle Safety Improvement: Less Hype, More Data (OESA 2018)


I enjoyed being on a panel at the Annual OESA Suppliers Conference today. My intro talk covered setting reasonable expectations about how much safety benefit autonomous vehicles can provide in the near-term to mid-term. Spoiler: when you hear that 94% of all road fatalities are caused by bad and impaired drivers, that number doesn't mean what you think it means!







Uber ATG Safety Report

Summary: Uber's reports indicate that they are taking improving their safety culture seriously. Their new approach to public road testing seems reasonable in light of current practices. Whether they can achieve an appropriate level of system safety and software quality for the final production vehicles remains an open question -- just as it does for the self-driving car industry in general.

Uber ATG has released a set of materials regarding their in-development self-driving car technology and testing, including a NHTSA-style safety report, as well as reports of a safety review in light of the tragic death in Tempe AZ earlier this year. (See: https://www.uber.com/info/atg/safety/)

Generally I have refrained from critical analysis of other company safety reports because this is still something everyone is sorting out. Anyone putting out a safety report is automatically in the top 10% for transparency (because about 90% of the companies haven't even released one yet). So by this metric Uber looks good.  In fact, their report has a lot more detail than we've been seeing in general, so kudos to them for improving transparency. The other companies who haven't even published a report at all should get with the program.

But, Uber has also had a fatality as a result of their on-road test program. If any company should be put under increased scrutiny for safety it should be them.  I fully acknowledge that many of the critique points apply to other companies as well, so this is not about whether they are ahead or behind, but rather how they stand on their own merits.  (And, if anyone at Uber thinks I got something wrong, please let me know.)

Overall

It seems that Uber's development and deployment plan is generally what we're seeing from other companies.  They plan to operate on public roads to build a library of surprises and teach their system how to handle each one they encounter. They plan to have safety drivers (Mission Specialists) intervene when the vehicle encounters something it can't handle.  As a result of the fatal mishap they plan to improve safety culture, improve safety drivers, and do more pre-testing simulation. There is every reason to believe that at least some other companies were already doing those things, so this generally puts Uber on a par with where all companies doing road testing should be.

Clearly the tragic death in Tempe got Uber's attention, as it should have. Let's hope that other companies pay attention to the lessons learned before there is another fatality.

Doing the math, there should be no fatalities in any reasonable pre-deployment road test program. That's because there simply won't be enough miles accumulated in road testing with a small fleet to reach a level at which an average human driver would be likely to have experienced a fatal accident. (It is not zero risk, just as everyday driving is not risk free. But a fatality should be unlikely.)

The Good

  • This is perhaps the most thorough set of safety reports yet. We've been seeing a trend that more recent reports often include areas not touched on by earlier reports. I hope this results in a competitive dynamic in which each company wants to raise the bar for safety transparency. We'll see how this turns out. Uber is certainly doing their part.
  • The materials place significant emphasis on improving safety culture, including excellent recommendations of good practices from the external report. Safety culture is essential. I'm glad to see this.
  • There are detailed discussions about Mission Specialist roles, responsibilities, and training.  This is important. Supervising autonomy is a difficult, demanding role, and gets more difficult as the autonomy gets better. Again, I'm glad to see this.
  • There is quite a bit about hardware quality for both computer hardware and vehicle hardware. It is hard to tell how far down the ISO 26262 hardware safety path they are for life critical functions such as disengaging autonomy for Mission Specialist takeover.  They mention some external technical safety reviews, but none recently. This is a good start, but more work required here. They say they plan more external reviews, which is good.
  • They state concrete goals for each of their five safety principles. This is also good.

Jury Still Out on Fully Autonomous Operation System Safety:

  • The section on system safety is for the most part aspirational. Even the section that is not forward looking is mostly about plans, not current capabilities. This is consistent with currently using Mission Specialists to ensure testing safety.  In other words, assuming the Mission Specialist can avoid mishaps, and the vehicle always responds to human driver takeover, this isn't a problem yet. So we'll have to wait to see how this turns out.
  • The external review document concentrated on safety culture and road testing supervision. That would be consistent with a conclusion that the fatality root causes were poor safety culture and ineffective road testing supervision. (Certainly it would be no surprise if this hypothetical analysis were true, but we'll see what the final NTSB report says to know for sure.)
  • In general, we have no idea how they plan to prove that their vehicles are safe to deploy other than by driving around until they feel they have sufficiently infrequent road failures. Simulation will improve quality before they do road testing, but they are counting on road testing to find the problems. To be clear, road testing and that type of simulation can help, but I don't believe they are enough. (This is the same as for many other developers, so this is not a particular criticism of Uber.)
  • Uber says they are working on a safety case, perhaps using a GSN-based approach. This is an excellent idea. But I don't see anything that looks like a formal safety case in these documents. Hopefully we'll get to see something like that down the road.

Software Quality and Software Safety:

  • The software development process shown on page 47 of the report emphasizes fixing bugs found in testing. I don't know of any application domain in which that alone will actually get you acceptably safe life-critical software. Again, for now their primary safety strategy for testing is Mission Specialists, so this is an issue for the future. Maybe we'll find out if they are doing more in a later edition of this report. 
  • The information on software quality and software development process description is a bit skimpy in general.  It is difficult to tell if that is a reflection of their process or they just didn't want to talk about it.  For example, there is a box in their process diagrams on page 47 that says "peer review" with no description as to which of the many well-known review techniques they are using, whether they review everything, etc. There are no boxes in their software process for requirements, architecture, and design. There isn't an SQA function described (Software Quality Assurance, which deals with development process quality monitoring).  For Agile fans, there are plenty of boxes missing from whichever methodology you like. The point is that this is an incomplete software process model compared to what I'd expect to see for life-critical software. The question is whether the pieces are there and not drawn. Again, there is no other industry in which the approach shown would be sufficient or acceptable for creating life-critical software. It is possible there is more to their process than they are revealing, or they have some plan to address this before they remove their Mission Specialists from the vehicles. 
  • Perception (having the system recognize objects, obstacles, and so on) is notoriously difficult to get right, and probably the hardest problem out of many difficult problems to make self driving cars safe. They talk about how they use perception, but not how they plan to validate it, other than via road testing and possibly some aspects of simulating scenarios observed during road testing. But then again, other developers don't say much about this either.
  • It's easy to believe that at least some other organizations are following similar software approaches and will face the same challenges.  Again, because they currently have safety drivers these are forward-looking issues that are not the primary concerns for Uber road testing safety in the near term.
  • It is worth noting that they plan to have a secondary fail-over computer that they say will be developed at least taking into account software quality and safety standards such as ISO 26262 and MISRA C. (Safety Report Page 35.) But they don't seem to say if this is what they are doing for their basic control software that controls normal operation. Again, perhaps there is more to this they haven't revealed.

Is It Enough?

Overall the reports seem to put them on a par with other developers in terms of road testing safety. Whether they operate safely on public roads will largely depend upon maintaining their safety culture and Mission Specialist proficiency. I'd suggest an independent monitor within the organization to make sure that happens.


What I'd Like to See

There are a number of things I'd like to see from Uber to help in regaining public trust. (These same recommendations go for all the other companies doing public road testing.)
  • Uber should issue periodic report cards in which they tell us about adopting the recommendations in their various reports and their safety plans in general.  Are they staying on track? Did the safety culture initiative really work? Are they following their Mission Specialist procedures?
  • I'd like to see metrics that track the effectiveness of Mission Specialists. Nobody is perfect, but I'd be happier having data about how often they get distracted to see whether the break schedule and monitoring are working as intended. This should be something all companies do, since in the end they are putting the public at risk. The effectiveness of Mission Specialists who have been assigned a difficult job is their stated way to mitigate that risk -- but we have no insight as to whether that approach is really working until a crash is in the news.
  • They have promised safety metrics that are better than disengagements and miles driven. That's a great idea. We'll have to see how that turns out. (They sponsored a RAND report on this topic that was recently released. That will have to be the topic of another post.)
  • We should track whether they establish their external safety advisory board -- and whether it has appropriate autonomy and software safety technical expertise as well as areas such as safety culture and human/machine interface.
  • They should also have an independent internal monitor making sure their safety-relevant operational and design processes are being followed. This seems in line with their plans.
  • They need a much stronger story about how they plan to ensure software safety and system safety when they remove their Mission Specialists from the vehicle downstream. Hopefully they'll make public a high level version of the safety case and have it externally evaluated.
  • I hope that they work with PennDOT to comply with PA AV testing safety guidelines before resuming operation in Pittsburgh, where I live. From the materials I've seen that should be straightforward, but they should still do it. As of right now, they'd only be the second company to do so.
Dr. Philip Koopman is a faculty member at Carnegie Mellon University. He is an internationally recognized expert in the area of self-driving car safety. He is also Co-Founder of Edge Case Research, which provides products and services relating to autonomy safety. 
koopman@cmu.edu

Automotive Safety Practices vs. Accepted Principles (SAFECOMP paper)

I'm presenting this paper at SAFECOMP this today

2018 SAFECOMP Paper Preprint

Abstract. This paper documents the state of automotive computer-based system safety practices based on experiences with unintended acceleration litigation spanning multiple vehicle makers. There is a wide gulf between some observed automotive practices and established principles for safety critical system engineering. While some companies strive to do better, at least some car makers in the 2002-2010 era took a test-centric approach to safety that discounted nonreproducible and “unrealistic” faults, instead blaming driver error for mishaps. Regulators still follow policies from the pre-software safety assurance era. Eight general areas of contrast between accepted safety principles and observed automotive safety practices are identified. While the advent of ISO 26262 promises some progress, deployment of highly autonomous vehicles in a nonregulatory environment threatens to undermine safety engineering rigor.

See the full paper here:
https://users.ece.cmu.edu/~koopman/pubs/koopman18_safecomp.pdf

Note that there is some pretty interesting stuff to be seen by following the links in the paper reference section.
Also see the expanded list of (potentially) deadly automotive defects.

Here are the accompanying slides:  https://users.ece.cmu.edu/~koopman/pubs/koopman18_safecomp_slides.pdf







Victoria Australia Is Winning the Race to ADS Testing Safety Regulations

Victoria Australia has just issued new guidelines regarding Automated Driving System (ADS) testing.  These should be required reading for anyone doing on-road testing elsewhere in the world. There is just too much good stuff here to miss.  And, the guidelines are accompanied by actual laws that are designed to make autonomy testing safe.

A look through the regulations and guidelines shows that there is a lot to like. The most intriguing points I noticed were:
  • It provides essentially unlimited technical flexibility to the companies building the ADS vehicles while still providing a way to ensure safety. The approach is a simple two-parter:
    1. The testing permit holders have to explain why they will be safe via a safety management plan.
    2. If the vehicle testing doesn't follow the safety management plan or acts unsafely on the roads, the testing permit can be revoked.
  • The permit holder rather than the vehicle supervisor (a.k.a. "safety driver" in the US) is liable when operating in autonomous mode.  In other words, if the safety driver fails to avoid a mishap, liability rests with the company running the tests, not the safety driver. That sounds like an excellent way to avoid a hypothetical strategy of companies using safety drivers as scapegoats (or expendable liability shields) during testing.
  • The permitting process requires a description of ODD/OEDR factors including not just geofencing, but also weather, lighting, infrastructure requirements, and types of other road users that could be encountered.
  • The regulators have broad, sweeping powers to inspect, assess, require tests, and in general do the right thing to ensure that on-road testing is safe. For example, a permit can be denied or revoked if the safety plan is inadequate or not being followed.
There are many other interesting and on-target discussions in the guidelines.  They include the need to reduce risk as low as reasonably practicable (ALARP); accounting for the Australian road safety approach of: safe speeds, safe roads, safe vehicles, safe people during testing; transition issues between ADS and supervisor; the need to drive in a predictable way to interact safely with human drivers; and a multi-page list of issues to be considered by the safety plan. There is also a list of other laws that come into play.

Here are some pointers for those who want to look further.
There are some legal back stories at work here as well. For example, it seems that under previous law a passenger in an ADS could have been found responsible for errors made by the ADS, and this has been rectified with the new laws.

The regulations were created according to the following criteria from a 2009 Transportation bill:
  • Transportation system objectives:
    • Social and economic inclusion
    • Economic prosperity
    • Environmental sustainability
    • Integration of transport and land use
    • Efficiency, coordination and reliability
    • Safety and health and well being
  •  Decision making principles:
    • Principle of integrated decision making
    • Principle of triple bottom line assessment
    • Principle of equity
    • Principle of the transport system user perspective
    • Precautionary principle
    • Principle of stakeholder engagement and community participation
    • Principle of transparency. 
(The principle of transparency is my personal favorite.)

Here is a list of key features of the Road Safety (Automated Vehicles) Regulations 2018:

  1. The purpose of an ADS permits scheme (see regulation 5):
    • For trials of automated driving systems in automated mode of public roads
    • To enable a road authority to monitor and  manage the use and impacts of the automated driving system on a highway
    • To enable VicRoads to perform its functions under the Act and the Transport Integration Act
  2. The permit scheme requires the applicant to prepare and maintain a safety management plan that (see regulation 9 (2)):
    • Identifies the safety risks of the ADS trials
    • Identifies the risks to the reliability, security and operation of the automated driving system to be used in the ADS trial
    • Specifies what the applicant will do to eliminate or reduce those risks so far as is reasonably practicable; and
    • Addresses the safety criteria set out in the ADS guidelines
  3. The regulations will require the ADS permit holder to submit a serious incident within 24 hours (see regulations 13 and 19). A serious incident means any:
    • accident
    • speeding, traffic light, give way and level crossing offence
    • theft or carjacking
    • tampering with, unauthorised access to, modification of, or impairment of an automated driving system
    • failure of an automated driving system of an automated vehicle that would impair the reliability, security or operation of that automated driving system.
I hope that US states (and the US DOT) have a look at these materials.  Right now I'd say VicRoads is ahead of the US in the race to comprehensive but reasonable autonomous vehicle safety regulations.

(I would not at all be surprised if there are issues with these regulations that emerge over time. My primary point is that it looks to me like responsible regulation can be done in a way that does not pick technology winners and does not unnecessarily hinder innovation. This looks to be excellent source material for other regions to apply in a way suitable to their circumstances.)


AAMVA Slides on Self-Driving Car Road Testing Safety

These are the slides I presented at the AAMVA International Conference, August 22, 2018 in Philadelphia PA.

It's an update of my PennDOT AV Summit presentation from earlier this year.  A key takeaway is that the lesson that we should be learning from the tragic Uber fatality in Tempe AZ earlier this year is:
- Do NOT blame the victim
- Do NOT blame the technology
- Do NOT blame the driver
INSTEAD -- figure out how to make sure the safety driver is actually engaged even during long, monotonous road testing campaigns.   AND actually measure driver engagement so problems can be fixed before there is another avoidable testing fatality.

Even better is to use simulation to minimize the need for road testing, but given that testers are out on the road operating, there needs to be credible safety argument that they will be no more dangerous than other conventional vehicles while operating on public roads.






ADAS Code of Practice

One of the speakers at AVS last month mentioned that there was a Code of Practice for ADAS design (basically, level 1 and level 2 autonomy).  And that there is a proposal to update it over the next few years for higher autonomy levels.

A written set of uniform practices is generally worth something worth looking into, so I took a look here:
https://www.acea.be/uploads/publications/20090831_Code_of_Practice_ADAS.pdf


The main report sets forth a development process with a significant emphasis on controllability. That makes sense, because for ADAS typically the safety argument ultimately ends up being that the driver will be responsible for safety, and that requires an ability for the driver to assert ultimate control over a potentially malfunctioning system.

The part that I actually found more interesting in many respects was the set of Annexes, which include quite a number of checklists for controllability evaluation, safety analysis, and assessment methods as well as Human-Machine Interface concept selection.

I'd expect that this is a useful starting point for those working on higher levels of autonomy, and most critically anyone trying to take on the very difficult human/machine issues involved with level 2 and level 3 systems.  (Whether it is sufficient on its own is not something I can say at this point, but starting with something like this is usually better than a cold start.)

If you have any thoughts about this document please let me know via a comment.

The Case for Lower Speed Autonomous Vehicle On-Road Testing

Every once in a while I hear about a self-driving car test or deployment program that plans to operate at lower speeds (for example, under 25 mph) to lower risk. Intuitively that sounds good, but I thought it would be interesting to dig deeper and see what turns up.

There have been a few research projects over the years looking into the probability of a fatality when a conventionally driven car impacts a pedestrian. As you might expect, faster impact speeds increase fatalities. But it's not linear -- it's an S-shape curve. And that matters a lot:

(Source: WHO http://bit.ly/2uzRfSI )

Looking at this data (and other similar data), impacts at less than 20 miles an hour have a flat curve near zero, and are comparatively survivable. Above 30 mph or so is a significantly bigger problem on a per-incident basis.  Hmm, maybe the city planners who set 25 mph speed limits have a valid point!  (And surely they must have known this already.) In conventional vehicles the flat curve at and below 20 has lead to campaigns to nudge urban speed limits lower, with slogans such as "20 is plenty."

For on-road autonomous vehicle testing there's a message here. Low speed testing and deployment carries dramatically less risk of a fatality. The risk of a fatality goes up dramatically as speed increases beyond that.

For systems with a more complex "above 25 mph" strategy there still ought to be plenty that is either reused from the slow system or able to be validated at low speeds.  Yes, slow is different than fast due to the physics of kinetic energy.  But a strategy that validates as much as possible below 25 mph and then reuses significant amounts of that validation evidence as a foundation for higher speed validation could present less risk to the public.  For example, if you can't tell the difference between a person riding a bike and a person walking next to a bike at 25 mph, you're going to have worse problems at 45 mph.  (You might say "but that's not how we do it."  My point is maybe the AV industry should be optimizing for validation, and this is the way it should get done.)

It's clear that many companies are on a "race" to autonomy. But sometimes slow and steady can win the race. Slow speed runs might be less flashy, but until the technology matures slower speeds could dramatically reduce the risk of pedestrian fatalities due to a test platform or deployed system malfunction. Maybe that's a good idea, and we ought to encourage companies who take that path now and in the future as the technology continues to mature.



The "above 25 mph" paragraph was added in response to social media comments 8/9/2018.  And despite that I still got comments saying that systems below 25 mph are completely different than higher speed systems.  So in case that point isn't clear enough, here is more on that topic:

I'm not assuming that slow and fast systems are designed the same. Nor am I advocating for limiting AV designs only to slow speeds (unless that fits the ODD).

I'm saying when you build a high-speed capable AV, it's a good idea to initially test at below 25 mph to reduce the risk to the public for when something goes wrong.  And something WILL go wrong.  There is a reason there are safety drivers.

If a system is designed to work properly at speeds of 0 mph to 55 mph (say), you'd think it would work properly at 25 mph.  And you could design it so that at 25 it's using most or all of the machinery that is being used at 55 mph (SW, HW, sensors, algorithms, etc.)  Yes, you can get away with something simpler at low speed.  But this is low speed testing, not deployment.  Why go tearing around town at high speed with a system that hasn't even been proven at lower speeds?  Then bump up speed once you've built confidence.

If you design to validate as much as possible at lower speeds, you lower the risk exposure.  Sure, investors probably want to see max. speed operation as soon as possible.  But not at the cost of dead pedestrians because testing was done in a hurry.


Notes for those who like details:

There is certainly room for reasonable safety arguments at speeds above 20 mph. I'm just pointing out that testing time spent at/below 20 mph is inherently less risky if a pedestrian collision does occur. So maximizing the exposure to high speed operation is a way to improve overall safety in the event a pedestrian impact does occur.

The impact speed is potentially different than vehicle speed. If the vehicle has time to shed even 5 or 10 mph of speed at the last second before impact that certainly helps, potentially a lot, even if the vehicle does not come to a complete stop before impact. But a slower vehicle is less dependent upon that last second braking (human or automated) working properly in a crisis.

The actual risk will depend upon circumstances. For example, since the 1991 data shown it seems likely that emergency medical services have improved, reducing fatality rates. On the other hand, increasing prevalence of SUVs might increase fatality rates depending upon impact geometries. And so on.   A study that compares multiple data sets is here:
https://nacto.org/docs/usdg/relationship_between_speed_risk_fatal_injury_pedestrians_and_car_occupants_richards.pdf
But, all that aside, all the data I've seen shows that traditional city speed limits (25 mph or less) help with reducing pedestrian fatalities.


Putting image manipulations in context: robustness testing for safe perception

UPDATE 8/17 -- added presentation slides!

I'm very pleased to share a publication from our NREC autonomy validation team that explains how computationally cheap image perturbations and degradations can expose catastrophic perception brittleness issues.  You don't need adversarial attacks to foil machine learning-based perception -- straightforward image degradations such as blur or haze can cause problems too.

Our paper "Putting image manipulations in context: robustness testing for safe perception" will be presented at IEEE SSRR August 6-8.  Here's a submission preprint:

https://users.ece.cmu.edu/~koopman/pubs/pezzementi18_perception_robustness_testing.pdf

Abstract—We introduce a method to evaluate the robustness of perception systems to the wide variety of conditions that a deployed system will encounter. Using person detection as a sample safety-critical application, we evaluate the robustness of several state-of-the-art perception systems to a variety of common image perturbations and degradations. We introduce two novel image perturbations that use “contextual information” (in the form of stereo image data) to perform more physically-realistic simulation of haze and defocus effects. For both standard and contextual mutations, we show cases where performance drops catastrophically in response to barely perceptible
changes. We also show how robustness to contextual mutators can be predicted without the associated contextual information in some cases.

Fig. 6: Examples of images that show the largest change in detection performance for MS-CNN under moderate blur and haze. For all of them, the rate of FPs per image required to detect the person increases by three to five orders of magnitude. In each image, the green box shows the labeled location of the person. The blue and red boxes are the detection produced by the SUT before and after mutation respectively, and the white-on-blue text is the strength of that detection (ranged 0 to 1). Finally, the value in whiteon-yellow text shows the average FP rate per image that a sensitivity threshold set at that value would yield. i.e., that is the required FP rate to still detect the person.




Alternate slide download link: https://users.ece.cmu.edu/~koopman/pubs/pezzementi18_perception_robustness_testing_slides.pdf

Citation:
Pezzementi, Z., Tabor, T., Yim, S., Chang, J., Drozd, B., Guttendorf, D., Wagner, M., & Koopman, P., "Putting image manipulations in context: robustness testing for safe perception," IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), Aug. 2018.

Pennsylvania's Autonomous Vehicle Testing Guidelines

PennDOT has just issued new Automated Vehicle Testing Guidance:
       July 2018 PennDOT AV Testing Guidance (link to acrobat document)
(also, there is a press release.)


It's only been a short three months since the PA AV Summit in which PennDOT took up a challenge to improve AV testing policy. Today PennDOT released a significantly revised policy as promised. And it looks like they've been listening to safety advocates as well as AV companies.

At a high level, there is a lot to like about this policy. It makes it clear that a written safety plan is required, and suggests addressing one way or another the big three items I've proposed for AV testing safety
  • Make sure that the driver is paying attention
  • Make sure that the driver is capable of safing the vehicle in time when something goes wrong
  • Make sure that the Big Red Button (disengagement mechanism) is actually safe

There are a number of items in the guidance that look like a good idea. Here is a partial list of ones that catch my idea as being on the right track (many other ideas in the document are also good):

Good Ideas:
  • Submission of a written safety plan
  • Must have a safety driver in the driver seat who is able to take immediate physical control as required
  • Two safety drivers above 25 mph to ensure that the safety drivers are able to tend to both the safety driving and the testing
  • Validation "under controlled conditions" before on-road testing
  • Disengagement technology complies with industry standards
  • Safety driver training is mandatory, and has a nice list of required topics
  • Data recording for post-mishap analysis
  • Mitigate cybersecurity risk
  • Quality controls to ensure that major items are "adhered to and measured to ensure safe operation"
There are also some ideas that might or might not work out well in practice. I'm not so sure how these will work out, and they seem in some cases to be compromises:

Not Sure About These:
  • Only one safety driver required below 25 mph. It's true that low speed pedestrian collisions are less lethal, and there can be more time to react, so the risk is somewhat lower. But time will tell if drivers are able to stay sufficiently alert to avoid mishaps even if they are lower speed.
  • It's not explicit about the issue of ensuring that there is enough time for a safety driver to intervene when something goes wrong. It's implicit in the parts about a driver being able to safe the vehicle. It's possible that this was considered a technical issue for developers rather than regulators, but in my mind it is a primary concern that can easily be overlooked in a safety plan. This topic should be more explicitly called out in the safety plan.
  • The data reporting beyond crashes is mostly just tracking drivers, vehicles, and how much testing they are doing.  I'd like to see more reporting regarding how well they are adhering to their own safety plan. It's one thing to say things look good via hand waving and "trust us, we're smart." It's another to report metrics such as how often drivers drop out during testing and what corrective actions are taken in response to such data. (The rate won't be a perfect zero; continual improvement should be the goal, as well as mishap rates no worse than conventional vehicles during testing.) I realize picking metrics can be a problem -- so just let each company decide for themselves what they want to report. The requirement should be to show evidence that safety is actually being achieved during testing. To be fair, there is a bullet in the document requiring quality controls. I'd like that bullet to have more explicit teeth to get the job done.
  • The nicely outlined PennDOT safety plan can be avoided by instead submitting something following the 2017 NHTSA AV Guidance. That guidance is a lot weaker than the 2016 NHTSA AV Guidance was. Waymo and GM have already created such public safety disclosures, and others are likely coming. However, it is difficult for a reader to know if AV vendors are just saying a lot of buzzwords or are actually doing the right things to be safe. Ultimately I'm not comfortable with "trust us, we're safe" with no supporting evidence. While some disclosure is better than no disclosure, the public deserves better than NHTSA's rather low bar in safety plan transparency, which was not intended to deal specifically with on-road testing. We'll have to see how this alternative option plays out, and what transparency the AV testers voluntarily provide. Maybe the new 2018 NHTSA AV Guidance due later this summer will raise the bar again.
Having said nice things for the most part, there are a few areas which really need improvement in a future revision. I realize they didn't have time to solve everything in three months, and it's good to see the progress they made. But I hope these areas are on the list for the next iteration:

Not A Fan:
  • Only one safety driver above 25 mph after undergoing "enhanced driver safety training." It's unclear what this training might really be, or if more training will really result in drivers that can do solo testing safely. I'd like to see something more substantive demonstrating that solo drivers will actually be safe in practice. Training only goes so far, and no amount of hiring only experienced drivers will eliminate the fact that humans have trouble staying engaged when supervising autonomy for long stretches of time. I'm concerned this will end up being a loophole that puts solo drivers in an untenable safety role.
  • No independent auditing. This is a big one, and worth discussing at length.
The biggest issue I see is no requirement for independent auditing of safety. I can understand why it might be difficult to get testers on board with such a requirement, especially a requirement for third party auditing. The AV business is shrouded in secrecy. Nobody wants PennDOT or anyone else poking around in their business. But every other safety-critical domain is based on an approach of transparent, independent safety assessment.

A key here is that independent auditing does NOT have to include public release of information.  The "secret sauce" doesn't even have to be revealed to auditors, so long as the system is safe regardless of what's in the fancy autonomy parts of the system. There are established models to keep trade secrets a secret used in other industries while still providing independent oversight of safety. There's no reason AVs should be any different. After all, we're all being put at risk by AV testing when we share public roads with them, even as pedestrians. AV testing ought to have transparent, independent safety oversight.

Overall, I think this guidance is excellent progress from PennDOT that puts us ahead of most, if not all locations in the US regarding AV safety testing. I hope that AV testers take this and my points above to heart, and get ahead of the safety testing problem.

Road Sign Databases and Safety Critical Data Integrity

It's common for autonomous vehicles to use road map data, sign data, and so on for their operation. But what if that data has a problem?

Consider that while some data is being mapped by the vehicle manufacturers, they might be relying upon other data as well.  For example, some companies are encouraging cities to build a database of local road signs  (https://www.wired.com/story/inrix-road-rules-self-driving-cars?mbid=nl_071718_daily_list3_p4&CNDID=23351989)

It's important to understand the integrity of the data. What if there is a stop sign missing from the database and the vehicle decides to believe the database if it's not sure whether a stop sign in the real world is valid?  (Perhaps it's hard to see the real world stop sign due to sun glare and the vehicle just goes with the database.) If the vehicle blows through a stop sign because it's missing from the database, whose fault is that?  And what happens next?

Hopefully such databases will be highly accurate, but anyone who has worked with any non-trivial database knows there is always some problem somewhere. In fact, there have been numerous accidents and even deaths due to incorrect or corrupted data over the years.

Avoiding "death by road sign database" requires managing the safety critical integrity of the road sign data (and map data in general).  If your system uses it for guidance but assumes it is defective with comparatively high probability, then maybe you're fine. But as soon as you trust it to make a safety-relevant decision, you need to think about how much you can trust it and what measures are in place to ensure it is not only accurately captured, but also dependably maintained, updated, and delivered to consumers.

Fortunately you don't need to start from scratch.  The Safety-Critical Systems Club has been working on this problem for a while, and recently issued version 3 of their guidlines for safety critical data. You can get it for free as a download here: https://scsc.uk/scsc-127c

The guidance includes a broad range of  information, guidance, and a worked example.  It also has quite a number of data integrity issues in Appendix H that are worth looking at if you need some war stories about what happens if you get data integrity wrong.  Highly recommended.


https://scsc.uk/r127C:2



Latest version as of May 2021:
https://scsc.uk/scsc-127F

A Safe Way to Apply FMVSS Principles to Self-Driving Cars

As the self-driving car industry works to create safer vehicles, it is facing a significant regulatory challenge.  Complying with existing Federal Motor Vehicle Safety Standards (FMVSS) can be difficult or impossible for advanced designs. For conventional vehicles the FMVSS structure helps ensure a basic level of safety by testing some key safety capabilities. However, it might be impossible to run these tests on advanced self-driving cars that lack a brake pedal, steering wheel, or other components required by test procedures.

While there is industry pressure to waive some FMVSS requirements in the name of hastening progress, doing so is likely to result in safety problems. I’ll explain a way out of this dilemma based on the established technique of using safety cases. In brief, auto makers should create an evidence-based explanation as to why they achieve the intended safety goals of current FMVSS regulations even if they can’t perform the tests as written. This does not require disclosure of proprietary autonomous vehicle technology, and does not require waiting for the government to design new safety test procedures.

Why the Current FMVSS Structure Must Change

Consider an example of FMVSS 138, which relates to tire pressure monitoring. At some point many readers have seen a tire pressure telltale light, warning of low tire pressure:

FMVSS 138 Low Tire Pressure Telltale

This light exists because of FMVSS, which specifies tests to make sure that a driver-visible telltale light turns on for under-inflation and blow-out conditions with specified road surface conditions, vehicle speed, and so on.

But what if an unmanned vehicle doesn’t have a driver seat?  Or even a dashboard for mounting the telltale? Should we wait years for the government to develop an alternate self-driving car FMVSS series? Or should we simply waive FMVSS compliance when the tests don’t make sense as written?

Simplistic, blanket waivers are a bad idea. It is said that safety standards such as FMVSS are written in the blood of past victims. Self-driving cars are supposed to improve safety. We shouldn’t grant FMVSS waivers that will result in having more blood spilled to re-learn well understood lessons for self-driving cars.

The weakness of the FMVSS approach is that the tests don’t explicitly capture the “why” of the safety standard. Rather, there is a very prescriptive set of rules, operating in a manner similar to building codes for houses. Like building codes, they can take time to update when new technology appears. But just as it is a bad idea to skip a building inspection on your new house, you shouldn’t let vehicle makers skip FMVSS tests for your new car – self-driving or otherwise. Despite the fear of hindering progress, something must be done to adapt the FMVSS framework to self-driving cars.

A Safety Case Approach to FMVSS

A way to permit rapid progress while still ensuring that we don’t lose ground on basic vehicle safety is to adopt a safety case approach. A safety case is a written explanation of why a system is appropriately safe. Safety cases include: a safety goal, a strategy for meeting the goal, and evidence that the strategy actually works.

To create an FMVSS 138 safety case, a self-driving car maker would first need to identify the safety goals behind that standard. A number of public documents that precede FMVSS 138 state safety goals of detecting low tire pressure and avoiding blowouts. Those goals were, in turn, motivated by dozens of deaths resulting from tire blowouts that provoked the 2000 TREAD act.

The next step is for the vehicle maker to propose a safety strategy compatible with its product. For example, vehicle software might set internal speed and distance limits in response to a tire failure, or simply pull off the road to await service. The safety case would also propose tests to provide concrete evidence that the safety strategy is effective. For example, instead of demonstrating that a telltale light illuminates, the test might instead show that the vehicle pulls to the side of the road within a certain timeframe when low tire pressure is detected. There is considerable flexibility in safety strategy and evidence so long as the safety goal is adequately met.

Regulators will need a process for documenting the safety case for each requested FMVSS deviation. They must decide whether they should evaluate safety cases up front or employ less direct feedback approaches such as post-mishap litigation. Regardless of approach, the safety cases can be made public, because they will describe a way to test vehicles for basic safety, and not the inner workings of highly proprietary autonomy algorithms.

Implementing this approach only requires vehicle makers to do extra work for FMVSS deviations that provide their products with a competitive advantage. Over time, it is likely that a set of standardized industry approaches for typical vehicle designs will emerge, reducing the effort involved. And if an FMVSS requirement is truly irrelevant, a safety case can explain why.

While there is much more to self-driving car safety than FMVSS compliance, we should not be moving backward by abandoning accepted vehicle safety requirements. Instead, a safety case approach will enable self-driving car makers to innovate as rapidly as they like, with a pay-as-you-go burden to justify why their alternative approaches to providing existing safety capabilities are adequate.

Author info: Prof. Koopman has been helping government, commercial, and academic self-driving developers improve safety for 20 years.
Contact: koopman@cmu.edu

Originally published in The Hill 6/30/2018:
http://thehill.com/opinion/technology/394945-how-to-keep-self-driving-cars-safe-when-no-one-is-watching-for-dashboard

AVS 2018 Panel Session

It was great to have the opportunity to participate in a panel on autonomous vehicle validation and safety at AVS in San Francisco this past week.  Thanks especially to Steve Shladover for organizing such an excellent forum for discussion.

The discussion was the super-brief version. If you want to dig deeper, you can find much more complete slide decks attached to other blog posts:
The first question was to spend 5 minutes talking about the types of things we do for validation and safety.  Here are my slides from that very brief opening statement.



Robustness Testing of Autonomy Software (ICSE 2018)

Our Robustness Testing team at CMU/NREC presented a great paper at ICSE on the things we learned on five years with the Automated Stress Testing for Autonomy Systems (ASTAA) project across 11 projects, finding 150 significant bugs.




The team members contributing to the paper were:
Casidhe Hutchison, Milda Zizyte, Patrick E. Lanigan, David Guttendorf, Michael Wagner, Claire Le Goues, and Philip Koopman.

Special thanks to Cas for doing the heavy lifting on the paper, and to Milda for the conference presentation.



Safety Validation and Edge Case Testing for Autonomous Vehicles (Slides)

Here is a slide deck that expands upon the idea that the heavy tail ceiling is a problem for AV validation. It also explains ways to augment image sensor inputs to improve robustness.



Safety Validation and Edge Case Testing for Autonomous Vehicles from Philip Koopman

(If slideshare is blocked for you, try this alternate download source)

Heavy Tail Ceiling Problem for AV Testing

I enjoyed participating in the AV Benchmarking Panel hosted by Clemson ICAR last week.  Here are my slides and a preprint of my position paper on the Heavy Tail Ceiling problem for AV safety testing.

Abstract
Creating safe autonomous vehicles will require not only extensive training and testing against realistic operational scenarios, but also dealing with uncertainty. The real world can present many rare but dangerous events, suggesting that these systems will need to be robust when encountering novel, unforeseen situations. Generalizing from observed road data to hypothesize various classes of unusual situations will help. However, a heavy tail distribution of surprises from the real world could make it impossible to use a simplistic drive/fail/fix development process to achieve acceptable safety. Autonomous vehicles will need to be robust in handling novelty, and will additionally need a way to detect that they are encountering a surprise so that they can remain safe in the face of uncertainty

Paper Preprint:
http://users.ece.cmu.edu/~koopman/pubs/koopman18_heavy_tail_ceiling.pdf

Presentation:




A Reality Check on the 94 Percent Human Error Statistic for Automated Cars

Automated cars are unlikely to get rid of all the "94% human error" mishaps that are often cited as a safety rationale. But there is certainly room for improvement compared to human drivers. Let's sort out the hype from the data.

You've heard that the reason we desperately need automated cars is that 94% of crashes are due to human error, right?  And that humans make poor choices such as driving impaired, right?  Surely, then, autonomous vehicles will give us a factor of 10 or more improvement simply by not driving stupid, right?

Not so fast. That's not actually what the research data says. It's important for us to set realistic expectations for this promising new technology. Probably it's more like 50%.  Let's dig deeper.



The US Department of Transportation publishes an impressive amount of data on traffic safety -- which is a good thing. And, sure enough, you can find the 94% number in DOT HS 812 115 Traffic Safety Facts, Feb. 2015.  (https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/812115) which says: "The critical reason was assigned to drivers in an estimated 2,046,000 crashes that comprise 94 percent of the NMVCCS crashes at the national level. However, in none of these cases was the assignment intended to blame the driver for causing the crash" (emphasis added). There’s the 94% number.  But wait – they’re not actually blaming the driver for those crashes! We need to dig deeper here.

Before digging, it's worth noting that this isn't really 2015 data, but rather a 2015 summary of data based on a data analysis report published in 2008.  (DOT HS 811 059 National Motor Vehicle Crash Causation Survey, July 2008 https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/811059)  If you want to draw your own conclusions you should look at the original study to make sure you understand things.

Now that we can look at the primary data, first we need to see what assigning a "crash reason" to the driver really means. Page 22 of the 2008 report sheds some light on this. Basically, if something goes wrong and the driver should have (arguably) been able to avoid the crash, the mishap “reason” is assigned to the driver. That's not at all the same thing as the accident being the driver's fault to due an overt error that directly triggered the mishap. To its credit that original report makes this clear. This means that the idea of crashes or fatalities being “94% due to driver error” differs from the report's findings in a subtle but critical way.

Indeed, many crashes are caused by drunk drivers. But other crashes are caused by something bad happening that the driver doesn't manage to recover from. Still other crashes are caused by the limits of human ability to safely operate a vehicle in difficult real-world situations despite the driver not having violated any rules. We need to dig still deeper to understand what's really going on with this report.

Page 25 of the report sheds some light on this. Of the 94% of mishaps attributed to drivers, there are a number of clear driver misbehaviors listed, including distracted driving, illegal maneuvers, and sleeping. But the #1 problem is "Inadequate surveillance 20.3%." In other words, a fifth of mishaps blamed on drivers are the driver not correctly identifying an obstacle, missing some road condition, or other problem of that nature. While automated cars might have better sensor coverage than a human driver's eyes, misclassifying an object or being fooled by an unusual scenario could happen with an automated car just as it can happen to a human.  (In other words, sometimes driver blame could be assigned to an automated driver, even if part of the 94%.) This biggest bin isn’t drunk driving at all, but rather gets to a core reason of why building automated cars is so hard. Highly accurate perception is difficult, whether you're a human or a machine.

Other driver bins in the analysis include "False assumption of other's action 4.5%," "Other/unknown recognition error 2.5%," Other/unknown decision error 6.2%," and "Other/unknown driver error 7.9%".  That’s another 21% that might or might not be impaired driving, and might be a mistake that could also be made by an automated driver.

So in the end, the 94% human attribution for mishaps isn't all impaired or misbehaving drivers. Rather, many of the reasons assigned to drivers sound more like imperfect drivers. It's increasingly clear that autonomous vehicles can also be imperfect. For example, they can misclassify objects on the road. So we can't blithely claim that automated cars won't have any of failures that the study attributes to human error. Rather, at least some of these problems will likely change from being assigned to "human driver error" to instead being "robot driver error."   Humans aren't perfect. Neither are robots. Robots might be better than humans in the end, but that area is still a work in progress and we do not yet have data to prove that it will turn out in the robot driver's favor any time soon.

A more precise statement of the study's findings is that while it is indeed true that 94% of mishaps might be attributed to humans, significantly less than that number is attributable to poor human choices such as driving drunk. While I certainly appreciate that computers don't drive drunk, they just might be driving buggy. And even if they don’t drive buggy, making self driving cars just as good as an unimpaired driver is unlikely to get near the 94% number so often tossed around. Perception, modeling expected behavior of other actors, and dealing with unexpected situations are all notoriously difficult to get right, and are all cited sources of mishaps.  So this should really be no surprise. It is possible automated cars might be a lot better than people eventually, but this data doesn't support that expectation at the 94% better level.

We can get a bit more clarity by looking at another DOT report that might help set more realistic expectations. Let's take a look at 2016 Fatal Motor Vehicle Crashes: Overview (DOT HS 812 456; access via https://www.nhtsa.gov/press-releases/usdot-releases-2016-fatal-traffic-crash-data). The most relevant numbers are below (note that there is overlap, so the categories add up to more than 100%):
  • Total US roadway fatalities: 37,461
  • Alcohol-impaired-driving fatalities: 10,497
  • Unrestrained passenger vehicle fatalities (not wearing seat belts): 10,428
  • Speeding-related fatalities: 10,111
  • Distraction-affected fatalities: 3,450
  • Drowsy driving fatalities: 803
  • Non-occupant fatalities (sum of pedestrians, cyclists, other): 7,079

I'm not going to attempt a detailed analysis, but certainly we can get the broad brush strokes from this data.  I come up with the following three conclusions:

1. Clearly impaired driving and speeding contribute to a large number of fatalities (ballpark twenty thousand, although there are overlaps in the categories).  So there is plenty of low hanging fruit to go after if we can create a automated vehicle that is as good as an unimpaired human. But it might be more like a 2x or 3x improvement than a 20x improvement. Consider the typical 100 million mile between fatality number that is quoted for human drivers. If you remove the impaired drivers, based on this data you get more like about 200 million miles between fatalities. It will take a lot to get automated cars that good. Don't get me wrong; a 2x or so improvement is potentially a lot of lives saved. But it's not zero fatalities, and it's nowhere near a 94% reduction.

2. Almost one-fifth of the fatalities are pedestrians, cyclists, and other at-risk road users. Detecting and avoiding that type of crash is notoriously difficult for autonomous vehicle technology. Occupants have all sorts of crash survivability features. Pedestrians -- not so much. Ensuring non-occupant safety has to be a high priority if we're going to deploy this technology and avoid an unintended consequence of a rise in pedestrian fatalities potentially offsetting gains in occupant safety.

3. Well over a quarter of the vehicle occupant fatalities are attributed to not wearing seat belts. Older generations have lived through the various technological attempts to enforce seat belt use. (Remember motorized seat belts?)  The main takeaway has been that some people will go to extreme lengths to avoid wearing seat belts. It's difficult to see how automated car technology alone will change that. (Yes, if there are zero crashes the need for seat belts is reduced, but we're not going to get there any time soon. It seems more likely that seat belts will play a key part in reducing fatalities when coupled with hopefully less violent and less frequent crashes.)

So where does that leave us?

There is plenty of potential for automated cars to help with safety. But the low hanging fruit is more likely cutting fatalities perhaps in half if we can achieve parity with an average well behaved human driver. The 94% number so often quoted will take a lot more than that. Over time, hopefully, automated cars can continue to improve further. ADAS features such as automatic emergency braking can likely help too. And for now, even getting rid of half the traffic deaths is well worth doing, especially if we make sure to consider pedestrian safety.

Driving safely is a complex task for man or machine. It would be a shame if a hype roller coaster ride triggers disillusionment with technology that can in the end improve safety. Let's set reasonable expectations so that automated car technology is asked to provide near term benefits that it can actually deliver.


Update: Mitch Turk pointed out another study from 2016 that is interesting.
http://www.pnas.org/content/pnas/113/10/2636.full.pdf

This has data from monitoring drivers who experienced crashes. The ground rules for the experiment were a little different, but it has data explaining what was going on in the vehicle before a crash.  A significant finding is that driver distraction is an issue. (Note that this data is several years later than the previous study, so that makes some sense.)

For our purposes an interesting finding is that 12.3% of crashes were NOT distracted/NOT Impaired/NOT Human Error:


Beyond that, it seems likely that some of the other categories contain scenarios that could be difficult for an AV, such as undistracted, unimpaired errors (16.5%).



Update 7/18/2018:  Laura Fraade-Blanar from RAND send me a nice paper from 2014 on exactly this topic:
https://web.archive.org/web/20181121213923/https://www.casact.org/pubs/forum/14fforum/CAS%20AVTF_Restated_NMVCCS.pdf

The study looked at the NMVCSS data from 2008 and asked the question of what this data means for autonomy, accounting for issues that autonomy is going to have trouble addressing. They say that "49% of accidents contain at least one limiting factor that could disable the [autonomy] technology or reduce its effectiveness."  They also point out that autonomy can create new risks not present in manually driven vehicles.

So this data suggests that eliminating perhaps half of autonomous vehicle crashes is a more realistic goal.


NOTES:

For an update on how many organizations are misusing this statistic, see:
  https://usa.streetsblog.org/2020/10/14/the-94-solution-we-need-to-understand-the-causes-of-crashes/

The studies referenced, as with other similar studies, don’t attempt to identify mishaps that might have been caused by computer-based system malfunction. In studies like this, potential computer-based faults and non-reproducible faults in general are commonly attributed to driver error unless there is compelling evidence to the contrary. So the human error numbers must be taken with a grain of salt. However, this doesn’t change the general nature of the argument being made here.

The data supporting the conclusion is more than 10 years old. So it would be no surprise if ADAS technology has been changing things. In any event, looking into current safety expectations should arguably require separating the effects of ADAS systems such as automatic emergency braking (AEB) from functions such as lane keeping and speed control.  ADAS can potentially improve safety even for well behaved human drivers.

Some readers will want to argue for a more aggressive safety target than 2x to 3x safer than average humans. I'm not saying that 2x or 3x is an acceptable safety target -- that's a whole different discussion. What I'm saying is that is a much more likely near term success target than the 94% number tossed around.

There will no doubt be different takes on how to interpret data from these and other reports.  That’s a vitally important discussion to have. But the point of this essay is not to argue how safe is safe enough.  Rather, the point is to have a discussion about realistic expectations based on data and engineering, not hype. So if you want to argue a different outcome than I propose, that's great. But please bring data instead of marketing claims to the discussion. Thanks.

Can Mobileye Validate ‘True Redundancy’?

I'm quoted in this article by Junko Yoshida on Mobileye's approach to AV safety.

Can Mobileye Validate ‘True Redundancy’?
Intel/Mobileye’s robocars start running in Jerusalem
Junko Yoshida
5/22/2018 02:01 PM EDT

...
Issues include how to achieve “true redundancy” in perception, how to explain objectively what “safe” really means, and how to formulate “a consistent and complete set of safety rules” agreeable to the whole AV industry, according to Phil Koopman, professor of Carnegie Mellon University.
...

Read the story here:
  https://www.eetimes.com/document.asp?doc_id=1333308

Did Uber Do Enough to Make Test AVs Safe?

I'm quoted in this article by Junko Yoshida regarding on-road AV testing safety:

Did Uber Do Enough to Make Test AVs Safe?
Junko Yoshida, Chief International Correspondent
5/26/2018 05:01 PM EDT

"The NTSB preliminary report exposes two issues. One is the immaturity of Uber's AV software stack. Another is the absence of an Uber safety strategy in creating its AV testing platform."

Read more:
  https://www.eetimes.com/author.asp?section_id=36&doc_id=1333325

AutoSens 2018 slides

I enjoyed presenting at AutoSens 2018 today.   The audience was very engaged and asked great questions.

Here are my slides. (If you seen my other recent slide decks probably not a lot of surprises, but I remixed things to emphasize perception validation.)


Slides from US-China Transportation Forum Presentation

On Thursday I had the honor to presenting to two Secretaries of Transportation at the 2018 U.S.-China Transportation Forum in Beijing China.  (US Secretary Chao and China Secretary Yang were in the front row -- as well as a huge room full of US and China delegates.)  It was a really interesting experience, and I truly appreciate the support and hospitality shown by the organizers and both governments.  It's not often that a hard-core techie gets face time with cabinet members!

I was the US Autonomous Vehicle technical expert in one of two technology sessions.  My topic was safe autonomous vehicle testing safety.   I gave the short version of my PA AV Summit talk.  The slides are here for anyone who is interested in seeing how I tried to boil that message down to a 10 minute slot (with simultaneous translation to Chinese).



Toward a framework for Highly Automated Vehicle Safety Validation

I'm presenting a paper on AV safety validation at the 2018 SAE World Congress.  Here's the unofficial version of the presentation and a preprint of the paper.

Toward a Framework for Highly Automated Vehicle Safety Validation
Philip Koopman & Michael Wagner
2018 SAE World Congress / SAE 2018-01-1071

Abstract:
Validating the safety of Highly Automated Vehicles (HAVs) is a significant autonomy challenge. HAV safety validation strategies based solely on brute force on-road testing campaigns are unlikely to be viable. While simulations and exercising edge case scenarios can help reduce validation cost, those techniques alone are unlikely to provide a sufficient level of assurance for full-scale deployment without adopting a more nuanced view of validation data collection and safety analysis. Validation approaches can be improved by using higher fidelity testing to explicitly validate the assumptions and simplifications of lower fidelity testing rather than just obtaining sampled replication of lower fidelity results. Disentangling multiple testing goals can help by separating validation processes for requirements, environmental model sufficiency, autonomy correctness, autonomy robustness, and test scenario sufficiency. For autonomy approaches with implicit designs and requirements, such as machine learning training data sets, establishing observability points in the architecture can help ensure that vehicles pass the right tests for the right reason. These principles could improve both efficiency and effectiveness for demonstrating HAV safety as part of a phased validation plan that includes both a "driver test" and lifecycle monitoring as well as explicitly managing validation uncertainty.

Paper Preprint:        http://users.ece.cmu.edu/~koopman/pubs/koopman18_av_safety_validation.pdf



Ensuring The Safety of On-Road Self-Driving Car Testing (PA AV Summit Talk Slides)

This is the slide version of my op-ed on how to make self-driving car testing safe.

The take-away is create a test vehicle with a written safety case that addresses these topics:
  • Show that the safety driver is paying adequate attention
  • Show that the safety driver has time to react if needed
  • Show that AV disengagement/safing actually works when things go wrong
(An abbreviated version was also presented in April 2018 at the US-China Transportation Forum in Beijing China.)


What can we learn from the UK guidelines on self-driving car testing?


The UK already has a pretty good answer for how to do self-driving car testing safely. US stakeholders could learn something from it.


You can see the document for yourself at: 
https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/446316/pathway-driverless-cars.pdf

As industry and various governmental organizations decide what to do in response to the tragic Tempe Arizona pedestrian accident, it's worth looking abroad to see what others have done.  As it turns out, the UK Department for Transport issued a 14 page document in July 2015: "The Pathway to Driverless Cars: A Code of Practice for testing."   It covers test drivers, vehicle equipment, licensing, insurance, data recording, and more. So far so good, and kudos for specifically addressing the topic of test platform safety that long ago!

As I'd expect from a UK safety document, there is a lot to like.  I'm not going to try to summarize it all, but here are some comments on specific sections that are worth noting. Overall, I think that the content is useful and will be helpful to improve safety for testing Autonomous Vehicle (AV) technology on public roads.  My only criticism is that it doesn't go quite far enough in a couple places.

First, it is light on making sure that the safety process is actually performing as intended. For example, they say it's important to make sure that test drivers are not fatigued, which is good. But they don't explicitly say that you need to take operational data to make sure that the procedures intended to mitigate fatigue problems are actually resulting in alert drivers. Similarly, they say that the test drivers need time to react, but they don't require feedback to make sure that on-road vehicles are actually leaving the drivers enough time to react during operation.  (Note that this is a tricky bit to be sure you get right because distracted drivers take longer to react, so you really need to ensure that field operations are leaving sufficient reaction time margin.)

In fairness, they say "robust procedures" and for a safety person taking data to make sure the procedures are actually working should be obvious. Nonetheless I've found in practice that it's important to spell out the need for feedback to correct safety issues. If you have a high stakes environment such as the autonomy race to market, it's only natural that the testers will be under pressure to cut corners. The only way I know of to ensure that "aggressive" technology maturation doesn't cross the line to being unsafe is to have continual feedback from field operations to ensure that the assumptions and strategy underlying the safety plan are actually effective and working as intended. For example, you should detect and correct systemic problems with safety driver alertness long before you experience a pedestrian fatality.

Second, although they say it's important for the takeover mechanism to work, they don't specifically require designing it according to a suitable functional safety standard.  Again, for a safety person this should be obvious, and quite possibly it was so obvious to the authors of this document that they didn't bother mentioning it.  But again it's worth spelling out.

To be clear, it's important that any on-road testing of AV technology should be no more dangerous than the normal operation of a human-driven non-autonomous vehicle. That's the whole purpose of having a safety driver!  But getting safety drivers to be that good in practice can be a challenge. However, rather than succumb to pessimism about whether testing can actually be safe, I say let the AV developers prove that they can handle this challenge with a transparent, public safety argument.  (See also my previous posting on safe AV testing for a high level take on things.)

The UK testing safety document is well worth considering by any governmental agency or AV company contemplating how on-road testing of AV technology should be done.



Below are some more detailed notes. The bullets are from the source document, with some informal comments after each bullet:
  • 1.3: ensure that on-road testing "is carried out with the minimum practicable risk"
This appears to be invoking the UK legal concept of ALARP ("As Low As Reasonably Practicable") and SFAIRP ("So Far As Is Reasonably Practicable.")  This is a technical concept, not an intuitive concept.  You can't simply say "this ought to be OK because I think it's OK." Rather, that you need to demonstrate via a rigorous engineering process that you've done everything reasonably practicable to reduce risk. 
  • 3.4 Testing organisations should:
    • ... Conduct risk analysis of any proposed tests and have appropriate risk management strategies.
    • Be conscious of the effect of the use of such test vehicles on other road users and plan trials to manage the risk of adverse impacts. 
It's not OK to start driving around without having done some work to understand and mitigate risks.
  • 4.16 Testing organisations should develop robust procedures to ensure that test drivers and operators are sufficiently alert to perform their role and do not suffer fatigue. This could include setting limits for the amount of time that test drivers or operators perform such a role per day and the maximum duration of any one test period. 
The test drivers have to stay alert.  Simply setting the limits isn't enough. You have to actually make sure the limits are followed, that there isn't undue pressure for drivers to skip breaks, and in the end you have to make sure that drivers are actually alert.  Solving alertness issues by firing sleepy drivers doesn't fix any systemic problem with alertness -- it just gives you fresh drivers who will have just as much trouble staying alert as the drivers you just fired.
  • 4.20 Test drivers and operators should be conscious of their appearance to other road users, for example continuing to maintain gaze directions appropriate for normal driving. 
This appears to address the problem of other road users interacting with an AV. The theory seems to be that if for example the test driver makes eye contact with a pedestrian at a crosswalk, that means that even if the vehicle makes a mistake the test driver will intervene to give the pedestrian right of way. This seems like a sensible requirement, and could help the safety driver remain engaged with the driving task.
  • 5.3 Organisations wishing to test automated vehicles on public roads or in other public places will need to ensure that the vehicles have successfully completed in-house testing on closed roads or test tracks.
  • 5.4 Organisations should determine, as part of their risk management procedures, when sufficient in-house testing has been completed to have confidence that public road testing can proceed without creating additional risk to road users. Testing organisations should maintain an audit trail of such evidence.
You should not be doing initial development on public roads. You should be using extensive analysis and simulation to be pretty sure everything is going to work before you ever get near a public road.  On-road testing should be to check that things are OK and there are no surprises. (Moreover, surprises should be fed back to development to avoid similar surprises in the future.)  You should have written records that you're doing the right amount of validation before you ever operate on public roads. (emphasis added)
  • 5.5 Vehicle sensor and control systems should be sufficiently developed to be capable of appropriately responding to all types of road user which may typically be encountered during the test in question. This includes more vulnerable road users for example disabled people, those with visual or hearing impairments, pedestrians, cyclists, motorcyclists, children and horse-riders. 
Part of your development should include making sure the system can deal with at-risk road users.  This means there should be a minimal chance that a pedestrian or other at-risk road user will be put into danger by the AV even without safety driver intervention.  (The safety driver should be handling unexpected surprises, and not be relied upon as a primary control mechanism during road testing.)
  • 5.8 This data should be able to be used to determine who or what was controlling the vehicle at the time of an incident. The data should be securely stored and should be provided to the relevant authorities upon request. It is expected that testing organisations will cooperate fully with the relevant authorities in the event of an investigation
With regard to data recording, there should be no debate over whether the autonomy was in control a the time of the mishap.  (How can it possibly be that a developer says "we're not sure if the autonomy was in control at the time of the mishap?" Yet I've heard this on the news more than once.)  It's also important to be transparent about the role of autonomy at times just before any mishap.  For example, if autonomy disengages a fraction of a second before impact, it's unreasonable to just blame the human driver without a more thorough investigation.
  • 5.18 Ensuring that the transition periods between manual and automated mode involve minimal risk will be an important part of the vehicle development process and one which would be expected to be developed and proven during private track testing prior to testing on public roads or other public places. 
It's really important that manual takeover by a safety driver actually works. As mentioned above, the takeover system should be designed to a suitable level of safety (e.g., according to ISO 26262).
  • 5.21 ... All software and revisions have been subjected to extensive and well documented testing. This should typically start with bench testing and simulation, before moving to testing on a closed test track or private road. Only then should tests be conducted on public roads or other public places. 
Again, testing should be used to confirm that the design is right, not as an iterative drive-fix-drive approach to gradually beating the system into proper operation via brute force road testing.

These comments are based on a preliminary reading of the document. I might change my thoughts on this document over time.