Putting image manipulations in context: robustness testing for safe perception

UPDATE 8/17 -- added presentation slides!

I'm very pleased to share a publication from our NREC autonomy validation team that explains how computationally cheap image perturbations and degradations can expose catastrophic perception brittleness issues.  You don't need adversarial attacks to foil machine learning-based perception -- straightforward image degradations such as blur or haze can cause problems too.

Our paper "Putting image manipulations in context: robustness testing for safe perception" will be presented at IEEE SSRR August 6-8.  Here's a submission preprint:

https://users.ece.cmu.edu/~koopman/pubs/pezzementi18_perception_robustness_testing.pdf

Abstract—We introduce a method to evaluate the robustness of perception systems to the wide variety of conditions that a deployed system will encounter. Using person detection as a sample safety-critical application, we evaluate the robustness of several state-of-the-art perception systems to a variety of common image perturbations and degradations. We introduce two novel image perturbations that use “contextual information” (in the form of stereo image data) to perform more physically-realistic simulation of haze and defocus effects. For both standard and contextual mutations, we show cases where performance drops catastrophically in response to barely perceptible
changes. We also show how robustness to contextual mutators can be predicted without the associated contextual information in some cases.

Fig. 6: Examples of images that show the largest change in detection performance for MS-CNN under moderate blur and haze. For all of them, the rate of FPs per image required to detect the person increases by three to five orders of magnitude. In each image, the green box shows the labeled location of the person. The blue and red boxes are the detection produced by the SUT before and after mutation respectively, and the white-on-blue text is the strength of that detection (ranged 0 to 1). Finally, the value in whiteon-yellow text shows the average FP rate per image that a sensitivity threshold set at that value would yield. i.e., that is the required FP rate to still detect the person.




Alternate slide download link: https://users.ece.cmu.edu/~koopman/pubs/pezzementi18_perception_robustness_testing_slides.pdf

Citation:
Pezzementi, Z., Tabor, T., Yim, S., Chang, J., Drozd, B., Guttendorf, D., Wagner, M., & Koopman, P., "Putting image manipulations in context: robustness testing for safe perception," IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), Aug. 2018.

Pennsylvania's Autonomous Vehicle Testing Guidelines

PennDOT has just issued new Automated Vehicle Testing Guidance:
       July 2018 PennDOT AV Testing Guidance (link to acrobat document)
(also, there is a press release.)


It's only been a short three months since the PA AV Summit in which PennDOT took up a challenge to improve AV testing policy. Today PennDOT released a significantly revised policy as promised. And it looks like they've been listening to safety advocates as well as AV companies.

At a high level, there is a lot to like about this policy. It makes it clear that a written safety plan is required, and suggests addressing one way or another the big three items I've proposed for AV testing safety
  • Make sure that the driver is paying attention
  • Make sure that the driver is capable of safing the vehicle in time when something goes wrong
  • Make sure that the Big Red Button (disengagement mechanism) is actually safe

There are a number of items in the guidance that look like a good idea. Here is a partial list of ones that catch my idea as being on the right track (many other ideas in the document are also good):

Good Ideas:
  • Submission of a written safety plan
  • Must have a safety driver in the driver seat who is able to take immediate physical control as required
  • Two safety drivers above 25 mph to ensure that the safety drivers are able to tend to both the safety driving and the testing
  • Validation "under controlled conditions" before on-road testing
  • Disengagement technology complies with industry standards
  • Safety driver training is mandatory, and has a nice list of required topics
  • Data recording for post-mishap analysis
  • Mitigate cybersecurity risk
  • Quality controls to ensure that major items are "adhered to and measured to ensure safe operation"
There are also some ideas that might or might not work out well in practice. I'm not so sure how these will work out, and they seem in some cases to be compromises:

Not Sure About These:
  • Only one safety driver required below 25 mph. It's true that low speed pedestrian collisions are less lethal, and there can be more time to react, so the risk is somewhat lower. But time will tell if drivers are able to stay sufficiently alert to avoid mishaps even if they are lower speed.
  • It's not explicit about the issue of ensuring that there is enough time for a safety driver to intervene when something goes wrong. It's implicit in the parts about a driver being able to safe the vehicle. It's possible that this was considered a technical issue for developers rather than regulators, but in my mind it is a primary concern that can easily be overlooked in a safety plan. This topic should be more explicitly called out in the safety plan.
  • The data reporting beyond crashes is mostly just tracking drivers, vehicles, and how much testing they are doing.  I'd like to see more reporting regarding how well they are adhering to their own safety plan. It's one thing to say things look good via hand waving and "trust us, we're smart." It's another to report metrics such as how often drivers drop out during testing and what corrective actions are taken in response to such data. (The rate won't be a perfect zero; continual improvement should be the goal, as well as mishap rates no worse than conventional vehicles during testing.) I realize picking metrics can be a problem -- so just let each company decide for themselves what they want to report. The requirement should be to show evidence that safety is actually being achieved during testing. To be fair, there is a bullet in the document requiring quality controls. I'd like that bullet to have more explicit teeth to get the job done.
  • The nicely outlined PennDOT safety plan can be avoided by instead submitting something following the 2017 NHTSA AV Guidance. That guidance is a lot weaker than the 2016 NHTSA AV Guidance was. Waymo and GM have already created such public safety disclosures, and others are likely coming. However, it is difficult for a reader to know if AV vendors are just saying a lot of buzzwords or are actually doing the right things to be safe. Ultimately I'm not comfortable with "trust us, we're safe" with no supporting evidence. While some disclosure is better than no disclosure, the public deserves better than NHTSA's rather low bar in safety plan transparency, which was not intended to deal specifically with on-road testing. We'll have to see how this alternative option plays out, and what transparency the AV testers voluntarily provide. Maybe the new 2018 NHTSA AV Guidance due later this summer will raise the bar again.
Having said nice things for the most part, there are a few areas which really need improvement in a future revision. I realize they didn't have time to solve everything in three months, and it's good to see the progress they made. But I hope these areas are on the list for the next iteration:

Not A Fan:
  • Only one safety driver above 25 mph after undergoing "enhanced driver safety training." It's unclear what this training might really be, or if more training will really result in drivers that can do solo testing safely. I'd like to see something more substantive demonstrating that solo drivers will actually be safe in practice. Training only goes so far, and no amount of hiring only experienced drivers will eliminate the fact that humans have trouble staying engaged when supervising autonomy for long stretches of time. I'm concerned this will end up being a loophole that puts solo drivers in an untenable safety role.
  • No independent auditing. This is a big one, and worth discussing at length.
The biggest issue I see is no requirement for independent auditing of safety. I can understand why it might be difficult to get testers on board with such a requirement, especially a requirement for third party auditing. The AV business is shrouded in secrecy. Nobody wants PennDOT or anyone else poking around in their business. But every other safety-critical domain is based on an approach of transparent, independent safety assessment.

A key here is that independent auditing does NOT have to include public release of information.  The "secret sauce" doesn't even have to be revealed to auditors, so long as the system is safe regardless of what's in the fancy autonomy parts of the system. There are established models to keep trade secrets a secret used in other industries while still providing independent oversight of safety. There's no reason AVs should be any different. After all, we're all being put at risk by AV testing when we share public roads with them, even as pedestrians. AV testing ought to have transparent, independent safety oversight.

Overall, I think this guidance is excellent progress from PennDOT that puts us ahead of most, if not all locations in the US regarding AV safety testing. I hope that AV testers take this and my points above to heart, and get ahead of the safety testing problem.

Road Sign Databases and Safety Critical Data Integrity

It's common for autonomous vehicles to use road map data, sign data, and so on for their operation. But what if that data has a problem?

Consider that while some data is being mapped by the vehicle manufacturers, they might be relying upon other data as well.  For example, some companies are encouraging cities to build a database of local road signs  (https://www.wired.com/story/inrix-road-rules-self-driving-cars?mbid=nl_071718_daily_list3_p4&CNDID=23351989)

It's important to understand the integrity of the data. What if there is a stop sign missing from the database and the vehicle decides to believe the database if it's not sure whether a stop sign in the real world is valid?  (Perhaps it's hard to see the real world stop sign due to sun glare and the vehicle just goes with the database.) If the vehicle blows through a stop sign because it's missing from the database, whose fault is that?  And what happens next?

Hopefully such databases will be highly accurate, but anyone who has worked with any non-trivial database knows there is always some problem somewhere. In fact, there have been numerous accidents and even deaths due to incorrect or corrupted data over the years.

Avoiding "death by road sign database" requires managing the safety critical integrity of the road sign data (and map data in general).  If your system uses it for guidance but assumes it is defective with comparatively high probability, then maybe you're fine. But as soon as you trust it to make a safety-relevant decision, you need to think about how much you can trust it and what measures are in place to ensure it is not only accurately captured, but also dependably maintained, updated, and delivered to consumers.

Fortunately you don't need to start from scratch.  The Safety-Critical Systems Club has been working on this problem for a while, and recently issued version 3 of their guidlines for safety critical data. You can get it for free as a download here: https://scsc.uk/scsc-127c

The guidance includes a broad range of  information, guidance, and a worked example.  It also has quite a number of data integrity issues in Appendix H that are worth looking at if you need some war stories about what happens if you get data integrity wrong.  Highly recommended.


https://scsc.uk/r127C:2



Latest version as of May 2021:
https://scsc.uk/scsc-127F

A Safe Way to Apply FMVSS Principles to Self-Driving Cars

As the self-driving car industry works to create safer vehicles, it is facing a significant regulatory challenge.  Complying with existing Federal Motor Vehicle Safety Standards (FMVSS) can be difficult or impossible for advanced designs. For conventional vehicles the FMVSS structure helps ensure a basic level of safety by testing some key safety capabilities. However, it might be impossible to run these tests on advanced self-driving cars that lack a brake pedal, steering wheel, or other components required by test procedures.

While there is industry pressure to waive some FMVSS requirements in the name of hastening progress, doing so is likely to result in safety problems. I’ll explain a way out of this dilemma based on the established technique of using safety cases. In brief, auto makers should create an evidence-based explanation as to why they achieve the intended safety goals of current FMVSS regulations even if they can’t perform the tests as written. This does not require disclosure of proprietary autonomous vehicle technology, and does not require waiting for the government to design new safety test procedures.

Why the Current FMVSS Structure Must Change

Consider an example of FMVSS 138, which relates to tire pressure monitoring. At some point many readers have seen a tire pressure telltale light, warning of low tire pressure:

FMVSS 138 Low Tire Pressure Telltale

This light exists because of FMVSS, which specifies tests to make sure that a driver-visible telltale light turns on for under-inflation and blow-out conditions with specified road surface conditions, vehicle speed, and so on.

But what if an unmanned vehicle doesn’t have a driver seat?  Or even a dashboard for mounting the telltale? Should we wait years for the government to develop an alternate self-driving car FMVSS series? Or should we simply waive FMVSS compliance when the tests don’t make sense as written?

Simplistic, blanket waivers are a bad idea. It is said that safety standards such as FMVSS are written in the blood of past victims. Self-driving cars are supposed to improve safety. We shouldn’t grant FMVSS waivers that will result in having more blood spilled to re-learn well understood lessons for self-driving cars.

The weakness of the FMVSS approach is that the tests don’t explicitly capture the “why” of the safety standard. Rather, there is a very prescriptive set of rules, operating in a manner similar to building codes for houses. Like building codes, they can take time to update when new technology appears. But just as it is a bad idea to skip a building inspection on your new house, you shouldn’t let vehicle makers skip FMVSS tests for your new car – self-driving or otherwise. Despite the fear of hindering progress, something must be done to adapt the FMVSS framework to self-driving cars.

A Safety Case Approach to FMVSS

A way to permit rapid progress while still ensuring that we don’t lose ground on basic vehicle safety is to adopt a safety case approach. A safety case is a written explanation of why a system is appropriately safe. Safety cases include: a safety goal, a strategy for meeting the goal, and evidence that the strategy actually works.

To create an FMVSS 138 safety case, a self-driving car maker would first need to identify the safety goals behind that standard. A number of public documents that precede FMVSS 138 state safety goals of detecting low tire pressure and avoiding blowouts. Those goals were, in turn, motivated by dozens of deaths resulting from tire blowouts that provoked the 2000 TREAD act.

The next step is for the vehicle maker to propose a safety strategy compatible with its product. For example, vehicle software might set internal speed and distance limits in response to a tire failure, or simply pull off the road to await service. The safety case would also propose tests to provide concrete evidence that the safety strategy is effective. For example, instead of demonstrating that a telltale light illuminates, the test might instead show that the vehicle pulls to the side of the road within a certain timeframe when low tire pressure is detected. There is considerable flexibility in safety strategy and evidence so long as the safety goal is adequately met.

Regulators will need a process for documenting the safety case for each requested FMVSS deviation. They must decide whether they should evaluate safety cases up front or employ less direct feedback approaches such as post-mishap litigation. Regardless of approach, the safety cases can be made public, because they will describe a way to test vehicles for basic safety, and not the inner workings of highly proprietary autonomy algorithms.

Implementing this approach only requires vehicle makers to do extra work for FMVSS deviations that provide their products with a competitive advantage. Over time, it is likely that a set of standardized industry approaches for typical vehicle designs will emerge, reducing the effort involved. And if an FMVSS requirement is truly irrelevant, a safety case can explain why.

While there is much more to self-driving car safety than FMVSS compliance, we should not be moving backward by abandoning accepted vehicle safety requirements. Instead, a safety case approach will enable self-driving car makers to innovate as rapidly as they like, with a pay-as-you-go burden to justify why their alternative approaches to providing existing safety capabilities are adequate.

Author info: Prof. Koopman has been helping government, commercial, and academic self-driving developers improve safety for 20 years.
Contact: koopman@cmu.edu

Originally published in The Hill 6/30/2018:
http://thehill.com/opinion/technology/394945-how-to-keep-self-driving-cars-safe-when-no-one-is-watching-for-dashboard

AVS 2018 Panel Session

It was great to have the opportunity to participate in a panel on autonomous vehicle validation and safety at AVS in San Francisco this past week.  Thanks especially to Steve Shladover for organizing such an excellent forum for discussion.

The discussion was the super-brief version. If you want to dig deeper, you can find much more complete slide decks attached to other blog posts:
The first question was to spend 5 minutes talking about the types of things we do for validation and safety.  Here are my slides from that very brief opening statement.



Robustness Testing of Autonomy Software (ICSE 2018)

Our Robustness Testing team at CMU/NREC presented a great paper at ICSE on the things we learned on five years with the Automated Stress Testing for Autonomy Systems (ASTAA) project across 11 projects, finding 150 significant bugs.




The team members contributing to the paper were:
Casidhe Hutchison, Milda Zizyte, Patrick E. Lanigan, David Guttendorf, Michael Wagner, Claire Le Goues, and Philip Koopman.

Special thanks to Cas for doing the heavy lifting on the paper, and to Milda for the conference presentation.