Planning Metrics (Metrics Episode 8)

Planning metrics should cover whether the plan paths are actually safe. But just as important is whether plans work across the full ODD and account for safety when pushed out of the ODD by external events.

Planning metrics deal with how effectively a vehicle can plan a path through the environment, obstacles, and other actors. Often, planning metrics are tied to the concept of having various scenarios and actors that a vehicle might encounter when dealing with the various behaviors, maneuvers, and other considerations. In another segment, I’ll talk about how the system builds a model of the external world. For now, let’s assume that the self driving car knows exactly where all the objects are and what their predicted trajectories and behaviors are. The objective is typically to make progress in navigating through the scenario without hitting things.

Some path planning metrics are likely to be tied closely to the motion safety metrics. A self driving car that creates a path plan that involves a collision with a pedestrian clearly has an issue, but in practice, things are some shades of gray. Typically, it’s not okay to just barely miss something. Rather, you want to leave some sort of sufficient time, space, or combination buffer around objects and obstacles to provide a safety margin. You need to do better than just not hitting things and in fact, you want to give everything else in the environment an appropriate amount of leeway. From a planning point of view, this metric would cover how often and how severely object buffers are violated. This ties in with motion planning metrics, but instead of saying, “What’s the worst case to avoid a collision?” You have to add in some sort of buffer as well.

For safety, it’s important to differentiate between safety boundaries and continuous performance metrics. Here’s an example. Let’s say you have a one-meter hard threshold from bicycles for a certain urban setting at a certain travel speed. Let’s say your vehicle leaves 1.1 meters to bicyclists. Well, that’s great, a little bit further than one meter sounds safe. Is two meters better? Well, all things being equal, leaving a bicycle a little more room is probably also a good idea. But on that metric, it’s tempting to say that 0.9 meters is only about 10% worse than one meter, when in fact it’s not. With a one-meter hard threshold, safe is one meter or better. Kind of doesn’t matter, as long as you’re at least one meter, you’re by definition, safe. 0.99 is unsafe because you’ve violated a hard threshold. 

There’s a potential big difference between safety thresholds and general performance indications. For general background risk, sure, leaving a little more room is a good thing. But as soon as you pass a hard safety threshold, that changes it from a little bit worse to a safety violation that requires some sort of response to fix the system’s behavior.

Now, for those who are thinking, “Well, it doesn’t always have to be one meter,” that’s right. What I assumed in this example was that for the particular circumstances, it was determined one meter was the hard deck; you couldn’t go any closer. It might well be the case that it slower speeds it’s closer and at higher speeds it’s further away. The point here is that in some cases you will have hard cutoffs of safety that are never supposed to be violated. Those are fundamentally different than continuous metrics where a little bit further or a little bit closer is a little bit better or a little bit worse.

Other path planning metrics are likely to be based more on coverage. Some self driving car projects use the concept of a scenario, which is a specific combination of environment, objects, and own vehicle intended behavior over a relatively short time period. For example, a fairly generic scenario might be making an unprotected left turn in rush hour traffic on a sunny day. The idea is that you come up with a large set of scenarios to cover all the possibilities in your operational design domain, or ODD. If you do that and you can validate that each scenario has been built properly, then you can claim you’ve covered the whole ODD. In practice, development teams tend to build scenario catalogs with varying levels of abstraction from high level, to parameterized, to very concrete single settings of parameter scenarios that can be fed into a simulator or executed on a track. Then they test the concrete examples to see if they violate motion safety metrics or other bad things happen.

There are several different ways to look at coverage of a catalog of scenarios. One is how well the high level and parameterized scenarios cover the ODD. In principle, you would like the scenario catalog to cover all possibilities across all dimensions of the ODD. Now, ODDs are pretty complicated, which we’ll discuss another time, so that includes at least weather and types of actors and road geometries, but there are probably many other considerations. It’s going to be a big catalog, thousands, tens of thousands, maybe more scenarios to really cover a complicated ODD.

A different take on the same topic is how well the concrete scenarios, those are your test cases, actually sample the ODD. Sure, you have these high level and parameterized scenarios that are supposed to cover everything, but at some point you have to actually run tests on a specific set of geometries, and behaviors, and actors. If you don’t sample that properly, there’ll be corners of the ODD that you didn’t exercise. There may be edge cases where there’s some boundary between two things and you didn’t test at the boundary, even though in principle your more generic scenarios sweep across the entire ODD. You want to make sure when you’re sampling the ODD via these concrete scenarios that you cover both frequent scenarios, which is probably pretty obvious, as well as infrequent but very severe, very high consequence scenarios that have to be gotten right even though they may not happen often.

For scenarios in which there’s a specific response intended, another metric can be how well the system follows its script. For example, if you’ve designed a self driving car to always leave two meters clearance to bicycle, even though one meter is the hard deck for safety, and it’s consistently going at 1.5 meters, that’s not an immediate safety violation, but there’s something wrong because it was supposed to do two meters and it’s consistently doing 1.5. Something isn’t quite right. The issue is that might be indicative of a deeper problem that at some other time could impact safety.

Another metric has to do with how well the system deals with scenarios getting more complex, more objects, unpredictable actors, minimizing the severity of unavoidable crashes when it’s been put in a no win situation, incorrect sensor information that may be presented, and so on. In general, one of the metrics is how well the system reacts to stress.

Another metric that can be useful is the brittleness of the system when encountering novel concrete examples that aren’t used in system training or might be outside the ODD. Remember that even though the system is designed to operate inside the ODD, in the real world, every once in a while, something weird will happen that is outside the ODD. The system may not have to remain operable, but it should remain safe even if that means invoking some sort of safety shutdown procedure. That means it has to know that something weird has happened, even if that something is not part of its designed ODD.

Summing up, planning metrics should include at least two categories. First is whether the plan paths are actually safe. Second is how well the path planner design covers the full scope of the intended operational design domain, as well as what happens when you exit the ODD. All this depends on the system having an accurate model of the world it’s operating in, which we’ll cover in another segment.

For the podcast version of this posting, see: https://archive.org/details/metrics-09-planning-metrics

Thanks to podcast producer Jackie Erickson.



0 comments:

Post a Comment