Preferences

ActorNightly parent
> Karpathy is consistently one of the clearest thinkers out there.

Eh, he ran Teslas self driving division and put them into a direction that is never going to fully work.

What they should have done is a) trained a neural net to represent sequence of frames into a physical environment, and b)leveraged Mu Zero, so that self driving system basically builds out parallel simulations into the future, and does a search on the best course of action to take.

Because thats pretty much what makes humans great drivers. We don't need to know what a cone is - we internally compute that something that is an object on the road that we are driving towards is going to result in a negative outcome when we collide with it.


AlotOfReading
Aren't continuous, stochastic, partial knowledge environments where you need long horizon planning with strict deadlines and limited compute exactly the sort of environments muzero variants struggle with? Because that's driving.

It's also worth mentioning that humans intentionally (and safely) drive into "solid" objects all the time. Bags, steam, shadows, small animals, etc. We also break rules (e.g. drive on the wrong side of the road), and anticipate things we can't even see based on a theory of mind of other agents. Human driving is extremely sophisticated, not reducible to rules that are easily expressed in "simple" language.

ActorNightly OP
I didn't say use Mu Zero end to end, I said leverage it.

This is how I would do it:

First, you come up with a compressed representation of the state space of the terrain + other objects around your car that encodes the current states of everything, and its predicted evolution like ~5 seconds into the future.

The idea is that you would leverage physics, which means objects need to behave according to laws of motion, so this means you can greatly compress how this is represented. For example, a meshgrid of "terrain" other than empty road that is static, lane lines representing the road, and 3d boxes representing moving objects with a certain mass, with initial 6 dof state (xyz position, orientation), intial 6dof velocities, and 6 dof forcing functions with parameter of time that represent how these objects move.

So given this representation, you can write a program that simulates the evolution of the state space given any initial condition, and essentially simulate collisions.

Then you divide into 3 teams.

1st team trains a model to translate sensor data into this state space representation, with continuous updates on every cycle, leveraging things like Kalman filtering because of the correlation of certain things that leads to better accuracy. Overall you would get something where things like red brake lights would lead to deceleration forcing functions.

(If you wanted to get fancy, instead of a simulation, you build out probability space instead. I.e when you run the program, it would spit out a heat map of where certain objects are more likely to end up)

2nd team trains a model on real world traffic to find correlations between forcing functions of vehicles. I.e if a car slows down, the cars behind it would slow down. You could do this kinda like Tesla did - equip all your cars with sensors, assume driver inputs as the forcing function, observe the state space change given the model from team 1.

3nd team trains a Mu Zero like model given the 2 above. Given a random initial starting state, the "game" is to chose the sequence of accelerations, decelerations, and steering (quantized with finite values) that gets the highest score by a) avoiding collision b) following traffic laws, c) minimizing disturbance to other vehicles, and d) maximizing space around your own vehicle.

What all of this does is allow the model to compute not only expected behavior, but things that are realistically possible. For example, in a situation where collision is imminent, like you sitting at a red stop light, and the sensors detect a car rapidly approaching, the model would make a decision to drive into the intersection when there are no cars present to avoid getting rear ended, which is quantifiably way better than average human.

Furthermore, the models from team 2 and 3 can self improve real time, which is equivalent to humans getting used to driving habits of others in certain areas. You simply to batch training runs to improve prediction capability of other drivers. Then when your policy model makes a correct decision, you build a shortcut into the MCTS that lets you know that this works, which then means in the finite time compute span, you can search away from that tree for a more optimal solution, and if you don't find it, you already have the best one that works, and next time you search even more space. So essentially you get a processing speed up the more you use it.

visarga
> We don't need to know what a cone is

The counter argument is that you can't zoom in and fix a specific bug in this mode of operation. Everything is mashed together in the same neural net process. They needed to ensure safety, so testing was crucial. It is harder to test an end-to-end system than its individual parts.

impossiblefork
I don't think that would have worked either.

But if they'd gone for radars and lidars and a bunch of sensors and then enough processing hardware to actually fuse that, then I think they could have built something that had a chance of working.

ActorNightly OP
Think about this. If I give you GTA 5 traffic in single player with only NPC drivers, could you manually write a policy that gets a player from point a to point b in a car, assuming you have in game positions of all cars?
suddenlybananas
That's absolutely not what makes humans great drivers?
ActorNightly OP
Enlighten me please.
tayo42
Is that the approach that waymo uses?
ActorNightly OP
Dunno what Waymo uses, but they definitely work in 3d space as a start, rather than trying to map sequences of pictures to action. They also need training on specific areas.

This item has no comments currently.