Gemini Robotics On-Device brings AI to local robotic devices

202 points 1 day ago

82 comments meetpateltech deepmind.google

I’m optimistic about humanoid robotics, but I’m curious about the reliability issue. Biological limbs and hands are quite miraculous when you consider that they are able to constantly interact with the world, which entails some natural wear and tear, but then constantly heal themselves.

gene-h 13 hours ago

Industrial robots at least are very reliable, MTBF is often upwards of 100,000 hours[0]. Industrial robots are optimized to be as reliable as possible because the longer they last and less often they need to be fixed, the more profitable they are. In fact, German and Japanese companies came to dominate the industrial robotics market because they focused on reliability. They developed rotary electric actuators that were more reliable. Cincinnati Millicron(US) was out competed in the industrial robot market because although their hydraulic robots were strong, they were less reliable.

I am personally a bit skeptical of anthropormophic hands achieving similarly high reliability. There's just too many small parts that need to withstand high forces.

[0]https://robotsdoneright.com/Articles/what-are-the-different-...

ragebol 7 hours ago

If you E-stop an industrial robot, it stops immediately, all OK. If a humanoid were to freeze like that, it would fall over and hurt you and your stuff on the way down, when it'll damage itself.

Mechanical reliability is not the main concern IMO

marinmania 1 day ago

It does either get very exciting or very spooky thinking of the possibilities in the near future.

I had always assumed that such a robot would be very specific (like a cleaning robot) but it does seem like by the time they are ready they will be very generalizable.

I know they would require quite a few sensors and motors, but compared to self-driving cars their liability would be less and they would use far less material.

fragmede 1 day ago

The exciting part comes when two robots are able to do repairs on each other.

marinmania 1 day ago

I think this is the spooky part. I feel dumb saying it, but is there a point where they are able to coordinate and build a factory to build chips/more of themselves? Or other things entirely?

bamboozled 1 day ago

Of course there is

ta988 9 hours ago

But this still has a massive cost. Replacing or repairing an actuator isn't cheap, in material and in time of unavailability.

jacobaul 7 hours ago

To maybe get a little carried away with the sci-fi for a minute, why does the Actuator need to cost anything?

When the tree of costs that make up a product are traced, surely all the leaf nodes are human labour? As in, to make the actuator, I had to pay someone to assemble it and I had to buy the parts. Each part had a materials cost and a labour cost. So it goes for the factory that made the fasteners, the foundry that made the steel, the mine that extracted the ore.

Shudder to think of how to regulate resource extraction in a future where AI humanoid robots are strip mining and logging for free.

david-gpu 6 hours ago

> When the tree of costs that make up a product are traced, surely all the leaf nodes are human labour?

What about energy, real estate and taxes?

Even at the extreme end of automation, if you want iron ore, you need to buy a mine from somebody, pay taxes on it, and power the machines to extract the minerals and transport them elsewhere for processing.

pryelluw 1 day ago

2 bots 1 bolt ?

UltraSane 1 day ago

Consumable components could be automatically replaced by other robots.

didip 1 day ago

I think those problems can be solved with further research in material science, no? Combined that with very responsive but low torque servos, I think this is a solvable problem.

michaelt 1 day ago

It's a simple matter of the number of motors you have. [1]

Assume every motor has a 1% failure rate per year.

A boring wheeled roomba has 3 motors. That's a 2.9% failure rate per year, and 8.6% failures over 3 years.

Assume a humanoid robot has 43 motors. That gives you a 35% failure rate per year, and 73% over 3 years. That ain't good.

And not only is the humanoid robot less reliable, it's also 14.3x the price - because it's got 14.3x as many motors in it.

[1] And bearings and encoders and gearboxes and control boards and stuff... but they're largely proportional to the number of motors.

elcritch 7 hours ago

With more motors and joints also comes some degree of redundancy however. Having multiple fingers means one finger dying won't be as big of an impedement. It'd require feedback and the ability for the motion planner / AI to account for it.

Plus they'll likely be modular and able to be replaced.

IMHO, the bigger design issue for humanistic is lowering the need for mechanical precision which requires lots more metals and instead using adaptive feedback and sensors to obtain accuracy similar to how humans and animals do it. AIs should be really good at that, eventually. I think the compute will need to be about 10x what it is now though.

mewpmewp2 1 day ago

Would it be possible to reduce the failure rates?

ac29 1 day ago

The 1%/year failure rate appears to just be made up. There are plenty of electric motors that dont have anywhere near that failure rate (at least during the expected service life, failure rates certainly will probably hit 1%/year or higher eventually).

For example, do the motors in hard drives fail anywhere close to 1% a year in the first ~5 years? Backblaze data gives a total drive failure rate around 1% and I imagine most of those are not due to failure of motors.

2 More Comments →

michaelt 1 day ago

To an extent, yes.

For example, an industrial robot arm with 6 motors achieves much higher reliability than a consumer roomba with 3 motors. They do this with more metal parts, more precision machining, much more generous design tolerances, and suchlike. Which they can afford by charging 100x as much per unit.

5 More Comments →

bamboozled 1 day ago

I'm interested how differences with robots work overtime, there are a lot of machines in this world that have been patched or "jimmied up" to continue working, let's say a mining robot, it would probably get quite heavily contaminated with dust, wear would occur in different places, rock falls might bend parts.

So even though another robot could probably do the "jimmy up". it seems like overtime, the robots will "drift" into all being a bit different.

Even commercial airlines seem to go through fairly unique repairs from things like collisions with objects, tail strikes etc.

Maybe it's just easier to recycle robots?

Toritori12 1 day ago

Does Anyone know how easy is to join the "trusted tester program" and if they offer modules that you can easily plug-in to run the sdk?

technotony 13 hours ago

There's a sign up button at the bottom of the article...

suyash 1 day ago

What sort of hardware does the SDK runs on, can it run on a modern Raspberry Pi ?

ethan_smith 1 day ago

According to the blog post, it requires an NVIDIA Jetson Orin with at least 8GB RAM, and they've optimized for Jetson AGX Orin (64GB) and Orin NX (16GB) modules.

v9v 1 day ago

Could you quote where in the blog post they claim that? CTRL+F "Jetson" gave no results in TFA.

moffkalast 1 day ago

Yeah they didn't really mention anything, I was almost getting my hopes up that Google might be announcing a modernized Coral TPU for the transformer age, but I guess not. It's probably all just API calls to their TPUv6 data centers lmao.

martythemaniak 1 day ago

You can think of these as essentially multi-modal LLMs, which is to say you can have very small/fast ones (SmolVLA - 0.5B params) that are good at specific tasks, and larger/slower more general ones (OpenVLA - a finetuned llama2 7B). So a rpi could be used for some very specific tasks, but even the more general ones could run on beefy consumer hardware.

moelf 1 day ago

The MuJoCo link actually points to https://github.com/google-deepmind/aloha_sim

westurner 12 hours ago

mujoco_menagerie has Mujoco MJCF XML models of various robots.

google-deepmind/mujoco_menagerie: https://github.com/google-deepmind/mujoco_menagerie

mujoco_menagerie/aloha: https://github.com/google-deepmind/mujoco_menagerie/tree/mai...

polskibus 1 day ago

What is the model architecture? I'm assuming it's far away from LLMs, but I'm curious about knowing more. Can anyone provide links that describe architectures for VLA?

KoolKat23 1 day ago

Actually very close to one I'd say.

It's a "visual language action" VLA model "built on the foundations of Gemini 2.0".

As Gemini 2.0 has native language, audio and video support, I suspect it has been adapted to include native "action" data too, perhaps only on output fine-tuning rather than input/output at training stage (given its Gemini 2.0 foundation).

Natively multimodal LLM's are basically brains.

quantumHazer 1 day ago

> Natively multimodal LLM's are basically brains.

Absolutely not.

KoolKat23 5 hours ago

Lol keep telling yourself that. It's not a human brain nor is it necessarily a very intelligent brain, but it is a brain nonetheless.

martythemaniak 1 day ago

OpenVLA is basically a slightly modified, fine-tuned llama2. I found the launch/intro talk by lead author to be quite accessible: https://www.youtube.com/watch?v=-0s0v3q7mBk

KoolKat23 5 hours ago

In the paper at the bottom of googles page, this VLA says it is built on the foundations of Gemini 2.0 (hence my quotations). They'd be using Gemini 2.0 rather than llama.

https://arxiv.org/pdf/2503.20020

m00x 10 hours ago

A more modern one, smolVLA is similar and uses a VLM but skips a few layers and uses an action adapter for outputs. Both are from HF and run on LeRobot.

https://arxiv.org/abs/2506.01844

Explanation by PhosphoAI: https://www.youtube.com/watch?v=00A6j02v450

san1927 4 hours ago

meanwhile i will drink a coffee while it loads a reply from the API

jagger27 1 day ago

These are going to be war machines, make absolutely no mistake about it. On-device autonomy is the perfect foil to escape centralized authority and accountability. There’s no human behind the drone to charge for war crimes. It’s what they’ve always dreamed of.

Who’s going to stop them? Who’s going to say no? The military contracts are too big to say no to, and they might not have a choice.

The elimination of toil will mean the elimination of humans all together. That’s where we’re headed. There will be no profitable life left for you, and you will be liquidated by “AI-Powered Automation for Every Decision”[0]. Every. Decision. It’s so transparent. The optimists in this thread are baffling.

0: https://www.palantir.com/

mateus1 1 day ago

MIT spinoff Google-owned Boston Dynamics pledged not to militarize their robots. Which is very hard to believe given they're backed by DARPA, the DoD/Military investment arm.

arcticfox 10 hours ago

This pledge would last five seconds in an actual conflict, if it makes it even that far.

paxys 1 day ago

Was owned by Google. Then Softbank. Now Hyundai.

jagger27 1 day ago

Militarize is just bad marketing. Call them cleaning machines and put them to work on dirty things.

JumpCrisscross 1 day ago

> These are going to be war machines, make absolutely no mistake about it

Of course they will. Practically everything useful has a military application. I'm not sure why this is considered a hot take.

jagger27 1 day ago

The difference between this machine and the ones that came before is that there won’t have to be a human in the loop to execute mass murder.

m00x 6 hours ago

There's a clear task being given to the robot. If anything this will save lives. There are plenty of soldiers that love to kill for the hell of it, at least this will be easy to track down to who gave the order.

JumpCrisscross 1 day ago

> there won’t have to be a human in the loop to execute mass murder

This looks like an increasingly theoretical concern. (And probably always has been. Wars were far more brutal when folks fought face to face than they are today.)

bamboozled 1 day ago

How would these things be competitive with drones on the battlefield? They probably cost the equivalent of 1000 autonomous drones and 100x the time and materials to make, way more power would be required to make them work too.

Terminator is a good movie but in reality, a cheap autonomous drone would mess one of those up pretty good.

I've seen some of the footage from Ukraine, drones are deadly, efficient, they are terrifying on the battlefield. Even though those robots will get crazy maneuverable, it's going to be pretty hard to out run an exploding drone.

Maybe the Terminators will have shotguns, but I could imagine 5 drones per terminator being a pretty easy to achieve considering they will be built by other autonomous robots.

m00x 6 hours ago

Good!

Workaccount2 1 day ago

I continued to be impressed how Google stealth releases fairly groundbreaking products, and then (usually) just kind of forgets about them.

Rather than advertising blitz and flashy press events, they just do blog posts that tech heads circulate, forget about, and then wonder 3-4 years later "whatever happened to that?"

This looks awesome. I look forward to someone else building a start-up on this and turning it into a great product.

fusionadvocate 1 day ago

Because the whole purpose of these kinds of projects at Google is to keep regulators at bay. They don't need these products in the sense of making money from them. They will just burn some money and move on, exactly the way they did hundreds of times. But what kind of company has such a free pass to burning money? The kind of company that is a monopoly. Monopolies are THAT profitable.

antonkar 12 hours ago

The only way to prevent robots from being jailbroken and set to rob banks is to move GPUs to private SOTA secure GPU clouds

sajithdilshan 1 day ago

I wonder what kind of guardrails (like Three Laws of Robotics) there are to prevent the robots going crazy while executing the prompts

ctoth 1 day ago

The laws of robotics were literally designed to cause conflict and facilitate strife in a fictional setting--I certainly hope no real goddamn system is built like that,.

> To ensure robots behave safely, Gemini Robotics uses a multi-layered approach. "With the full Gemini Robotics, you are connecting to a model that is reasoning about what is safe to do, period," says Parada. "And then you have it talk to a VLA that actually produces options, and then that VLA calls a low-level controller, which typically has safety critical components, like how much force you can move or how fast you can move this arm."

conception 1 day ago

Of course someone will. The terror nexus doesn’t build itself, yet, you know.

hlfshell 1 day ago

The generally accepted term for the research around this in robotics is Constitutional AI (https://arxiv.org/abs/2212.08073) and has been cited/experimented with in several robotics VLAs.

JumpCrisscross 1 day ago

Is there any evidence we have the technical ability to put such ambiguous guardrails on LLMs?

hn_throwaway_99 1 day ago

A power cord?

sajithdilshan 1 day ago

what if they are battery powered?

msgodel 1 day ago

Usually I put master disconnect switches on my robots just to make working on them safe. I use cheap toggle switches though I'm too cheap for the big red spiny ones.

pixl97 1 day ago

[Robot learns to superglue the switch open]

3 More Comments →

bigyabai 1 day ago

That's what we use twelve gauge buckshot for, here in America.

asadm 1 day ago

in practice, those laws are bs.

TZubiri 10 hours ago

Nice. I work with some students younger than 13, so most cloud and llms are quite tricky to work with, local only models like vertex are nice for this use case. I will try this as a replacement for chatgpt as Computer Vision in robotics like Lego Mindstorm

zzzeek 1 day ago

THANK YOU.

Please make robots. LLMs should be put to work for *manual* tasks, not art/creative/intellectual tasks. The goal is to improve humanity. not put us to work putting screws inside of iphones

(five years later)

what do you mean you are using a robot for your drummer

martythemaniak 1 day ago

I've spent the last few months looking into VLAs and I'm convinced that they're gonna be a big deal, ie they very well might be the "chatgpt moment for robotics" that everyone's been anticipating. Multimodal LLMs already have a ton of built-in understanding of images and text, so VLAs are just regular MMLLMs that are fine-tuned to output a specific sequence of instructions that can be fed to a robot.

OpenVLA, which came out last year, is a Llama2 fine tune with extra image encoding that outputs a 7-tuple of integers. The integers are rotation and translation inputs for a robot arm. If you give a vision llama2 a picture of a an apple and a bowl and say "put the apple in the bowl", it already understands apples, bowls, knows the end state should apple in bowl etc. What missing is a series of tuples that will correctly manipulate the arm to do that, and the way they did it is through a large number of short instruction videos.

The neat part is that although everyone is focusing on robot arms manipulating objects at the moment, there's no reason this method can't be applied to any task. Want a smart lawnmower? It already understands "lawn" "mow", "don't destroy toy in path" etc, just needs a finetune on how to corectly operate a lawnmower. Sam Altman made some comments about having self-driving technology recently and I'm certain it's a chat-gpt based VLA. After all, if you give chatgpt a picture of a street, it knows what's a car, pedestrian, etc. It doesn't know how to output the correct turn/go/stop commands, and it does need a great deal of diverse data, but there's no reason why it can't do it. https://www.reddit.com/r/SelfDrivingCars/comments/1le7iq4/sa...

Anyway, super exciting stuff. If I had time, I'd rig a snowblower with a remote control setup, record a bunch of runs and get a VLA to clean my driveway while I sleep.

ckcheng 1 day ago

VLA = Vision-language-action model: https://en.wikipedia.org/wiki/Vision-language-action_model

Not https://public.nrao.edu/telescopes/VLA/ :(

For completeness, MMLLM = Multimodal Large language model.

Workaccount2 1 day ago

I don't think transformers will be viable for self driving cars until they can both:

1) Properly recognize what they are seeing without having to lean so hard on their training data. Go photoshop a picture of a cat and give it a 5th leg coming out of it's stomach. No LLM will be able to properly count the cat's legs (they will keep saying 4 legs no matter how many times you insist they recount).

2.) Be extremely fast at outputting tokens. I don't know where the threshold is, but its probably going to be a non-thinking model (at first) and probably need something like Cerebras or diffusion architecture to get there.

cgearhart 12 hours ago

The current gen VLA architectures include some tricks (like compressed action tokenization and diffusion decoding) to reach action frequencies between 50-200hz. I think they’re _more_ efficient this way than regular LLMs trying to do everything thru text.

martythemaniak 1 day ago

1. Well, based on Karpathy's talks on Tesla FSD, his solution is to actually make the training set reflect everything you'd see in reality. The tricky part is that if something occurs 0.0000001% IRL and something else occurs 50% of the time, they both need to make 5% of the training corpus. The thing with multimodal LLMs is that lidar/depth input can just be another input that gets encoded along with everything else, so for driving "there's a blob I don't quite recognize" is still a blob you have to drive around.

2. Figure has a dual-model architecture which makes a lot of sense: A 7B model that does higher-level planning and control and a runs at 8Hz, and a tiny 0.08B model that runs at 200Hz and does the minute control outputs. https://www.figure.ai/news/helix

generalizations 1 day ago

I will be surprised if VLAs stick around, based on your description. That sounds far too low-level. Better hand that off to the 'nervous system' / kernel of the robot - it's not like humans explicitly think about the rotation of their hip & ankle when they walk. Sounds like a bad abstraction.

MidoriGlow 5 hours ago

Elon Musk said in last week’s Starship Update: the very first Mars missions are planned to be flown by Optimus humanoid robots to scout and build basic infrastructure before humans arrive (full transcript + audio: https://transpocket.com/share/oUKhep6cUl3s/). If Gemini Robotics On-Device can truly adapt to new tasks with ~50–100 demos, pairing that with mass-produced Optimus bodies and Starship’s lift capacity could be powerful—offline autonomy, zero-latency control, and the ability to ship dozens of robots per launch.

lm28469 25 minutes ago

Elon Musk said in 2016 that we'd have fully autonomous cars by the end of the year and we'd be on Mars by 2018, with manned missions by 2024.

Fast forward to 2025, weeks have no self driving cars, and nothing is even close to getting to Mars, let alone manned

suninsight 1 day ago

This will not end well.

This item has no comments currently.