Comment by heyitaki - Hacker Neue

heyitaki Apr 13, 2024 parent

Picking apart Devin based solely on the demo video while ignoring all of the primary source testimonials on Twitter as to Devin's effectiveness seems somewhat intellectually dishonest... A demo video will of course cherrypick impressive-looking moments, even if they're not really.

wokwokwok Apr 13, 2024

You can only report on facts.

They failed to provide any examples of facts with regard to Devin.

This is like arguing that it’s not fair to critique people claiming to have made superconductors because “some people said they are really superconductors” but no one can share samples with anyone for some reason.

A reasonable counter argument would be:

> Here is evidence of Devin actually doing things.

How, other than the available evidence was anyone supposed to evaluate Devin?

There is a broad opportunity for the developers to respond to this, but they haven’t.

Why is that?

It is because he’s right.

Regardless of what Devin can do that video was deceptive and misleading. There no two ways about it.

htormey Apr 13, 2024

I don’t trust anecdotes on twitter because every time I’ve tried an agent that’s been hyped up it’s been more expensive and time consuming than just using GitHub co pilot with Claude/ChatGPT and putting up a PR myself.

Hence I’m skeptical of people making claims about a product I can’t try out myself. It’s unclear if the tasks they are doing and the way they are using Agents is relevant to the work I do. Which is usually working on a team of engineers shipping code on a complex code base.

For AI I tend to put a lot more weight in benchmarks, such as SWE-bench, which is why I wrote an article about:

https://www.stepchange.work/blog/why-do-ai-software-engineer...

SWE-bench is mostly small python tasks evaluated solely by unit tests which require less than 15 line changes to a single file. Most of those it fails at and the ones it gets right it ignores all sorts of libraries and conventions used in the rest of the code base.

I’m Optimistic that agents will eventually agents will improve dramatically in a few years but today Devin is not good at making larger changes that build on one another like features.

carlbrown Apr 14, 2024

The company said in the description of the demo video that Devin did something in the demo video that Devin clearly did not do in the demo video.

That's a lie, pure and simple, and no statements made elsewhere can make that lie any less a lie.

jibal Apr 13, 2024

That's backwards; depending on "primary source testimonials on Twitter" is grossly intellectually dishonest.

SwellJoe Apr 13, 2024

"Primary source testimonials on"...Twitter? Are you serious?

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous