Preferences

Picking apart Devin based solely on the demo video while ignoring all of the primary source testimonials on Twitter as to Devin's effectiveness seems somewhat intellectually dishonest... A demo video will of course cherrypick impressive-looking moments, even if they're not really.

You can only report on facts.

They failed to provide any examples of facts with regard to Devin.

This is like arguing that it’s not fair to critique people claiming to have made superconductors because “some people said they are really superconductors” but no one can share samples with anyone for some reason.

A reasonable counter argument would be:

> Here is evidence of Devin actually doing things.

How, other than the available evidence was anyone supposed to evaluate Devin?

There is a broad opportunity for the developers to respond to this, but they haven’t.

Why is that?

It is because he’s right.

Regardless of what Devin can do that video was deceptive and misleading. There no two ways about it.

I don’t trust anecdotes on twitter because every time I’ve tried an agent that’s been hyped up it’s been more expensive and time consuming than just using GitHub co pilot with Claude/ChatGPT and putting up a PR myself.

Hence I’m skeptical of people making claims about a product I can’t try out myself. It’s unclear if the tasks they are doing and the way they are using Agents is relevant to the work I do. Which is usually working on a team of engineers shipping code on a complex code base.

For AI I tend to put a lot more weight in benchmarks, such as SWE-bench, which is why I wrote an article about:

https://www.stepchange.work/blog/why-do-ai-software-engineer...

SWE-bench is mostly small python tasks evaluated solely by unit tests which require less than 15 line changes to a single file. Most of those it fails at and the ones it gets right it ignores all sorts of libraries and conventions used in the rest of the code base.

I’m Optimistic that agents will eventually agents will improve dramatically in a few years but today Devin is not good at making larger changes that build on one another like features.

The company said in the description of the demo video that Devin did something in the demo video that Devin clearly did not do in the demo video.

That's a lie, pure and simple, and no statements made elsewhere can make that lie any less a lie.

That's backwards; depending on "primary source testimonials on Twitter" is grossly intellectually dishonest.
"Primary source testimonials on"...Twitter? Are you serious?

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal