Comment by dddrh - Hacker Neue

dddrh Oct 31, 2023 parent

Interesting concept that raised the question for me: What is the primary limiting factor right now that prevents LLM’s or any other AI model to go “end to end” on programming a full software solution or full design/engineering solution?

Is it token limitations or accuracy the further you get into the solution?

thechao Oct 31, 2023

LLM's can't gut a fish in the cube when they get to their limits.

On a more serious note: I think the high-level structuring of the architecture, and then the breakdown into tactical solutions — weaving the whole program together — is a fundamental limitation. It's akin to theorem-proving, which is just hard. Maybe it's just a scale issue; I'm bullish on AGI, so that's my preferred opinion.

alasarmas Oct 31, 2023

Actually I think this is a good point: fundamentally an AI is forced to “color inside the lines”. It won’t tell you your business plan is stupid and walk away, which is a strong signal that is hard to ignore. So will this lead to people with more money than sense to do even more extravagantly stupid things than we’ve seen in the past, or is it basically just “Accenture-in-a-box”?

RecycledEle Oct 31, 2023

AI will absolutely rate your business plan if you ask it to.

Try this prompt:"Please rate this business plan on a scale of 1-100 and provide buttle points on how it can be improved without rewriting any of it: <business plan>"

alasarmas Oct 31, 2023

I agree that AI is totally capable of rating a business plan. However, I think that the act of submitting a business plan to be rated requires some degree of humility on the part of the user, and I do doubt that an AI will “push back” when it comes to an obviously bad business plan unless specifically instructed to do so.

mhh__ Oct 31, 2023

I wouldn't trust an absolute answer but it can help you generate counterarguments that you might miss

taneq Oct 31, 2023

> LLM's can't gut a fish in the cube when they get to their limits.

Is this an idiom? Or did one of us just reach the limits of our context? :P

anu7df Oct 31, 2023

Office space reference.

drsopp Oct 31, 2023

I guess this would be the context window size in the case of LLMs.

Edit: On second thought, maybe at a certain minimum context window size it is possible to cajole the instructions in such a way that you at any point in the process make the LLM work at a suitable level of abstraction more like humans do.

margorczynski Oct 31, 2023

Maybe the issue is that for us the "context window" that we feed ourselves is actually a compressed and abstracted version - we do not re-feed ourselves the whole conversation but a "notion" and key points that we have stored. LLMs have static memory so I guess there is no other way as to single-pass the whole thing.

For human-like learning it would need to update it state (learn) on the fly as it does inference.

drsopp Oct 31, 2023

Half baked idea: What if you have a tree of nodes. Each node stores a description of (a part of) a system and an LLM generated list of what the parts of it are, in terms of a small step towards concreteness. The process loops through each part in each node recursively, making a new node per part, until the LLM writes actual compilable code.

jjoonathan Oct 31, 2023

Isn't that what langchain is?

anon291 Oct 31, 2023

See https://github.com/mit-han-lab/streaming-llm and others. There's good reason to believe that attention networks learn how to update their own weights (Forget the paper) based on their input. The attention mechanism can act like a delta to update weights as the data propagates through the layers. The issue is getting the token embeddings to be more than just the 50k or so that we use for the english language so you can explore the full space, which is what the attention sink mechanism is trying to do.

meiraleal Oct 31, 2023

Memory and finetuning. If it was easy to insert a framework/documentation into GPT4 (the only model capable of complex software development so far in my experience), it would be easy to create big complex software. The problem is that currently the memory/context management needs to be done all by the side of the LLM interaction (RAG). If it was easy to offload part of this context management on each interaction to a global state/memory, it would be trivial to create quality software with tens of thousands of LoCs.

einpoklum Oct 31, 2023

It is the fact that LLM's can't and don't try to write valid programs. They try to write something which reads like a reply to your question, using their corpus of articles, exchanges etc. That's not remotely the same thing, and it's not at all about "accuracy" or "tokens".

anon291 Oct 31, 2023

The issue with transformers is the context length. Compute wise, we can figure out the long context window (in terms of figuring out the attention matrix and doing the calculations). The issue is training. The weights are specialized to deal with contexts only of a certain size. As far as I know, there's no surefire solution that can overcome this. But theoretically, if you were okay with the quadratic explosion (and had a good dataset, another point...) you could spend money and train it for much longer context lengths. I think for a full project you'd need millions of tokens.

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous