It's possible, the question is how to choose which submodel will be used for a given query.
You can use a specific LLM, or a general larger LLM to do this routing.
Also, some work suggest using smaller llms to generate multiple responses and use a stronger and larger model to rank the responses (which is much more efficient than generating them)
You can use a specific LLM, or a general larger LLM to do this routing.
Also, some work suggest using smaller llms to generate multiple responses and use a stronger and larger model to rank the responses (which is much more efficient than generating them)