- 
                Notifications
    You must be signed in to change notification settings 
- Fork 96
Open
Description
Description
I'm trying to use GenerateREADME and maximize the underlying LLM's context window. But unfortunately I can't figure out easily what that magical value is, because model_max_tokens isn't the length of the final input sent to the LLM.
For instance, I'm trying to consume the entire 128k context window. And I'm doing a bunch of trials:
- patchwork GenerateREADME ... model_max_tokens=128_000===>- Error code: 400 - {'error': {'message': "This model's maximum context length is 128000 tokens. However, you requested 255511 tokens
- patchwork GenerateREADME ... model_max_tokens=64_000===>- Error code: 400 - {'error': {'message': "This model's maximum context length is 128000 tokens. However, you requested 191511 tokens
- patchwork GenerateREADME ... model_max_tokens=30_000===>- Error code: 400 - {'error': {'message': "This model's maximum context length is 128000 tokens. However, you requested 157511 tokens
So I need to keep guessing.
Proposed solution
Have an option to e.g. set model_max_tokens=-1, which would mean the maximum window allowed by the underlying LLM, once all the other tokens you're sending under the hood are accounted for.
Alternatives considered
n/a
Metadata
Metadata
Assignees
Labels
No labels