-
Notifications
You must be signed in to change notification settings - Fork 192
Description
Is your feature related to a problem? Please describe.
When building applications that process video/images with the Gemini API, I need to implement rate limiting to stay within TPM (Tokens Per Minute) quotas. To do this effectively, I need accurate token estimates before making generateContent() calls.
The countTokens() API is ideal for this, but it doesn't support the mediaResolution config option. Since mediaResolution significantly affects token count (64 tokens/frame for LOW vs 256 tokens/frame for MEDIUM/HIGH on Gemini 2.5), the token estimate from countTokens() doesn't reflect the actual tokens that will be used when generateContent() is called with a specific mediaResolution.
Describe the solution you'd like
Add mediaResolution?: MediaResolution to the CountTokensConfig interface, matching its availability in GenerateContentConfig.
Currently in @google/[email protected]:
// CountTokensConfig - does NOT have mediaResolution
export declare interface CountTokensConfig {
httpOptions?: HttpOptions;
abortSignal?: AbortSignal;
systemInstruction?: ContentUnion;
tools?: Tool[];
generationConfig?: GenerationConfig;
}
// GenerateContentConfig - DOES have mediaResolution
export declare interface GenerateContentConfig {
// ... other options ...
mediaResolution?: MediaResolution; // ✓ Supported here
}Expected behavior:
const ai = new GoogleGenAI({ apiKey: 'xxx' });
// Should be able to get accurate token count with mediaResolution
const tokenCount = await ai.models.countTokens({
model: 'gemini-2.5-flash',
contents: [{ role: 'user', parts: [videoPart, textPart] }],
config: {
mediaResolution: MediaResolution.MEDIA_RESOLUTION_LOW // Currently not supported
}
});
// Then use that estimate for rate limiting before calling generateContent
await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: [{ role: 'user', parts: [videoPart, textPart] }],
config: {
mediaResolution: MediaResolution.MEDIA_RESOLUTION_LOW // This IS supported
}
});Describe alternatives you've considered
-
Formula-based estimation: Using the documented token rates (64/256 tokens per frame + 32 audio tokens/second) to estimate manually. This works but duplicates logic that the API already has, and may drift from actual API behavior.
-
Post-hoc adjustment: Using
usageMetadata.promptTokenCountfrom responses to retroactively update rate limiting state. This helps adapt over time but doesn't prevent the initial rate limit violations. -
Conservative over-estimation: Always estimate using the highest resolution rate (256 tokens/frame). This wastes rate limit budget when using LOW resolution.
Additional context
-
SDK Version:
@google/[email protected] -
Documentation reference: Media Resolution docs show significant token differences:
MediaResolution Video (tokens/frame) LOW 64 MEDIUM 256 HIGH 256 -
Use case: Video processing applications that chunk long videos and need to respect TPM quotas require accurate pre-call token estimation to implement proper rate limiting.
-
Related: The
GenerationConfigtype insideCountTokensConfigalso doesn't appear to surfacemediaResolution, though the documentation suggests token counts depend on it.