`countTokens()` should support `mediaResolution` in config for accurate multimodal token estimation

**Is your feature related to a problem? Please describe.**

When building applications that process video/images with the Gemini API, I need to implement rate limiting to stay within TPM (Tokens Per Minute) quotas. To do this effectively, I need accurate token estimates *before* making `generateContent()` calls.

The `countTokens()` API is ideal for this, but it doesn't support the `mediaResolution` config option. Since `mediaResolution` significantly affects token count (64 tokens/frame for LOW vs 256 tokens/frame for MEDIUM/HIGH on Gemini 2.5), the token estimate from `countTokens()` doesn't reflect the actual tokens that will be used when `generateContent()` is called with a specific `mediaResolution`.

**Describe the solution you'd like**

Add `mediaResolution?: MediaResolution` to the `CountTokensConfig` interface, matching its availability in `GenerateContentConfig`.

Currently in `@google/genai@1.30.0`:

```typescript
// CountTokensConfig - does NOT have mediaResolution
export declare interface CountTokensConfig {
    httpOptions?: HttpOptions;
    abortSignal?: AbortSignal;
    systemInstruction?: ContentUnion;
    tools?: Tool[];
    generationConfig?: GenerationConfig;
}

// GenerateContentConfig - DOES have mediaResolution
export declare interface GenerateContentConfig {
    // ... other options ...
    mediaResolution?: MediaResolution;  // ✓ Supported here
}
```

**Expected behavior:**

```typescript
const ai = new GoogleGenAI({ apiKey: 'xxx' });

// Should be able to get accurate token count with mediaResolution
const tokenCount = await ai.models.countTokens({
  model: 'gemini-2.5-flash',
  contents: [{ role: 'user', parts: [videoPart, textPart] }],
  config: {
    mediaResolution: MediaResolution.MEDIA_RESOLUTION_LOW  // Currently not supported
  }
});

// Then use that estimate for rate limiting before calling generateContent
await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: [{ role: 'user', parts: [videoPart, textPart] }],
  config: {
    mediaResolution: MediaResolution.MEDIA_RESOLUTION_LOW  // This IS supported
  }
});
```

**Describe alternatives you've considered**

1. **Formula-based estimation**: Using the documented token rates (64/256 tokens per frame + 32 audio tokens/second) to estimate manually. This works but duplicates logic that the API already has, and may drift from actual API behavior.

2. **Post-hoc adjustment**: Using `usageMetadata.promptTokenCount` from responses to retroactively update rate limiting state. This helps adapt over time but doesn't prevent the initial rate limit violations.

3. **Conservative over-estimation**: Always estimate using the highest resolution rate (256 tokens/frame). This wastes rate limit budget when using LOW resolution.

**Additional context**

- **SDK Version**: `@google/genai@1.30.0`
- **Documentation reference**: [Media Resolution docs](https://ai.google.dev/gemini-api/docs/media-resolution) show significant token differences:

  | MediaResolution | Video (tokens/frame) |
  |-----------------|---------------------|
  | LOW | 64 |
  | MEDIUM | 256 |
  | HIGH | 256 |

- **Use case**: Video processing applications that chunk long videos and need to respect TPM quotas require accurate pre-call token estimation to implement proper rate limiting.

- **Related**: The `GenerationConfig` type inside `CountTokensConfig` also doesn't appear to surface `mediaResolution`, though the documentation suggests token counts depend on it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`countTokens()` should support `mediaResolution` in config for accurate multimodal token estimation #1134

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

countTokens() should support mediaResolution in config for accurate multimodal token estimation #1134

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`countTokens()` should support `mediaResolution` in config for accurate multimodal token estimation #1134