Skip to content

Background model MiniMax-M2.5 is expensive and uncontrollable #440

@claell

Description

@claell

Summary

Command Code uses background models (MiniMax-M2.5, Kimi-K2.5) for internal tasks such as title generation, tool call naming, and taste learning. These calls are not user-selectable, yet they appear in usage logs and consume credits. In my sessions, these background calls account for a significant share of total cost — sometimes roughly half the spend for a task — because tool descriptions are being generated/processed on what seems like every other request.

Problems

  1. MiniMax-M2.5 is not cost-effective for trivial tasks. Title generation and tool naming are lightweight operations that do not require a 400B+ parameter model. Much smaller and cheaper alternatives (e.g. Qwen 3 Flash, DeepSeek V4 Flash, Gemma 3 4B, Phi-4-mini) would handle these tasks at a fraction of the cost.

  2. Users have no control over the background model. The /model selector only affects the primary conversation model. There is no way to choose, downgrade, or disable the background model — yet the resulting token usage is billed to the user's account.

  3. Frequency of background calls inflates cost. Usage logs show tool-desc/summary calls being made repeatedly (appearing every other request in some sessions). Even at low per-call cost, this adds up significantly over a session.

Expected Behavior

  • Background helper tasks should use the cheapest viable model available, not a mid-tier general-purpose one.
  • Users should be able to see which model was used for each call in the usage dashboard (currently it just shows "MiniMax-M2.5" alongside the selected model without explanation).
  • Ideally, users should be able to opt out of, or at least select the model for, these background operations.

Actual Behavior

  • MiniMax-M2.5 (a 400B+ parameter model) is used for trivial text summarization tasks.
  • These calls are invisible during the session — they only show up in the usage dashboard after the fact.
  • The cumulative cost of these background calls can rival or exceed the cost of the primary model calls.

Suggestions

  • Switch background tasks to a small, fast, cheap model (Qwen 3 Flash, DeepSeek V4 Flash, or similar sub-cent-per-million-token models).
  • Surface background model usage separately in the UI so users understand what they're being billed for.
  • Consider making taste learning and title generation opt-in per session, not just globally via /taste (as suggested in Unexpected usage of multiple models during a single session #326).

Related

Environment

  • Command Code version: latest
  • OS: Windows
  • Shell: PowerShell

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions