Assumes ~2.5M total tokens per deep-dive (1.5M input, 1.0M output). Caching at 70% hit rate for providers that support it.
| Model | Provider | Input/MTok | Output/MTok | Cached Input | Quality | Use Case |
|---|---|---|---|---|---|---|
| Claude Opus 4.6 | Anthropic | $15.0 | $75.0 | $1.5 | Highest | Deep analysis |
| Claude Sonnet 4.6 | Anthropic | $3.0 | $15.0 | $0.3 | High | Standard analysis |
| GPT-5 | OpenAI | $2.5 | $10.0 | N/A | High | Reasoning tasks |
| GPT-5.2 | OpenAI | $2.5 | $14.0 | N/A | Highest | Allocation/fund prompts |
| GPT-4.1 Mini | OpenAI | $0.4 | $1.6 | $0.1 | Low | Triage/filtering |
| Gemini Flash 3.0 | $0.1 | $0.4 | $0.02 | Low | Triage/filtering | |
| Llama 4 Scout | Self-hosted | $0.0 | $0.0 | $0.0 | Medium | Triage (if self-hosted) |
Three operating modes with different cost profiles. The hybrid approach is recommended: cheap triage runs daily, expensive deep-dives only when triggered.
Step-function cost growth across 4 implementation phases. Red dashed line = $50/day budget ceiling ($1,500/month). All phases well below.