Real-time countdown, leaked benchmarks, pricing predictions, and frontier model comparison for OpenAI's next model.
Based on pre-training completion (late March 2026) + typical post-training timeline
Earliest estimated date: May 15, 2026 | Latest: June 30, 2026
Countdown shows time to earliest estimate. Actual date may be later.
Estimate how GPT-5.5 will perform on your use case based on leaked signals and benchmark trends.
Sam Altman confirmed "a strong new base model" has finished pre-training, incorporating two years of research advances.
HIGH CONFIDENCEUnlike GPT-5.4's modular approach, Spud is reported to have unified text/image/audio/video understanding in a single model architecture.
HIGH CONFIDENCESignificant improvements in multi-step task execution, tool chaining, and autonomous workflow completion. Built-in computer use abilities.
HIGH CONFIDENCEInternal codename following OpenAI's tradition of food-related names. Some speculate it references the model's "growth from a seed" architecture.
MEDIUM CONFIDENCEExpected to support 1 million+ tokens in context, competing with Gemini's long-context capabilities.
MEDIUM CONFIDENCEEstimated at $10-20 per million input tokens and $30-60 per million output tokens. Mini variant expected at 5-10x cheaper.
SPECULATIVE| Benchmark | GPT-5.5 Spud (Est.) | GPT-5.4 | Claude Opus 4.7 | Gemini 2.5 Ultra | Llama 4 Behemoth |
|---|---|---|---|---|---|
| MMLU-Pro | ~94.5% | 91.2% | 92.8% | 93.1% | 90.5% |
| HumanEval+ | ~96% | 92.4% | 94.1% | 91.8% | 89.2% |
| MATH-500 | ~98% | 94.8% | 96.2% | 95.5% | 92.1% |
| GPQA Diamond | ~78% | 71.4% | 74.9% | 73.2% | 69.8% |
| SWE-bench Verified | ~72% | 65.3% | 68.7% | 62.1% | 58.4% |
| Agentic Tasks | ~88% | 79.5% | 85.2% | 80.1% | 75.3% |
| Context Length | 1M+ tokens | 128K | 200K | 2M | 512K |
| Price (Input/1M) | ~$15 | $10 | $15 | $12.50 | Open source |
*GPT-5.5 Spud scores are estimates based on leak analysis and trend extrapolation. Actual results may differ.