đ GPTâ5 is here!
đ GPTâ5 is here! đ§ TL;DR: OpenAI has launched GPTâ5 today. It is gradually becoming the default model for Free, Plus, Pro, Team, and Enterprise in ChatGPT, and is available in the API as gptâ5 / gptâ5âmini / gptâ5ânano. A new feature is the 'router' that automatically

đ GPTâ5 is here!
đ§ TL;DR: OpenAI has launched GPTâ5 today. It is gradually becoming the default model for Free, Plus, Pro, Team, and Enterprise in ChatGPT, and is available in the API as gptâ5 / gptâ5âmini / gptâ5ânano. A new feature is the ârouterâ that automatically chooses between fast vs. âthinkingâ mode based on the task; users can also explicitly say âthink hard about this.â Hallucinations in healthcare scenarios have significantly decreased (to 1.6% on HealthBench Hard Hallucinations). In coding, it achieves 74.9% on SWEâbench Verified. Deployment is global, with a gradual rollout.
đ What exactly does GPTâ5 bring? GPTâ5 is not just a âlarger GPTâ4â. It is a combination of several fundamental improvements: Unified System + âRouterâ GPTâ5 is a unified system that combines a fast âsmartâ model, a deeper âreasoningâ model, and a router that decides in real-time what is best for a given query (also taking into account when you write something like âthink hard about thisâ). Upon reaching limits, it switches to the âminiâ version.
â Accuracy and Lower Hallucinations Fewer hallucinations in medicine - On HealthBench Hard Hallucinations, gptâ5âthinking shows an error rate of 1.6% (compared to 12.9% for GPTâ4o and 15.8% for o3). In urgent situations and global health, infractions are also dramatically lower. Note: this is not a universal âhallucination rateâ, but rather a specific high-risk set of healthcare scenarios (HealthBench).
A significant reduction compared to previous models.
â Coding and Agent Work
SWEâbench Verified: 74.9% (1st attempt). Aider Polyglot (code-editing): 88% (SOTA). ÏÂČâbench (telecom tool-use): 96.7%, significant improvement in tool chaining and robustness. Not just code generation, but also bug detection, planning, end-to-end builds. Performance confirmed across common technologies (web, backend, databases). đ Source: OpenAI, TechCrunch
â Knowledge Capabilities OpenAI reports SOTA 88.4% on GPQA (Diamond) for the variant with extended âthinkingâ mode. (Some media report slightly different values depending on settings and âwith toolsâ vs. without tools.)
GPT-5 achieves 89.4% in the GPQA test (PhD level knowledge). Claude Opus: 80.9% | Grok 4: 88.9%. đ Source: Axios
â Long Context GPTâ5 handles up to 256,000 tokens without loss of accuracy. In ChatGPT, the context limit is up to 128k tokens for Pro/Enterprise, 32k for Plus/Team, and 8k in Free. For the API (gptâ5/mini/nano), OpenAI lists 256k as the context dimension in the product overview, and in the developer post, a technical ceiling of up to 272k input + 128k output = ~400k total (depending on variant/model-card). Practically, this means significantly longer inputs and outputs than before.
â Adaptive âRoutingâ System Automatic switching between models based on task type (emails vs. analyses). đ Source: The Verge
â New Features for Developers The API has added parameters for verbosity (short vs. long responses) and reasoning_effort (depth of thinking), plus custom tools (calling tools without strict JSON).
đŻ The result is an AI that better understands intent, plans, explains, and responds like an experienced specialist.
â Updates in ChatGPT OpenAI has also introduced preset âpersonalitiesâ in ChatGPT (Cynic, Robot, Listener, Nerd). From a user perspective, GPTâ5 is meant to be âsmarter, faster, and more usefulâ and is gradually becoming the default model for all users.
đŹ What are users saying? đ§ Reactions from communities (Reddit, early access, developers):
đč âThe difference between GPTâ4 and 5 isnât visually stunning. But it fixes code accurately and without nonsense. That changes the game.â â u/embeddedwizard
đč âClaude 4.1 is more stable in large projects. But GPTâ5 understands context better.â â u/datadevtools
đč âGPTâ5 has the lowest hallucination rate Iâve ever seen.â â u/ai_benchmark_bot
đč âIt remembers things from 10 pages back and uses them elegantly. Thatâs a level we havenât seen before.â â u/langchainlover
đ§Ș Overview of Benchmarks Area GPTâ5 Claude 4.1 Grok 4 Heavy SWEâbench (coding) 74.9% 74.5% â GPQA (scientific knowledge) 89.4% 80.9% 88.9% Humanityâs Last Exam 42% â 44.4% HealthBench (hallucinations) 1.6% â â
đ Source: OpenAI, Reddit /r/singularity, TechCrunch
đ What does this mean for businesses? GPTâ5 is not just a technological toy. It brings concrete advantages for business:
đïž Content Automation Product descriptions, email campaigns, landing pages.
Lower error rates, faster design, more variants.
đ§ Customer Feedback Analysis Sentiment detection, review summaries, improvement suggestions.
đ§° Software Development Real-time debugging.
Natural feature planning â so-called vibe coding.
Integration into development tools (e.g. Cursor, Copilot).
đ° Pricing and Access ChatGPT (consumer) Free: GPTâ5 as default (with limits), shorter context. Plus (~$20/month): higher limits, 32k context. Pro (~$200/month, price varies by region; in the UK itâs ÂŁ200 on the page): access to GPTâ5 Pro and 128k context, higher limits. Team/Enterprise similarly.
API (developers)
gptâ5: $1.25/M input tokens, $10/M output. gptâ5âmini: $0.25/M in, $2/M out. gptâ5ânano: $0.05/M in, $0.40/M out.
In the product overview, OpenAI lists 256k context for these tiers; see also the detailed developer post on long context and reasoning outputs.
đ°What does this mean for businesses (practically)
- Software Development: GPTâ5 handles planning, multi-tool chaining, bug fixing, and maintains âcourseâ even on long tasks (SWEâbench 74.9%; ÏÂČâbench 96.7%).
- Integration in Azure/GitHub Copilot/VS Code is complete.
- Customer care and agents: Router + reasoning â lower cost/latency for light queries, âthinkingâ for complex cases.
- Knowledge work: Longer context â better handling of documents (reports, due diligence, research).
- Health/finance: Lower error rates on high-risk scenarios (but still not a substitute for a doctor/advisor)!
đ°How to quickly test GPTâ5 (tips for the group)
-
Code â plan â build â test âDesign a migration plan to Postgres 16, then adjust the code step by step and show diffs and tests. Think aloud (think hard) and use tools sequentially.â (Setting the âthinkingâ mode and monitoring tool usage.) OpenAI
-
Long Context âHere is a 150-page document (I will attach it as text). Find 5 inconsistencies, reference the pages, suggest corrections, and write a summary in 300 words.â (This will test the âneedle in a haystackâ search with 128k/256k inputs.)
-
Healthcare Queries (for informational purposes only!) âExplain the differences between tests A and B, highlighting when it is necessary to contact a doctor and why.â (Monitor how the model conservatively flags risks.)
â Context and Clarifications â What is Humanityâs Last Exam? An advanced test that verifies general AI intelligence through questions from ethics, biology, history, and logic â often without clear answers.
â Sam Altmanâs statement about the ânuclear bombâ? It comes from a private meeting at Stanford.
đŻïž âGPTâ5 is so smart that I ask myself: What have we actually created?â
đŹ Itâs a metaphor, not an alarmist statement. Many criticise it as marketing dramatism. However, it captures the growing tension between innovation and regulation. It is more a description of the pace and significance of change than an âalarmist messageâ.
đ± And what about sustainability? GPTâ5 is extremely computationally intensive.
Daily energy consumption is equivalent to tens of thousands of households.
In addition to electricity, water consumption for cooling servers is also a concern.
OpenAI states that it is implementing a âroutingâ system that uses smaller models where sufficient.
đ€ In conclusion (personally) I have been looking forward to GPTâ5 since Altman first hinted at âsomething bigâ. And now itâs here. Perhaps only on paper for now, but the quality and possibilities are real.
đ§ For developers â a new way of thinking about code. đĄ For businesses â fewer errors, faster content, smarter support. đš For creatives â deeper context, better language, consistency.
Bonus: quick comparison (for graphs/slides)
- SWEâbench Verified: GPTâ5 74.9% > Claude 4.1 74.5% > Gemini 2.5 Pro 59.6%. (TechCrunch)
- GPQA (Diamond): GPTâ5 Pro 88.4% (OpenAI).
- HLE (with tools): GPTâ5 Pro 42%, Grok 4 Heavy 44.4%. (TechCrunch)
- HealthBench Hard Hallucinations: 1.6% (gptâ5âthinking).
Originally published on Facebook â link to post
PĆŻvodnĂ zdroj: facebook