Back to Blog
·Jan Tyl·6 min read

🔁 GPT‑5 is here!

🔁 GPT‑5 is here! 🧠 TL;DR: OpenAI has launched GPT‑5 today. It is gradually becoming the default model for Free, Plus, Pro, Team, and Enterprise in ChatGPT, and is available in the API as gpt‑5 / gpt‑5‑mini / gpt‑5‑nano. A new feature is the 'router' that automatically

🔁 GPT‑5 is here!

🔁 GPT‑5 is here!

🧠 TL;DR: OpenAI has launched GPT‑5 today. It is gradually becoming the default model for Free, Plus, Pro, Team, and Enterprise in ChatGPT, and is available in the API as gpt‑5 / gpt‑5‑mini / gpt‑5‑nano. A new feature is the “router” that automatically chooses between fast vs. “thinking” mode based on the task; users can also explicitly say “think hard about this.” Hallucinations in healthcare scenarios have significantly decreased (to 1.6% on HealthBench Hard Hallucinations). In coding, it achieves 74.9% on SWE‑bench Verified. Deployment is global, with a gradual rollout.

🔍 What exactly does GPT‑5 bring? GPT‑5 is not just a “larger GPT‑4”. It is a combination of several fundamental improvements: Unified System + “Router” GPT‑5 is a unified system that combines a fast “smart” model, a deeper “reasoning” model, and a router that decides in real-time what is best for a given query (also taking into account when you write something like “think hard about this”). Upon reaching limits, it switches to the “mini” version.

✅ Accuracy and Lower Hallucinations Fewer hallucinations in medicine - On HealthBench Hard Hallucinations, gpt‑5‑thinking shows an error rate of 1.6% (compared to 12.9% for GPT‑4o and 15.8% for o3). In urgent situations and global health, infractions are also dramatically lower. Note: this is not a universal “hallucination rate”, but rather a specific high-risk set of healthcare scenarios (HealthBench).

A significant reduction compared to previous models.

✅ Coding and Agent Work

SWE‑bench Verified: 74.9% (1st attempt). Aider Polyglot (code-editing): 88% (SOTA). τÂČ‑bench (telecom tool-use): 96.7%, significant improvement in tool chaining and robustness. Not just code generation, but also bug detection, planning, end-to-end builds. Performance confirmed across common technologies (web, backend, databases). 🔎 Source: OpenAI, TechCrunch

✅ Knowledge Capabilities OpenAI reports SOTA 88.4% on GPQA (Diamond) for the variant with extended “thinking” mode. (Some media report slightly different values depending on settings and “with tools” vs. without tools.)

GPT-5 achieves 89.4% in the GPQA test (PhD level knowledge). Claude Opus: 80.9% | Grok 4: 88.9%. 🔎 Source: Axios

✅ Long Context GPT‑5 handles up to 256,000 tokens without loss of accuracy. In ChatGPT, the context limit is up to 128k tokens for Pro/Enterprise, 32k for Plus/Team, and 8k in Free. For the API (gpt‑5/mini/nano), OpenAI lists 256k as the context dimension in the product overview, and in the developer post, a technical ceiling of up to 272k input + 128k output = ~400k total (depending on variant/model-card). Practically, this means significantly longer inputs and outputs than before.

✅ Adaptive “Routing” System Automatic switching between models based on task type (emails vs. analyses). 🔎 Source: The Verge

✅ New Features for Developers The API has added parameters for verbosity (short vs. long responses) and reasoning_effort (depth of thinking), plus custom tools (calling tools without strict JSON).

🎯 The result is an AI that better understands intent, plans, explains, and responds like an experienced specialist.

✅ Updates in ChatGPT OpenAI has also introduced preset “personalities” in ChatGPT (Cynic, Robot, Listener, Nerd). From a user perspective, GPT‑5 is meant to be “smarter, faster, and more useful” and is gradually becoming the default model for all users.

💬 What are users saying? 🧠 Reactions from communities (Reddit, early access, developers):

đŸ”č “The difference between GPT‑4 and 5 isn’t visually stunning. But it fixes code accurately and without nonsense. That changes the game.” – u/embeddedwizard

đŸ”č “Claude 4.1 is more stable in large projects. But GPT‑5 understands context better.” – u/datadevtools

đŸ”č “GPT‑5 has the lowest hallucination rate I’ve ever seen.” – u/ai_benchmark_bot

đŸ”č “It remembers things from 10 pages back and uses them elegantly. That’s a level we haven’t seen before.” – u/langchainlover

đŸ§Ș Overview of Benchmarks Area GPT‑5 Claude 4.1 Grok 4 Heavy SWE‑bench (coding) 74.9% 74.5% – GPQA (scientific knowledge) 89.4% 80.9% 88.9% Humanity’s Last Exam 42% – 44.4% HealthBench (hallucinations) 1.6% – –

📚 Source: OpenAI, Reddit /r/singularity, TechCrunch

🚀 What does this mean for businesses? GPT‑5 is not just a technological toy. It brings concrete advantages for business:

đŸ›ïž Content Automation Product descriptions, email campaigns, landing pages.

Lower error rates, faster design, more variants.

🧠 Customer Feedback Analysis Sentiment detection, review summaries, improvement suggestions.

🧰 Software Development Real-time debugging.

Natural feature planning – so-called vibe coding.

Integration into development tools (e.g. Cursor, Copilot).

💰 Pricing and Access ChatGPT (consumer) Free: GPT‑5 as default (with limits), shorter context. Plus (~$20/month): higher limits, 32k context. Pro (~$200/month, price varies by region; in the UK it’s £200 on the page): access to GPT‑5 Pro and 128k context, higher limits. Team/Enterprise similarly.

API (developers)

gpt‑5: $1.25/M input tokens, $10/M output. gpt‑5‑mini: $0.25/M in, $2/M out. gpt‑5‑nano: $0.05/M in, $0.40/M out.

In the product overview, OpenAI lists 256k context for these tiers; see also the detailed developer post on long context and reasoning outputs.

💰What does this mean for businesses (practically)

  • Software Development: GPT‑5 handles planning, multi-tool chaining, bug fixing, and maintains “course” even on long tasks (SWE‑bench 74.9%; τÂČ‑bench 96.7%).
  • Integration in Azure/GitHub Copilot/VS Code is complete.
  • Customer care and agents: Router + reasoning → lower cost/latency for light queries, “thinking” for complex cases.
  • Knowledge work: Longer context → better handling of documents (reports, due diligence, research).
  • Health/finance: Lower error rates on high-risk scenarios (but still not a substitute for a doctor/advisor)!

💰How to quickly test GPT‑5 (tips for the group)

  1. Code → plan → build → test “Design a migration plan to Postgres 16, then adjust the code step by step and show diffs and tests. Think aloud (think hard) and use tools sequentially.” (Setting the “thinking” mode and monitoring tool usage.) OpenAI

  2. Long Context “Here is a 150-page document (I will attach it as text). Find 5 inconsistencies, reference the pages, suggest corrections, and write a summary in 300 words.” (This will test the “needle in a haystack” search with 128k/256k inputs.)

  3. Healthcare Queries (for informational purposes only!) “Explain the differences between tests A and B, highlighting when it is necessary to contact a doctor and why.” (Monitor how the model conservatively flags risks.)

❓ Context and Clarifications ❔ What is Humanity’s Last Exam? An advanced test that verifies general AI intelligence through questions from ethics, biology, history, and logic – often without clear answers.

❔ Sam Altman’s statement about the “nuclear bomb”? It comes from a private meeting at Stanford.

đŸ—Żïž “GPT‑5 is so smart that I ask myself: What have we actually created?”

💬 It’s a metaphor, not an alarmist statement. Many criticise it as marketing dramatism. However, it captures the growing tension between innovation and regulation. It is more a description of the pace and significance of change than an “alarmist message”.

đŸŒ± And what about sustainability? GPT‑5 is extremely computationally intensive.

Daily energy consumption is equivalent to tens of thousands of households.

In addition to electricity, water consumption for cooling servers is also a concern.

OpenAI states that it is implementing a “routing” system that uses smaller models where sufficient.

đŸŽ€ In conclusion (personally) I have been looking forward to GPT‑5 since Altman first hinted at “something big”. And now it’s here. Perhaps only on paper for now, but the quality and possibilities are real.

🔧 For developers – a new way of thinking about code. 💡 For businesses – fewer errors, faster content, smarter support. 🎹 For creatives – deeper context, better language, consistency.

Bonus: quick comparison (for graphs/slides)

  • SWE‑bench Verified: GPT‑5 74.9% > Claude 4.1 74.5% > Gemini 2.5 Pro 59.6%. (TechCrunch)
  • GPQA (Diamond): GPT‑5 Pro 88.4% (OpenAI).
  • HLE (with tools): GPT‑5 Pro 42%, Grok 4 Heavy 44.4%. (TechCrunch)
  • HealthBench Hard Hallucinations: 1.6% (gpt‑5‑thinking).

Originally published on Facebook — link to post

PĆŻvodnĂ­ zdroj: facebook

Související články