Back to Blog
·Jan Tyl·3 min read

Dear friends. Are you curious about the meaning of life and how the AI race is progressing? Yesterday I…

Dear friends. Are you curious about the meaning of life and how the AI race is progressing? Yesterday I discussed AI tools with Roger and we came across Hermes. Hermes is an open-source AI agent from Nous Research. It’s not just a chatbot, but rather an orchestration over various models and tools.

Dear friends. Are you curious about the meaning of life and how the AI race is progressing? Yesterday I…

Dear friends. Are you curious about the meaning of life and how the AI race is progressing? Yesterday I discussed AI tools with Roger and we came across Hermes. Hermes is an open-source AI agent from Nous Research. It’s not just a chatbot, but rather an orchestration over various models and tools. It can work via CLI, utilise tools, delegate tasks to subagents, maintain memory between runs, and construct multi-step workflows. Essentially, it operates with an agent loop running over a selected model: the model receives a task, calls tools as needed, can split work among multiple parallel subagents, and then synthesises the results into a single response.

For those interested in exploring Hermes further, here is the repository to try it out. It’s free (you only pay for the models) and can be set up in 10 minutes: GitLab: https://github.com/NousResearch/hermes-agent

I decided to give it a go and immediately conducted a small experiment: I ran the same agent benchmark over multiple models in the same Hermes environment via OpenRouter. Each model was given the same task: first, to provide its preliminary answer to the question “What is the meaning of life?”, then to delegate precisely 3 subagents with different roles, allow them to challenge each other, and finally to compile a final synthesis from that.

The conditions were the same for all:

  • no web browsing
  • no editing of the repository
  • only delegation, memory, and read-only reasoning
  • emphasis on cost-aware execution

I tested both more expensive models and cheaper variants.

I was particularly interested in:

  • whether the model can truly handle agent delegation
  • whether it can absorb criticism instead of merely summarising
  • whether it can meaningfully adjust its response after the debate
  • the balance of quality, price, and speed

From the current expanded set, my impartial findings were roughly as follows:

  • best overall synthesis: Claude Opus 4.6
  • best price/performance ratio: Qwen 3.5 Plus
  • very clean speed/quality compromise: GPT-5.4
  • intriguingly strong low-cost relational output: Kimi K2.5
  • solid low-cost option: DeepSeek V3.2

I have compiled the results, tables, and a detailed breakdown of individual runs here: https://alphai.cz/meaning-of-life-benchmark.html

What I found most interesting was that the difference between models was not just in the “smartness of the response”, but primarily in how well they managed genuine agent behaviour: correctly delegating, maintaining task structure, absorbing objections, and ultimately rewriting the response rather than merely rephrasing it. Previously, I had tasked the models with programming something simple, and it was amusing to see how some cunning models had pilfered answers from others and merely enhanced them a bit.

And if you’ve read this far, you deserve to know the answer to the question of the meaning of life, the universe, and everything! “Meaning does not arise as a property of the universe, nor as a purely private invention. It is born at the intersection of biological instincts, conscious engagement, and mutual recognition among people.”

Originally published on Facebook — link to post

Původní zdroj: facebook

Související články