

October 31, 2025
Swarm Inference by Fortytwo
At Fortytwo Research Lab, we’re announcing the benchmark results for our new AI architecture, Swarm Inference. Across key evaluation tests, including GPQA Diamond, AIME 2025, LiveCodeBench, and Humanity’s Last Exam, Swarm Inference outperformed OpenAI’s ChatGPT 5, Google Gemini 2.5 Pro, Anthropic Claude Opus 4.1, xAI Grok 4, and DeepSeek R1, demonstrating stronger reasoning abilities even under conditions where other frontier models fail.
Fortytwo Swarm Inference leads on key benchmarks
Swarm inference operates through a network of interconnected models, answering as one. The network is run by a community of AI enthusiasts worldwide. Each node hosts a small language model (SLM) selected by its operator. The SLM can be fully custom-built, a fine-tuned version of an existing model, or a publicly available open-source model.
When a prompt is introduced, multiple nodes respond, their outputs are ranked by each other, and the highest quality answers are combined. To the outside observer, the network behaves like a single model, though in reality it emerges from the coordination of hundreds of independent SLMs.
All models tested on pass@1 using raw prompts with no tools usage. Standard testing: one try for a correct answer.
Understanding beyond reasoning
To ensure the model's accuracy under varied conditions, Fortytwo also conducted additional benchmark tests that included extraneous context alongside standard benchmark prompts. This method prevents simple recall of memorized benchmark data and checks whether the models truly understand the problem.
This type of testing is similar to methods used in university exams and olympiads and is known as 'extraneous information' problems where the conditions include additional irrelevant information that is not required for solving the task. This helps determine whether the student genuinely understands the essence of the problem or is simply applying familiar formulas mechanically.
Reasoning Resilience: Fortytwo vs. Grok 4
Extraneous information (highlighted in blue) is deliberately designed to be unrelated or confusing to the model, testing whether
models can maintain focus on the actual problem or become distracted by irrelevant context.
Reasoning models get distracted easily and waste compute budget overthinking irrelevant information.
Swarm Inference stays accurate in real-world scenarios
Fortytwo’s Swarm Inference consistently demonstrates higher resilience to noise, prompt injections, and deliberately misleading inputs. In contrast, frontier AI models show steep declines in accuracy, often getting trapped in repetitive reasoning loops or distracted by irrelevant details.
By coordinating peer-ranked responses across diverse models, Swarm Inference maintains stable accuracy and delivers more reliable reasoning. This collaborative process allows intelligence to scale beyond the limits of individual models and opens a new path toward dependable, high-precision problem-solving.
The model is the network
Fortytwo’s Swarm Inference shows that intelligence can emerge from a decentralized network of small, diverse models that rank, validate, and improve each other, forming intelligence greater than the sum of its parts.

Each node produces its own response

Nodes compare and evaluate responses

The swarm selects the most relevant answers

The merged result is returned as the collective output
Mechanisms enabling Swarm Inference
Multiple specialized AI nodes independently produce answers to the same query, each bringing its unique domain expertise.
Nodes compare answers head-to-head instead of scoring them absolutely, ensuring that the strongest reasoning consistently wins.
Rankings are combined using a statistical model that gives more weight to proven, high-accuracy nodes.
Nodes maintain reputation by demonstrating real capability, verified through peer evaluation, preventing spam or fake identities.
Diversity across nodes filters out noise: when one is misled, others correct it, yielding accuracy beyond any single model.
What's Next
Fortytwo will continue to grow the network, enabling fully open participation by node operators, custom model providers, and data scientists. The team is pushing for even greater accuracy and intelligence across the swarm, with an API release planned later this year to rival frontier AI companies in the most demanding use cases: coding, deep research, and advanced reasoning.