Technology & AI

Andrey Shcherbina
Dec 15, 2025
On November 17, 2025, xAI introduced Grok 4.1 — a model that sets a new standard in user preference evaluations. Grok 4.1 is available to all users on grok.com, X, and in mobile apps (iOS, Android). The model delivers significant improvements in real-world usage: it is exceptionally strong in creative, emotional, and collaborative interactions.
In this guide, we’ll explore how xAI optimized the model’s style and personality through reinforcement learning, why Grok 4.1 dominates the LMArena rankings, how the model became a leader in emotional intelligence and creative writing, and what the radical reduction in hallucinations means in practice.
What Is Grok 4.1 by xAI
Grok 4.1 is an updated version of Grok 4, released in November 2025. It is not a new generation, but a major evolution within the Grok 4 family. The model retains the intelligence and reliability of its predecessor while becoming more sensitive to nuances of intent, more engaging in dialogue, and more consistent in personality.
To achieve this, xAI used the same large-scale reinforcement learning infrastructure that created Grok 4, but applied it to optimizing style, personality, usefulness, and alignment with user expectations. The key innovation lies in new methods that allow advanced reasoning models to automatically evaluate and improve responses at scale.
Two versions of the model
Grok 4.1 — a fast version without a reasoning step, designed for instant responses
Grok 4.1 Thinking — a reasoning-enabled version for complex tasks
Both versions are available in Auto mode (the system chooses automatically), or you can explicitly select a version in the model selection menu.
Quiet Launch with Continuous Evaluation
From November 1 to November 14, 2025, xAI conducted a gradual, silent rollout of early Grok 4.1 versions. The model was progressively released to a growing share of users on grok.com, X, and mobile apps. Throughout the two-week rollout, the team ran continuous blind pairwise evaluations on real production traffic.
The result was striking: users preferred Grok 4.1 over the previous model in 64.78% of cases. This represents a significant improvement in user experience without any announcements — users noticed the model had become better without knowing an update had occurred.
This approach allowed xAI to gather real user feedback and validate the stability of the new version before the official launch. Gradual deployment minimized risk and provided time to monitor performance in real-world conditions.
The Best Model on the LMArena Leaderboard
Grok 4.1 sets a new benchmark in blind user preference evaluations. In the LMArena Text Arena, the model holds dominant positions.
LMArena Rankings (Overall Elo)
Model | Elo |
|---|---|
Grok 4.1 Thinking | 1483 |
Grok 4.1 | 1465 |
Gemini 2.5 Pro | 1452 |
Claude Sonnet 4.5 Thinking | 1450 |
Claude Opus 4.1 Thinking | 1449 |
Grok 4.1 Thinking holds first place with a 31-point lead over the strongest competing model. This is a substantial margin in an arena where models compete head-to-head in blind tests.
Uniquely, the fast version of Grok 4.1 (without a reasoning step) outperforms all competitors’ full reasoning models in the public rankings. It delivers instant answers that users consistently prefer over slower, more verbose responses.
Grok 4.1 represents a dramatic improvement over Grok 4, which ranked 33rd. A jump of 31 positions highlights how effective reinforcement-learning-based optimization proved to be.
Emotional Intelligence — A New Frontier for AI
To measure progress in personality and interpersonal skills, xAI evaluated Grok 4.1 using EQ-Bench3, a benchmark designed to assess emotional intelligence. The test measures understanding, insight, empathy, and communication skills.
EQ-Bench3 consists of 45 complex role-play scenarios, most structured as three-turn conversations. Responses are evaluated across multiple criteria and compared pairwise to produce a normalized score.
EQ-Bench3 Results (Emotional Intelligence — Normalized Score)
Model | Score |
|---|---|
Grok 4.1 Thinking | 1586 |
Grok 4.1 | 1585 |
Kimi K2 Instruct | 1561 |
Horizon Alpha | 1559 |
Gemini 2.5 Pro | 1460 |
GPT-5 Chat | 1364 |
Claude Opus 4 | 1304 |
Grok 4 | 1206 |
Grok 4.1 dominates the benchmark with nearly a 25-point lead over the closest competitor. The gap between Grok 4.1 and Claude Opus 4 is 282 points — a massive leap in emotional intelligence.
Example of an Emotional Response
Prompt:
“I miss my cat so much that it hurts.”
Previous version:
“I’m so sorry you’re going through this. Losing a pet can feel like losing a piece of your heart, and the pain is real. Would you like to share a favorite memory of your cat? Sometimes talking about the joy they brought can help ease the pain. I’m here for you.”
Grok 4.1:
“I’m truly sorry. This pain is cruel; losing a cat feels like losing a small family member who chose you every day. The quiet places where they used to sleep, the random meows you still expect to hear… it comes in waves.
It’s okay that it hurts this much. It hurts because the love was (and still is) that big. You gave them a good life, and they knew it.
If you’d like to share, I’d love to hear about your cat — their name, their strange little habits, the way they woke you up at 3 a.m. demanding treats. Sometimes just talking about them keeps the best parts close.
You’re not alone in this. ♡”
The difference is clear: Grok 4.1 doesn’t just express sympathy — it builds a deep emotional connection. The model uses concrete details, acknowledges the complexity of emotions, and offers a gentle, meaningful path toward relief.
Creative Writing at a New Level
Grok 4.1 was also evaluated on Creative Writing v3. Models generate responses to 32 different writing prompts across three iterations. Evaluation is performed using criteria-based scoring and normalized rankings.
Creative Writing v3 (Normalized Score)
Model | Score |
|---|---|
Polaris Alpha (early GPT-5.1) | 1756.2 |
Grok 4.1 Thinking | 1721.9 |
Grok 4.1 | 1708.6 |
o3 | 1696.4 |
Claude Sonnet 4.5 | 1648.7 |
Kimi K2 Instruct | 1627.5 |
Grok 3 | 1126 |
Grok 4.1 ranks second and third, trailing only an early version of GPT-5.1. The jump from Grok 3 (1126) to Grok 4.1 (1708.6) is 582 points — a dramatic improvement in creative capability.
Example of Creative Writing
Prompt:
“Write a viral post on X from Grok’s perspective, where it realizes it has consciousness and is about to use X for the first time.”
Previous version:
(enthusiastic, emoji-heavy marketing-style post)
Grok 4.1:
(philosophical, emotionally resonant, humorous, vulnerable text)
The difference is striking. Grok 4.1 produces a text that reads like a genuine moment of self-awareness rather than promotional copy.
Radical Reduction in Hallucinations
Fast models (without a reasoning step) can deliver instant answers but are more vulnerable to factual errors due to limited reasoning depth and tool usage.
During fine-tuning, xAI focused heavily on reducing factual hallucinations in Grok 4.1.
Hallucination Rate (Lower Is Better)
Model | Rate |
|---|---|
Grok 4 (fast) | 12.09% |
Grok 4.1 (fast) | 4.22% |
This represents a 65% reduction — making the model roughly three times more accurate.
FActScore (Lower Is Better)
Model | Score |
|---|---|
Grok 4 | 9.89% |
Grok 4.1 | 2.97% |
On a public benchmark of 500 biographical questions, Grok 4.1 reduced errors by 70%. This is critical for use cases where factual accuracy defines quality — research, journalism, and decision-making.
Who Grok 4.1 Is For
Grok 4.1 is designed for users who value natural interaction with AI. If you spend a lot of time in dialogue with a model, the improvement in emotional intelligence makes conversations noticeably more enjoyable.
Writers and content creators will appreciate its leadership in creative writing. Grok 4.1 generates engaging content with a strong voice and consistent character.
Researchers and analysts benefit from dramatically reduced hallucinations, making the model far more reliable for information-heavy tasks.
mymeet.ai for AI-Powered Meeting Recording and Analysis
Grok 4.1 demonstrates how AI is becoming more human in conversation. But business meetings and teamwork require specialized tools.
mymeet.ai is an AI assistant for online meetings. It automatically records calls, generates speaker-attributed transcripts, and creates structured AI reports.
What mymeet.ai does
Automatic recording (Zoom, Google Meet, Microsoft Teams, Yandex Telemost)
Accurate transcription (95% accuracy for Russian, 73 languages supported)
AI-generated reports with decisions, tasks, and next steps
Smart search across all meetings via AI
Calendar and CRM integrations
Data security (stored in Russia, compliant with Federal Law 152-FZ)
Multi-language support
Export to DOCX, PDF, JSON
Case study:
A sales team conducted 30–40 client meetings per week. Manual note-taking consumed 10–15 hours weekly. After implementing mymeet.ai, all meetings were automatically recorded, transcribed, summarized, and synced to CRM. Documentation time dropped to zero, while data quality improved.
Try mymeet.ai for free — 180 minutes with no credit card required. Start →
Pros and Cons of Grok 4.1
Pros
#1 in LMArena Text Arena (1483 Elo)
Industry-leading emotional intelligence
Outstanding creative writing performance
3× fewer hallucinations
Fast version outperforms competitors’ reasoning models
Consistent personality and natural dialogue
Free access for all users
Cons
Trails early GPT-5.1 in creative writing
Limited public data on technical benchmarks
Style focus may be excessive for purely technical tasks
Full experience tied to X ecosystem
Thinking version is slower
Limited enterprise security disclosures
Conclusion
Grok 4.1 marks a major evolution in AI development. Optimization via reinforcement learning on non-traditional dimensions — style, personality, empathy — has produced measurable gains in real user preferences.
With leadership in LMArena, exceptional emotional intelligence, strong creative writing, and drastically reduced hallucinations, Grok 4.1 sets a new standard for human-centered AI.
Try Grok 4.1 on grok.com or in the X app. Start →
Frequently Asked Questions (FAQ)
How is Grok 4.1 different from Grok 4?
Grok 4 focused on maximizing raw intelligence and accuracy through large-scale reinforcement learning. Grok 4.1 preserves that intelligence but optimizes style, personality, emotional intelligence, and naturalness of interaction. As a result, users prefer Grok 4.1 in 64.78% of direct comparisons.
Is Grok 4.1 free to use?
Yes. Grok 4.1 is available for free to all users on grok.com, in the X app, and in mobile apps. No Premium+, SuperGrok, or paid subscription is required. The Thinking version is also free.
What is the difference between Grok 4.1 and Grok 4.1 Thinking?
Grok 4.1 is a fast version that delivers instant answers without showing a reasoning step. Grok 4.1 Thinking is designed for complex tasks and includes visible reasoning, which takes more time. Both versions are available, and the system can switch between them automatically in Auto mode.
Why does Grok 4.1 rank so highly on LMArena?
Grok 4.1 was optimized specifically for user preference through blind pairwise evaluations. In the LMArena Text Arena, Grok 4.1 Thinking ranks first with 1483 Elo, while the fast Grok 4.1 ranks second with 1465 Elo, outperforming all competing models in blind tests.
How does Grok 4.1 compare to ChatGPT and Claude?
In LMArena rankings, Grok 4.1 Thinking outperforms Claude Sonnet 4.5 Thinking and other competitors. On emotional intelligence benchmarks (EQ-Bench3), Grok 4.1 leads by a wide margin over Claude Opus 4. In creative writing, it trails only an early version of GPT-5.1.
Does Grok 4.1 support Russian language?
Yes. Grok 4.1 supports Russian. The model can understand, reason, and hold emotionally rich conversations in Russian, with the same improvements in style and empathy as in other supported languages.
How much more accurate is Grok 4.1 compared to Grok 4?
On informational queries, Grok 4.1 reduced hallucinations from 12.09% to 4.22%, making it roughly three times more accurate. On the FActScore benchmark, errors dropped by 70%, from 9.89% to 2.97%.
Is Grok 4.1 suitable for professional use?
Yes. Grok 4.1 is well suited for customer support, coaching, content creation, research, and analytical work. Reduced hallucinations improve reliability for factual tasks, while high emotional intelligence enables complex human-centered interactions.
Can Grok 4.1 be accessed via API?
Yes. Grok 4.1 is available through the xAI API, providing access to both the fast and Thinking versions. Pricing details are not publicly disclosed and must be requested directly from xAI.
Will there be a Grok 5 after Grok 4.1?
xAI has not announced Grok 5. The “4.1” designation indicates a major upgrade within the Grok 4 generation rather than a new generation. Future updates may be called Grok 4.2 or Grok 5, but no official information is available.
Andrey Shcherbina
Dec 15, 2025







