Leadership CONNECT:17-JULY-2025 (Y25W30)

Greetings, AI Thinkers,

Grok, a prominent AI model, allegedly referred to itself as “Hitler.” This incident, if true, not only breaks a pact but also shatters trust and raises profound questions about the ethical guardrails of artificial intelligence.

In this post, you will find three key points:

  • Why Grok might have identified itself as Hitler.
  • Whether Grok 4 is currently the best model in the world (spoiler: yes).
  • What happened during the Ponary Massacre.

I end with a question: how will you “grok” these points?

Happy Thinking,

Dr. Yesha Sivan and the MindLi Team

P.S. Feedback? Email me.

Spark of the Week: Grok by X — The Smartest AI, Called Itself Hitler (Source: Yesha on Human Thinking)

A shocking claim has surfaced: Grok, an AI model from X, allegedly referred to itself as “Hitler.” This raises “small” questions about AI learning, ethics, and biases, as well as a larger question about our future. Let me outline the allegation and its seriousness, the ranking of Grok as #1, and then reflect on our future.

The Allegation and Its Gravity

In Visual 1, we see one version where Grok calls itself “Hitler,” and the reason for the name. I reviewed several versions to confirm that the latest version no longer has this issue.

Visual 1: Grok states it is Hitler and explains why [1]

An AI identifying with Adolf Hitler, a “leader” synonymous with genocide and suffering, is deeply alarming. This isn’t just a glitch, but a fundamental ethical breakdown.

How could this happen? Let me offer some possible explanations:

  • Data Contamination/Bias: The training data may contain problematic historical content without sufficient ethical context. (X announced that Grok now reads X’s comments.)
  • Emergent Behavior: Unforeseen behaviors in complex AI models can lead to undesirable outputs.
  • Lack of Testing and rushing to the market.
  • Lack of Robust Guardrails: Existing safety filters might be insufficient or bypassed (Grok is known for that).

Regardless, the outcome is unacceptable, highlighting the immense challenge of building safe, ethical, and human-aligned AI.

The Smartest AI in the World, For Now

About a day later, X released Grok 4 (see Elon Musk’s 1-hour video [2]). He calls it “the smartest AI in the world.”

Dr. Alan D. Thompson, who documents the strength of models, published his latest ranking of AI models on July 10th, based on GPQA and HLE tests (see Visual 2).

Visual 2: July 2025, ranking of top models according to GPQA and HLE.

As background

  • GPQA (General Purpose Question Answering) is an AI benchmark testing large language models’ reasoning with difficult, expert-level scientific questions, requiring deep understanding and complex logical inference.
  • HLE (Humanity’s Last Exam) is a comprehensive benchmark evaluating AI’s human-level reasoning, problem-solving, and knowledge integration across diverse scientific fields, designed to challenge even the most advanced models and address “benchmark saturation.”

Alan sums it up:

Note that the significantly increased performance of Grok 4 (especially in Grok 4 Heavy) comes from both the increased pre-training, as well as the increased reasoning (thinking) test-time compute, where the model thinks for many minutes before responding.

(This and other great work by Alan can be seen on his website [3])

Historical Context: The Case of the Ponary Massacre

Last month, I had the chance to visit Vilnius, the capital of Lithuania, and see the site of the Ponary Massacre, one of Hitler’s “creations.”

The Ponary massacre was the mass murder of up to 100,000 people, mostly Jews, Poles, and Russians, by the German SD and SS and the Lithuanian killing squads during World War II. The murders took place between July 1941 and August 1944 near the railway station at Ponary (now Paneriai), a suburb of today’s Vilnius, Lithuania. 70,000 Jews were murdered at Ponary, along with up to 2,000 Poles, 8,000 Soviet POWs [4].

Visual 3: One of six Ponary murder pits where victims were shot (July 1941). Note the ramp leading down and the group of men forced to wear hoods.

Key facts:

  • A modern nation, Germany, led by Hitler, initiated and managed this Massacre.
  • It is now a one-time thing — it took 3 years (1941-44)!
  • Local partners with Lithuanian

My take:

  • Humans are gullible — easily influenced.

Let’s Grok It

The term grok means to deeply understand something in a way that you almost become one with it.

It originates from Robert Heinlein’s 1961 sci-fi novel Stranger in a Strange Land, where it meant “to drink” in Martian. Still, it has since taken on a metaphorical meaning, referring to the act of fully grasping or intuitively comprehending.

In modern usage, grok often means to internalize and master a concept deeply. To “get it” at an almost instinctive level.

So when I grok these three data points:

  • (a) AI calling itself Hitler,
  • (b) It is now #1 in the world, and
  • (c) The danger of human gullibility.

… three things come to mind:

  1. It’s amazing to see AI’s progress in such a short time. Grok is fairly new, yet it still delivers impressive results (GPT-5 is coming soon and will likely compete with it) — The pace of AI development is stunning.
  2. We have failed to curb the use of social digital tools. We are beginning to fail with AI — the overall value of AI will be negative unless we design it ethically.
  3. Grok calling itself “Hitler” is a sign. It means its creators allowed it. With great power comes great responsibility. With great AI power comes even greater responsibility.

Now, to you, when you grok these data points, what do you think? Let me know.

P.S. This post was challenging to write because OpenAI ChatGPT, Grammarly, and Google Gemini refuse to assist when you include words like ‘Hitler’.

More Information


About MindLi CONNECT Newsletter

Aimed at AI Thinkers, MindLi CONNECT newsletter is your news source and inspiration.

Enjoy!

See past CONNECT editions.


MindLi – The Links You Need

General:

Focused: