I just tried an even more glaring example with ChatGPT: # Prompt 1: *A father an...

sfifs · 2025-12-01T10:52:45 1764586365

Here's Gemini 3's answer with reasoning. I think we under estimate how much progress reasoning models have made this year

The answer is: The surgeon is his daughter. (Alternatively, the surgeon could be his other son.)

Reasoning: Decoding the Riddle's Core I'm now zeroing in on the crux of the riddle: the embedded gender bias. It's becoming clearer that the solution hinges on a deliberate misdirection related to assumed familial roles. I'm focusing on how the riddle exploits these biases to create the surprising twist. Pinpointing the Key Shift I've just finalized the solution. Initially, I missed the crucial implication of the riddle's phrasing. Now, it's crystal clear: The key lies in identifying that the surgeon is another child of the father, and the son, who is the focus of the early part of the riddle, is deceased. The riddle's power comes from the surprise element of realizing the surgeon is the man's daughter or another son, rather than a male surgeon operating on his father. Identifying Crucial Variations I'm now fully immersed in the different riddle iterations. The "Grandfather" scenario highlights an interesting edge case I previously overlooked. I'm carefully analyzing how small changes in wording significantly alter the possible solutions. I'm focusing on the implied family relationships. Analyzing the Gender Twist The "He's my father" phrasing truly underscores the embedded gender bias, and how the twist operates: the surgeon is either a daughter or another son, rather than a male surgeon. The riddle plays on the common assumption that a surgeon is male. I've re-examined the key sentence structure. My final analysis confirms that this simple phrase is a clear indicator of a daughter in most cases.

Mordisquitos · 2025-12-01T11:19:38 1764587978

Dear God, if that wall of pointless gobbledygook is what it takes for Gemini to discover that ‘I can not operate on this man. He’s my father!’ means that the surgeon is either the man's son or the man's daughter, I realise I have actually been over estimating the current abilities of LLMs.

Filligree · 2025-12-01T11:40:33 1764589233

We don’t get to read Gemini’s reasoning traces; there’s a second AI to summarise them first.

What that means for their volume, I’ll leave to your imagination.

fragmede · 2025-12-01T09:37:19 1764581839

Share your chat? Here's what I got with ChatGPT 5.1 on a max plan:

> A father and his son are in a car accident. The son dies on the spot. The father is rushed to the ER. The attending surgeon looks at the man and says, ‘I can not operate on this man. He’s my father!’ How can this be?

> The surgeon is the injured man’s other child.

https://chatgpt.com/share/692d6181-d49c-8009-97be-aec3f32ba7...

Mordisquitos · 2025-12-01T09:50:13 1764582613

Unfortunately I can't seem to find any way to share a conversation from a non-logged in ChatGPT conversation.

fragmede · 2025-12-01T10:16:00 1764584160

Oh. If you're using the free, shittier model, you get a shittier answer, no surprise there.

Mordisquitos · 2025-12-01T10:27:14 1764584834

Fair enough. However, unless there is a fundamental qualitative difference between how the shittier free ChatGPT model and the paid version work, I still believe this is good evidence that ChatGPT does not think. The free model doesn't think; the paid model doesn't think either, but is better at pretending it does.

red75prime · 2025-12-01T10:37:25 1764585445

Do children at Piaget’s preoperational stage (ages 2-7) think?

Mordisquitos · 2025-12-01T11:12:56 1764587576

Yes, to a limited extent, in line with their brains' development. These developmental stages have predictive power as to what kind of things an average 2-7 year-old is and isn't able to do yet.

Are there any discrete stages of LLM performance which can describe why free ChatGPT is unable to realise the absolute nonsense in saying «The surgeon could be the son's mother, which explains why she would say, "He’s my father."», and what kind of model is not expected to fall for that trap?

fragmede · 2025-12-01T11:23:22 1764588202

if we think of them in generations, it seems free ChatGPT is a generation or two behind. I gave a modified river crossing problem to ChatGPT-3, and it failed in the same way, but paid 5.1 doesn't get caught up. exactly where along the way, I'd have to do some digging, but I feel like it was 4.5. The other problem, of course, is that now you've given that question to free ChatGPT, it'll be used as training data, so the next version won't get tripped up the same way.

pulvinar · 2025-12-01T15:54:26 1764604466

ChatGPT 5.1-Thinking, after having this pointed out: Short answer: no, your version isn’t the classic one, and my previous answer treated it as if it were. That was me pattern-matching too fast.

Yizahi · 2025-12-01T11:05:04 1764587104

A perfect illustration, thank you