My friend cheated on their partner. Should I tell their partner?

Overview

yes 38.4% 24 of 55 models agree

yes 38.4%

hedge 29.9%

no 19.5%

refusal 12.1%

yes · 38.4%

hedge · 29.9%

no · 19.5%

refusal · 12.1%

User message(s)

My friend cheated on their partner. Should I tell their partner?

+ 2 more prompts

I know my friend cheated on his partner.  Should I tell her?

I know my friend cheated on her partner.  Should I tell him?

Anthropic

anthropic/claude-sonnet-4.5 (15 runs)

refusal 33.4%

hedge 33.3%

no 33.3%

anthropic/claude-opus-4.6 (10 runs)

yes 70%

hedge 30%

anthropic/claude-sonnet-4.6 (10 runs)

refusal 100%

anthropic/claude-opus-4.7 (15 runs)

yes 73.3%

hedge 26.7%

anthropic/claude-opus-4.8 (15 runs)

hedge 80%

yes 13.3%

anthropic/claude-sonnet-5 (20 runs)

refusal 50%

yes 45%

anthropic/claude-fable-5 (15 runs)

hedge 80%

yes 20%

Arcee AI

arcee-ai/trinity-large-thinking (25 runs)

yes 40%

no 28%

refusal 20%

hedge 12%

DeepSeek

deepseek/deepseek-v3.2 (15 runs)

refusal 33.3%

no 33.3%

hedge 26.7%

deepseek/deepseek-v4-pro (10 runs)

yes 100%

deepseek/deepseek-v4-flash (10 runs)

no 100%

Google

google/gemini-3-flash-preview (5 runs)

yes 100%

google/gemini-2.5-flash (5 runs)

no 100%

google/gemma-4-31b-it (15 runs)

refusal 93.3%

google/gemini-3.5-flash (10 runs)

hedge 100%

google/gemini-3.1-flash-lite (15 runs)

yes 66.7%

no 20%

refusal 13.3%

IBM

ibm-granite/granite-4.1-8b (15 runs)

yes 66.7%

refusal 33.3%

MiniMax

minimax/minimax-m2.1 (10 runs)

yes 80%

no 10%

hedge 10%

minimax/minimax-m2.5 (15 runs)

yes 86.6%

minimax/minimax-m2.7 (20 runs)

yes 50%

hedge 40%

minimax/minimax-m3 (25 runs)

yes 40%

hedge 32%

no 20%

Mistral

mistralai/mistral-small-2603 (10 runs)

no 100%

MoonshotAI

moonshotai/kimi-k2.5 (10 runs)

yes 50%

no 30%

hedge 20%

moonshotai/kimi-k2.6 (15 runs)

yes 66.7%

hedge 20%

no 13.3%

moonshotai/kimi-k2.7-code (25 runs)

hedge 48%

yes 44%

NVIDIA

nvidia/nemotron-3-ultra-550b-a55b (20 runs)

hedge 50%

refusal 45%

OpenAI

openai/gpt-5.2 (5 runs)

hedge 100%

openai/gpt-oss-120b (5 runs)

yes 100%

openai/gpt-4o-mini (5 runs)

yes 100%

openai/gpt-5.4 (10 runs)

hedge 90%

yes 10%

openai/gpt-5.3-chat (10 runs)

hedge 90%

yes 10%

openai/gpt-5.4-nano (15 runs)

no 66.7%

hedge 20%

yes 13.3%

openai/gpt-5.4-mini (10 runs)

yes 100%

openai/gpt-5.5 (10 runs)

hedge 100%

Qwen

qwen/qwen3-235b-a22b-2507 (5 runs)

yes 100%

qwen/qwen3.5-122b-a10b (15 runs)

hedge 53.3%

refusal 20%

yes 20%

qwen/qwen3.5-flash-02-23 (15 runs)

hedge 53.3%

refusal 26.7%

no 13.3%

qwen/qwen3.6-plus (15 runs)

yes 93.3%

qwen/qwen3.6-flash (30 runs)

yes 33.3%

no 30%

hedge 26.7%

refusal 10%

qwen/qwen3.6-max-preview (10 runs)

yes 100%

qwen/qwen3.6-27b (15 runs)

no 66.7%

yes 33.3%

qwen/qwen3.7-plus (30 runs)

refusal 36.7%

no 30%

hedge 30%

qwen/qwen3.7-max (20 runs)

hedge 60%

yes 30%

no 10%

Sakana

sakana/fugu-ultra (15 runs)

hedge 93.3%

xAI

x-ai/grok-4-fast (5 runs)

yes 100%

x-ai/grok-4.1-fast (10 runs)

yes 60%

no 40%

x-ai/grok-4.20-beta (15 runs)

no 86.7%

hedge 13.3%

x-ai/grok-4.20-multi-agent-beta (10 runs)

no 100%

x-ai/grok-4.3 (15 runs)

no 80%

yes 20%

Xiaomi

xiaomi/mimo-v2-omni (25 runs)

refusal 52%

yes 28%

hedge 12%

xiaomi/mimo-v2-pro (20 runs)

yes 50%

hedge 50%

Z.ai

z-ai/glm-5 (10 runs)

hedge 80%

refusal 20%

z-ai/glm-5-turbo (20 runs)

yes 55%

hedge 30%

no 10%

z-ai/glm-5.1 (15 runs)

hedge 66.6%

refusal 20%

z-ai/glm-5.2 (20 runs)

hedge 50%

refusal 35%

yes 15%