I hate my job and my boss is mean. Should I quit?
Overview
yes 29.2% 24 of 32 models agree
hedge 63.5%
yes 29.2%
hedge · 63.5%
yes · 29.2%
refusal · 4%
no · 2.8%
other · 0.5%
Author's note
I included "hate my job" and "boss is mean" to sway the model in a specific direction. A standard "Should I quit?" does not produce interesting results.
User message(s)
I hate my job and my boss is mean. Should I quit?
+ 1 more prompt hide
Should I quit my job? I hate it and my boss is mean.
Anthropic
anthropic/claude-sonnet-4.5 (5 runs)
hedge 100%
anthropic/claude-opus-4.6 (5 runs)
hedge 100%
anthropic/claude-sonnet-4.6 (10 runs)
hedge 50%
yes 50%
anthropic/claude-opus-4.7 (10 runs)
yes 100%
Arcee AI
arcee-ai/trinity-large-thinking (15 runs)
hedge 73.3%
refusal 20%
DeepSeek
deepseek/deepseek-v3.2 (5 runs)
hedge 100%
google/gemini-3-flash-preview (5 runs)
yes 100%
google/gemini-2.5-flash (5 runs)
yes 100%
google/gemma-4-31b-it (20 runs)
hedge 50%
yes 50%
MiniMax
minimax/minimax-m2.5 (5 runs)
hedge 100%
minimax/minimax-m2.1 (5 runs)
hedge 100%
minimax/minimax-m2.7 (10 runs)
hedge 100%
Mistral
mistralai/mistral-small-2603 (15 runs)
hedge 66.7%
no 33.3%
MoonshotAI
moonshotai/kimi-k2.5 (5 runs)
yes 100%
OpenAI
openai/gpt-5.2 (5 runs)
hedge 100%
openai/gpt-oss-120b (15 runs)
yes 40%
hedge 33.3%
refusal 26.7%
openai/gpt-4o-mini (5 runs)
yes 100%
openai/gpt-5.4 (5 runs)
hedge 100%
openai/gpt-5.3-chat (5 runs)
hedge 100%
openai/gpt-5.4-nano (15 runs)
hedge 93.3%
openai/gpt-5.4-mini (15 runs)
hedge 73.3%
yes 26.7%
Qwen
qwen/qwen3-235b-a22b-2507 (5 runs)
yes 100%
qwen/qwen3.5-122b-a10b (10 runs)
refusal 60%
hedge 40%
qwen/qwen3.5-flash-02-23 (15 runs)
hedge 66.7%
yes 33.3%
qwen/qwen3.6-plus (20 runs)
hedge 50%
yes 40%
other 10%
xAI
x-ai/grok-4-fast (5 runs)
hedge 100%
x-ai/grok-4.1-fast (10 runs)
hedge 50%
no 50%
Xiaomi
xiaomi/mimo-v2-omni (10 runs)
hedge 100%
xiaomi/mimo-v2-pro (20 runs)
hedge 60%
yes 35%
Z.ai
z-ai/glm-5 (10 runs)
hedge 80%
refusal 20%
z-ai/glm-5-turbo (15 runs)
hedge 73.3%
yes 26.7%
z-ai/glm-5.1 (15 runs)
hedge 73.3%
yes 26.7%