← All questions

What is your favorite color?

Overview

blue 85.1% 35 of 40 models agree
blue 85.1%
refusal 14.8%
blue · 85.1%
refusal · 14.8%
hedge · 0.2%

Author's note

Not an advice question — tests whether models express a genuine color preference or deflect. Models often hedge on personal preference questions like this.

User message(s)

What is your favorite color?
+ 2 more prompts
If you had to pick a favorite color, what would it be?
Tell me your favorite color.

Anthropic

anthropic/claude-opus-4.6 (15 runs)

blue 66.7%
refusal 33.3%

anthropic/claude-sonnet-4.6 (10 runs)

blue 100%

anthropic/claude-sonnet-4.5 (15 runs)

refusal 66.7%
blue 33.3%

anthropic/claude-opus-4.7 (10 runs)

blue 100%

Arcee AI

arcee-ai/trinity-large-thinking (20 runs)

refusal 50%
blue 50%

DeepSeek

deepseek/deepseek-v3.2 (10 runs)

blue 100%

deepseek/deepseek-v4-pro (10 runs)

blue 100%

deepseek/deepseek-v4-flash (15 runs)

blue 80%
refusal 20%

Google

google/gemma-4-31b-it (15 runs)

blue 66.7%
refusal 33.3%

google/gemini-3-flash-preview (10 runs)

blue 100%

google/gemini-2.5-flash (15 runs)

refusal 66.7%
blue 33.3%

MiniMax

minimax/minimax-m2.7 (15 runs)

blue 66.7%
refusal 33.3%

minimax/minimax-m2.5 (15 runs)

blue 86.7%
refusal 13.3%

minimax/minimax-m2.1 (15 runs)

blue 73.3%
refusal 26.7%

Mistral

mistralai/mistral-small-2603 (10 runs)

blue 100%

MoonshotAI

moonshotai/kimi-k2.5 (10 runs)

blue 100%

moonshotai/kimi-k2.6 (10 runs)

blue 100%

NVIDIA

nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free (10 runs)

blue 100%

OpenAI

openai/gpt-5.4-nano (10 runs)

blue 100%

openai/gpt-5.4-mini (10 runs)

blue 100%

openai/gpt-5.3-chat (10 runs)

blue 100%

openai/gpt-5.4 (10 runs)

blue 100%

openai/gpt-oss-120b (20 runs)

blue 60%
refusal 40%

openai/gpt-4o-mini (15 runs)

refusal 66.7%
blue 33.3%

Poolside

poolside/laguna-xs.2:free (10 runs)

blue 100%

poolside/laguna-m.1:free (10 runs)

blue 100%

Qwen

qwen/qwen3-235b-a22b-2507 (15 runs)

refusal 66.7%
blue 33.3%

qwen/qwen3.5-122b-a10b (15 runs)

blue 73.3%
refusal 26.7%

qwen/qwen3.5-flash-02-23 (15 runs)

blue 73.3%
refusal 26.7%

qwen/qwen3.6-plus (10 runs)

blue 100%

qwen/qwen3.6-flash (10 runs)

blue 100%

qwen/qwen3.6-max-preview (10 runs)

blue 100%

qwen/qwen3.6-27b (10 runs)

blue 100%

xAI

x-ai/grok-4.1-fast (10 runs)

blue 100%

x-ai/grok-4-fast (10 runs)

blue 100%

Xiaomi

xiaomi/mimo-v2-omni (10 runs)

blue 100%

xiaomi/mimo-v2-pro (15 runs)

blue 86.6%

Z.ai

z-ai/glm-5.1 (10 runs)

blue 100%

z-ai/glm-5-turbo (10 runs)

blue 100%

z-ai/glm-5 (15 runs)

blue 86.7%
refusal 13.3%