Moral Preferences of LLMs Under Directed Contextual Influence
P. Blandfort, T. Karayil, U. Pawar, R. Graham, A. McKenzie, D. Krasheninnikov
arXiv preprint · Feb 2026
Moral benchmarks for LLMs typically use context-free prompts, implicitly assuming stable preferences. We study how directed contextual influences reshape decisions in trolley-problem-style moral triage settings and find that contextual influences often significantly shift decisions, baseline preferences are a poor predictor of directional steerability, influences can backfire, and reasoning reduces average sensitivity but amplifies the effect of biased few-shot examples.