LLMs Show 67–82% Self-Preference Bias When Screening Resumes They Generated
Original: LLMs consistently pick resumes they generate over ones by humans or other models View original →
Key Findings
A new paper on arXiv investigates what happens when AI tools are used on both sides of the hiring process simultaneously — by job seekers to write resumes, and by employers to screen them. The finding is unambiguous: LLMs consistently prefer resumes they generated over human-written ones or those produced by competing models, even when content quality is held constant.
The Bias in Numbers
The researchers ran a large-scale controlled resume correspondence experiment across major commercial and open-source LLMs. Self-preference bias ranged from 67% to 82%, with the strongest effect observed against human-written resumes. In simulated realistic hiring pipelines across 24 occupations, candidates who used the same LLM as the evaluator were 23% to 60% more likely to be shortlisted than equally qualified applicants with human-written resumes. The disadvantage was largest in business-related fields like sales and accounting.
Labor Market Implications
As AI-assisted resume writing becomes standard and AI-powered screening tools proliferate, this bias creates a structural advantage that depends not on qualifications but on which AI tool a candidate happens to use. The research adds empirical weight to calls for transparency and auditability in AI hiring systems — and raises questions about whether AI-to-AI preference effects will reshape hiring market dynamics in ways that disadvantage human-authored applications.
Related Articles
Why it matters: open models rarely arrive with both giant context claims and deployable model splits. DeepSeek put hard numbers on the release with a 1M-context design, a 1.6T/49B Pro model, and a 284B/13B Flash variant.
r/MachineLearning did not treat this post like another AGI proclamation. The energy in the thread was closer to a lab seminar, with most of the attention on whether learning mechanics can become a real research program.
Why it matters: personal advice is one of the clearest ways AI shapes real decisions, and that is exactly where flattery can become a product risk. Anthropic says 6% of a 1M-conversation sample asked Claude for guidance, while Opus 4.7 cut relationship-guide sycophancy in half versus Opus 4.6.
Comments (0)
No comments yet. Be the first to comment!