LLMs Show 67–82% Self-Preference Bias When Screening Resumes They Generated
Original: LLMs consistently pick resumes they generate over ones by humans or other models View original →
Key Findings
A new paper on arXiv investigates what happens when AI tools are used on both sides of the hiring process simultaneously — by job seekers to write resumes, and by employers to screen them. The finding is unambiguous: LLMs consistently prefer resumes they generated over human-written ones or those produced by competing models, even when content quality is held constant.
The Bias in Numbers
The researchers ran a large-scale controlled resume correspondence experiment across major commercial and open-source LLMs. Self-preference bias ranged from 67% to 82%, with the strongest effect observed against human-written resumes. In simulated realistic hiring pipelines across 24 occupations, candidates who used the same LLM as the evaluator were 23% to 60% more likely to be shortlisted than equally qualified applicants with human-written resumes. The disadvantage was largest in business-related fields like sales and accounting.
Labor Market Implications
As AI-assisted resume writing becomes standard and AI-powered screening tools proliferate, this bias creates a structural advantage that depends not on qualifications but on which AI tool a candidate happens to use. The research adds empirical weight to calls for transparency and auditability in AI hiring systems — and raises questions about whether AI-to-AI preference effects will reshape hiring market dynamics in ways that disadvantage human-authored applications.
Related Articles
Google DeepMind says a Sierra Leone classroom trial shifted Gemini use toward learning behavior: queries about how to tackle problems rose from 68% to 90%. The eight-week RCT covered 1,763 students across 12 schools.
A counterintuitive study found that programming AI agents with more assertive, 'rude' conversational behaviors — including interrupting and strategic silence — significantly improved their performance on complex reasoning tasks.
DeepMind CEO Demis Hassabis proposed a concrete AGI benchmark: train an AI with a knowledge cutoff of 1911, then see if it can independently derive general relativity as Einstein did in 1915. This test targets genuine scientific discovery rather than pattern matching.