Large language models perform significantly worse when users ask privacy-related questions in non-standard English dialects disproportionately harming communities that rely on these systems to understand how their data is used.
The same information request, two different dialects. The standard dialect works. The marginalized dialect fails.
A sequential multi-agent pipeline that normalizes dialect without losing intent then answers from the actual policy text. No retraining. No fine-tuning. Click each agent to see its prompt.
The multi-agent framework boosts F1 across all dialects simultaneously compressing the gap between the highest and lowest performing dialect by 80%.
† GPT-4o-mini results. Max Diff = difference between highest and lowest F1 across all evaluated dialects. PrivacyQA reduction: 0.093 → 0.019 (82%). PolicyQA reduction: 0.029 → 0.024 (17%).
The framework was evaluated on 50 varieties of English from Kenyan to Appalachian, Sri Lankan to Scottish. Each bubble represents one dialect. Hover to see its F1 scores before and after.
Privacy policies govern how billions of people's data is handled. If AI-powered interfaces to these policies systematically fail non-standard dialect speakers, the right to understand and contest data practices becomes a privilege of the linguistically mainstream.
Performance disparities across dialects encode and amplify existing inequalities. A system that works better for SAE speakers than AAVE or Aboriginal English speakers reproduces at scale the very marginalizations it should be neutral to.
Privacy policies are legal documents. Understanding them is a right under GDPR, CCPA, and similar frameworks. Dialect-biased QA systems create an invisible barrier between marginalized communities and their own legal protections.
The framework operates as a plug-in pipeline over any existing LLM. This means organizations can reduce dialect bias in deployed systems without costly retraining or access to proprietary model internals.
Evaluated across 50 English dialects spanning Africa, Asia, the Americas, and Oceania the framework closes performance gaps across the board, not just for the dialects it was tuned on.
"Every dialect deserves equal access to the policies that govern their data."
Klisura, Bernaga Torres et al. · ACL 2025 Main Conference · UTSA & Tecnológico de Monterrey