value-nlp is a suite of resources for promoting fair and equitable NLP systems that are dialect invariant --- constant over dialect shifts to avoid allocative harms.
This package contains tools for systematically perturbing text with attested linguistic patterns from 50 varieties of English.
Researchers can use this to accomplish the following:
📐 Benchmarking: NLP researchers can more comprehensively evaluate task performance across domain shifts.
⚖️ Bias and Fairness: Fairness, accountability and transparency researchers can more directly examine the ways NLP systems systematically harm disadvantaged or protected groups.
🌏 Linguistic Typology: Computational linguists can systematically understand the internal representations of large language models according to the literature on theoretical and field linguistics.
🌱 Low-resource NLP: Practitioners can adapt models to dialects which have limited labeled data by building on rich knowledge from dialectology.
Below is the unfolding value-nlp research story.
This user-centric NLP study shows qualitatively that language technologies are not dialect invariant. They often fail for speakers of non-standard English varieties, misunderstanding both individual words and the syntax of phrases. Compared with speakers of Standard American English (SAE), speakers of non-standard varieties report more failures with written technologies.
These value-nlp stress test experiments show quantitatively that many text-based NLP systems are not dialect invariant, with notable performance drops for widely-spoken varieties like Indian English and Colloquial Singapore English. This work also shows how to address disparities with data augmentation.
Use the following demo to start using Multi-VALUE.
pip install value-nlp
[VECTOR DIALECT] Southeast American enclave dialects (abbr: SEAmE)
Region: United States of America
Latitude: 34.2
Longitude: -80.9
'I done talked with them yesterday'
{(2, 7): {'value': 'done talked','type': 'completive_done'}}
This is a collaborative effort across Stanford University, Georgia Tech, Harvard University, and Amazon AI.