Multi-VALUE: A Framework for Cross-Dialectal English NLP

How can I use value-nlp?

Researchers can use this to accomplish the following:

📐 Benchmarking: NLP researchers can more comprehensively evaluate task performance across domain shifts.

⚖️ Bias and Fairness: Fairness, accountability and transparency researchers can more directly examine the ways NLP systems systematically harm disadvantaged or protected groups.

🌏 Linguistic Typology: Computational linguists can systematically understand the internal representations of large language models according to the literature on theoretical and field linguistics.

🌱 Low-resource NLP: Practitioners can adapt models to dialects which have limited labeled data by building on rich knowledge from dialectology.

Below is the unfolding value-nlp research story.

Perceptions of Language Technology Failures from South Asian English Speakers

arXiv Code Datasets

This user-centric NLP study shows qualitatively that language technologies are not dialect invariant. They often fail for speakers of non-standard English varieties, misunderstanding both individual words and the syntax of phrases. Compared with speakers of Standard American English (SAE), speakers of non-standard varieties report more failures with written technologies.

Multi-VALUE: A Framework for Cross-Dialectal English NLP

arXiv Code Datasets

These value-nlp stress test experiments show quantitatively that many text-based NLP systems are not dialect invariant, with notable performance drops for widely-spoken varieties like Indian English and Colloquial Singapore English. This work also shows how to address disparities with data augmentation.

DADA: Dialect Adaptation via Dynamic Aggregation of Linguistic Rules

arXiv Code

DADA is an efficient fine-tuning method for achieving dialect invariance. This new methodology works by composing adapters which handle specific linguistic features.

Task-Agnostic Low-Rank Adapters for Unseen English Dialects

arXiv Code

HyperLoRA is a parameter-efficient method for building dialect invariant NLP systems in a task-agnostic manner. HyperLoRA relies on hypernetworks to disentangle dialect-specific and cross-dialectal information.

How can I set up value-nlp?

Use the following demo to start using Multi-VALUE.

Installation

        pip install value-nlp

Demo

from multivalue import Dialects

southern_usa = Dialects.SoutheastAmericanEnclaveDialect()

print(southern_usa)

[VECTOR DIALECT] Southeast American enclave dialects (abbr: SEAmE)
      Region: United States of America
      Latitude: 34.2
      Longitude: -80.9

southern_usa.transform("I talked with them yesterday")

'I done talked with them yesterday'

southern_usa.executed_rules

{(2, 7): {'value': 'done talked','type': 'completive_done'}}

Acknowledgements

This is a collaborative effort across Stanford University, Georgia Tech, Harvard University, and Amazon AI.

value-nlp: A toolkit for Cross-Dialectal NLP

What is value-nlp?

How can I use value-nlp?

Perceptions of Language Technology Failures from South Asian English Speakers

Multi-VALUE: A Framework for Cross-Dialectal English NLP

DADA: Dialect Adaptation via Dynamic Aggregation of Linguistic Rules

Task-Agnostic Low-Rank Adapters for Unseen English Dialects

How can I set up value-nlp?

Installation

Demo

Acknowledgements