nlp.nd.edu Open in urlscan Pro
129.74.246.3  Public Scan

Submitted URL: http://nlp.nd.edu/
Effective URL: https://nlp.nd.edu/
Submission: On October 30 via api from US — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

University of Notre Dame

Natural Language Processing Group

Natural language processing (NLP) aims to enable computers to use human
languages – so that people can, for example, interact with computers naturally;
or communicate with people who don't speak a common language; or access speech
or text data at scales not otherwise possible. The NLP group at Notre Dame is
interested in all aspects of NLP, with a focus on machine translation and
connections with formal language theory.

The NLP group co-sponsors NL+, the Natural Language Processing Lunch Seminar.


CURRENT MEMBERS

 * David Chiang
   Associate professor
   translation, syntax, formal language theory, programming languages
 * Darcey Riley
   PhD student
   generation from language models, probabilistic models
 * Stephen Bothwell
   PhD student
   NLP for classical languages, computational historical linguistics
 * Ken Sible
   PhD student
   retrieval-augmented translation
 * Aarohi Srivastava
   PhD student
   dialects, noisy text, zero-shot transfer
 * Chihiro Taguchi
   PhD student
   computational linguistics, language documentation, syntax
 * Andy Yang
   PhD student
   neural network expressivity, formal logic


FORMER MEMBERS

 * Brian DuSell (PhD 2023 → ETH Zürich)
 * Colin McDonald (BA 2023 → CMU)
 * Patrick Soga (BS 2022 → UVA)
 * Xing Jie Zhong (MS 2021 → Google)
 * Toan Q. Nguyen (PhD 2021 → Amazon)
 * Justin DeBenedetto (PhD 2021 → asst. prof. Villanova)
 * Chan Hee (Luke) Song (BS 2020 → OSU)
 * Kenton Murray (PhD 2020 → JHU)
 * Antonios Anastasopoulos (PhD 2019 → postdoc CMU → asst. prof GMU)
 * Arturo Argueta (PhD 2019 → Apple)
 * Tomer Levinboim (PhD 2017 → Google)
 * Xiang Zhou (summer intern 2017 → UNC)
 * Cindy Xinyi Wang (BS 2017 → PhD CMU → Google)
 * Ashish Vaswani (PhD 2014 at USC → USC ISI → Google Brain → Adept)


PROJECTS

Expressivity of neural sequence models Relating neural sequence models to
automata, grammars, circuits, and logics. Collaboration with Peter Cholak and
Anand Pillay.
Retrieval-augmented neural machine translation Augmenting neural machine
translation systems by retrieving and using data beyond parallel text.
Collaboration with Meng Jiang. Sponsored by NSF.
Natural language (variety) processing Collaboration with Antonis Anastaspoulos
(GMU) and Yulia Tsvetkov (UW). Sponsored by NSF.
Language documentation with an AI helper Collaboration with Antonis
Anatasopoulos and Geraldine Walther (GMU). Sponsored by NSF.
Differentiable, probabilistic programming with recursive structured models
Collaboration with Chung-chieh Shan (IU). Sponsored by NSF.
NLP on medieval texts Analysis of Latin texts and language modeling for OCR of
Latin manuscsripts. Collaborations with Walter Scheirer and Hildegund Müller.
Sponsored by Notre Dame FRSP.


RECENT PUBLICATIONS

Dana Angluin, David Chiang, and Andy Yang. Masked hard-attention transformers
and Boolean RASP recognize exactly the star-free languages. 2023.
arXiv:2310.13897. PDF BibTeX

@misc{angluin+:2023,
    author = "Angluin, Dana and Chiang, David and Yang, Andy",
    title = "Masked Hard-Attention Transformers and {B}oolean {RASP} Recognize Exactly the Star-Free Languages",
    year = "2023",
    note = "{arXiv}:2310.13897"
}

Stephen Bothwell, Justin DeBenedetto, Theresa Crnkovich, Hildegund Müller, and
David Chiang. Introducing rhetorical parallelism detection: a new task with
datasets, metrics, and baselines. In Proc. EMNLP. 2023. To appear. BibTeX

@inproceedings{bothwell+:2023,
    author = {Bothwell, Stephen and DeBenedetto, Justin and Crnkovich, Theresa and M{\"u}ller, Hildegund and Chiang, David},
    title = "Introducing Rhetorical Parallelism Detection: A New Task with Datasets, Metrics, and Baselines",
    booktitle = "Proc. EMNLP",
    year = "2023",
    note = "To appear"
}

Alexandra Butoi, Tim Vieira, Ryan Cotterell, and David Chiang. Efficient
algorithms for recognizing weighted tree-adjoining languages. In Proc. EMNLP.
2023. To appear. PDF BibTeX

@inproceedings{butoi+:2023efficient,
    author = "Butoi, Alexandra and Vieira, Tim and Cotterell, Ryan and Chiang, David",
    title = "Efficient Algorithms for Recognizing Weighted Tree-Adjoining Languages",
    booktitle = "Proc. EMNLP",
    year = "2023",
    note = "To appear"
}

Aarohi Srivastava and David Chiang. BERTwich: extending BERT's capabilities to
model dialectal and noisy text. In Findings of ACL: EMNLP. 2023. To appear.
BibTeX

@inproceedings{srivastava+chiang:2023,
    author = "Srivastava, Aarohi and Chiang, David",
    title = "{BERTwich}: Extending {BERT}'s Capabilities to Model Dialectal and Noisy Text",
    booktitle = "Findings of ACL: EMNLP",
    year = "2023",
    note = "To appear"
}

Brian DuSell and David Chiang. Stack attention: improving the ability of
transformers to model hierarchical patterns. 2023. arXiv:2310.01749. PDF BibTeX

@misc{dusell+chiang:2023attention,
    author = "DuSell, Brian and Chiang, David",
    title = "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns",
    year = "2023",
    note = "{arXiv}:2310.01749"
}

Chihiro Taguchi, Yusuke Sakai, Parisa Haghani, and David Chiang. Universal
automatic phonetic transcription into the International Phonetic Alphabet. In
Proc. INTERSPEECH. 2023. doi:10.21437/Interspeech.2023-2584. PDF BibTeX

@inproceedings{taguchi+:2023,
    author = "Taguchi, Chihiro and Sakai, Yusuke and Haghani, Parisa and Chiang, David",
    title = "Universal Automatic Phonetic Transcription into the {I}nternational {P}honetic {A}lphabet",
    booktitle = "Proc. INTERSPEECH",
    year = "2023",
    doi = "10.21437/Interspeech.2023-2584"
}

Alexandra Butoi, Ryan Cotterell, and David Chiang. Convergence and diversity in
the control hierarchy. In Proc. ACL. 2023. PDF BibTeX

@inproceedings{butoi+:2023convergence,
    author = "Butoi, Alexandra and Cotterell, Ryan and Chiang, David",
    title = "Convergence and Diversity in the Control Hierarchy",
    booktitle = "Proc. ACL",
    year = "2023"
}

David Chiang, Peter Cholak, and Anand Pillay. Tighter bounds on the expressivity
of transformer encoders. In Proc. ICML, 5544–5562. 2023. PDF BibTeX

@inproceedings{chiang+cholak+pillay:2023,
    author = "Chiang, David and Cholak, Peter and Pillay, Anand",
    title = "Tighter Bounds on the Expressivity of Transformer Encoders",
    booktitle = "Proc. ICML",
    year = "2023",
    pages = "5544--5562"
}

Aarohi Srivastava and David Chiang. Fine-tuning BERT with character-level noise
for zero-shot transfer to dialects and closely-related languages. In Proc.
Workshop on NLP for Similar Languages, Varieties and Dialects. 2023. PDF BibTeX

@inproceedings{srivastava+chiang:2023fine,
    author = "Srivastava, Aarohi and Chiang, David",
    title = "Fine-Tuning {BERT} with Character-Level Noise for Zero-Shot Transfer to Dialects and Closely-Related Languages",
    year = "2023",
    booktitle = "Proc. Workshop on NLP for Similar Languages, Varieties and Dialects"
}

Patrick Soga and David Chiang. Bridging graph position encodings for
transformers with weighted graph-walking automata. Transactions on Machine
Learning Research, 2023. PDF BibTeX

@article{soga+chiang:2023,
    author = "Soga, Patrick and Chiang, David",
    title = "Bridging Graph Position Encodings for Transformers with Weighted Graph-Walking Automata",
    year = "2023",
    journal = "Transactions on Machine Learning Research"
}

Brian DuSell and David Chiang. The surprising computational power of
nondeterministic stack RNNs. In Proc. ICLR. 2023. PDF BibTeX

@inproceedings{dusell+chiang:2023surprising,
    author = "DuSell, Brian and Chiang, David",
    title = "The Surprising Computational Power of Nondeterministic Stack {RNN}s",
    booktitle = "Proc. ICLR",
    year = "2023"
}

David Chiang, Colin McDonald, and Chung-chieh Shan. Exact recursive
probabilistic programming. PACMPL, 2023. doi:10.1145/3586050. PDF BibTeX

@article{chiang+mcdonald+shan:2023,
    author = "Chiang, David and McDonald, Colin and Shan, Chung-chieh",
    title = "Exact Recursive Probabilistic Programming",
    journal = "PACMPL",
    volume = "7",
    number = "OOPSLA1",
    article = "98",
    xmonth = "April",
    year = "2023",
    doi = "10.1145/3586050"
}

Chihiro Taguchi and David Chiang. Introducing morphology in Universal
Dependencies Japanese. In Proc. Workshop on Universal Dependencies, 65–72. 2023.
PDF BibTeX

@inproceedings{taguchi+chiang:2023,
    author = "Taguchi, Chihiro and Chiang, David",
    title = "Introducing Morphology in {U}niversal {D}ependencies {J}apanese",
    year = "2023",
    booktitle = "Proc. Workshop on Universal Dependencies",
    pages = "65--72"
}

David Chiang, Alexander M. Rush, and Boaz Barak. Named tensor notation.
Transactions on Machine Learning Research, 2023. PDF BibTeX

@article{chiang+rush+barak:2023,
    author = "Chiang, David and Rush, Alexander M. and Barak, Boaz",
    title = "Named Tensor Notation",
    year = "2023",
    xmonth = "January",
    journal = "Transactions on Machine Learning Research"
}

Darcey Riley and David Chiang. A continuum of generation tasks for investigating
length bias and degenerate repetition. In Proc. BlackboxNLP. 2022. PDF BibTeX

@inproceedings{riley+chiang:2022,
    author = "Riley, Darcey and Chiang, David",
    title = "A Continuum of Generation Tasks for Investigating Length Bias and Degenerate Repetition",
    booktitle = "Proc. BlackboxNLP",
    year = "2022"
}

Alexandra Butoi, Brian DuSell, Tim Vieira, Ryan Cotterell, and David Chiang.
Algorithms for weighted pushdown automata. In Proc. EMNLP. 2022. PDF BibTeX

@inproceedings{butoi+:2022,
    author = "Butoi, Alexandra and DuSell, Brian and Vieira, Tim and Cotterell, Ryan and Chiang, David",
    title = "Algorithms for Weighted Pushdown Automata",
    year = "2022",
    booktitle = "Proc. EMNLP"
}

Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar
Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià
Garriga-Alonso, and others. Beyond the Imitation Game: quantifying and
extrapolating the capabilities of language models. 2022. arXiv:2206.04615. PDF
BibTeX

@misc{srivastava+:2022,
    author = "Srivastava, Aarohi and Rastogi, Abhinav and Rao, Abhishek and Shoeb, Abu Awal Md and Abid, Abubakar and Fisch, Adam and Brown, Adam R. and Santoro, Adam and Gupta, Aditya and Garriga-Alonso, Adri{\a} and others",
    title = "Beyond the {I}mitation {G}ame: Quantifying and extrapolating the capabilities of language models",
    year = "2022",
    note = "arXiv:2206.04615"
}

David Chiang and Peter Cholak. Overcoming a theoretical limitation of
self-attention. In Proc. ACL. 2022. PDF BibTeX

@inproceedings{chiang+cholak:2022,
    author = "Chiang, David and Cholak, Peter",
    title = "Overcoming a Theoretical Limitation of Self-Attention",
    booktitle = "Proc. ACL",
    year = "2022"
}

Brian DuSell and David Chiang. Learning hierarchical structures with
differentiable nondeterministic stacks. In Proc. ICLR. 2022. PDF BibTeX

@inproceedings{dusell+chiang:iclr2022,
    author = "DuSell, Brian and Chiang, David",
    title = "Learning Hierarchical Structures with Differentiable Nondeterministic Stacks",
    booktitle = "Proc. ICLR",
    year = "2022"
}

All papers →


LANGUAGE AND COMPUTATION AT NOTRE DAME


RESEARCH

 * Center for the Study of Languages and Cultures
 * Center for Digital Scholarship


PEOPLE

 * Meng Jiang: summarization and generation
 * Toby Li: human-computer interaction
 * Walter Scheirer: digital humanities and handwriting recognition
 * John Lalor (ITAO): NLP and biomedical informatics


COURSES

 * CSE 40657/60657, Natural Language Processing, Prof. David Chiang
 * CSE 40982, Interactive Dialogue Systems, Prof. Collin McMillan
 * ITAO 40250, Unstructured Data Analytics, Prof. John Lalor
 * AL 20301, Introduction to Linguistics, Prof. Hana Kang

Natural Language Processing Group