Yanda Chen

I am a Member of Technical Staff (Research Scientist) at the Alignment Science team at Anthropic. I work on natural language processing, AI safety, and machine learning.

Previously, I did my PhD in Computer Science at Columbia University, where I was very fortunate to be co-advised by Prof. Kathy McKeown, Prof. He He, and Prof. Zhou Yu. I received my bachelor's degree in Computer Science at Columbia University in April 2021.

Email / CV / Google Scholar / Twitter / Github

Research

My current research interest lies in two directions: i) Explainability: building explainable deep learning systems and understanding how LLMs behave, and ii) Reliability: improving the calibration and reducing the sensitivity of LLMs. Below are my publications.

Parallel Structures in Pre-training Data Yield In-Context Learning
Yanda Chen, Chen Zhao, Zhou Yu, Kathleen McKeown, He He
ACL, 2024
paper / code

We find that ICL ability of language models emerges from parallel structures in the pre-training data—--pairs of phrases following similar templates in the same context window. Specifically, we show that removing parallel structures in the pre-training data reduces LMs' ICL accuracy by 51% (vs 2% from random ablation). This drop persists even when excluding common patterns such as n-gram repetitions and long-range dependency.

Towards Consistent Natural-Language Explanations via Explanation-Consistency Finetuning
Yanda Chen, Chandan Singh, Xiaodong Liu, Simiao Zuo, Bin Yu, He He, Jianfeng Gao
COLING, 2025
paper / code

We propose explanation-consistency finetuning (EC-finetuning), which adapts LLMs to generate more consistent natural-language explanations on related examples by finetuning them on synthetic data that is carefully constructed to contain consistent explanations. EC-finetuning improves explanation consistency by 10.0% on four finetuning datasets, and by 4.5% on seven out-of-distribution datasets.

Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
Yanda Chen, Ruiqi Zhong, Narutatsu Ri, Chen Zhao, He He, Jacob Steinhardt, Zhou Yu, Kathleen McKeown
ICML (Spotlight), 2024
paper / code

We propose to evaluate the counterfactual simulatability of natural language explanations: whether an explanation can enable humans to precisely infer the model's outputs on diverse counterfactuals of the explained input. We implemented two metrics—precision and generality, and found that i) LLM's explanations have low precision, and ii) precision does not correlate with plausibility.

On the Relation between Sensitivity and Accuracy in In-context Learning
Yanda Chen, Chen Zhao, Zhou Yu, Kathleen McKeown, He He
EMNLP Findings, 2023
paper / code / poster

We find that label bias obscures true ICL sensitivity and that ICL sensitivity is strongly and negatively correlated with accuracy. Motivated by our study, we propose SenSel, a few-shot selective prediction method based on ICL sensitivity.

In-context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models
Yukun Huang, Yanda Chen, Zhou Yu, Kathleen McKeown
arXiv preprint, 2022
paper

We proposed in-context learning distillation, which transfers in-context learning (ICL) ability from large language models to small language models by augmenting in-context tuning with teacher-student distillation. Experiments on LAMA and CrossFit show that in-context learning distillation improves the ICL ability of small language models.

Meta-learning via Language Model In-context Tuning
Yanda Chen, Ruiqi Zhong, Sheng Zha, George Karypis, He He
ACL, 2022
paper / code / slides

We propose a novel few-shot meta-learning method called in-context tuning, where training examples are used as prefix in-context demonstrations for task adaptation. We show that in-context tuning out-performs MAML in terms of accuracy and eliminates several well-known oversensitivity artifacts of few-shot language model prompting.

Cross-language Sentence Selection via Data Augmentation and Rationale Training
Yanda Chen, Chris Kedzie, Suraj Nair, Petra Galuscakova, Rui Zhang, Douglas Oard, Kathleen McKeown
ACL, 2021
paper / code / talk / slides

We propose a data augmentation strategy and a rationale training strategy for cross-lingual sentence selection in low-resource settings where no labeled relevance judgment is available for training. Our methods achieve state-of-the-art results on three language pairs.

Improved Synthetic Training for Reading Comprehension
Yanda Chen, Md Arafat Sultan, Vittorio Castelli
arXiv preprint, 2020
paper

We propose two novel synthetic training strategies: targeted synthetic pre-training (a method to select useful synthetic examples to target weakness of existing models) and synthetic knowledge distillation. The two techniques, when combined, yield QA models that are simultaneously smaller, faster, and more accurate.

Detecting and Reducing Bias in a High Stakes Domain
Ruiqi Zhong, Yanda Chen, Desmond Patton, Charlotte Selous, Kathy McKeown
EMNLP, 2019
paper / code / poster

We propose a framework to systematically detect and reduce language bias of deep learning models under the high-stakes context of gang intervention.

Internships

Microsoft Research, Summer 2023, Mentor: Chandan Singh, Xiaodong Liu

AWS AI, Summer 2021, Mentor: He He

IBM Research, Summer 2020, Mentor: Arafat Sultan, Vittorio Castelli

Honors

Avanessians Doctoral Fellowships for Engineering Thought Leaders and Innovators in Data Science. 2023.

Mudd Doctoral Fellowship, Columbia SEAS. 2021.

Honorable Mention, CRA Undergraduate Research Awards. 2021.

Theodore R. Bashkow Research Award, Columbia Computer Science Dept. 2021.

Teaching Assistant

Natural Language Processing, Spring 2022 & Spring 2021

Analysis of Algorithms, Spring 2021 & Spring 2020

Website design from Jon Barron