- This event has passed.
Yiqiao Zhong (UWM): “How do LLMs generalize on out-of-distribution tasks? insights from model’s internal representations”
Abstract:
A mystery of large language models (LLMs) is their ability to solve novel tasks, notably through a few demonstrations in the prompt (in-context learning). Such tasks often require the model to generalize far beyond its training distribution, raising the question: how do LLMs achieve this form of out-of-distribution (OOD) generalization? For example, in symbolized language reasoning where names/labels are replaced by arbitrary symbols, yet the model can infer the correct name-label mapping without any finetuning.
In this talk, I will open the black box of LLMs and reveal how three facets of LLM behavior are interconnected: emergent phenomena during training, OOD generalization, and a model’s representation of compositions. Focusing on induction heads, I will show that learning the right compositional structure is a key to OOD generalization, and this learning process exhibits sharp transitions in training dynamics. Further, I propose that “”common bridge representation hypothesis””—where a latent subspace in the embedding space acts as a bridge to align multiple attention heads across early and later layers—may be the key geometric structure underlying the success of transformers.
Biography:
Yiqiao Zhong is currently an assistant professor at the University of Wisconsin-Madison, Department of Statistics. Yiqiao obtained his PhD from Princeton University, advised by Prof. Jianqing Fan, and was a postdoc at Stanford University, advised by Prof. Andrea Montanari and Prof. David Donoho. His research interests are the scientific foundations of large language models, including interpretability, visualization, and statistical theory.

