Speaker
Hongyang Zhang, Ph.D., Northeastern University
Abstract
Machine learning (ML) systems such as chatbots are now integrated into everyday workflows, mediating communication, writing, and problem-solving. Key technical ingredients behind these systems include language model fine-tuning (such as low-rank adaptation and chain-of-thought fine-tuning) and reinforcement learning from human or synthetic preferences. Despite their widespread adoption, the underlying mechanisms that make these techniques effective remain poorly understood, limiting our ability to interpret and reliably use large language models.
In this talk, I will provide a Hessian-based perspective on these questions in the context of modern ML. The use of second-order information in neural networks dates back to early work on optimization; recent advances show that Hessian structures provide valuable insights into generalization. I will introduce a Hessian-based measure of generalization (ICML’22) and show how the trace of the loss Hessian leads to non-vacuous bounds on the generalization gap (TMLR’24). I will then discuss how this Hessian view helps illustrate phenomena such as grokking, uncovering a novel connection between influence functions and surrogate models, motivating the design of kernel surrogates for task attribution, and guiding algorithm design for in-context learning (EMNLP’25) and multi-objective reinforcement learning (AAAI’26).
Bio
Hongyang R. Zhang is an assistant professor of computer science at Northeastern University in Boston. His research lies at the intersection of machine learning, optimization algorithms, and statistical learning. He received his PhD in Computer Science from Stanford University and his BEng from Shanghai Jiao Tong University. He later spent a year as a postdoctoral researcher in the Department of Statistics and Data Science at the University of Pennsylvania. His work has received paper awards from COLT and TMLR, and he has served as an area chair for ICML, AISTATS, ALT, and AAAI, as well as an action editor for the Journal of Data-centric Machine Learning Research and Transactions on Machine Learning Research.