Jul 18 (Thu) @ 10:30am: “Generalization and Optimization in the Interpolation Regime: From Linear Models to Neural Networks,” Hossein Taheri, ECE PhD Defense

Date and Time
Location
Engineering Science Building (ESB), Room 1001

Abstract

Learning with large models has driven unprecedented advancements across diverse fields of machine learning. As model's size grows the capacity of the model to memorize or interpolate the dataset also increases. Learning under interpolation presents new challenges and opportunities which are not addressed in classical statistical learning theory. This thesis explores the performance of learning methods in the interpolation regime across various models, including linear models and neural networks. Our primary goal is to understand how data and model characteristics influence the convergence behavior of gradient-based methods such as gradient descent and to quantify how well these models generalize to new data. In the first section, we explore linear models, which are the simplest examples where learning under interpolation can be studied. In particular, we consider empirical risk minimization methods applied on high-dimensional generalized linear models. Our goal is to understand the optimal test error performance for such models in an asymptotic set-up where the data-dimension is comparable to the number of training samples. By deriving a system of equations which precisely characterises the test error performance, we are able to find a tight lower-bound on the test error which holds for any convex loss function and ridge-regularization parameter. We then show the bound is tight by proposing a loss function and regularization parameter which achieves the bound. Continuing with linear models, we consider adversarial learning with high-dimensional Gaussian-mixture models. Adversarial training, based on empirical risk minimization, currently represents one of the main approaches for defending against adversarial attacks, which involve small but targeted modifications to test data that result in misclassification. We derive precise asymptotic expressions for both standard and adversarial test errors. Our results demonstrate the relationship between adversarial and standard errors and the influence of factors such as the over-parameterization ratio, the data model, and the attack budget. Finally, we discuss the training and generalization characteristics of interpolating neural networks. Neural nets are known for their ability to memorize even complex datasets, often achieving near-zero training loss via gradient descent optimization. Despite this capability, they also demonstrate remarkable generalization to new data. We investigate the generalization error of neural networks trained with logistic loss. Our main finding shows that under a specific data-separability condition, optimal test loss bounds are achievable if the network width is only poly-logarithmically large with respect to the number of training samples. Moreover, our analysis framework which is based on algorithmic stability presents improved generalization bounds and width lower bounds compared to prior works employing alternative methods.

Bio

Hossein Taheri received the B.Sc. degree in Electrical Engineering and Mathematics from the Sharif University of Technology in 2018. He is currently pursuing the Ph.D. degree in Electrical and Computer Engineering at the University of California at Santa Barbara. His main area of research is statistical learning and optimization.

Hosted by: Professor Christos Thrampoulidis

Submitted by: Hossein Taheri <hossein@ucsb.edu>