Publications

Preprints

Kernel Methods and Multi-layer Perceptrons Learn Linear Models in High Dimensionss, arXiv preprint arXiv:2201.08082. [arXiv]
M. Sahraee-Ardakan, M. Emami, P. Pandit, S. Rangan, AK. Fletcher
Empirical observation of high dimensional phenomena, such as the double descent behavior, has attracted a lot of interest in understanding classical techniques such as kernel methods, and their implications to explain generalization properties of neural networks. Many recent works analyze such models in a certain high-dimensional regime where the covariates are independent and the number of samples and the number of covariates grow at a fixed ratio (i.e., proportional asymptotics). In this work we show that for a large class of kernels, including the neural tangent kernel of fully connected networks, kernel methods can only perform as well as linear models in this regime. More surprisingly, when the data is generated by a Gaussian process model where the relationship between input and the response could be very nonlinear, we show that linear models are in fact optimal, i.e., linear models achieve the minimum risk among all models, linear or nonlinear. These results suggest that more complex models for the data other than independent features are needed for high-dimensional analysis.

Augmented Contrastive Self-Supervised Learning for Audio Invariant Representations, arXiv preprint arXiv:2112.10950. [arXiv]
M. Emami, D.Tran, K. Koishida
Improving generalization is a major challenge in audio classification due to labeled data scarcity. Self-supervised learning (SSL) methods tackle this by leveraging unlabeled data to learn useful features for downstream classification tasks. In this work, we propose an augmented contrastive SSL framework to learn invariant representations from unlabeled data. Our method applies various perturbations to the unlabeled input data and utilizes contrastive learning to learn representations robust to such perturbations. Experimental results on the Audioset and DESED datasets show that our framework significantly outperforms state-of-the-art SSL and supervised learning methods on sound/event classification tasks.

Implicit Bias of Linear RNNs. In International Conference on Machine Learning (ICML), 2021. [paper] [video]
M. Emami, M. Sahraee-Ardakan, P. Pandit, S. Rangan, AK. Fletcher
Contemporary wisdom based on empirical studies suggests that standard recurrent neural networks (RNNs) do not perform well on tasks requiring long-term memory. However, RNNs’ poor ability to capture long-term dependencies has not been fully understood. This paper provides a rigorous explanation of this property in the special case of linear RNNs. Although this work is limited to linear RNNs, even these systems have traditionally been difficult to analyze due to their non-linear parameterization. Using recently-developed kernel regime analysis, our main result shows that as the number of hidden units goes to infinity, linear RNNs learned from random initializations are functionally equivalent to a certain weighted 1D-convolutional network. Importantly, the weightings in the equivalent model cause an implicit bias to elements with smaller time lags in the convolution, and hence shorter memory. The degree of this bias depends on the variance of the transition matrix at initialization and is related to the classic exploding and vanishing gradients problem. The theory is validated with both synthetic and real data experiments.

Generalization Error of Generalized Linear Models in High Dimensions. In International Conference on Machine Learning (ICML), 2020. [paper] [video]
M. Emami, M. Sahraee-Ardakan, P. Pandit, S. Rangan, AK. Fletcher
At the heart of machine learning lies the question of generalizability of learned rules over previously unseen data. Increasing use of machine learning models in critical applications such as healthcare, autonomous driving, policy making, etc., calls for a detailed understanding of the underlying models for accountability and robustness. Therefore, providing methods to quantify the generalization error of these models becomes crucial in assessing their fitness for use. Bearing that in mind, this work considers Generalized Linear Models (GLMs) (i.e., single-layer neural networks) in the over-parameterized regime and provides a unified framework to exactly characterize the generalization error. The results are more general than the prior analyses in this area and hold for a large class of generalization metrics, loss functions, and regularization schemes. Also, in this framework we can capture a larger class of statistical models for the features as well as a distributional mismatch between training and test datasets.

Input-Output Equivalence of Unitary and Contractive RNNs. In Advances in Neural Information Processing Systems (NeurIPS), 2019. [paper] [arXiv]
M. Emami, M. Sahraee-Ardakan, S. Rangan, AK. Fletcher
Unitary recurrent neural networks (URNNs) have been proposed as a method to overcome the vanishing and exploding gradient problem in modeling data with long-term dependencies. A basic question is how restrictive is the unitary constraint on the possible input-output mappings of such a network? This work shows that for any contractive RNN with ReLU activations, there is a URNN with at most twice the number of hidden states and the identical input-output mapping. Hence, with ReLU activations, URNNs are as expressive as general RNNs. In contrast, for certain smooth activations, it is shown that the input-output mapping of an RNN cannot be matched with a URNN, even with an arbitrary number of states. The theoretical results are supported by experiments on modeling of slowly-varying dynamical systems.

Low-Rank Nonlinear Decoding of Micro-ECoG from the Primary Auditory Cortex. In Conference on Cognitive Computational Neuroscience (CCN), 2018. [paper] [arXiv]
M. Emami, M. Sahraee-Ardakan, P. Pandit, AK. Fletcher, S. Rangan, M. Trumpis, B. Bent, C. Chiang, J. Viventi
This paper considers the problem of neural decoding from parallel neural measurements systems such as micro-electrocorticography (micro-ECoG). In systems with large numbers of array elements at very high sampling rates, the dimension of the raw measurement data may be large. Learning neural decoders for this high-dimensional data can be challenging, particularly when the number of training samples is limited. To address this challenge, this work presents a novel neural network decoder with a low-rank structure in the first hidden layer. The low-rank constraints dramatically reduce the number of parameters in the decoder while still enabling a rich class of nonlinear decoder maps. The low-rank decoder is illustrated on micro-ECoG data from the primary auditory cortex (A1) of awake rats. This decoding problem is particularly challenging due to the complexity of neural responses in the auditory cortex and the presence of confounding signals in awake animals. It is shown that the proposed low-rank decoder significantly outperforms models using standard dimensionality reduction techniques such as principal component analysis (PCA).