About Me

I am a fourth-year Ph.D. student at the University of Toronto advised by Dr. Colin Raffel. Most of my research is on understanding the relationship between ML model behavior and the data that it was trained on. I am also interested in privacy and security of ML models, specifically language models.

Experience

I graduated in 2018 from the University of Maryland, College Park with a degree in Computer Engineering. During my undergrad I did research at the intersection of computer security and deep learning that focused on malware detection.

After graduating I worked for a couple years at a proprietary trading firm as a software engineer.

During my Ph.D. I’ve worked at:

Adobe Research with Oriol Nieto and Zeyu Jin on enhancing amateur music recordings
Google Brain with Nicholas Carlini on backdoor attacks against language models
Google Research with Peter Kairouz and Alina Oprea on user-level privacy attacks

Publications and Preprints

AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution at ICLR 2025
- Fengyuan Liu, Nikhil Kandpal, and Colin Raffel
User Inference Attacks Against Large Language Models
- Nikhil Kandpal, Krishna Pillutla, Alina Oprea, Peter Kairouz, Christopher A. Choquette-Choo, and Zheng Xu at EMNLP 2024
Backdoor Attacks for In-Context Learning with Language Models at ICML 2023 AdvML Workshop
- Nikhil Kandpal, Matthew Jagielski, Florian Tramèr, and Nicholas Carlini
Git-Theta: A Git Extension for Collaborative Development of Machine Learning Models at ICML 2023
- Nikhil Kandpal*, Brian Lester*, and 7 others
- Check out the Git-Theta project and the Collaborative + Communal + Continual ML group
Large Language Models Struggle to Learn Long-Tail Knowledge at ICML 2023
- Nikhil Kandpal, Haikang Deng, Adam Roberts, Eric Wallace, Colin Raffel
Deduplicating Training Data Mitigates Privacy Risks in Language Models at ICML 2022
- Nikhil Kandpal, Eric Wallace, Colin Raffel
Music Enhancement via Image Translation and Vocoding at IEEE ICASSP 2022
- Nikhil Kandpal, Oriol Nieto, Zeyu Jin
- Check out our project page
Universal Adversarial Triggers for Attacking and Analyzing NLP at EMNLP 2019
- Eric Wallace, Shi Feng, Nikhil Kandpal, Matt Gardner, Sameer Singh

Talks

If any of these presentations seem interesting and you’d like to know more, feel free to send me an email.

Understanding Language Models Through the Lens of their Training Data at MBZUAI, June 2024
- A talk synthesizing my past papers and future research ideas on LLM interpretability
Building Machine Learning Models like Open-Source Software with Git Theta at MLSys 2023, June 2023
- A presentation on using our Git Theta software package for collaborative model development
Large Language Models Struggle to Learn Long-Tail Information at IBM Research Zurich, February 2023
- A presentation on two of our papers that analyze LLM behavior through the lens of their pre-training data
Deduplicating Training Data Mitigates Privacy Risks in Language Models at ICML, June 2022
- A video presentation of our paper on the privacy of language models trained on web-scraped datasets
Music Enhancement via Image Translation and Vocoding at IEEE ICASSP, May 2022
- A video presentation of our paper on enhancing amateur music recordings
Duplication, Memorization, and Privacy at UNC Image, Vision, and Language Seminar, January 2022
- A presentation on the type of training data leaked by language models
How to Train Your Energy-Based Model at UNC Image, Vision, and Language Seminar, November 2020
- A survey of EBM training methods and ideas for incorporating EBM-like properties into classifiers