About Me
I am a fourth-year Ph.D. student at the University of Toronto advised by Dr. Colin Raffel. Most of my research is on understanding the relationship between ML model behavior and the data that it was trained on. I am also interested in privacy and security of ML models, specifically language models.
Experience
I graduated in 2018 from the University of Maryland, College Park with a degree in Computer Engineering. During my undergrad I did research at the intersection of computer security and deep learning that focused on malware detection.
After graduating I worked for a couple years at a proprietary trading firm as a software engineer.
During my Ph.D. I’ve worked at:
- Adobe Research with Oriol Nieto and Zeyu Jin on enhancing amateur music recordings
- Google Brain with Nicholas Carlini on backdoor attacks against language models
- Google Research with Peter Kairouz and Alina Oprea on user-level privacy attacks
Publications and Preprints
- User Inference Attacks Against Large Language Models
- Nikhil Kandpal, Krishna Pillutla, Alina Oprea, Peter Kairouz, Christopher A. Choquette-Choo, and Zheng Xu
- Backdoor Attacks for In-Context Learning with Language Models at AdvML Frontiers 2023
- Nikhil Kandpal, Matthew Jagielski, Florian Tramèr, and Nicholas Carlini
- Git-Theta: A Git Extension for Collaborative Development of Machine Learning Models at ICML 2023
- Nikhil Kandpal*, Brian Lester*, and 7 others
- Check out the Git-Theta project and the Collaborative + Communal + Continual ML group
- Large Language Models Struggle to Learn Long-Tail Knowledge at ICML 2023
- Nikhil Kandpal, Haikang Deng, Adam Roberts, Eric Wallace, Colin Raffel
- Deduplicating Training Data Mitigates Privacy Risks in Language Models at ICML 2022
- Nikhil Kandpal, Eric Wallace, Colin Raffel
- Music Enhancement via Image Translation and Vocoding at IEEE ICASSP 2022
- Nikhil Kandpal, Oriol Nieto, Zeyu Jin
- Check out our project page
- Universal Adversarial Triggers for Attacking and Analyzing NLP at EMNLP 2019
- Eric Wallace, Shi Feng, Nikhil Kandpal, Matt Gardner, Sameer Singh
Talks
If any of these presentations seem interesting and you’d like to know more, feel free to send me an email.
- Large Language Models Struggle to Learn Long-Tail Information at IBM Research Zurich, February 2023
- A presentation on two of our papers that analyze LLM behavior through the lens of their pre-training data
- Deduplicating Training Data Mitigates Privacy Risks in Language Models at ICML, June 2022
- A video presentation of our paper on the privacy of language models trained on web-scraped datasets
- Music Enhancement via Image Translation and Vocoding at IEEE ICASSP, May 2022
- A video presentation of our paper on enhancing amateur music recordings
- Duplication, Memorization, and Privacy at UNC Image, Vision, and Language Seminar, January 2022
- A presentation on the type of training data leaked by language models
- How to Train Your Energy-Based Model at UNC Image, Vision, and Language Seminar, November 2020
- A survey of EBM training methods and ideas for incorporating EBM-like properties into classifiers