About the Event
SPAR is a research mentorship program helping early researchers contribute meaningfully to frontier AI safety work. This year's virtual Demo Day (hosted in Gather Town) showcases 40+ technical and governance projects developed over 3 months by this round's stellar mentors and mentees. Projects from SPAR Spring 2025 have already won hackathons and been accepted to leading conferences such as NeurIPS and ICML.
SPAR is part of the Kairos Project, an organization dedicated to accelerating talent to work in AI safety. Kairos supports programs like SPAR and FSP that equip emerging researchers to address technical and strategic challenges posed by advanced AI systems. Show interest for future iterations of SPAR here.
RSVP HereEvent Schedule
Career Fair
A focused career fair featuring organizations working on the future of AI alignment research, policy, and security. This event is open to the public, providing an opportunity to learn about ongoing projects, explore research roles, and connect with teams hiring for thoughtful, mission-driven work.








Projects
Alignment and Economics
Mentor: Mantas Mazeika
Mentees: Gongbo Sun, Zhiguang Han, Jasmine Li
Attempt to fully sparsify a toy LLM
Mentor: Stefan Heimersheim
Mentees: Sean Fillingham, David Quarel, Andrew Gordon, Xavier Poncini
Benchmarking Language Agent Collusion in Bargaining Tasks
Mentor: Andy Liu
Mentees: Verona Teo, Kushal Agrawal, Sudarshanagopal Kunnavakkam, Juan Vazquez, Vishak Srikanth
Can you perform trusted computations on untrusted compute hardware?
Mentor: Jonathan Happel
Mentees: Jackson Dean, Mohammad Ali Jauhar
Compact Proofs of Model Performance via Mechanistic Interpretability
Mentor: Jason Gross
Mentees: Oliver Chen
Concrete Demos of AI Risks
Mentor: Lucas Hansen
Mentees: Sabrina Yen-Ko, Changbai Li
Dangers of language model agents
Mentor: Simon Lermen
Mentees: Abhinav Pola, David Bai
Deconfusing commitment races
Mentor: James Faville
Mentees: Maksim Vymenets, Timothy Parker
Developing and Applying Supervised and Unsupervised Probing Methods for Concept-Based Interpretability
Mentor: Walter Laurito
Mentees: Aarush Sheth, Chang-Han Chen, Nikita Menon, Seiji Armstrong
Do LLMs need LayerNorm?
Mentor: Stefan Heimersheim
Mentees: Galvin Khara, Luca Baroni
Empirical Investigations to Assist LLM Evaluators
Mentor: Ole Jorgensen
Mentees: Shiv Munagala, Edward Crookenden, Scott Wofford, James Sullivan
Evaluating causes and mitigations of hidden serial reasoning in foundation model agents
Mentor: Rohan Subramani
Mentees: Pradyumna Shyama Prasad, Yau-Meng Wong, Nicholas Chen, Daria Ivanova
Extending upon Control Evaluations
Mentor: Aryan Bhatt
Mentees: Bryan Sukidi, Meeri Kuoppala, Alex Serrano Terre
Finding model differences after applying post-training interventions
Mentor: Shashwat Goel
Mentees: Swaraj Singh, Aritra Bandyopadhyay, Thao Pham
Funding priorities for AI safety & governance
Mentor: Joe O'Brien
Mentees: Jeba Sania, Jeremy Dolan, Jay Kim, Jonah Dykhuizen
Geometry of spatial world models
Mentor: Matthieu Moullec
Mentees: Tenghai Long, Milton Lin, Vikram Natarajan, Jonathan Michala, Christian Moya
GPU side-channels for weight theft and covert communication
Mentor: Gabriel Kulp
Mentees: Krystal Maughan, Amir Nuriyev, Luc Chartier, George Tourtellot, Natalia Kokoromyti
Gradient routing: theory and practice
Mentor: Alex Cloud
Mentees: Cailley Factor, Jorio Cocola, Ariana Azarbal, Matthew Clarke
Hybrid VAE-SAE Architectures Exploration: Disentangling Structured Features for LMs
Mentor: Yuxiao Li
Mentees: Maxim Finenko, Ionel-Emilian Chiosa, Henry Zheng, Zach Baker, Maxim Panteleev, Eslam Zaher
Improving Interpretability with AIs
Mentor: Jacques Thibodeau
Mentees: Oliver Chen, Aarush Sheth, Yeonwoo Jang, Steven Cao, Matthew Shinkle
Information Safety
Mentor: Kellin Pelrine
Mentees: Ardy Haroen, Luda Cohen, Kushal Dev, Sukanya Krishna, Hikaru Tsujimura, Jay Chooi, Toshali Goel
Interpreting In-Context Learning
Mentor: Usman Anwar
Mentees: Luc Chartier, Dhruv Gautam, Joey Turnbull
Mech Interp for Robots
Mentor: Rick Goldstein
Mentees: Nicholas Chen, Neel Sortur, Zephaniah Roe
Mechanistic Anomaly Detection
Mentor: Jordan Taylor
Mentees: Gabor Berend, Hugo Lyons Keenan, Julian Bitterwolf
More Interpretable Models by Architecture Design
Mentor: Ronak Mehta
Mentees: Ian Li, Andrew Gordon, Shraya Pal, Coby Kassner
Near Zero-knowledge Detection of Undesired Behavior
Mentor: Satvik Golechha
Mentees: Venkata Hasith Vattikuti, Greta Kintzley, Ishwar Balappanawar, Ronan Azimi-Mancel
Nonstrategic downstream proxies
Mentor: James Faville
Mentees: Daniel Hustert
Overview of Systemic Economic Risks from TAI Systems
Mentor: Deric Cheng, Justin Bullock
Mentees: Alexis Eskenazi, Liam Epstein, Mishaal Lakhani, Matthew Hodak, Suchet Mittal, Joel Christoph, Kaushik Reddy, Andrew Chang, Rupal Jain, Noah Frank, Mohammad Ghasemi, Michał Kubiak, Roman Coussement, Iman Mouloudi, Angesom Teklu, Natalia Matuszczyk, Ky-Cuong Huynh
Reviewing the State of Chinese Frontier AI Development
Mentor: Aaron Scher
Mentees: Zac Richardson, Naci Cankaya, Lily Li
Robust Evaluation Framework for LLM Unlearning
Mentor: Diogo Cruz
Mentees: Ashwin Sreevatsa, Jan Batzner, Yeonwoo Jang, Shariqah Hossain
Science of Frameworks: Procedurally-generated eval framework to probe LLM agent tendencies and failure modes
Mentor: Samuel Brown
Mentees: Daniil Anisimov, Tetiana Bas
SKATE - a peer-challenge approach to LLM ELO-ranking evaluation
Mentor: Samuel Brown, Bruno Mlodozeniec
Mentees: Dewi Gould
Soft Nationalization - US Policy Analysis on Gov. Control of AI Labs
Mentor: Deric Cheng, Justin Bullock
Mentees: Marjia Siddik, Liam Patell, Archit Kalra, Joseph Fraley, Joseph Kehir
Systemic Alignment
Mentor: Christian Schroeder de Witt
Mentees: Apurv Verma, Aashiq Muhamed, Austin Ho, Tianhao Shen, Dipika Khullar
Toy model of computation in superposition
Mentor: Stefan Heimersheim
Mentees: Sara Molas Medina
Toy Models of Deep Data Structure
Mentor: Ari Brill
Mentees: Nathaniel Mitrani Hadida
Understanding and verifying the autoregressive conditioning hypothesis
Mentor: Jacek Karwowski
Mentees: David Steinberg, Marie Victoria Zhussupova, Aritra Bandyopadhyay, Milton Lin, Denis Moiseenko, Cole Blondin
Understanding Neural Networks with Sparse Autoencoders
Mentor: Liv Gorton, Tom McGrath
Mentees: Connor Watts, Connie Robinson, Supantho Rakshit
Using RL to train shutdownable agents
Mentor: Elliott Thornley
Mentees: Harry Garland
Value Alignment in Collective Intelligence Systems
Mentor: Jonas Hallgren, Aaron Halpern
Mentees: Mariia Koroliuk, Adebayo Mubarak, Fabio Marinello, Marlon Fu, abayomi adekanmbi, Tyler Bernard, Harshit Singhal, Ananya Ananya, Dmytro Saiankin, Ijya Paudel, Sukanya Krishna, Chetan Kandpal, ARITRA DAS, Aryan Suri, Luiza Corpaci, Maximilian Holschneider, Ilya Nachevsky, Daniel Swift, Angus Wylie, Max Ramsahoye, Jonathan Michala, Amin Memarian, Abhijeet Ghawade, Nimrod Lener
Various mechanistic interpretability projects
Mentor: Curt Tigges
Mentees: Evan Lloyd, Jenny Vega, Angus Wylie, Dipika Khullar
Various projects on mechanistic interpretability and adversarial robustness
Mentor: Andy Arditi
Mentees: Kureha Yamaguchi, Egg Syntax, Steven Durr
What does it cost to hack an AI chip?
Mentor: Jonathan Happel
Mentees: Houlton McGuinn, Dave Banerjee, Luc Chartier
Awards
Best Poster
$2,000
+ $1,000 for 2nd & 3rd
Judged by: