███████╗██████╗  █████╗ ██████╗
██╔════╝██╔══██╗██╔══██╗██╔══██╗
███████╗██████╔╝███████║██████╔╝
╚════██║██╔═══╝ ██╔══██║██╔══██╗
███████║██║     ██║  ██║██║  ██║
╚══════╝╚═╝     ╚═╝  ╚═╝╚═╝  ╚═╝
██████╗ ███████╗███╗   ███╗ ██████╗
██╔══██╗██╔════╝████╗ ████║██╔═══██╗
██║  ██║█████╗  ██╔████╔██║██║   ██║
██║  ██║██╔══╝  ██║╚██╔╝██║██║   ██║
██████╔╝███████╗██║ ╚═╝ ██║╚██████╔╝
╚═════╝ ╚══════╝╚═╝     ╚═╝ ╚═════╝
██████╗  █████╗ ██╗   ██╗
██╔══██╗██╔══██╗╚██╗ ██╔╝
██║  ██║███████║ ╚████╔╝ 
██║  ██║██╔══██║  ╚██╔╝  
██████╔╝██║  ██║   ██║   
╚═════╝ ╚═╝  ╚═╝   ╚═╝   
Sunday, May 11th, 2025
10:00 AM - 2:30 PM PT

About the Event

SPAR is a research mentorship program helping early researchers contribute meaningfully to frontier AI safety work. This year's virtual Demo Day (hosted in Gather Town) showcases 40+ technical and governance projects developed over 3 months by this round's stellar mentors and mentees. Projects from SPAR Spring 2025 have already won hackathons and been accepted to leading conferences such as NeurIPS and ICML.

SPAR is part of the Kairos Project, an organization dedicated to accelerating talent to work in AI safety. Kairos supports programs like SPAR and FSP that equip emerging researchers to address technical and strategic challenges posed by advanced AI systems. Show interest for future iterations of SPAR here.

RSVP Here

Event Schedule

17:00 - 18:00
Project Showcase
18:00 - 19:00
Lightning Talks
19:00 - 20:30
Career Fair
21:00
Prize Winners Announced
all day
1-1s & Networking

Career Fair

A focused career fair featuring organizations working on the future of AI alignment research, policy, and security. This event is open to the public, providing an opportunity to learn about ongoing projects, explore research roles, and connect with teams hiring for thoughtful, mission-driven work.

METR
Safe AI Forum
Gray Swan AI
Constellation
Catalyze
ML Alignment & Theory Scholars
Goodfire
Arcadia Impact

Projects

Alignment and Economics

Mentor: Mantas Mazeika

Mentees: Gongbo Sun, Zhiguang Han, Jasmine Li

Attempt to fully sparsify a toy LLM

Mentor: Stefan Heimersheim

Mentees: Sean Fillingham, David Quarel, Andrew Gordon, Xavier Poncini

Benchmarking Language Agent Collusion in Bargaining Tasks

Mentor: Andy Liu

Mentees: Verona Teo, Kushal Agrawal, Sudarshanagopal Kunnavakkam, Juan Vazquez, Vishak Srikanth

Can you perform trusted computations on untrusted compute hardware?

Mentor: Jonathan Happel

Mentees: Jackson Dean, Mohammad Ali Jauhar

Compact Proofs of Model Performance via Mechanistic Interpretability

Mentor: Jason Gross

Mentees: Oliver Chen

Concrete Demos of AI Risks

Mentor: Lucas Hansen

Mentees: Sabrina Yen-Ko, Changbai Li

Dangers of language model agents

Mentor: Simon Lermen

Mentees: Abhinav Pola, David Bai

Deconfusing commitment races

Mentor: James Faville

Mentees: Maksim Vymenets, Timothy Parker

Developing and Applying Supervised and Unsupervised Probing Methods for Concept-Based Interpretability

Mentor: Walter Laurito

Mentees: Aarush Sheth, Chang-Han Chen, Nikita Menon, Seiji Armstrong

Do LLMs need LayerNorm?

Mentor: Stefan Heimersheim

Mentees: Galvin Khara, Luca Baroni

Empirical Investigations to Assist LLM Evaluators

Mentor: Ole Jorgensen

Mentees: Shiv Munagala, Edward Crookenden, Scott Wofford, James Sullivan

Evaluating causes and mitigations of hidden serial reasoning in foundation model agents

Mentor: Rohan Subramani

Mentees: Pradyumna Shyama Prasad, Yau-Meng Wong, Nicholas Chen, Daria Ivanova

Extending upon Control Evaluations

Mentor: Aryan Bhatt

Mentees: Bryan Sukidi, Meeri Kuoppala, Alex Serrano Terre

Finding model differences after applying post-training interventions

Mentor: Shashwat Goel

Mentees: Swaraj Singh, Aritra Bandyopadhyay, Thao Pham

Funding priorities for AI safety & governance

Mentor: Joe O'Brien

Mentees: Jeba Sania, Jeremy Dolan, Jay Kim, Jonah Dykhuizen

Geometry of spatial world models

Mentor: Matthieu Moullec

Mentees: Tenghai Long, Milton Lin, Vikram Natarajan, Jonathan Michala, Christian Moya

GPU side-channels for weight theft and covert communication

Mentor: Gabriel Kulp

Mentees: Krystal Maughan, Amir Nuriyev, Luc Chartier, George Tourtellot, Natalia Kokoromyti

Gradient routing: theory and practice

Mentor: Alex Cloud

Mentees: Cailley Factor, Jorio Cocola, Ariana Azarbal, Matthew Clarke

Hybrid VAE-SAE Architectures Exploration: Disentangling Structured Features for LMs

Mentor: Yuxiao Li

Mentees: Maxim Finenko, Ionel-Emilian Chiosa, Henry Zheng, Zach Baker, Maxim Panteleev, Eslam Zaher

Improving Interpretability with AIs

Mentor: Jacques Thibodeau

Mentees: Oliver Chen, Aarush Sheth, Yeonwoo Jang, Steven Cao, Matthew Shinkle

Information Safety

Mentor: Kellin Pelrine

Mentees: Ardy Haroen, Luda Cohen, Kushal Dev, Sukanya Krishna, Hikaru Tsujimura, Jay Chooi, Toshali Goel

Interpreting In-Context Learning

Mentor: Usman Anwar

Mentees: Luc Chartier, Dhruv Gautam, Joey Turnbull

Mech Interp for Robots

Mentor: Rick Goldstein

Mentees: Nicholas Chen, Neel Sortur, Zephaniah Roe

Mechanistic Anomaly Detection

Mentor: Jordan Taylor

Mentees: Gabor Berend, Hugo Lyons Keenan, Julian Bitterwolf

More Interpretable Models by Architecture Design

Mentor: Ronak Mehta

Mentees: Ian Li, Andrew Gordon, Shraya Pal, Coby Kassner

Near Zero-knowledge Detection of Undesired Behavior

Mentor: Satvik Golechha

Mentees: Venkata Hasith Vattikuti, Greta Kintzley, Ishwar Balappanawar, Ronan Azimi-Mancel

Nonstrategic downstream proxies

Mentor: James Faville

Mentees: Daniel Hustert

Overview of Systemic Economic Risks from TAI Systems

Mentor: Deric Cheng, Justin Bullock

Mentees: Alexis Eskenazi, Liam Epstein, Mishaal Lakhani, Matthew Hodak, Suchet Mittal, Joel Christoph, Kaushik Reddy, Andrew Chang, Rupal Jain, Noah Frank, Mohammad Ghasemi, Michał Kubiak, Roman Coussement, Iman Mouloudi, Angesom Teklu, Natalia Matuszczyk, Ky-Cuong Huynh

Reviewing the State of Chinese Frontier AI Development

Mentor: Aaron Scher

Mentees: Zac Richardson, Naci Cankaya, Lily Li

Robust Evaluation Framework for LLM Unlearning

Mentor: Diogo Cruz

Mentees: Ashwin Sreevatsa, Jan Batzner, Yeonwoo Jang, Shariqah Hossain

Science of Frameworks: Procedurally-generated eval framework to probe LLM agent tendencies and failure modes

Mentor: Samuel Brown

Mentees: Daniil Anisimov, Tetiana Bas

SKATE - a peer-challenge approach to LLM ELO-ranking evaluation

Mentor: Samuel Brown, Bruno Mlodozeniec

Mentees: Dewi Gould

Soft Nationalization - US Policy Analysis on Gov. Control of AI Labs

Mentor: Deric Cheng, Justin Bullock

Mentees: Marjia Siddik, Liam Patell, Archit Kalra, Joseph Fraley, Joseph Kehir

Systemic Alignment

Mentor: Christian Schroeder de Witt

Mentees: Apurv Verma, Aashiq Muhamed, Austin Ho, Tianhao Shen, Dipika Khullar

Toy model of computation in superposition

Mentor: Stefan Heimersheim

Mentees: Sara Molas Medina

Toy Models of Deep Data Structure

Mentor: Ari Brill

Mentees: Nathaniel Mitrani Hadida

Understanding and verifying the autoregressive conditioning hypothesis

Mentor: Jacek Karwowski

Mentees: David Steinberg, Marie Victoria Zhussupova, Aritra Bandyopadhyay, Milton Lin, Denis Moiseenko, Cole Blondin

Understanding Neural Networks with Sparse Autoencoders

Mentor: Liv Gorton, Tom McGrath

Mentees: Connor Watts, Connie Robinson, Supantho Rakshit

Using RL to train shutdownable agents

Mentor: Elliott Thornley

Mentees: Harry Garland

Value Alignment in Collective Intelligence Systems

Mentor: Jonas Hallgren, Aaron Halpern

Mentees: Mariia Koroliuk, Adebayo Mubarak, Fabio Marinello, Marlon Fu, abayomi adekanmbi, Tyler Bernard, Harshit Singhal, Ananya Ananya, Dmytro Saiankin, Ijya Paudel, Sukanya Krishna, Chetan Kandpal, ARITRA DAS, Aryan Suri, Luiza Corpaci, Maximilian Holschneider, Ilya Nachevsky, Daniel Swift, Angus Wylie, Max Ramsahoye, Jonathan Michala, Amin Memarian, Abhijeet Ghawade, Nimrod Lener

Various mechanistic interpretability projects

Mentor: Curt Tigges

Mentees: Evan Lloyd, Jenny Vega, Angus Wylie, Dipika Khullar

Various projects on mechanistic interpretability and adversarial robustness

Mentor: Andy Arditi

Mentees: Kureha Yamaguchi, Egg Syntax, Steven Durr

What does it cost to hack an AI chip?

Mentor: Jonathan Happel

Mentees: Houlton McGuinn, Dave Banerjee, Luc Chartier

Awards

Poster

Best Poster

$2,000

+ $1,000 for 2nd & 3rd

Judged by:

Lightning Icon

Best Lightning Talk

$2,000

+ $1,000 runner-up

Judged by:

Joshua Landes (BlueDot)