Hello! I am a third-year PhD student, previously at Oxford and now at Stanford. I am broadly interested in research addressing gaps in models that prevent them from being applied to currently out-of-reach real-world tasks. This includes improved long-context understanding and data efficiency, methods that allow models to continually learn from new experiences, and architectures whose capability scales better with test-time compute. Currently, I am researching how we can train natively parallel reasoning models with RL.
Recently, I've also worked on methods for improving LLMs by scaling test-time compute
(Large Language Monkeys,
CodeMonkeys), as well as
corresponding systems research to make these models more efficient
(Hydragen).
Before this, my research focused on multimodal deep learning, including building one of the first
generative models of open-world 3D scenes
(NF-LDM). I have also worked on more theoretical
projects exploring how the geometric structure of image data influences model performance
(Union of Manifolds,
Geometry of Activations).
I have been fortunate to work with many amazing collaborators. At Stanford, I am working in the
Scaling Intelligence Lab
with
Professor Azalia Mirhoseini. At the University of Oxford, I was supervised by
Professor Ronald Clark in the
PiXL group. Before this, I obtained my
undergraduate degree at the University of Waterloo, where I studied Software Engineering with a joint major in Combinatorics and Optimization. Through Waterloo's co-op program, I completed six internships primarily focused on AI research across a variety of domains.
This includes 3D generative model research at Nvidia's Toronto AI Lab
advised by Professor Sanja Fidler, theoretical and recommender system research at
Layer 6 AI, and computer vision research at Akasha Imaging (acquired by Intrinsic).
Demonstrating that increasing the amount of inference compute through repeated sampling leads to large improvements in coverage - the fraction of problems solved by any attempt - across a variety tasks, models, and sample budgets. This makes it possible, and sometimes cost-effective, to amplify weaker models with many samples and outperform single attempts from more capable models.
Introducing an exact, simple (no custom CUDA) implementation of attention that can accelerate LLM throughput by over 30x for problems containing shared prefixes and large batch sizes.
Extending the manifold hypothesis to support natural image data lying on a union of manifolds with varying intrinsic dimension.
Show increased performance in generative modelling and image classification tasks by designing models with an inductive bias for this structure.
Demonstrating that large language models (LLMs) can be misled by providing them
with factually correct, but unrepresentative/biased examples, in the context of
integer-to-integer piecewise functions.
Investigating how the intrinsic dimension of activations in deep neural networks are affected by regularization, correlated with improved validation performance and are coupled with the effects of sudden generalization (grokking).
Proposing a mathematically sound rotation augmentation scheme and loss modification for object detection models that leads to better rotation invariance/equivariance.