Advanced AI is coming.
Let’s make sure it goes well.
Monash AI Alignment is a research group at Monash University working on the technical problems of making advanced AI systems safe, honest, and controllable.
About
We are a group of Masters students, PhD candidates, and researchers based at Monash, led by Dr. Trung Le, with contributions from faculty including Prof. Mehrtash Harandi. Our work focuses on the technical foundations of AI alignment — understanding what frontier models are doing internally, and developing the tools to keep them aligned with human intent as their capabilities grow.
We meet regularly to discuss papers, share work-in-progress, and collaborate on research projects. New members are welcome.
Research areas
-
Feature decomposition
Sparse autoencoders and related methods for decomposing model internals into interpretable features — the foundation for understanding what neural networks have learned.
-
Safety steering
Activation-level interventions and steering vectors that shape model behaviour without retraining, building on advances in interpretability.
-
Watermarking & fingerprinting
Provenance techniques for large language models that remain robust under fine-tuning, quantisation, and adversarial removal.
-
Hallucinations
Understanding and mitigating the conditions under which language models generate confident but ungrounded outputs.
People
The group is led by Dr. Trung Le, with active involvement from Prof. Mehrtash Harandi and a growing cohort of Masters and PhD researchers across the Faculty of Information Technology and the Department of Electrical and Computer Systems Engineering.
If you’re a Monash student or researcher interested in joining, please get in touch by email.
Get involved
Discussion, reading-group sessions, and project coordination happen on our internal forum. The forum is publicly readable, but participation is limited to invited members.