Home Research Experience

Hello! I am an A.B. Computer Science student at Princeton University graduating in 2028. I am currently a researcher in the Prof. Zhuang Liu Group and the Princeton Computer Vision Lab (advised by Prof. Jia Deng). I have had the pleasure of working with Sachin Konan, Supriyo Chakraborty, and Abhishek Joshi.

My research focuses on Machine Learning, Natural Language Processing, and Reinforcement Learning. I am particularly interested in LLM fine-tuning, mechanistic interpretability, and procedural generation for simulation.

I am expecting to graduate from Princeton in May 2028. Outside of research, I play classical cello in the Princeton University Orchestra, enjoy running (I completed the Jersey City Marathon in 2025!), and like experimenting with RL environments (PufferLib; NetHack env).

Thanks for visiting!

jonathanliu [at] princeton [dot] edu

Updates

Date
[06/2026]Will be joining Abridge AI as an Incoming Research Scientist (PhD Role).
[08/2025]Started as a Research Assistant in Prof. Zhuang Liu's Group.
[06/2025]Began a Machine Learning Research Internship at BBN Technologies.
[05/2025]First author paper presented at the 2025 NeurIPS workshop GenAI4Health.
[08/2024]Co-authored Infinigen-Sim, accepted to CoRL LSRL.

Papers

(* indicates equal contribution)

Continuous Diffusion Transformers for Designing Synthetic Regulatory Elements
Jonathan Liu*, Kia Ghods.
Generative AI in Genomics (Gen²) Workshop at ICLR 2026.
We present a parameter-efficient Diffusion Transformer (DiT) for generating 200bp cell-type-specific regulatory DNA sequences. By replacing the U-Net backbone of DNA-Diffusion with a transformer denoiser equipped with a 2D CNN input encoder, our model matches the U-Net's best validation loss in 13 epochs (60× fewer) and converges 39% lower, while reducing memorization from 5.3% to 1.7% of generated sequences aligning to training data via BLAT. Ablations show the CNN encoder is essential: without it, validation loss increases 70% regardless of positional embedding choice. We further apply DDPO finetuning using Enformer as a reward model, achieving a 38× improvement in predicted regulatory activity. Cross-validation against DRAKES on an independent prediction task confirms that improvements reflect genuine regulatory signal rather than reward model overfitting.
Demo: Statistically Significant Results on Biases and Errors of LLMs Do Not Guarantee Generalizable Results
Jonathan Liu*, Haoling Qiu, Jonathan Lasko, Damianos Karakos, Mahsa Yarmohammadi, Mark Dredze.
GenAI4Health 2025 Poster.
Recent research has shown that hallucinations, omissions, and biases are prevalent in everyday use-cases of LLMs. However, chatbots used in medical contexts must provide consistent advice in situations where non-medical factors are involved, such as when demographic information is present. In order to understand the conditions under which medical chatbots fail to perform as expected, we develop an infrastructure that 1) automatically generates queries to probe LLMs and 2) evaluates answers to these queries using multiple LLM-as-a-judge setups and prompts. For 1), our prompt creation pipeline samples the space of patient demographics, histories, disorders, and writing styles to create realistic questions that we subsequently use to prompt LLMs. In 2), our evaluation pipeline provides hallucination and omission detection using LLM-as-a-judge as well as agentic workflows, in addition to LLM-as-a-judge treatment category detectors. As a baseline study, we perform two case studies on inter-LLM agreement and the impact of varying the answering and evaluation LLMs. We find that LLM annotators exhibit low agreement scores (average Cohen's Kappa=0.118), and only specific (answering, evaluation) LLM pairs yield statistically significant differences across writing styles, genders, and races. We recommend that studies using LLM evaluation use multiple LLMs as evaluators in order to avoid arriving at statistically significant but non-generalizable results, particularly in the absence of ground-truth data. We also suggest publishing inter-LLM agreement metrics for transparency. Our code and dataset are available here: https://github.com/BBN-E/medic-neurips-2025-demo.
Infinigen-Sim: Procedural Generation of Articulated Simulation Assets
Abhishek Joshi*, Beining Han, Jack Nugent, Yiming Zuo, Jonathan Liu, Hongyu Wen, Stamatis Alexandropoulos, Tao Sun, Alexander Raistrick, Gaowen Liu, Yi Shao, Jia Deng.
LSRW Poster (CoRL LSRL).
We introduce Infinigen-Sim, a toolkit for generating realistic, procedurally generated articulated assets for robotics simulation. We include procedural generators for 12 common articulated object categories along with high-level utilities for use creating custom articulated assets in Blender. We also provide an export pipeline to integrate the resulting assets along with their physical properties into common robotics simulators. Experiments show that assets sampled from these generators are useful for movable object segmentation, training generalizable reinforcement learning policies, and sim-to-real transfer of imitation learning policies.

Experience

Abridge AIIncoming Research Scientist (PhD Role) (Summer 2026)
Prof. Zhuang Liu GroupResearch Assistant (Aug 2025 - Present)
BBN TechnologiesMachine Learning Research Intern (Jun 2025 - Aug 2025)
Princeton Computer Vision LabResearch Assistant (Aug 2024 - Present)

Projects

Sight Support (Assistive tech app)Spring 2020 - Winter 2025
RL for the Andrews-Curtis ConjectureSpring 2025 - Present
Discovering Transformer Circuits with Edge PruningSpring 2025

Relevant Coursework

Princeton University
COS 484: Natural Language ProcessingSpring 2025
COS 485: Neural Networks: Theory and ApplicationSpring 2025
COS 597R: Advanced Topics in Computer Science (Probabilistic Topics in RL)Fall 2025
COS 585: Information Theory and ApplicationsFall 2025
COS 568: Systems and Machine LearningSpring 2026
COS 417: Operating SystemsSpring 2026
COS 598B: Advanced Topics in Computer Science (Formal methods)Spring 2026
ECE 476: Parallel Computing: Principles, SystemsSpring 2026