2023 Winter, Yiming at Millennial Park, Chicago

Yiming Su

B.Sc. in Computer Science, The University of Chicago.

whoami

I'm a second-year Computer Science PhD at The University of Illinois-Urbana Champaign, advised by Prof. Tianyin Xu.

I work on agentic systems reliability: I design system-side frameworks and tooling support to reduce or eliminate the impacts of unreliable agent behaviors.

Research

LLM-powered agents are inherently nondeterministic in their behaviors, preventing us from applying them in safety-critical systems (e.g., cluster management systems), where us humans can greatly benefit from their decision-making and data-processing capabilities. My research focuses on reducing or eliminate the impacts of this unpredictable behavior. The efforts include providing systems-side frameworks that prevent any harmful behaviors, or providing tooling support for the agent at runtime to mitigate them. My research has culminated in Stratus, a multi-agent systems that enables autonomous SRE incident management through a transaction-like semantics, and previously, HotGPT, an attempt to understand the edges and limits of LLMs before time.

Before working on agents, I focused on distributed systems reliability. My research focused on the semantics challenge of managing traditional distributed systems (e.g., Apache Cassandra) on cloud-native platforms (e.g., Kubernetes). Large-scale distributed software have complicated management semantics that are hard to capture in management programs (termed "operators"). We conducted an effort to understand and detect such semantics bug in operator programs, which is accepted into NSDI '26. We found 86 bugs (53 confirmed and 28 fixed) in popular operators of distributed systems.

A short bio can be found here.

News

  • NEW!!! Check out SREGym, a high-fidelity benchmark and training ground for SRE agents!!
  • AWARD!!! Best Presentation at CSL Student Conference 2026 for SREGym demonstration!
  • NEW!!! ICLR '26: SysMoBench paper accepted! Evaluating AI on formally modeling complex real-world systems.
  • NEW!!! NeurIPS '25: STRATUS multi-agent system for autonomous reliability engineering accepted!
  • NEW!!! NSDI '26: Research on cloud application management reliability accepted!

Publications

Awards

Talks

Services

News

Random things

Misc. images I collected during PhD
Misc. quotes
Number of times this page got falsely marked as deceptive website
| dotfiles: git@github.com:yimingsu01/dotfiles.git |
Linkin Park - Rock am Ring 2001 Opening

Unless specifically noted, I do not own any of the images presented on this site. All rights go to their respective owner.

My past life... here