About

I'm Grzegorz Chlebus, a manager at NVIDIA working on Frontier AI Evaluation.

My work sits at the intersection of LLM evaluation, agent systems, developer tooling, and practical AI engineering. I'm especially interested in the gap between benchmark performance and real-world usefulness: what actually makes models reliable, debuggable, cost-effective, and worth deploying.

Before that, I worked on applied machine learning in industry and medical image analysis research, with a focus on segmentation, uncertainty, explainability, and practical deep learning systems.

This blog is a mix of technical notes, experiments, and opinions on machine learning, AI systems, and software.

Disclaimer: The views expressed on this blog are my own and do not represent the views of my employer or any affiliated organization.