About

I build LLM systems for messy workflows where the hard part is not only capability, but judgment: what evidence to trust, when to ask a clarifying question, how far to relax a constraint, and how to know whether the system got better instead of just more confident.

I originally came through physics because I liked problems where mathematical models have to survive contact with measurement. Software, data, and ML gave me a faster version of the same loop: build an instrument, watch where it fails, improve the model or system, and measure again. LLM agents make that loop interesting in a slightly annoying way, because their failures often expose weak assumptions in how we define tasks, evidence, memory, uncertainty, and success.

Things I keep coming back to:

evaluating what agents actually do, not only what they say at the end: trajectories, tool calls, evidence coverage, constraint handling, perturbation tests, and regression suites
turning noisy enterprise data into something agents can reason over: document pipelines, knowledge graphs, retrieval systems, and taxonomies
looking for right-answer, wrong-reason behavior before it becomes a product feature
keeping business logic explicit when the model should not be trusted with it
practical infrastructure: PostgreSQL, search, queues, containers, Linux, and the usual glue code needed to make prototypes survive contact with users

Before LLMs became the center of gravity, I worked as a software engineer, data engineer, data scientist, and product-ish ML person across insurance, marketing optimization, procurement, and enterprise data systems. Before that, I spent several years in doctoral physics research, which mostly left me with a useful allergy to unmeasured claims and love for scale effects.

This site is mostly a place for my CV and occasional blog notes when I run into something worth writing down.