In this episode, Collin and I continue building our list of questions to help increase the chances of your AI investments delivering a return. We focus in particular on the role non-technical C-level leaders should play in this effort.
We use Chip Huyen’s AI Engineering as a practical framework, exploring how it gives digital leaders and non-technical senior managers a structured way to engage more deeply with AI initiatives. In particular, it provides a way into the ecosystem and lifecycle of AI application development—helping leaders ask better questions around things like data, prompt design, fine-tuning, and evaluation—so they can have more meaningful discussions with technical teams about how these applications are built, where the risks sit, and what criteria should be used to define and measure success.
We walk through some of the key differences between traditional software and AI systems—particularly the shift from deterministic to probabilistic behaviour, and the central role of data in shaping outcomes.
From there, we build on the questions we believe leaders should be asking: What problem are we solving? How are we evaluating outputs? How are we managing risks around data quality, safety, and factual accuracy? What trade-offs are we making between quality, cost, and latency?
We spend some time looking at Chip’s section on evaluation criteria, using it as a springboard for non-technical senior leaders to delve deeper into the thinking behind—and expected outcomes of—AI applications. We also introduce the concept of “evals”—ongoing evaluation frameworks that extend beyond traditional testing—and why they require continuous iteration, collaboration, and oversight, even after deployment.
This episode continues our exploration of how leaders can better understand what they are funding, engage more effectively with product and delivery teams, and create the conditions for AI investments to deliver real value.
Links to Chip's book & interview on evals referred to in the episode:
Chip Huyen’s AI Engineering: https://www.oreilly.com/library/view/ai-engineering/9781098166298/
Lenny’s podcast - Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar: https://www.youtube.com/watch?v=BsWxPI9UM4c