When FHIR Meets Generative AI: My Wandering Journey from Tools to Tangible Healthcare Insights

PP

Ponvannan P

Aug 6, 2025 13 Minutes Read

Somewhere between my morning coffee and yet another developer call, I blinked and realized: half my life now revolves around making complex healthcare data actually useful. Want a (slightly embarrassing) story? My first attempt at building a FHIR search tool using AI left me staring at error logs and wondering if computers secretly hate me. Still, curiosity—and necessity—got the better of me. This blog dives into how I fumbled, iterated, and finally made something real: a more human approach to AI-powered FHIR search in healthcare.

1. How I Fell Into the Rabbit Hole: Making Sense of FHIR, AI, and Way Too Many Tools

My journey into the world of healthcare interoperability and FHIR search queries has been anything but straightforward. If I had to sum it up, I’d call it a LEGO-like history—constantly piecing together new tech stacks, frameworks, and standards, hoping each new block would finally make the whole structure click. As a Principal Software Engineer at Microsoft, my focus since 2019 has been on FHIR (Fast Healthcare Interoperability Resources), but my roots are firmly planted in the DICOM world. That early DICOM-centric experience taught me a tough lesson: not all healthcare standards play nicely together, and simply adding more tools doesn’t always solve the problem.

When I first started at MSR Health Futures, I was drawn to the challenge of making FHIR-compliant servers truly interoperable. My work with FHIR-I, FMG, TSC, and the Argonaut Project gave me a front-row seat to the tangled mess that is healthcare interoperability. It’s easy to assume that if you just throw enough tools, code generation scripts, and adapters at the problem, you’ll eventually get seamless data exchange. Reality, of course, is far more complicated.

"Interoperability isn’t just an acronym-filled word—it's the lifeblood of turning health data into real insight." – Gino Canessa

My first real brush with the complexity of healthcare data came from DICOM. I quickly learned that even the best standards can become a maze when you try to connect them with real-world systems. Moving to FHIR, I hoped things would be simpler. Instead, I found myself knee-deep in code generation, specification development, and endless iterations of tool-building. Each attempt to streamline FHIR search queries or build a better FHIR Adapter led to new challenges—and more late nights fueled by curiosity and stubbornness.

The FHIR-generative AI experiment was born out of this frustration. I started by theorizing about a generic AI tool for FHIR search, planning a RESTful API backend and a thin support layer for actual tool integration. But using these tools was too complicated for most users. So, I pivoted to a chat-style app, leveraging C# Blazor and data ingestion for FHIR specs and Markdown files. This approach brought its own set of headaches: configuration overload, too many models (Azure OpenAI, GitHubModels, Ollama), and unpredictable behaviors.

Eventually, I added the Model Context Protocol (MCP) to fhir-candle, aiming to answer practical questions about FHIR-compliant servers: What resources are supported? What search parameters are available? Can we validate a FHIR search query before it’s sent? Each iteration brought me closer to tangible healthcare insights, but also highlighted just how much work remains.

Looking back, every step—whether it was building infrastructure, developing tooling, or experimenting with generative AI—was driven by the same motivation: making healthcare data more accessible, intuitive, and actionable. And while the path has been anything but straight, it’s been fueled by endless curiosity and a deep belief in the power of interoperability.


2. Building the (Imperfect) Machine: From Over-Engineered Tools to Honest Chat Apps

When I first set out to build a FHIR search tool powered by AI, my vision was clear: a robust RESTful API backend that could handle complex FHIR queries. In reality, it felt more like assembling flat-pack furniture without instructions. Every piece seemed to fit somewhere, but the end result was confusing and, honestly, not very usable. As I quickly learned, sometimes, the best way to learn what NOT to build is to actually build it and watch the frustration unfold.

From RESTful API Dreams to Real-World Frustrations

My initial approach was to theorize a generic AI tool for FHIR search—something that could sit behind a RESTful API and offer a thin layer for actual tool support. I started building, but the complexity of using these tools made the experience far from intuitive. Even for someone deeply familiar with FHIR-compliant servers and healthcare data, the process was too convoluted to be practical for real users.

Switching Gears: The Chat App Epiphany

This realization forced me to rethink everything. Instead of building for engineers, I needed to build for people who just wanted answers. That’s when I pivoted to a chat-style app using C# Blazor. The goal was simple: create an AI search page that felt approachable and conversational. This shift made me step into the shoes of an actual user, not just a developer. Suddenly, the focus was on clarity, speed, and usefulness—not just technical prowess.

Wrestling with Data Ingestion: The Messy Middle

Of course, powering a chat app with real FHIR insights meant tackling the messy world of data ingestion. I had to pull in FHIR specification ZIP files, traverse endless HTML pages, and parse Markdown documents. Each format brought its own set of headaches. HTML parsing was brittle, Markdown files required extra hinting and examples, and storing everything in a vector store (for fast AI retrieval) was a confession-worthy challenge. There’s no single “right way” to do streaming import for healthcare data, and I learned that the hard way.

Model Overload: Azure OpenAI Integration and Beyond

Integrating AI models added another layer of complexity. I experimented with Azure OpenAI, OpenAI, GitHubModels, and Ollama. Each platform brought different behaviors—chat models, embedding models, and everything in between. The variance was overwhelming. Too many choices led to confusion, both for me and for anyone trying to use the tool. It became clear that while multiple models and data sources add power, they also introduce a lot of friction.

"Sometimes, the best way to learn what NOT to build is to actually build it and watch the frustration unfold." – Gino Canessa

In the end, building a practical FHIR search tool meant embracing imperfection, simplifying where possible, and always keeping the end user in mind. The journey from over-engineered tools to an honest, usable chat app was anything but straightforward—but it was necessary.


3. Breaking Down the Acronyms: Turning Model Context Protocol (MCP) Into Something Actually Useful

Let’s face it: healthcare tech is drowning in acronyms. But every so often, one comes along that actually makes life easier. For me, that’s the Model Context Protocol (MCP). When I first added MCP to the fhir-candle project, I was just hoping to answer some basic questions about FHIR resources and search parameters. What I didn’t expect was how much it would tighten my feedback loop—and, honestly, save my sanity.

From Frustration to Clarity: How MCP Simplifies FHIR Resource Discovery

Before MCP, trying to figure out what a FHIR store could do felt like wandering in the dark. Which resources are supported? What search parameters are available? What does each parameter mean? MCP changed that by offering a clear, structured way to:

  • List FHIR stores and their available resources
  • Display resource name, title, description, and comments
  • Enumerate search parameters for each resource, including code, type, and description
  • Show search type definitions, formats, and real-world examples
  • Validate search input with a search request validator

Suddenly, I could answer real questions about clinical data queries without digging through endless documentation or guessing at what might work. It was like flipping on the lights in a cluttered room.

The Joy (and Exasperation) of User-Friendly Tools

Building tools that are both powerful and user-friendly is a constant tug-of-war. With MCP, I could finally make things like search validators and configurable tools that didn’t require a PhD to operate. The protocol’s focus on clarity—surfacing just the right amount of detail—meant users could actually find what they needed, fast. As I like to say:

"The best tech is invisible; it just gets out of your way and lets you find what you need." – Gino Canessa
My ‘Aha!’ Moment: The Power of BYO Tools

One of my biggest realizations came when I embraced a bring-your-own (BYO) approach. Instead of building a bloated, one-size-fits-all solution, MCP lets you plug in your own models, apps, and tooling. Need a custom clinical data query validator? Want to integrate with your favorite AI model? No problem. This flexibility made everything feel lighter and more maintainable.

Star Trek Dreams vs. Developer Realities

Sometimes I imagine MCP as a kind of “universal translator” for healthcare data—a tool that bridges the gap between human questions and complex FHIR systems. While we’re not quite at Star Trek levels (yet), MCP gets us a lot closer to that dream by making resource discovery and query validation accessible, reliable, and—dare I say—enjoyable.

In short, MCP in fhir-candle helps users explore resources, discover supported search types, and validate FHIR queries, all while supporting BYO models and tools for a tailored setup. It’s not magic, but it’s the next best thing for developers who just want answers.


4. Demo Surprises and Sudden Lessons: When Reality Drove My Design (And My Ego)

There’s nothing quite like a live demo to remind you that software is never as simple as it seems in your own development environment. Running the MCP Model Inspector, Claude Desktop, and GitHub Copilot with real FHIR search queries was a humbling experience—one that taught me more about execution errors analysis and user prompts analysis than any amount of quiet coding ever could.

The Good, the Bad, and the Very Confusing

Each tool brought its own flavor to the table. The MCP Model Inspector was great for exploring FHIR store capabilities and search parameters, but it also exposed how easily a simple typo or missing argument could trigger a cascade of errors. Claude Desktop impressed with its conversational AI, but sometimes misunderstood the intent behind my FHIR queries, leading to unexpected results. GitHub Copilot shined when suggesting FHIR Adapter code snippets, but occasionally hallucinated functions that didn’t exist. The result? A mix of smooth moments, head-scratching confusion, and a few outright failures.

Live Demo Woes: “It Worked on My Machine”

There’s a special kind of dread that comes with the phrase, “It worked on my machine.” In health tech, where data standards like FHIR are unforgiving, this is the riskiest phrase you can utter. During my demos, I ran into classic issues—network hiccups, authentication problems, and, most commonly, the infamous HTTP 400 'bad request' error. These errors were more than just obstacles; they were signals pointing to deeper issues in my design and assumptions.

Learning from Execution Errors: The Hidden Gifts in HTTP 400s

Execution errors analysis became my unexpected teacher. Each HTTP 400 was a clue—a prompt to revisit how user input was handled, how FHIR search queries were constructed, and where documentation or validation was lacking. I started to see patterns: certain user prompts consistently led to malformed requests, and some APIs were especially “grumpy” about missing or extra parameters. As I refined the tools, these lessons drove crucial improvements in both the user experience and the underlying logic.

"Every failed demo doubles as a masterclass in humility—and bug fixing." – Gino Canessa

Tangent: If AI Had a Favorite Error Code…

If AI could pick an error code mascot, I’m convinced it would be 418 I'm a teapot—the ultimate joke in HTTP status codes. Thankfully, I haven’t seen that one pop up in a FHIR Adapter yet, but it’s a reminder that not all errors are created equal. Some are just there to keep us humble (and maybe make us laugh).

Public Tools and Resources

In the end, every demo—successful or not—became a source of insight. Execution errors, especially those pesky HTTP 400s, weren’t just bugs to squash; they were the hidden gifts that pushed my FHIR Adapter and AI integration to become more robust and user-friendly.


5. Why Simplicity (and a Little Bit of Grit) Wins Every Time: What I’d Tell Anyone Building FHIR + AI Today

If you’re feeling overwhelmed while trying to configure AI search or build a healthcare search app that leverages FHIR and generative AI, you’re probably on the right track. But here’s the real secret: don’t be afraid to throw things out and start simple. My journey through three major iterations—each more complex than the last—taught me that simplicity, paired with relentless grit, is what actually delivers value in healthcare AI.

When I first set out to create a tool for FHIR search queries using generative AI, I imagined a generic, all-purpose backend with a RESTful API and a thin layer of support for every possible feature. It sounded great on paper, but in practice, it was just too complicated to be useful. The more features I added, the harder it became for real users to configure AI search and get meaningful results. That’s when I realized: clarity beats complexity every time.

With each iteration, I learned to choose tools, models, and platforms not for their shiny features or the latest acronyms, but for their clarity and reliability. Whether you’re building an AI search page or a full-fledged healthcare search app, focus on what your users actually need. Overengineering is a universal temptation, especially in health tech. But as I discovered, pragmatism pays off. As I like to say,

"You can always add more features, but removing them is what takes real nerve—and delivers what people truly want." – Gino Canessa

Testing is another area where simplicity and grit matter. Obsessively test your hardest use cases, and then re-test when everything changes—which it inevitably will. Each time I rebuilt my platform, I found new edge cases and unexpected behaviors. Iterative cycles of testing and rebuilding weren’t just a chore; they were the only way to ensure my FHIR search query tools actually worked in the real world. The best healthcare search apps are born from this cycle of learning, adapting, and never stopping the iteration process.

Sometimes, I daydream about a ‘magic button’—an AI that could instantly understand clinical context and answer any FHIR query perfectly. But the reality for developers is different. There’s no shortcut to building tools that are both powerful and usable. The path from idea to insight is paved with discarded features, simplified interfaces, and countless rounds of testing. That’s where the grit comes in: sticking with it, even when progress feels slow or messy.

If you’re building with FHIR and generative AI today, my advice is simple: start with the basics, focus on clarity, and don’t be afraid to iterate (and re-iterate) until your tool truly serves its purpose. Simplicity and resilience aren’t just ideals—they’re the foundation of every effective healthcare AI solution.

TL;DR: Building a truly useful AI-driven FHIR search tool means embracing mistakes, learning through real-world trials, and always pushing for clarity and authenticity. Sometimes, less really is more.

TLDR

Building a truly useful AI-driven FHIR search tool means embracing mistakes, learning through real-world trials, and always pushing for clarity and authenticity. Sometimes, less really is more.

More from FlexiDigit Blogs