Read the transcript below:
We’re excited for this conversation and to chat more about what you’re doing, your background, Mimi Labs, and the big initiatives you have going forward. We’d love to start with a quick intro—high level—who you are and what you’re working on.
Yeah, no, it’s my pleasure to be here. Thanks again for inviting me. My name is Ubin Park, and I’m an engineer by training—I still consider myself an engineer. I happen to be in healthcare now, though I’m not entirely sure why. I started working in healthcare about ten years ago and have been here ever since. I’ve done two startups in the value-based care space, mostly focused on healthcare data analytics. That’s kind of where I come from, and these days, I’m building a new company called Mimi Labs.
Mimi Labs is an open data house built on Databricks. I’ve been compiling a lot of publicly available datasets into a single data lakehouse. Originally, I did it for myself. I write a lot of LinkedIn posts about public data, and I’ve done that for many years—downloading files, finding patterns, pointing out strange numbers, and asking questions. I realized a lot of people enjoyed that, so I kept doing it. It’s been a hobby for a long time.
Earlier this year, I left my full-time job to pursue this hobby more seriously. I wanted to dig into the data deeper, so I started organizing it all in one place. Eventually, I needed more compute power, so I moved everything to Databricks. Since March, I’ve been accumulating data, and by June I had around 18 terabytes. At that point, it felt a bit wasteful to keep it just for myself, so I started giving others access to explore it like I do.
The vision is to make Mimi Labs an open and collaborative data platform where people can find the datasets they need—mostly public ones that are already available online but hard to work with. Usually, you have to download them, write scripts, explore statistics. Personally, I don’t like downloading files anymore—there’s too much uncertainty. I wanted a place where everything is pre-loaded and ready to query, with complex SQL or Python. I also thought it would be great if people could share scripts and discuss patterns or problems together.
That’s the goal for Mimi Labs: to grow as a community where people explore and find useful insights. There have been so many times when I’ve tried downloading CMS data just to get frustrated. It either didn’t answer my question or required too much effort to clean up. If it were all just a query away—amazing. I want access right now!
It’s not just the size of the data—it often requires undocumented knowledge to navigate it. Data without context doesn’t have much meaning. It’s hard to find that context on the web because the people using this data typically don’t share their scripts on GitHub. That makes the learning curve steep.
You said something really insightful—“data without context doesn’t have much meaning.” That line really stuck with me.
Thanks! I’m sure someone else has said it, but feel free to quote me.
That brings us to another big topic—large language models (LLMs) and the shift in how we access and interpret data. You’ve talked about the limitations of LLMs when it comes to public data, but you’re also very optimistic about the future. Could you share more of your perspective?
Sure. Before LLMs, there was word2vec. In a way, word2vec laid the groundwork—it showed that meaning comes from context. It vectorized English words based on nearby words, and I think that same logic applies to public datasets. We have metadata, dictionaries, labels—but the real meaning comes from how datasets are used. Unfortunately, we don’t have enough examples of that usage. If we could crowdsource examples, LLMs could learn to navigate and apply public datasets more effectively.
That’s why I think it’s time to write code collectively—share how people are filtering, exploring, and interpreting data. Right now, we’re often working in silos, and even with the same data dictionaries, we might use datasets very differently. LLMs could help extract the essence of how datasets should be used—if we feed them enough examples.
It’s a recurring theme here—context. LLMs provide context to language, and now you’re saying they could do the same for public data if we give them the right examples. Can you share a concrete example of a public dataset used with strong contextual insight?
There are many, but one that gets talked about a lot is social determinants of health (SDOH). Our health outcomes aren’t just driven by medical care—many studies say that 70–80% comes from non-medical factors. To address that, you need data, and ideally, you’d get it directly from patients. But that’s hard, so people use geographically grounded data—census data, surveys, neighborhood-level trends—to infer context. It’s not perfect, but it helps guide better care and build stronger patient-provider relationships.
That leads to our next question—something we’ve seen in your content: that understanding SDOH isn’t just about social justice, it’s about understanding your market. Could you expand on that?
Yes. Understanding SDOH isn’t only about equity—it’s a market opportunity. But to realize it, we need public-private collaboration. Governments need to create the right incentives for organizations to address these needs. I believe many agencies are working on that, but we’re still early. Right now, incentives (not penalties) may be more effective.
Once those incentives are clearly defined, the private sector will follow. Companies will act to maximize profits based on the new structures. That’s why we need to design incentives that encourage long-term thinking and collaboration between sectors.
Some organizations are already acting—those that think long term, that see the value in addressing SDOH even without current incentives. But others operate on short-term horizons—quarterly results, annual goals. Without direct incentives, they won’t invest in this area. So the real challenge is figuring out how to support those who can’t plan for the long term—startups, new market entrants, smaller players.
If you could wave a magic wand and change one thing for risk-bearing organizations, what would it be?
I’d transform mindsets—to help people plan for long-term outcomes. That’s the biggest shift we need.
And if you could ask our audience—mostly leaders at risk-bearing entities—one question, what would it be?
I’d ask: instead of just optimizing your operations within the current policy framework, if you had the chance to change that framework, what would you change? What would you ask CMS to revise—so that your organization and your patients could benefit? Too often, leaders just follow what’s set in policy without questioning if it’s the right direction. I’d love to hear what they would change if they had the opportunity to influence policy.