Skip to content

How Genomenon finds order in 11 million scientific papers

Biology’s data problem isn’t scarcity, it’s structure. Genomenon is addressing it by converting millions of scientific papers into usable genomic intelligence. “If Google had gone to school to get a PhD in molecular biology, that's what Mastermind is,” says CEO Mike Klein.

Mike Klein, CEO, Genomenon

Table of Contents

💡
Part of our CEO feature series for The Onyx Life Sciences Report, publishing in Fortune in 2026.

Could you give our readers a bit of your background, and how you came into this space?

I grew up as an engineer. My background and training is in computer and electrical engineering. I spent about 10 years in corporate America before starting my first company roughly 35 years ago. Since then, I've been the CEO of four different companies,  all in the software, data and IT space. With Genomenon, this is the first company I’ve led in healthcare - doing work specifically around genomics and more recently real-world evidence for our pharma customers.

That path really started when I met the founder about nine years ago. The company started in 2014, and in 2016 Dr. Mark Kiel was looking for a CEO to lead the company through commercial development and capital raising. At that point, Genomenon was still pre-product - we hadn’t shipped anything yet. I remember thinking, this is going to be really hard. Going from zero to one million dollars in sales is far harder than going from 10 to 50 million dollars.

But I fell in love with the mission and what it implied - the chance to leverage my background in software and data to have a real impact on patients’ lives. Saving and improving the lives of babies born with rare diseases and helping cancer patients get properly diagnosed and making sure they’re receiving the right therapies. And doing that by providing actionable data - data clinicians and researchers can trust - was incredibly compelling. 

When I think about it from a mission perspective, Genomenon is the most impactful company I’ve led in my career.

Could you give our readers an overview of Genomenon’s products, and what you're focusing on?

Genomenon started in the genomics space, with the launch of our Mastermind Genomic Intelligence Platform 8 years ago. We built a bespoke AI approach we call GLP, or “genomic language processing.” GLP finds and disambiguates the hundreds of different ways authors describe genes, variants, and genomic findings across the full text of scientific publications. That matters because the same variant can be written many different ways in the literature - and if you can’t reliably recognize all of them, you miss critical evidence. 

Today, we’ve indexed more than 11 million full-text articles and over three and a half million supplemental datasets to build the world’s most comprehensive knowledgebase of published genomic data.

That knowledgebase is delivered through Mastermind as a genomic search engine. You can search for any genetic variant, and the results will surface every publication that mentions it - regardless of how the author described the variant. The simplest way I can describe it is that Mastermind is what Google would be if it went to school and earned a PhD in molecular biology. It directs users to the evidence they need to make genetic diagnoses and treatment decisions.

Over time, we evolved from simply delivering the underlying literature to delivering that data in a curated, interpretation-ready format. So instead of just showing clinicians the articles, we curate the evidence within them and present the information they need to determine whether a variant is pathogenic or benign. We also organize everything according to the clinical guidelines, so they can get to the point of interpretation very quickly. The benefit for our customers - typically clinical labs and healthcare providers - is that we cut down interpretation time, which directly affects their testing costs and their turnaround time.

As the business has grown, we’ve added a couple more products to extend that core capability. Last year, we acquired the Cancer Knowledgebase (CKB) from Jackson Labs. It’s very much in line with what we’re doing with Mastermind, but focused on somatic oncology. It helps customers understand the molecular profiles of cancer - not just for diagnosis, but also for determining what treatments may apply and which clinical trials a patient might be eligible for.

Between Mastermind and CKB, we're now serving more than 250 leading diagnostic labs and healthcare providers around the world. And about 40 percent of our business is outside the US - Australia, Asia, Western Europe, Central America, really across the globe.

Over the last 5 years, we extended our offerings to include literature-derived real-world evidence for our pharma and biopharma customers. We expanded the AI foundation we built for genomics to support pharma customers who want to understand diseases at the patient level. The published literature is incredibly rich in patient data, but it’s extremely difficult to extract - and that’s really been our forte: leveraging AI to extract biomedical information from full-text literature and then curating that data with a team of expert scientists.

Think about it like this: in the U.S. alone, about $2 trillion has been spent on medical research over the last two decades. A lot of what’s been learned is documented in the scientific literature - but extracting it and turning it into usable data is incredibly difficult. We’ve been able to pull out patient data, including genotypic, phenotypic, demographic data as well as treatments and outcomes, and find relationships across the data so pharma teams can better understand diseases, design smarter clinical trials, and identify label expansion opportunities in new subpopulations.

You've indexed over 15 million publications. How do you keep the database accurate and ensure you can recall that data at scale?

Yes - just to clarify - it's over 11 million full text articles, plus another 3.7 million supplemental datasets. One of the key things is that we ingest this information as it's published. We add between 15,000 and 20,000 new articles every week that have genomic or disease specific data, so Mastermind and CKB are always being kept up to date.

And we've got a team of curators who go through that data to make sure that everything is accurate. So, the databases are always current and up to standard.

Could you talk about expanded labels, reduced labels, or precision prescriptions?

Yes, of course. Because we pull out really rich patient populations from the literature, we can identify cohorts and uncover insights that you wouldn't otherwise see. For example, in colorectal cancer, we identified patients who responded to therapies that were being used off label - signals that weren’t captured by the current criteria. Once you can identify that subpopulation, with evidence in the peer-reviewed literature, it becomes much easier to go to the FDA and say: there’s published evidence supporting another population we can include.

In another project, we quadrupled the number of variants that qualified patients for a trial - and removed variants that were known to be resistant to the drug. That both expands the eligible patient pool and helps protect the trial from enrolling likely non-responding patients. And that kind of insight can make a real difference when you’re trying to hit that final p-value.

What does your primary revenue mix look like, in terms of diagnostics versus partnerships and licensing?

We started about 10 years ago in the clinical diagnostic market; working with healthcare providers, diagnostic labs, and genetic testing labs - and that’s still the core of our business today. Roughly 70 percent of our revenue comes from diagnostics. The other 30 percent comes from pharma and biopharma, where we support them with literature-derived real-world evidence for disease understanding and clinical trial design.

In terms of customer demographics, what's the spread between North America and elsewhere? What are your international expansion plans?

About 40 percent of our business is outside the US and we plan to keep supporting and growing that international footprint. One of the advantages in healthcare is that a lot of the work, globally, is done in English. I've run other software companies where you needed multilingual support. For Genomenon’s work, most of the scientific literature is written in English and the majority of users can read and write English. That’s made it easier to support international markets.

If you're pitching the ROI of Genomenon's platform, how do you quantify that? What metrics matter?

On the clinical diagnostic side, we've worked with healthcare providers who, by using the knowledge available in CKB or Mastermind, have increased diagnostic yield by 12 percent. And if you’re diagnosing rare diseases, diagnosing 12 more out of 100 kids is pretty impactful.

Other labs have shown they’ve reduced interpretation time - and therefore cost - by around 15% when using our platform. They have the evidence at their fingertips instead of scouring the internet for variant information, and that directly impacts cost of goods sold and turnaround time.

On the pharma side, it’s about expanding and clarifying patient populations and the real-world signals that drive development and commercialization decisions. For example, in familial hypercholesterolemia, we extracted and analyzed data on more than 42,000 patients from the medical literature and showed that the clinical criteria used to diagnose FH were overly restrictive. That helped support a broader view of the eligible population - meaning more patients could be identified for therapy - which can have a meaningful impact on commercial opportunity. More broadly, we’re delivering insights that are very difficult to get from EHR or claims data alone.

Given that rare disease populations are small, how do you validate accuracy to ensure the platform is working as intended?

Every piece of data we provide is curated by scientists. We're not just running AI and dumping raw data - we use AI to extract the signal, and then our experts review and validate it before it ever reaches a customer. When we deliver real world datasets, especially for rare diseases, we are often able to find ten to twenty times more patients buried in the literature than are available in patient registries. And often, we can capture full disease trajectories, because they're documented in the literature.

That combination of high recall and scientific curation is part of the ‘secret sauce’.

Can you talk about partnerships at Genomenon, and how they fit into your platform?

If you look at trends in next-generation sequencing, the platforms and reagents are increasingly becoming commoditized. What's becoming the key is informatics, and the ability to make sense of all that data. 

That’s where our partnerships fit in. We partner with about a dozen genomic analysis companies where our data serve as their leading annotation source. Companies like Illumina and SOPHiA GENETICS directly integrate our data into their workflows. It’s a win-win: customers get a single pane of glass for their genomic analysis and they get access to a world-class genomic annotation set without needing to stitch together multiple sources.

On the real-world evidence (RWE) side, we're looking to partner with traditional RWE providers, who deliver EHR or claims data. We sit upstream. We can pull early signals from the literature and help define clinical trials before a program ever touches EHR data. So there’s a great opportunity to complement what those providers already do.

How do you prevent yourselves from becoming commoditized?

I think right now there’s still so much being learned. Biology is messy, and new publications are coming out constantly. So staying current - and being able to interpret what’s new - matters.

And the AI we’ve built isn’t off-the-shelf. It’s not just plugging articles into GPT and seeing what comes out. There’s a decade of experience behind these systems, and we’ve trained them on curated datasets. There’s a lot that goes into the process - the extraction, the normalization, the quality control, and the scientific review. That combination of high recall, purpose-built AI, and expert curation is hard to replicate, and it would be very difficult for someone to come in and knock us off.

The core challenge is, of course, unstructured distribution of highly structured data.

Exactly. Most scientists aren’t publishing everything in neat, standardized fields - it’s usually buried in a massive PDF. The challenge is taking that unstructured content and turning it into structured, actionable data, and that’s what we do really well.

Any lessons that you'd like to pass on to the next generation of executives in genomic data and medicine?

My advice is to never lose sight of the patient on the other end of the data. What makes Genomenon special is the mission; helping to save and improve lives by making genomic information actionable. Our employees join because of that mission. It's what gets me up every day.

We’ve had patient stories where we surfaced information that simply wasn’t available anywhere else - and it changed a child’s life, or a cancer patient’s trajectory. We’re not the clinicians; we’re the data source that helps medical personnel do their job better. And being able to connect those dots - between the science, the data, and real patient impact - is a powerful motivator for anyone building in life sciences.