The Invisible Infrastructure: Bio-IT World Panel Warns of a Slow-Motion Crisis in Biomedical Data
By Allison Proffitt
June 17, 2026 | A group of researchers is sounding the alarm: the publicly funded data infrastructure underpinning virtually all of modern biomedical research is fragile, underfunded, and increasingly at risk. It is up to us to safeguard it.
At a session at last month’s Bio-IT World Conference & Expo, Douaa Mugahid, PhD, Data Officer at the Hi-IMPAcTB Consortium at the Harvard School of Public Health moderated a panel of academics, investors, and consultants to take on the future of public data and software infrastructure. Panelists included Andrew Marshall, Haystack Science; Anne Deslattes Mays, Science and Technology Consulting; Eleanor A. Howe, Diamond Age Data Science; Joseph Flanagan, RA Ventures; Lev Tsidilkovski, Deep Origin; and Sorin Draghici, Professor of Computer Science at Wayne State University and Chief Science Officer at AdvaitaBio.
Setting the Stage: A Year of Self-Inflicted Damage
Mugahid opened with a sweeping overview of what she described as a year of “self-inflicted harm” to the United States’ position as the global leader in biomedical science. She traced the logic of the American innovation engine—from NIH and NSF tax-funded grants flowing into academic research, to patents, to venture investment, to drug approvals, to sales by companies who pay taxes back into the public coffers—and then methodically catalogued where that engine had been disrupted over the past eighteen months.
In early 2025, she noted, federal funding was abruptly cut to research institutions leaving researchers idle. Grant issuance slowed dramatically outside of NIH. Roughly 30% of public health datasets hosted on the CDC website were taken down, some never restored. Talent began to flee: a survey by Nature found that 75% of U.S.-based researchers were actively seeking opportunities abroad, and by 2026, international student enrollment in the U.S. had dropped by 36%. Federal agency leadership in charge of administering grants was hollowed out through firings and resignations. And for the first time in recent memory, the U.S. filed approximately 3% fewer biomedical patents.
Yet the nation has decades of rich, public datasets and computer science with AI is now enabling better use of data than ever before. “We are so well positioned to leverage these datasets right at the moment when it would be most impactful,” Mugahid said. “Compromising their stability and longevity right now seems like a really unwise move.”
Large language models like ChatGPT and Claude, she pointed out, are examples of what’s possible. These models are trained substantially on publicly-generated scientific data and code, much of it deposited into GitHub or HuggingFace repositories by publicly-funded researchers. AlphaFold, arguably the most celebrated AI breakthrough in biology, was trained on protein structure data funded by taxpayers and hosted in publicly supported repositories. “It’s only because of [public data] that we’re able to be at this moment in time,” she said.
The Databases Nobody Thinks About Until They’re Gone
When panelists were asked to name the public resources their work depended on, the list came quickly: dbGaP (the Database of Genotypes and Phenotypes), the NIH’s All of Us research program, The Cancer Genome Atlas (TCGA), GTEx, PubMed, ClinicalTrials.gov, PubChem, the Protein Data Bank (PDB), BindingDB, ChEMBL, and several more.
Flanagan, who leads data strategy at RA Ventures’ company-creation arm, put the dependency in stark operational terms. “When NCBI blinks, my pipelines blink,” he said. The firm uses automated pipelines built on top of public data sources to identify promising scientific areas and inform company formation. Any disruption to the underlying infrastructure cascades immediately into business decisions.
Tsidilkovski, working in molecular dynamics and drug discovery at Deep Origin, noted that BindingDB and similar databases make computational modeling possible for small molecule research. “Without those resources, we wouldn’t be able to make any predictions or have a base to stand on when doing any of the LLM work or smaller models specific for binding affinities,” he said.
Draghici described integrating 12 to 15 public data sources into his company’s knowledge base. “Without this we couldn’t function, either as a company nor as a research group,” he said. He also noted with concern that NIH’s SBIR/STTR program for early-stage company formation, though recently reauthorized by Congress, currently has no active application process at NIH — a gap he called “personally very concerning.”
What Happens When the Oxygen Leaves the Room
A recurring theme was the difficulty of making people care about a crisis that hasn’t fully arrived yet. An audience member from the biotech space recounted asking colleagues at major pharmaceutical companies what would happen if PubMed ceased to exist. They were nonchalant. They just didn’t believe it could happen.
“You’re not going to solve a problem unless you think it’s real,” an audience member agreed. He provided an analogy: “Right now we’re breathing and we’re happy. If you remove air from the room, all of a sudden this common public good becomes very valuable. Until there’s a perception that something’s going to happen, nobody’s going to act.”
Howe, who has worked in bioinformatics for over two decades and now consults across pharma and biotech, sketched out what she saw as the most likely scenario. “I have some theories about what would happen if that data disappeared — or more likely, if it just sat around and never got updated or added to. That’s the more likely outcome,” she said. “Small biotechs rely entirely on that data. If that data is not there, the small biotechs can’t do anything. The small biotechs can’t develop new drugs. The pharmas can’t buy those small biotechs. So then there’s no new drugs. We all get sicker — but it’ll take a while.” She estimated a five-year lag before the effects become undeniable. “And then we’ll realize that there are no biotechs in America anymore. And the pharmas will just go to China.”
Marshall, whose background includes two decades as Editor-in-Chief of Nature Biotechnology followed by early-stage venture investment, expressed alarm at the loss of funding for CASP — the decades-old competition through which AlphaFold’s breakthrough was first recognized. “CASP is the reason why … everyone realized AlphaFold … was groundbreaking,” he said. Google DeepMind has since stepped in to fund it, but that raised its own concerns. “Is it the right thing for Google DeepMind to be funding CASP? I’m not so sure.”
From Fragile to Anti-Fragile: What Should Be Done
Marshall articulated the deeper issue. “These essential resources that the global scientific community relies on have this fragility,” he said. “This is the question we need to think about: how do we build resilience into these key resources? … And if these go away or are limited in some way, then everything stops.”
From the audience, Giovanni Nisato of Innovation-horizons, pushed back against the word “resilience,” proposing the concept of anti-fragility instead—borrowed from Nassim Nicholas Taleb—as a more appropriate aspiration. A truly anti-fragile system, he argued, grows stronger when attacked, the way an immune system does.
In response, panelists and audience members cycled through a range of proposals, from incremental to structural, to ensure anti-fragility of the system.
Utility-style funding. Flanagan proposed to reframe critical data infrastructure the way society treats GPS or the National Weather Service — not as grant recipients competing for annual awards, but as public utilities funded through stable, long-term appropriations. “They don’t have to go through a grant writing process and be vulnerable to these types of cuts,” he said.
Acts of Congress. Mays pointed to precedents like the INCLUDE Project — a congressionally mandated and funded initiative on Down syndrome research — as a model for insulating critical resources from shifting executive branch priorities.
International data agreements. Several panelists called for formal inter-governmental data sharing treaties modeled on existing agreements for the Sequence Read Archive, which is jointly maintained by the U.S., Europe, and Japan. Mugahid noted that such agreements would provide a legal and logistical mechanism for mirroring datasets offshore, reducing dependence on any single government’s goodwill. “It takes such a long time to build relationships, but very little time to destroy them,” she said.
Distributed, redundant hosting. Mugahid also drew parallels to the software world, where repositories like Bioconductor are mirrored across multiple continents. The Global Alliance of Open Science (GaLOS), which began at the Bio-IT World Conference in 2025 by Howe and conference director, Cindy Crowninshield, has been working on coordinating exactly this kind of effort. Howe noted that during the height of the funding panic in 2025, scientists had descended on databases en masse to download backup copies — creating what she described as an accidental DDoS attack. “They weren’t coordinated methods,” she acknowledged.
The Voices in the Room
A thread running through the entire session was the failure of the scientific community to communicate its own importance to non-scientists—particularly policymakers
“If you’re a scientist, … engage with the public,” Mugahid urged. “Break down those barriers. Scientists should not be living in ivory towers; they are funded by the public. They should be able to speak directly to the public.” She noted that new effective scientific communicators emerged during the COVID-19 pandemic, and that more were needed.
From the floor, Nisato challenged the U.S. citizens to understand their own constitutional system well enough to engage it. Congress controls the budget; citizens elect Congress. “It’s your job as U.S. citizens to figure out how your system works,” he said. “If Congress is not able to pass a budget, that’s an issue. You might want to call the people you vote for because it’s their job to do that.”
Mugahid agreed and challenged the entire Bio-IT World community to action. “We want to hear from the experts in many domains … everyone should have an opportunity to comment not just people on this panel or in this audience for that matter!” she said. “Public comment as a mechanism in democratic societies is actually incredibly important, and unilateral decision making by people who are not experts in the field should not be possible.”







Leave a comment