Newsstand Menu

SQUID pries open AI black box

ai generated image of a squid in a computer data center
Laboratoryscientists have engineered a new computational tool named SQUID to plumb the depths of AI鈥檚 mysterious inner workings. AI-generated image.

Artificial intelligence continues to squirm its way into many aspects of our lives. But what about biology, the study of life itself? AI can sift through hundreds of thousands of genome data points to identify potential new therapeutic targets. While these genomic insights may appear helpful, scientists aren鈥檛 sure how today鈥檚 AI models come to their conclusions in the first place. Now, a new system named SQUID arrives on the scene armed to pry open AI鈥檚 black box of murky internal logic.

SQUID, short for Surrogate Quantitative Interpretability for Deepnets, is a computational tool created by Laboratory(麻豆传媒社区) scientists. It鈥檚 designed to help interpret how AI models analyze the genome. Compared with other analysis tools, SQUID is more consistent, reduces background noise, and can lead to more accurate predictions about the effects of genetic mutations.

image of the Squid computational pipeline
An illustration outlining the SQUID computational pipeline.

How does it work so much better? The key, LaboratoryAssistant Professor Peter Koo says, lies in SQUID鈥檚 specialized training. Koo explains:

鈥淭丑别 tools that people use to try to understand these models have been largely coming from other fields like computer vision or natural language processing. While they can be useful, they鈥檙e not optimal for genomics. What we did with SQUID was leverage decades of quantitative genetics knowledge to help us understand what these deep neural networks are learning.鈥

SQUID works by first generating a library of over 100,000 variant DNA sequences. It then analyzes the library of mutations and their effects using a program called MAVE-NN (Multiplex Assays of Variant Effects Neural Network). This tool allows scientists to perform thousands of virtual experiments simultaneously. In effect, they can 鈥渇ish out鈥 the algorithms behind a given AI鈥檚 most accurate predictions. Their computational 鈥渃atch鈥 could set the stage for experiments that are more grounded in reality. LaboratoryAssociate Professor Justin Kinney, a co-principal investigator of the study, explains:

鈥淚n silico [virtual] experiments are no replacement for actual laboratory experiments. Nevertheless, they can be very informative. They can help scientists form hypotheses for how a particular region of the genome works or how a mutation might have a clinically relevant effect.鈥

image of Evan Seitz standing in front a bookcase and white board
Evan E. Seitz, the lead author of this study, is a postdoc in the Kinney and Koo labs.

There are tons of AI models in the sea. More enter the waters each day. Koo, Kinney, and colleagues hope that SQUID will help scientists grab hold of those that best meet their specialized needs.

Though mapped, the human genome remains an incredibly challenging terrain. SQUID could help biologists navigate the field more effectively, bringing them closer to their findings鈥 true medical implications.

Written by: Luis Sandoval, Communications Specialist | sandova@cshl.edu | 516-367-6826


Funding

Simons Foundation, National Institutes of Health, Alfred P. Sloan Foundation

Citation

Seitz, E.E., et al., 鈥淚nterpreting cis-regulatory mechanisms from genomic deep neural networks using surrogate models鈥, Nature Artificial Intelligence, June 21, 2024. DOI:

Core Facilites

image of the sequencing core facility icon 鈥淭丑别 Sequencing Technologies and Analysis Shared Resource provides access to an array of high throughput Next Generation Sequencing (NGS) technologies. We offer cutting-edge technology alongside convenient in-house services for a broad range of genetic analysis.鈥 鈥 Project Manager Sara Goodwin, Ph.D.

Stay informed

Sign up for our newsletter to get the latest discoveries, upcoming events, videos, podcasts, and a news roundup delivered straight to your inbox every month.

  Newsletter Signup

Tags