A researcher has just finished writing a scientific paper. She knows her work could benefit from another perspective. Did she overlook something? Or perhaps there’s an application of her research she hadn’t thought of. A second set of eyes would be great, but even the friendliest of collaborators might not be able to spare the time to read all the required background publications to catch up.
Rapid advances in AI and ML have given way to programs that can generate creative text and useful software code. These general-purpose chatbots have recently captured the public imagination. Existing chatbots—based on large, diverse language models—lack detailed knowledge of scientific sub-domains.
By leveraging a document-retrieval method, Yager’s bot is knowledgeable in areas of nanomaterial science that other bots are not. The details of this project and how other scientists can leverage this AI colleague for their own work have recently been published in Digital Discovery.
Rise of the robots
“CFN has been looking into new ways to leverage AI/ML to accelerate nanomaterial discovery for a long time. Currently, it’s helping us quickly identify, catalog, and choose samples, automate experiments, control equipment, and discover new materials. Esther Tsai, a scientist in the electronic nanomaterials group at CFN, is developing an AI companion to help speed up materials research experiments at the National Synchrotron Light Source II (NSLS-II).” NSLS-II is another DOE Office of Science User Facility at Brookhaven Lab.
At CFN, there has been a lot of work on AI/ML that can help drive experiments through the use of automation, controls, robotics, and analysis, but having a program that was adept with scientific text was something that researchers hadn’t explored as deeply. Being able to quickly document, understand, and convey information about an experiment can help in a number of ways—from breaking down language barriers to saving time by summarizing larger pieces of work.
Watching your language
To build a specialized chatbot, the program required domain-specific text—language taken from areas the bot is intended to focus on. In this case, the text is scientific publications. Domain-specific text helps the AI model understand new terminology and definitions and introduces it to frontier scientific concepts. Most importantly, this curated set of documents enables the AI model to ground its reasoning using trusted facts.
To emulate natural human language, AI models are trained on existing text, enabling them to learn the structure of language, memorize various facts, and develop a primitive sort of reasoning. Rather than laboriously retrain the AI model on nanoscience text, Yager gave it the ability to look up relevant information in a curated set of publications. Providing it with a library of relevant data was only half of the battle. To use this text accurately and effectively, the bot would need a way to decipher the correct context.
“A challenge that’s common with language models is that sometimes they ‘hallucinate’ plausible sounding but untrue things,” explained Yager. “This has been a core issue to resolve for a chatbot used in research as opposed to one doing something like writing poetry. We don’t want it to fabricate facts or citations. This needed to be addressed. The solution for this was something we call ’embedding,’ a way of categorizing and linking information quickly behind the scenes.”
Embedding is a process that transforms words and phrases into numerical values. The resulting “embedding vector” quantifies the meaning of the text. When a user asks the chatbot a question, it’s also sent to the ML embedding model to calculate its vector value. This vector is used to search through a pre-computed database of text chunks from scientific papers that were similarly embedded. The bot then uses text snippets it finds that are semantically related to the question to get a more complete understanding of the context.
The user’s query and the text snippets are combined into a “prompt” that is sent to a large language model, an expansive program that creates text modeled on natural human language, that generates the final response. The embedding ensures that the text being pulled is relevant in the context of the user’s question. By providing text chunks from the body of trusted documents, the chatbot generates answers that are factual and sourced.
“The program needs to be like a reference librarian,” said Yager. “It needs to heavily rely on the documents to provide sourced answers. It needs to be able to accurately interpret what people are asking and be able to effectively piece together the context of those questions to retrieve the most relevant information. While the responses may not be perfect yet, it’s already able to answer challenging questions and trigger some interesting thoughts while planning new projects and research.”
Bots empowering humans
CFN is developing AI/ML systems as tools that can liberate human researchers to work on more challenging and interesting problems and to get more out of their limited time while computers automate repetitive tasks in the background. There are still many unknowns about this new way of working, but these questions are the start of important discussions scientists are having right now to ensure AI/ML use is safe and ethical.
“There are a number of tasks that a domain-specific chatbot like this could clear from a scientist’s workload. Classifying and organizing documents, summarizing publications, pointing out relevant info, and getting up to speed in a new topical area are just a few potential applications,” remarked Yager. “I’m excited to see where all of this will go, though. We never could have imagined where we are now three years ago, and I’m looking forward to where we’ll be three years from now.”
For researchers interested in trying this software out for themselves, the source code for CFN’s chatbot and associated tools can be found in this GitHub repository.
More information: Kevin G. Yager, Domain-specific chatbots for science using embeddings, Digital Discovery (2023). DOI: 10.1039/D3DD00112A
News
The Surprising Link Between Smell, Sound, and Emotions
New research reveals how smell and hearing interact in the brain to drive social behavior, using mouse maternal instincts as a model. Imagine you’re at a dinner party, but you can’t smell the food [...]
Brain cells age at different rates
As our body ages, not only joints, bones and muscles wear out, but also our nervous system. Nerve cells die, are no longer fully replaced, and the brain shrinks. "Aging is the most important risk factor [...]
Long COVID Breakthrough: Spike Proteins Persist in Brain for Years
Researchers have discovered that the SARS-CoV-2 spike protein persists in the brain and skull bone marrow for years after infection, potentially leading to chronic inflammation and neurodegenerative diseases. Researchers from Helmholtz Munich and Ludwig-Maximilians-Universität (LMU) have [...]
Water-Resistant Paper Could Revolutionize Packaging and Replace Plastic
A groundbreaking study showcases the creation of sustainable hydrophobic paper, enhanced by cellulose nanofibres and peptides, presenting a biodegradable alternative to petroleum-based materials, with potential uses in packaging and biomedical devices. Researchers aimed to [...]
NIH Scientists Discover Game-Changing Antibodies Against Malaria
Novel antibodies have the potential to pave the way for the next generation of malaria interventions. Researchers at the National Institutes of Health (NIH) have identified a novel class of antibodies that target a previously unexplored region [...]
Surprising Discovery: What If Some Cancer Genes Are Actually Protecting You?
A surprising discovery reveals that a gene previously thought to accelerate esophageal cancer actually helps protect against it initially. This pivotal study could lead to better prediction and prevention strategies tailored to individual genetic [...]
The Cancer Test That Exposes What Conventional Scans Miss
Researchers at UCLA have unveiled startling findings using PSMA-PET imaging that reveal nearly half of patients diagnosed with high-risk prostate cancer might actually have metastases missed by traditional imaging methods. This revelation could profoundly affect future [...]
Pupil size in sleep reveals how memories are processed
Cornell University researchers have found that the pupil is key to understanding how, and when, the brain forms strong, long-lasting memories. By studying mice equipped with brain electrodes and tiny eye-tracking cameras, the researchers [...]
Stanford’s Vaccine Breakthrough Boosts Flu Protection Like Never Before
Stanford Medicine researchers have developed a new method for influenza vaccination that encourages a robust immune response to all four common flu subtypes, potentially increasing the vaccine’s efficacy. In laboratory tests using human tonsil [...]
Water’s Worst Nightmare: The Rise of Superhydrophobic Materials
New materials with near-perfect water repellency offer potential for self-cleaning surfaces in cars and buildings. Scientists from Karlsruhe Institute of Technology (KIT) and the Indian Institute of Technology Guwahati (IITG) have developed a surface [...]
Japanese dentists test drug to help people with missing teeth regrow new ones
Japanese dentists are testing a groundbreaking drug that could enable people with missing teeth to grow new ones, reducing the need for dentures and implants, AFP recently reported. Katsu Takahashi, head of oral surgery at [...]
An AI system has reached human level on a test for ‘general intelligence’
A new artificial intelligence (AI) model has just achieved human-level results on a test designed to measure "general intelligence." On December 20, OpenAI's o3 system scored 85% on the ARC-AGI benchmark, well above the previous AI best [...]
According to Researchers, Your Breathing Patterns Could Hold the Key to Better Memory
Breathing synchronizes brain waves that support memory consolidation. A new study from Northwestern Medicine reports that, much like a conductor harmonizes various instruments in an orchestra to create a symphony, breathing synchronizes hippocampal brain waves to [...]
The Hidden Culprit Behind Alzheimer’s Revealed: Microglia Under the Microscope
Researchers at the CUNY Graduate Center have made a groundbreaking discovery in Alzheimer’s disease research, identifying a critical link between cellular stress in the brain and disease progression. Their study focuses on microglia, the brain’s immune [...]
“Mirror Bacteria” Warning: A New Kind of Life Could Pose a Global Threat
Mirror life, a concept involving synthetic organisms with reversed molecular structures, carries significant risks despite its potential for medical advancements. Experts warn that mirror bacteria could escape natural biological controls, potentially evolving to exploit [...]
Lingering Viral Fragments: The Hidden Cause of Long COVID
Long COVID, affecting 5-10% of COVID-19 patients, might be caused by the enduring presence of the virus in the body. Research suggests that viral fragments, possibly live, linger and lead to symptoms. Addressing this involves antiviral treatments, enhanced [...]