A researcher has just finished writing a scientific paper. She knows her work could benefit from another perspective. Did she overlook something? Or perhaps there’s an application of her research she hadn’t thought of. A second set of eyes would be great, but even the friendliest of collaborators might not be able to spare the time to read all the required background publications to catch up.
Rapid advances in AI and ML have given way to programs that can generate creative text and useful software code. These general-purpose chatbots have recently captured the public imagination. Existing chatbots—based on large, diverse language models—lack detailed knowledge of scientific sub-domains.
By leveraging a document-retrieval method, Yager’s bot is knowledgeable in areas of nanomaterial science that other bots are not. The details of this project and how other scientists can leverage this AI colleague for their own work have recently been published in Digital Discovery.
Rise of the robots
“CFN has been looking into new ways to leverage AI/ML to accelerate nanomaterial discovery for a long time. Currently, it’s helping us quickly identify, catalog, and choose samples, automate experiments, control equipment, and discover new materials. Esther Tsai, a scientist in the electronic nanomaterials group at CFN, is developing an AI companion to help speed up materials research experiments at the National Synchrotron Light Source II (NSLS-II).” NSLS-II is another DOE Office of Science User Facility at Brookhaven Lab.
At CFN, there has been a lot of work on AI/ML that can help drive experiments through the use of automation, controls, robotics, and analysis, but having a program that was adept with scientific text was something that researchers hadn’t explored as deeply. Being able to quickly document, understand, and convey information about an experiment can help in a number of ways—from breaking down language barriers to saving time by summarizing larger pieces of work.
Watching your language
To build a specialized chatbot, the program required domain-specific text—language taken from areas the bot is intended to focus on. In this case, the text is scientific publications. Domain-specific text helps the AI model understand new terminology and definitions and introduces it to frontier scientific concepts. Most importantly, this curated set of documents enables the AI model to ground its reasoning using trusted facts.
To emulate natural human language, AI models are trained on existing text, enabling them to learn the structure of language, memorize various facts, and develop a primitive sort of reasoning. Rather than laboriously retrain the AI model on nanoscience text, Yager gave it the ability to look up relevant information in a curated set of publications. Providing it with a library of relevant data was only half of the battle. To use this text accurately and effectively, the bot would need a way to decipher the correct context.
“A challenge that’s common with language models is that sometimes they ‘hallucinate’ plausible sounding but untrue things,” explained Yager. “This has been a core issue to resolve for a chatbot used in research as opposed to one doing something like writing poetry. We don’t want it to fabricate facts or citations. This needed to be addressed. The solution for this was something we call ’embedding,’ a way of categorizing and linking information quickly behind the scenes.”
Embedding is a process that transforms words and phrases into numerical values. The resulting “embedding vector” quantifies the meaning of the text. When a user asks the chatbot a question, it’s also sent to the ML embedding model to calculate its vector value. This vector is used to search through a pre-computed database of text chunks from scientific papers that were similarly embedded. The bot then uses text snippets it finds that are semantically related to the question to get a more complete understanding of the context.
The user’s query and the text snippets are combined into a “prompt” that is sent to a large language model, an expansive program that creates text modeled on natural human language, that generates the final response. The embedding ensures that the text being pulled is relevant in the context of the user’s question. By providing text chunks from the body of trusted documents, the chatbot generates answers that are factual and sourced.
“The program needs to be like a reference librarian,” said Yager. “It needs to heavily rely on the documents to provide sourced answers. It needs to be able to accurately interpret what people are asking and be able to effectively piece together the context of those questions to retrieve the most relevant information. While the responses may not be perfect yet, it’s already able to answer challenging questions and trigger some interesting thoughts while planning new projects and research.”
Bots empowering humans
CFN is developing AI/ML systems as tools that can liberate human researchers to work on more challenging and interesting problems and to get more out of their limited time while computers automate repetitive tasks in the background. There are still many unknowns about this new way of working, but these questions are the start of important discussions scientists are having right now to ensure AI/ML use is safe and ethical.
“There are a number of tasks that a domain-specific chatbot like this could clear from a scientist’s workload. Classifying and organizing documents, summarizing publications, pointing out relevant info, and getting up to speed in a new topical area are just a few potential applications,” remarked Yager. “I’m excited to see where all of this will go, though. We never could have imagined where we are now three years ago, and I’m looking forward to where we’ll be three years from now.”
For researchers interested in trying this software out for themselves, the source code for CFN’s chatbot and associated tools can be found in this GitHub repository.
More information: Kevin G. Yager, Domain-specific chatbots for science using embeddings, Digital Discovery (2023). DOI: 10.1039/D3DD00112A

News
Tumor “Stickiness” – Scientists Develop Potential New Way To Predict Cancer’s Spread
UC San Diego researchers have developed a device that predicts breast cancer aggressiveness by measuring tumor cell adhesion. Weakly adherent cells indicate a higher risk of metastasis, especially in early-stage DCIS. This innovation could [...]
Scientists Just Watched Atoms Move for the First Time Using AI
Scientists have developed a groundbreaking AI-driven technique that reveals the hidden movements of nanoparticles, essential in materials science, pharmaceuticals, and electronics. By integrating artificial intelligence with electron microscopy, researchers can now visualize atomic-level changes that were [...]
Scientists Sound Alarm: “Safe” Antibiotic Has Led to an Almost Untreatable Superbug
A recent study reveals that an antibiotic used for liver disease patients may increase their risk of contracting a dangerous superbug. An international team of researchers has discovered that rifaximin, a commonly prescribed antibiotic [...]
Scientists Discover Natural Compound That Stops Cancer Progression
A discovery led by OHSU was made possible by years of study conducted by University of Portland undergraduates. Scientists have discovered a natural compound that can halt a key process involved in the progression [...]
Scientists Just Discovered an RNA That Repairs DNA Damage – And It’s a Game-Changer
Our DNA is constantly under threat — from cell division errors to external factors like sunlight and smoking. Fortunately, cells have intricate repair mechanisms to counteract this damage. Scientists have uncovered a surprising role played by [...]
What Scientists Just Discovered About COVID-19’s Hidden Death Toll
COVID-19 didn’t just claim lives directly—it reshaped mortality patterns worldwide. A major international study found that life expectancy plummeted across most of the 24 analyzed countries, with additional deaths from cardiovascular disease, substance abuse, and mental [...]
Self-Propelled Nanoparticles Improve Immunotherapy for Non-Invasive Bladder Cancer
A study led by Pohang University of Science and Technology (POSTECH) and the Institute for Bioengineering of Catalonia (IBEC) in South Korea details the creation of urea-powered nanomotors that enhance immunotherapy for bladder cancer. The nanomotors [...]
Scientists Develop New System That Produces Drinking Water From Thin Air
UT Austin researchers have developed a biodegradable, biomass-based hydrogel that efficiently extracts drinkable water from the air, offering a scalable, sustainable solution for water access in off-grid communities, emergency relief, and agriculture. Discarded food [...]
AI Unveils Hidden Nanoparticles – A Breakthrough in Early Disease Detection
Deep Nanometry (DNM) is an innovative technique combining high-speed optical detection with AI-driven noise reduction, allowing researchers to find rare nanoparticles like extracellular vesicles (EVs). Since EVs play a role in disease detection, DNM [...]
Inhalable nanoparticles could help treat chronic lung disease
Nanoparticles designed to release antibiotics deep inside the lungs reduced inflammation and improved lung function in mice with symptoms of chronic obstructive pulmonary disease By Grace Wade Delivering medication to the lungs with inhalable nanoparticles [...]
New MRI Study Uncovers Hidden Lung Abnormalities in Children With Long COVID
Long COVID is more than just lingering symptoms—it may have a hidden biological basis that standard medical tests fail to detect. A groundbreaking study using advanced MRI technology has uncovered significant lung abnormalities in [...]
AI Struggles with Abstract Thought: Study Reveals GPT-4’s Limits
While GPT-4 performs well in structured reasoning tasks, a new study shows that its ability to adapt to variations is weak—suggesting AI still lacks true abstract understanding and flexibility in decision-making. Artificial Intelligence (AI), [...]
Turning Off Nerve Signals: Scientists Develop Promising New Pancreatic Cancer Treatment
Pancreatic cancer reprograms nerve cells to fuel its growth, but blocking these connections can shrink tumors and boost treatment effectiveness. Pancreatic cancer is closely linked to the nervous system, according to researchers from the [...]
New human antibody shows promise for Ebola virus treatment
New research led by scientists at La Jolla Institute for Immunology (LJI) reveals the workings of a human antibody called mAb 3A6, which may prove to be an important component for Ebola virus therapeutics. [...]
Early Alzheimer’s Detection Test – Years Before Symptoms Appear
A new biomarker test can detect early-stage tau protein clumping up to a decade before it appears on brain scans, improving early Alzheimer’s diagnosis. Unlike amyloid-beta, tau neurofibrillary tangles are directly linked to cognitive decline. Years [...]
New mpox variant can spread rapidly across borders
International researchers, including from DTU National Food Institute, warn that the ongoing mpox outbreak in the Democratic Republic of the Congo (DRC) has the potential to spread across borders more rapidly. The mpox virus [...]