- NAMs establishment for risk assessment with the use of the available data and the data developed in PARC.
- Introduction of a novel in-silico approach toward defining associations between human health outcomes and exposure.
- Employment of natural language processing, cheminformatics and bioinformatics techniques directly to the data mining of the integrated data set. This will lead the advancement of in-silico methodologies for OECD integrated approaches to testing and assessment (IATA) and support the 3R strategy of chemical testing.
Key messages
- Management of heterogenous data originating from interdisciplinary field like chemistry, biology, medicine and environmental science with varying format, quality and level of uncertainty is a major roadblock in chemical risk assessment (CRA).
- Computational methods like ontology-based text mining, natural language processing (NLP) models, and data mining approaches can play a pivotal role in CRA by improving interoperability of data and reducing the fragmentation in dataset for a comprehensive analysis.
- A harmonised computational framework integrating text mining, transcriptomics-based analysis, emerging Mew Approach Methodologies (NAMs ↗), and AI-driven bioactivity prediction can be employed to advance chemical risk assessment through mechanistic insights, data FAIRification and reduce the dependence on animal testing.
Overview
This project aims to improve how we gather, organise, and analyse information about chemical hazards ↗ and risk assessment. It will use ontology-based text-mining and data integration, a method that extracts relevant information from scientific texts and structures it in a harmonised and interoperable format. These structured data in turns assist in the development of Adverse Outcome Pathway (AOPs ↗). Several regulatory agencies, like OECD ↗, ECHA ↗ and EFSA ↗, have adopted evidence-based methodologies for risk assessment using AOP.
However, these methods require large, well-organised curated event evidence. A structured database makes it easier to access relevant information and fill the experimental data gaps.
Ontology-based data integration, which leverages structured, standardised vocabularies and semantic framework to align data across different sources, facilitating automated reasoning and querying enabling more efficient data discovery and reuse. This will make it more widely applicable in advance data science, including network biology and machine learning to predict the chemical adverse effect on the human system.
The tools and methodologies will be validated by real case studies selected in collaboration within PARC. They will explore different text mining tools during scoping study to understand its usability and set the agenda for future research needs. Aside the application in developing AOPs and supporting network biology approaches to identify and describe hazards in relation to disease development, other applications of AI ↗, network approaches, text mining and ontology assisted hazard identification are foreseen as well.
These methods can also be applied to:
- Detect emerging risks from imported goods and chemicals in the EU more quickly.
- Identify data gaps in hazard information to help prioritise testing.
By improving how we organise and analyse chemical risk data, this project aims to enhance decision-making and safety measures.
Achievements & Results
- Developed a FAIR-aligned information extraction system to support AOP construction. It is currently available for testing and can be adapted to various toxicological domains, enabling users to enhance AOPs with targeted evidence from scientific literature. A manuscript detailing the development of the information extraction system has been completed and is ready for submission.
- Developed a PBPK ontology to support the standardisation and harmonisation of kinetic models used in chemical risk assessment. This ontology is intended to be further extended and maintained by the community, facilitating its adoption in standardised reporting of kinetic models.
- Built a classification model for the prediction of fraction unbound (fu) using machine learning techniques. This ML model will be freely available in public domain and will be further extended from other pharmacokinetic parameters as well.
- Successful integration of conformal prediction into deepFPlearn+, allowing the model to produce prediction intervals that quantify the uncertainty of each toxicity estimate.
Policy relevance
This PARC project addresses the need of incorporation of advanced computational methods into chemical risk assessment. This is done by identifying key methodological and practical barriers in integrating heterogeneous data and modern analytical tools, such as AI-driven uncertainty quantification, text mining, and ontology in regulatory science. Through targeted case studies and the development of FAIR-aligned tools and workflows, the findings support more transparent, mechanistic, and scalable analysis. These insights have been shared with regulatory stakeholders across Europe, guiding future readiness for incorporating multiple statistical and computational methods to evaluate the output alignment for regulatory requirements ensuring the results aid in risk assessment and decision-making process.