2. Features of ReproductiveCDB
2.1 A comprehensive and integrative platform for reproductive chemicals
The database provides diverse compound-centric annotations, including essential information, functional details, associated diseases, and emphasis on transgenerational or intergenerational effects. It also offers screening information for target proteins and other relevant branches of knowledge.
Regarding compound sources, there are both specificities and diversities. The specificities arise from our exclusive focus on chemicals linked to important reproductive processes, ensuring data significance and integrity, while also capturing the varying degrees of relevance between compounds and reproductive processes. Meanwhile, the database exhibits diversity through extensive collection of experimental and medical data, including metabolomics and other sources, and the database covers multiple chemical categories, enriching our understanding and enabling comprehensive analysis and exploration. To the best of our knowledge, our database is the first of its kind, focusing on collecting chemicals, including metabolites, associated with significant reproductive processes and gamete and embryo-fetal origins of adult diseases.
2.2 Utilizing molecular docking to efficiently narrow the screening scope of target proteins
Small molecules influence the essential processes for life, mainly through regulating the function of proteins. ReproductiveCDB creatively use reverse docking for shelter the protein targets for small molecules. The candidate proteins are from gene sets according to chemicals' functional biological processes in the GO database and use a series of methods to optimize the protein structure models. Finally, the database provides 255 chemicals and 614 proteins interactions. Molecular docking is useful to substantially decrease the number of testing candidates for experimentalists in searching for the true targets of the functional compounds in the important reproductive process. Also it can serve as a supplementary reference to other target protein prediction methods.
2.3 Providing an online tool of identifying chemicals involved in reproductive biological processes.
ReproductiveCDB is an online tool for identifying chemicals involved in reproductive biological processes. Additionally, we offer a reference list with relevance information for over 10,000 chemicals and biological processes.
While machine learning methods have been utilized in reproductive biology for quite some time, most previous work has primarily focused on predicting reproductive toxicity of compounds or other endpoints. In contrast, our database is the first database uniquely dedicated to the interpretation and prediction of underlying mechanisms like GO biological processes. Researchers interested in exploring chemical functions, compound/drug discovery, and developing computational methods for important reproductive processes can greatly benefit from ReproductiveCDB. The platform provides comprehensive ML-ready sub-datasets that assist researchers in predicting biological processes or other endpoints for small molecules.
3. How to use ReproductiveCDB
3.1 Data retrieval and interpretation of results
3.1.1 Quick search
On the homepage, enter a keyword into the search box to initiate your search. The user-friendly interface provides various search options, including chemicals, proteins, biological processes, and diseases, allowing you to customize your search based on specific parameters of interest.
3.1.2 Search
We have developed user-friendly web-based modules within ReproductiveCDB, enabling users to efficiently retrieve target information and access comprehensive annotations.
(1) Keyword search for chemicals
In the chemical search section on the search page, simply enter a keyword in the search box to initiate your search. The search box accepts complete keywords such as chemical names, synonyms, or PubChem CID. Additionally, three example keywords are provided in the search box, allowing users to click to view the corresponding search results. The chemical page is organized into four sections: chemical information, reproductive functions, biochemical interactions, and disease annotation. Using the example of 'Glucose,' we will illustrate how to interpret the search results effectively.
(a) Basic information
Basic information includes a 2D structure image, as well as the chemical name, molecular weight, and canonical SMILES. Additionally, links to other databases, such as HMDB and KEGG, have been implemented on the small molecule page, enabling users to access associated target pages or vice versa.
(b) Reproductive functions
The reproductive function annotation in our database encompasses chemical-induced phenotypes, action qualifiers, function periods (including important reproductive processes such as female and male gamete generation, fertilization, and embryo development), as well as specific phenotype details with GO ID, a short description, and corresponding PubMed reference. 1) The description section can be expanded for further reading by clicking on the text.
(c) Biochemical interactions
For the protein targets prediction, a table of predicted protein with further detailed information is provided according to the functions of chemicals. 1) Clicking enables the toggling of different docking results. 2) The visualization area of the docking results, with blue representing proteins and orange representing small molecules.
(b) Disease annotation
For the related disease annotation part, the database will provide related diseases of chemicals with functional biological processes, interactions, endpoints, species, generation information and description, etc.
(2) Keyword search for protein target
Enter a keyword in the search box of the Search page protein searching part. The search box accepts both protein name or uniprot ID. There are examples in the search box, users can click the example keywords to view the search results. The GO process page consists of two sections, 1) protein basic information (entry name, protein name, gene name, organism, corresponding GO process and structure information for molecular docking) and 2) a list of related chemicals with detailed information.
(3) Keyword search for reproductive biological process
Enter a keyword in the GO searching box of the Search page. The search box accepts both complete or partial keywords of GO biological process or GO ID. There are all the available GO biological processes below the search box for references, users can click the example keywords to view the search results. The GO process page consists of two sections, 1) GO biological process information (GO ID, synonyms, definition) and 2) a list of related chemicals with detailed information.
(4) Keyword search for disease
Enter a keyword in the disease search box of the Search page. The search box accepts both complete or partial keywords of disease or Mesh ID. We also provide all the diseases for references for the condition that you may not specific about the content. users can click the example keywords to view the search results. The GO process page consists of two sections, 1) disease basic information (Mesh ID, synonyms, definition) and 2) a list of related chemicals with detailed information.
3.2 Data browsing
3.2.1 Browsing data on the home page
(1) Browse statistical information of database
ReproductiveCDB provides three parts about chemicals, which includes functional biological process, disease and predicted proteins. On the Home page, The overall data situation of the database is displayed and the data of human and mouse is respectively displayed in the form of tables below.
(2) Browse schematic diagram of important productive process in human and mouse
3.2.2 Browsing on the browse page
On the browse page of ReproductiveCDB, there are 5 choices such as (1) Data summary, (2) Chemicals, (3) Proteins, (4) biological processes and (5) diseases.
(1) Browse information of data summary
By clicking on the ‘Data summary’ part in the browsing module, the analysis results of the crucial data in the database will be displayed.
1) Classification of chemicals
A pie chart presents the classification of all chemicals. Mouse hover reveals the respective category count.
2) Proportion of different functional chemicals
A pie chart presents the classification of all chemicals based on function annotations in the reproductive biological processes. Mouse hover reveals the respective category count.
3) Amount of chemicals in important reproductive processes
The bar plot displays the quantity of chemicals involved in significant reproductive processes. Mouse hover reveals the corresponding category count.
4) Amount of sub-GO process in important reproductive processes
The bar plot shows the number of sub-GO processes for significant reproductive processes. Mouse hover displays the corresponding category count.
(2) Browse information of chemicals
Clicking on the ' Chemicals ' section in the browsing module instantly displays all chemicals in the database, along with their relevant information. Page navigation for switching between different page numbers is available in the bottom right corner of the webpage.
(3) Browse information of proteins
Clicking on the 'Proteins' section in the browsing module instantly displays all proteins in the database, along with their relevant information. Page navigation for switching between different page numbers is available in the bottom right corner of the webpage.
(4) Browse information of GO biological processes
Clicking on the 'GO biological processes' section in the browsing module instantly displays all GO biological processes in the database, along with their relevant information. Page navigation for switching between different page numbers is available in the bottom right corner of the webpage.
(5) Browse information of diseases
Clicking on the ' Diseases ' section in the browsing module instantly displays all diseases in the database, along with their relevant information. Page navigation for switching between different page numbers is available in the bottom right corner of the webpage.
3.3 Prediction chemicals involved in reproductive biological processes
3.3.1 Introduction of online prediction tool
The database offers an online tool for predicting the involvement of compounds in important biological processes in mouse. Based on the significance of key reproductive processes and the subsequent profound impact and influence of chemicals, we selected the relationships between functional chemicals and reproductive processes from the ReproductiveCDB and extracted their respective features. Subsequently, we employed the deep neural method (DNN) in machine learning to build predictive models and achieve reliable results. This tool significantly enhances the practicality of the database, expands the dataset, and provides a streamlined approach, eliminating the time and effort associated with biological experiments.
3.3.2 Method of utilizing the online prediction tool
The database offers predictive examples compound (CID: 90311989), where by 1) clicking on 'Example', 2) the results page will appear. The left panel provides a visual representation of the 3D structure of the target small molecule, which can be further explored by dragging with the mouse. On the right panel, the relevance of the target molecule to crucial reproductive processes is displayed. The information is presented in a table format, including the compound's CID, the biological process's GO ID, the correlation probability between them, and a final assessment of their relevance (Yes/No) is made. (‘Yes’ for a probability greater than 0.5, denoting relevance, and ‘No’ for a probability less than 0.5, denoting irrelevance).
Users can conveniently utilize this tool by 1) uploading a 2D SDF file containing the structure of the target compound on the webpage, In this demonstration, we use Octyl Octanoate (CID: 61294) as an example. Its structure was retrieved from the PubChem database (https://pubchem.ncbi.nlm.nih.gov/) and uploaded to our database. 2) clicking ‘submit’, and after a brief waiting period of approximately 10 seconds, 3) subsequently receiving the predicted results regarding the compound's relevance to significant reproductive processes.
3.3.3 Browse and search of prediction results of 10,396 chemicals
The 'search and browse of prediction results' section of the database facilitates user convenience by showcasing the relationships between 10,396 tested compounds and important reproductive processes. The information is presented in a table format, including the compound's CID, the biological process's GO ID, the correlation probability between them, and a final assessment of their relevance (Yes/No) is made. (‘Yes’ for a probability greater than 0.5, denoting relevance, and ‘No’ for a probability less than 0.5, denoting irrelevance). The search functionality is also supported. 1) Users can browse or filter through biological processes of interest by clicking on them, facilitating easy exploration. 2) Alternatively, users have the option to perform targeted searches by entering the compound CID, further enhancing their ability to find specific biological processes. Users can also download the table containing all predicted results of compounds analysis from the 'Download' page.