Information about chemicals
This tutorial explains how to use a few of the VHP4Safety services to aggregate information about a chemical of interest.
Let’s start with the compound with the name “aflatoxin B1”.
Name to Structure
The first task we have is to establish a chemical identity of what we mean with “aflatoxin B1”. That is, what is the chemical structure. This common task is the starting point of most cheminformatics workflows: the resolve the chemical structure from a chemical name. That is, a name to structure conversion.
There are many solutions available, including the main chemical compounds databases like PubChem and ChemSpider. Because we want to use a common VHP4Safety language (a controlled vocabulary or glossary), we can also use a VHP4Safety solution for this task.
For this, we have set up a service to link specific chemical structures to names and external databases, the VHP4Safety Wikibase.
Step 1
Visit the compound wiki and use the search box to find “aflatoxin B1”. The resulting page should look something like this:
Step 2
On this page we can find chemical information and links to other database. Information we can find include:
the SMILES: a line notation to describe the chemical structure (using a chemical graph approach)
the mass
the InChI and InChIKey: the global, unique identifier of this compound
Write down the SMILES, which we are going to use in the next section.
Second, we find external identifiers and links to resources with more information about this compound. For example, for this compound we find a link to the ToxBank Wiki (doi:10.1002/minf.201200114) where the SEURAT-1 cluster projects collected information about compounds in their discussion to reach their Gold Compound collection.
Other information we can find:
the Wikidata Q identifier: a link to Wikidata
the PubChem CID: a link to PubChem
xenobiotic metabolism pathway: a link to a WikiPathways describing experimental knowledge about the compound metabolism
Visualize a Structure
With the SMILES you got from the compound wiki, you can now visualize this with the CDK Depict service.
Step 3
Copy/paste the SMILES into the text box and wait for CDK Depict to make a 2D depiction:
Note that you can change the depiction style/properties. For example, you can choose to not abbreviate long chains:
Or to show the CIP R/S labels:
External databases
Back in the wikibase, we can find links to other databases. The compound wiki provides links to the following databases. For each we can list the chemical compounds that have links to those resources:
ToxBank: general toxicology info (all ToxBank compounds)
WikiPathways: compound metabolism (all WikiPathways compounds)
Step 4
Visit WikiPathways and check the human metabolism of “aflatoxin B1”. The resulting page should look like this:
These resources can provide important information, but for new compounds you mean also need computationally predicted properties. The platform support this. The following section uses the SOMBIE tool, that predicts site-of-metabolism properties, starting with the SMILES we get from the compound wiki.
Identifier Mapping
The compound wiki also lists a PubChem Compound Identifier (“cid”). The BridgeDb webservice can convert this to identifiers from other database.
Step 5
The BridgeDb Webservice has an API call where you can request other identifiers (“xrefs”) for a PubChem CID identifier with the following URL pattern: https://bridgedb.cloud.vhp4safety.nl/Human/xrefs/Cpc/186907
Click the link and check in what other databases information is provided for “aflatoxin B1”. The output should look something like this:
C00000546 KNApSAcK
CHEBI:2504 ChEBI
C06800 KEGG Compound
DTXSID00873175 EPA CompTox
186907 PubChem-compound
OQIQSTLJSLGHID-WNWIJWBNSA-N InChIKey
CHEMBL1697694 ChEMBL compound
DTXSID9020035 EPA CompTox
1162-65-8 CAS
162470 Chemspider
Q4689278 Wikidata
HMDB0006552 HMDB
2504 ChEBI
HMDB06552 HMDB
Metabolite prediction
… SOMBIE todo