Skip to content

Commit d844206

Browse files
author
Renato Juaçaba Neto
committed
Update metadata service page with retriever API
1 parent d95bc41 commit d844206

File tree

3 files changed

+123
-46
lines changed

3 files changed

+123
-46
lines changed

Gemfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# frozen_string_literal: true
2-
ruby '3.2.0'
2+
ruby '>3.2.0'
33
source "https://rubygems.org"
44

55
gem 'bundle'

README.md

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,15 @@
1-
# Identifiers.org documentation
2-
Cloned from [EBI Web Design Jekyll Boilerplate](http://ebiwd.github.io/EBI-Boilerplate-Jekyll).
1+
# Identifiers.org documentation pages
32

4-
Under construction.
3+
Live version at: https://docs.identifiers.org/
4+
5+
This is developed using the [Jekyll](https://jekyllrb.com/).
6+
7+
Use `bundle install` to install required dependencies.
8+
9+
You might need to update your ruby version if you are on macOS.
10+
11+
To run this locally, use `jekyll serve` from this folder.
12+
13+
This is deployed as a GitHub page, deployment is performed by the
14+
default jekyll GitHub actions, so merging any change to the master
15+
branch will put it live.

pages/metadata_service.md

Lines changed: 108 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -5,55 +5,121 @@ title: metadata service
55

66
# <i class="icon icon-common icon-mapping"></i> Metadata Service
77

8+
9+
## Acquisition of provider page annotations
10+
811
Identifiers.org metadata service enables users to extract Schema.org from landing pages of the original providers by passing in Compact Identifiers.
912

1013
``
1114
http://metadata.api.identifiers.org/{Compact Identifier}
1215
``
1316

1417
For example:
18+
<http://metadata.api.identifiers.org/reactome:R-HSA-446203>
19+
20+
21+
### How it works
22+
1. Our backend resolves the compact identifier to find the URLs to query
23+
2. For each URL, it loads its content and search for JSON-LD script tags
24+
- Xpath query used on the loaded HTML: `//script[@type='application/ld+json']`
25+
3. If multiple providers have this content available, the recommendation index from the resolver API is used to pick one.
26+
27+
The source code can be found [here](https://github.com/identifiers-org/cloud-ws-metadata/blob/aa70412bcded9d8888c633ba2ae672bb98d049f8/src/main/java/org/identifiers/cloud/ws/metadata/models/MetadataFetcherChromeEngineBased.java).
28+
29+
### Resources providing metadata
30+
Following is a list of resources in the Identifiers.org registry providing metadata (last updated 2018-12-05).
31+
32+
[ec-code](http://identifiers.org/ec-code), [reactome](http://identifiers.org/reactome),
33+
[prosite](http://identifiers.org/prosite), [cath.domain](http://identifiers.org/cath.domain),
34+
[hamap](http://identifiers.org/hamap), [biosample](http://identifiers.org/biosample),
35+
[fairsharing](http://identifiers.org/fairsharing), [cellosaurus](http://identifiers.org/cellosaurus),
36+
[cosmic](http://identifiers.org/cosmic), [mobidb](http://identifiers.org/mobidb),
37+
[hpscreg](http://identifiers.org/hpscreg), [lei](http://identifiers.org/lei),
38+
[biomodels.db](http://identifiers.org/biomodels.db), [pdb](http://identifiers.org/pdb),
39+
[sgd](http://identifiers.org/sgd), [wb](http://identifiers.org/wb), [fb](http://identifiers.org/fb),
40+
[arrayexpress](http://identifiers.org/arrayexpress), [mgi](http://identifiers.org/mgi),
41+
[rgd](http://identifiers.org/rgd), [zfin](http://identifiers.org/zfin), [narcis](http://identifiers.org/narcis),
42+
[gxa.expt](http://identifiers.org/gxa.expt), [metabolights](http://identifiers.org/metabolights),
43+
[rgd.qtl](http://identifiers.org/rgd.qtl), [rgd.strain](http://identifiers.org/rgd.strain),
44+
[ega.study](http://identifiers.org/ega.study), [ega.dataset](http://identifiers.org/ega.dataset),
45+
[pride.project](http://identifiers.org/pride.project), [lincs.data](http://identifiers.org/lincs.data),
46+
[mw.study](http://identifiers.org/mw.study), [mex](http://identifiers.org/mex),
47+
[gpmdb](http://identifiers.org/gpmdb), and [kaggle](http://identifiers.org/kaggle)
48+
49+
50+
## Acquisition of metadata from other providers
51+
52+
Following [our recent participation on the 3rd German BioHackathon](https://www.denbi.de/de-nbi-events/1762-3rd-biohackathon-germany-identifiers-bridgedb-togoid),
53+
we have expanded our metadata service to collect information from other metadata providing services.
54+
This is implemented by retriever components that use the APIs from these services to acquire information
55+
on compact identifiers.
56+
The retrievers enabled and the data collected differs based on the namespace of the compact identifier.
57+
58+
This is used in our resolution page to display metadata on resolved compact identifiers.
59+
60+
<div class="infobox mb-1">
61+
<i class="icon icon-common icon-beta text-warning size-300 mr-2"></i>
62+
<p class="mb-0">
63+
This feature is a work in progress.
64+
It may be modified or removed as necessary without proper warning.
65+
If you are interested in it or already using it, <a href="/pages/contact">please let us know</a>.
66+
</p>
67+
</div>
68+
69+
### Retriever endpoints
70+
71+
The main endpoint for the retriever API follows the pattern
72+
73+
``
74+
https://metadata.api.identifiers.org/retrievers/{Compact Identifier}
75+
``
76+
77+
This endpoint lists the available retriever endpoints for that compact identifier. It is expected to be queried first
78+
discover which retrievers can contain information on that compact identifier. The response will look similarly to:
79+
80+
```json
81+
{
82+
"apiVersion": "1.0",
83+
"errorMessage": null,
84+
"payload": {
85+
"parsedCompactIdentifier": {
86+
// Same values from resolver API
87+
},
88+
"ableRetrievers": [
89+
"https://metadata.api.identifiers.org/retrievers/{Retriever 1}/{Compact Identifier}",
90+
"https://metadata.api.identifiers.org/retrievers/{Retriever 2}/{Compact Identifier}",
91+
//...
92+
]
93+
}
94+
}
95+
```
96+
97+
Then, each URL under `.payload.ableRetrivers` will query different metadata providers for information and answer with
98+
a set of `label -> list of values` pairs representing the parsed metadata from that provider.
99+
The response of each will look similar to:
100+
101+
```json
102+
{
103+
"label1": [
104+
"value1",
105+
"value2",
106+
"value3"
107+
],
108+
"label2": [
109+
"value4"
110+
]
111+
}
112+
```
113+
114+
To acquire the raw data from providers, the user may use a URL in the format:
115+
15116
``
16-
http://metadata.api.identifiers.org/reactome:R-HSA-446203
117+
https://metadata.api.identifiers.org/retrievers/{Retriever 2}/raw/{Compact Identifier}
17118
``
18119

19-
## Resources providing metadata
20-
21-
Following is a list of resources in the Identifiers.org registry providing metadata (last updated 2018-12-05)
22-
23-
| Prefix | Resource Information | Example Dataset URL | Dataset Metadata | Home URL | DataCatalog Metadata |
24-
|---------------|-----------------------------------------------------------------------|---------------------------------------------------------------------|------------------|------------------------------------|----------------------|
25-
| ec-code | Enzyme nomenclature database, ExPASy (Expert Protein Analysis System) | https://enzyme.expasy.org/EC/1.1.1.1 | Yes | https://enzyme.expasy.org/ | Yes |
26-
| reactome | Reactome, a curated knowledgebase of biological pathways | https://reactome.org/content/detail/R-HSA-201451 | Yes | https://www.reactome.org/ | Yes |
27-
| prosite | ExPASy PROSITE | https://prosite.expasy.org/PS00001 | Yes | https://www.expasy.org/prosite/ | Yes |
28-
| cath.domain | CATH domain at UCL | http://www.cathdb.info/domain/1cukA01 | Yes | http://www.cathdb.info/ | Yes |
29-
| hamap | HAPMAP at Swiss Institute of Bioinformatics | https://hamap.expasy.org/unirule/MF_01400 | Yes | https://hamap.expasy.org/ | Yes |
30-
| biosample | BioSamples Database at EBI | https://www.ebi.ac.uk/biosamples/sample/SAMEA2397676 | Yes | https://www.ebi.ac.uk/biosamples/ | Yes |
31-
| fairsharing | FAIRSharing at University of Oxford | https://fairsharing.org/bsg-000052 | Yes | https://fairsharing.org/ | Yes |
32-
| cellosaurus | Cellosaurus through SIB | http://web.expasy.org/cellosaurus/CVCL_0030 | Yes | http://web.expasy.org/cellosaurus/ | Yes |
33-
| cosmic | COSMIC Gene at Sanger | http://cancer.sanger.ac.uk/cosmic/gene/overview?ln=BRAF | Yes | http://cancer.sanger.ac.uk/cosmic/ | Yes |
34-
| mobidb | MobiDB | http://mobidb.bio.unipd.it/P10636 | Yes | http://mobidb.bio.unipd.it | Yes |
35-
| hpscreg | Human Pluripotent Stem Cell Registry | https://hpscreg.eu/cell-line/BCRTi001-A | Yes | https://hpscreg.eu/ | Yes |
36-
| lei | Global LEI Index | https://www.gleif.org/lei/HWUPKR0MPOU8FGXBT394 | Yes | https://www.gleif.org/ | Yes |
37-
| biomodels.db | BioModels through OmicsDI | https://www.omicsdi.org/dataset/biomodels/BIOMD0000000048 | Yes | https://www.omicsdi.org/ | No |
38-
| pdb | Protein Databank in Europe (PDBe) | http://www.pdbe.org/2gc4 | Yes | http://www.pdbe.org/ | No |
39-
| sgd | SGD through the Alliance of Genome Resources | https://www.alliancegenome.org/gene/SGD:S000003909 | Yes | https://www.alliancegenome.org | No |
40-
| wb | WormBase through the Alliance of Genome Resources | https://www.alliancegenome.org/gene/WB:WBGene00000001 | Yes | https://www.alliancegenome.org | No |
41-
| fb | FlyBase through the Alliance of Genome Resources | https://www.alliancegenome.org/gene/FB:FBgn0011293 | Yes | https://www.alliancegenome.org | No |
42-
| arrayexpress | ArrayExpress through OmicsDI | https://www.omicsdi.org/dataset/arrayexpress-repository/E-MEXP-1712 | Yes | https://www.omicsdi.org/ | No |
43-
| mgi | MGI through the Alliance of Genome Resources | https://www.alliancegenome.org/gene/MGI:2442292 | Yes | https://www.alliancegenome.org | No |
44-
| rgd | RGD through the Alliance of Genome Resources | https://www.alliancegenome.org/gene/RGD:2018 | Yes | https://www.alliancegenome.org | No |
45-
| zfin | ZFIN through the Alliance of Genome Resources | https://test.alliancegenome.org/gene/ZFIN:ZDB-GENE-041118-11 | Yes | https://www.alliancegenome.org | No |
46-
| narcis | NARCIS at The Hague | http://www.narcis.nl/publication/RecordID/oai:cwi.nl:4725 | Yes | http://www.narcis.nl/?Language=en | No |
47-
| gxa.expt | GXA Expt through OmicsDI | https://www.omicsdi.org/dataset/atlas-experiments/E-MTAB-2037 | Yes | https://www.omicsdi.org/ | No |
48-
| metabolights | MataboLights through OmicsDI | https://www.omicsdi.org/dataset/metabolights_dataset/MTBLS1 | Yes | https://www.omicsdi.org/ | No |
49-
| rgd.qtl | Rat Genome Database qTL at Medical College of Wisconsin | http://rgd.mcw.edu/rgdweb/report/qtl/main.html?id=1354581 | Yes | http://rgd.mcw.edu/ | No |
50-
| rgd.strain | Rat Genome Database strain at Medical College of Wisconsin | http://rgd.mcw.edu/rgdweb/report/strain/main.html?id=5688061 | Yes | http://rgd.mcw.edu/ | No |
51-
| ega.study | EGA Study through OmicsDI | https://www.omicsdi.org/dataset/ega/EGAS00000000001 | Yes | https://www.omicsdi.org/ | No |
52-
| ega.dataset | EGA Dataset through OmicsDI | https://www.omicsdi.org/dataset/ega/EGAD00000000001 | Yes | https://www.omicsdi.org/ | No |
53-
| pride.project | PRIDE Project through OmicsDI | https://www.omicsdi.org/dataset/pride/PXD000440 | Yes | https://www.omicsdi.org/ | No |
54-
| lincs.data | Lincs through OmicsDI | https://www.omicsdi.org/dataset/lincs/LDS-1110 | Yes | https://www.omicsdi.org/ | No |
55-
| mw.study | Metabolomics Workbench Study through OmicsDI | https://www.omicsdi.org/dataset/metabolomics_workbench/ST000900 | Yes | https://www.omicsdi.org/ | No |
56-
| mex | Metabolome Express through OmicsDI | https://www.omicsdi.org/dataset/metabolome_express/MEX36 | Yes | https://www.omicsdi.org/ | No |
57-
| gpmdb | GPMDB through OmicsDI | https://www.omicsdi.org/dataset/gpmdb/GPM32310002988 | Yes | https://www.omicsdi.org/ | No |
58-
| kaggle | Kaggle | https://www.kaggle.com/nasa/kepler-exoplanet-search-results | Yes | https://kaggle.com | No |
59-
{: .hover }
120+
### Retriever implementation
121+
At this time (Jan 23rd 2025), only two data retrievers are implemented:
122+
- [EBI Search](https://www.ebi.ac.uk), the search engine that incorporates EBI resources in addition to collaborator resources.
123+
- [TogoID](https://togoid.dbcls.jp), an ID conversion service implementing unique features with an intuitive web interface and an API for programmatic access.
124+
125+
If you are interested in contributing to this list, [please reach out to us](/pages/contact).

0 commit comments

Comments
 (0)