dipper.sources.EBIGene2Phen module¶
-
class
dipper.sources.EBIGene2Phen.
EBIGene2Phen
(graph_type, are_bnodes_skolemized, data_release_version=None)¶ Bases:
dipper.sources.Source.Source
From EBI: The gene2phenotype dataset (G2P) integrates data on genes, variants and phenotypes for example relating to developmental disorders. It is constructed entirely from published literature, and is primarily an inclusion list to allow targeted filtering of genome-wide data for diagnostic purposes. The dataset was compiled with respect to published genes, and annotated with types of disease- causing gene variants. Each row of the dataset associates a gene with a disease phenotype via an evidence level, inheritance mechanism and mutation consequence. Some genes therefore appear in the database more than once, where different genetic mechanisms result in different phenotypes.
Disclaimer: https://www.ebi.ac.uk/gene2phenotype/disclaimer Terms of Use: https://www.ebi.ac.uk/about/terms-of-use#general Documentation: https://www.ebi.ac.uk/gene2phenotype/documentation
https://www.clinicalgenome.org/site/assets/files/ 2757/fitzpatrick_ddg2p.pdfThis script operates on the Developmental Disorders (DDG2P.csv) file In the future we may update to include the cancer gene disease pairs in the CancerG2P.csv file
-
EBI_BASE
= 'https://www.ebi.ac.uk/gene2phenotype/downloads/'¶
-
fetch
(is_dl_forced: bool = False)¶ Fetch DDG2P.csv.gz and check headers to see if it has been updated
Parameters: is_dl_forced – {bool} Returns: None
-
files
= {'developmental_disorders': {'columns': ['gene_symbol', 'gene_omim_id', 'disease_label', 'disease_omim_id', 'g2p_relation_label', 'allelic_requirement', 'mutation_consequence', 'phenotypes', 'organ_specificity_list', 'pmids', 'panel', 'prev_symbols', 'hgnc_id', 'entry_date'], 'file': 'DDG2P.csv.gz', 'url': 'https://www.ebi.ac.uk/gene2phenotype/downloads/DDG2P.csv.gz'}}¶
-
map_files
= {'mondo_map': 'https://data.monarchinitiative.org/dipper/cache/unmapped_ebi_diseases.tsv'}¶
-
parse
(limit: Optional[int] = None)¶ Here we parse each row of the gene to phenotype file
We create anonymous variants along with their attributes (allelic requirement, functional consequence) and connect these to genes and diseases
genes are connected to variants via global_terms[‘has_affected_locus’]
variants are connected to attributes via: global_terms[‘has_allelic_requirement’] global_terms[‘has_functional_consequence’]
variants are connected to disease based on mappings to the DDD category column, see the translationtable specific to this source for mappings
For cases where there are no disease OMIM id, we either use a disease cache file with mappings to MONDO that has been manually curated
Parameters: limit – {int} number of rows to parse Returns: None
-