dipper.sources.EBIGene2Phen module

class dipper.sources.EBIGene2Phen.EBIGene2Phen(graph_type, are_bnodes_skolemized, data_release_version=None)

Bases: dipper.sources.Source.Source

From EBI: The gene2phenotype dataset (G2P) integrates data on genes, variants and phenotypes for example relating to developmental disorders. It is constructed entirely from published literature, and is primarily an inclusion list to allow targeted filtering of genome-wide data for diagnostic purposes. The dataset was compiled with respect to published genes, and annotated with types of disease- causing gene variants. Each row of the dataset associates a gene with a disease phenotype via an evidence level, inheritance mechanism and mutation consequence. Some genes therefore appear in the database more than once, where different genetic mechanisms result in different phenotypes.

Disclaimer: https://www.ebi.ac.uk/gene2phenotype/disclaimer Terms of Use: https://www.ebi.ac.uk/about/terms-of-use#general Documentation: https://www.ebi.ac.uk/gene2phenotype/documentation

This script operates on the Developmental Disorders (DDG2P.csv) file In the future we may update to include the cancer gene disease pairs in the CancerG2P.csv file

EBI_BASE = 'https://www.ebi.ac.uk/gene2phenotype/downloads/'
fetch(is_dl_forced: bool = False)

Fetch DDG2P.csv.gz and check headers to see if it has been updated

Parameters:is_dl_forced – {bool}
Returns:None
files = {'developmental_disorders': {'columns': ['gene_symbol', 'gene_omim_id', 'disease_label', 'disease_omim_id', 'g2p_relation_label', 'allelic_requirement', 'mutation_consequence', 'phenotypes', 'organ_specificity_list', 'pmids', 'panel', 'prev_symbols', 'hgnc_id', 'entry_date'], 'file': 'DDG2P.csv.gz', 'url': 'https://www.ebi.ac.uk/gene2phenotype/downloads/DDG2P.csv.gz'}}
map_files = {'mondo_map': 'https://data.monarchinitiative.org/dipper/cache/unmapped_ebi_diseases.tsv'}
parse(limit: Optional[int] = None)

Here we parse each row of the gene to phenotype file

We create anonymous variants along with their attributes (allelic requirement, functional consequence) and connect these to genes and diseases

genes are connected to variants via global_terms[‘has_affected_locus’]

variants are connected to attributes via: global_terms[‘has_allelic_requirement’] global_terms[‘has_functional_consequence’]

variants are connected to disease based on mappings to the DDD category column, see the translationtable specific to this source for mappings

For cases where there are no disease OMIM id, we either use a disease cache file with mappings to MONDO that has been manually curated

Parameters:limit – {int} number of rows to parse
Returns:None