dipper.sources.HPOAnnotations module

class dipper.sources.HPOAnnotations.HPOAnnotations(graph_type, are_bnodes_skolemized)

Bases: dipper.sources.Source.Source

The [Human Phenotype Ontology](http://human-phenotype-ontology.org) group curates and assembles over 115,000 annotations to hereditary diseases using the HPO ontology. Here we create OBAN-style associations between diseases and phenotypic features, together with their evidence, and age of onset and frequency (if known). The parser currently only processes the “abnormal” annotations. Association to “remarkable normality” will be added in the near future.

We create additional associations from text mining. See info at http://pubmed-browser.human-phenotype-ontology.org/.

Also, you can read about these annotations in [PMID:26119816](http://www.ncbi.nlm.nih.gov/pubmed/26119816).

In order to properly test this class, you should have a conf.json file configured with some test ids, in the structure of: # as examples. put your favorite ids in the config. <pre> test_ids: {“disease” : [“OMIM:119600”, “OMIM:120160”]} </pre>

add_common_files_to_file_list()
eco_dict = {'ICE': 'ECO:0000305', 'IEA': 'ECO:0000501', 'ITM': 'ECO:0000246', 'PCS': 'ECO:0000269', 'TAS': 'ECO:0000304'}
fetch(is_dl_forced=False)

abstract method to fetch all data from an external resource. this should be overridden by subclasses :return: None

files = {'annot': {'url': 'http://compbio.charite.de/hudson/job/hpo.annotations/lastStableBuild/artifact/misc/phenotype_annotation.tab', 'file': 'phenotype_annotation.tab'}, 'doid': {'url': 'http://purl.obolibrary.org/obo/doid.owl', 'file': 'doid.owl'}, 'version': {'url': 'http://compbio.charite.de/hudson/job/hpo.annotations/lastStableBuild/artifact/misc/data_version.txt', 'file': 'data_version.txt'}}
getTestSuite()

An abstract method that should be overwritten with tests appropriate for the specific source. :return:

get_common_files()

Fetch the raw hpo-annotation-data by cloning/pulling the [repository](https://github.com/monarch-initiative/hpo-annotation-data.git) These files get added to the files object, and iterated over separately. :return:

get_doid_ids_for_unpadding()

Here, we fetch the doid owl file, and get all the doids. We figure out which are not zero-padded, so we can map the DOID to the correct identifier when processing the common annotation files.

This may become obsolete when https://github.com/monarch-initiative/hpo-annotation-data/issues/84 is addressed.

Returns:
parse(limit=None)

abstract method to parse all data from an external resource, that was fetched in fetch() this should be overridden by subclasses :return: None

process_all_common_disease_files(limit=None)

Loop through all of the files that we previously fetched from git, creating the disease-phenotype assoc. :param limit: :return:

process_common_disease_file(raw, unpadded_doids, limit=None)

Make disaese-phenotype associations. Some identifiers need clean up: * DOIDs are listed as DOID-DOID: –> DOID: * DOIDs may be unnecessarily zero-padded. these are remapped to their non-padded equivalent.

Parameters:
  • raw
  • unpadded_doids
  • limit
Returns:

scrub()

Perform various data-scrubbing on the raw data files prior to parsing. For this resource, this currently includes: * revise errors in identifiers for some OMIM and PMIDs

Returns:None