dipper.sources.HPOAnnotations module

class dipper.sources.HPOAnnotations.HPOAnnotations(graph_type, are_bnodes_skolemized, data_release_version=None)

Bases: dipper.sources.Source.Source

The [Human Phenotype Ontology](http://human-phenotype-ontology.org) group curates and assembles over 115,000 annotations to hereditary diseases using the HPO ontology. Here we create OBAN-style associations between diseases and phenotypic features, together with their evidence, and age of onset and frequency (if known). The parser currently only processes the “abnormal” annotations. Association to “remarkable normality” will be added in the near future.

We create additional associations from text mining. See info at http://pubmed-browser.human-phenotype-ontology.org/.

Also, you can read about these annotations in [PMID:26119816](http://www.ncbi.nlm.nih.gov/pubmed/26119816).

In order to properly test this class, you should have a resources/test_ids.yaml file configured with some test ids, in the structure of: # as examples. put your favorite ids in the config. <pre> test_ids: {“disease” : [“OMIM:119600”, “OMIM:120160”]} </pre>

add_common_files_to_file_list()

The (several thousands) common-disease files from the repo tarball are added to the files object. try adding the ‘common-disease-mondo’ files as well?

fetch(is_dl_forced=False)

abstract method to fetch all data from an external resource. this should be overridden by subclasses :return: None

files = {'doid': {'file': 'doid.owl', 'url': 'http://purl.obolibrary.org/obo/doid.owl'}, 'hpoa': {'columns': ['#DatabaseID', 'DiseaseName', 'Qualifier', 'HPO_ID', 'Reference', 'Evidence', 'Onset', 'Frequency', 'Sex', 'Modifier', 'Aspect', 'Biocuration'], 'file': 'phenotype.hpoa', 'url': 'http://compbio.charite.de/jenkins/job/hpo.annotations.current/lastSuccessfulBuild/artifact/current/phenotype.hpoa'}}
getTestSuite()

An abstract method that should be overwritten with tests appropriate for the specific source. :return:

get_common_files()

Fetch the hpo-annotation-data [repository](https://github.com/monarch-initiative/hpo-annotation-data.git) as a tarball

Returns:
parse(limit=None)

abstract method to parse all data from an external resource, that was fetched in fetch() this should be overridden by subclasses :return: None

process_all_common_disease_files(limit=None)

Loop through all of the files that we previously fetched from git, creating the disease-phenotype association. :param limit: :return:

process_common_disease_file(raw, unpadded_doids, limit=None)

Make disaese-phenotype associations. Some identifiers need clean up: * DOIDs are listed as DOID-DOID: –> DOID: * DOIDs may be unnecessarily zero-padded. these are remapped to their non-padded equivalent.

Parameters:
  • raw
  • unpadded_doids
  • limit
Returns:

small_files = {'columns': ['Disease ID', 'Disease Name', 'Gene ID', 'Gene Name', 'Genotype', 'Gene Symbol(s)', 'Phenotype ID', 'Phenotype Name', 'Age of Onset ID', 'Age of Onset Name', 'Evidence ID', 'Evidence Name', 'Frequency', 'Sex ID', 'Sex Name', 'Negation ID', 'Negation Name', 'Description', 'Pub', 'Assigned by', 'Date Created']}