dipper.sources.Panther module

class dipper.sources.Panther.Panther(graph_type, are_bnodes_skolemized, data_release_version=None, tax_ids=None)

Bases: dipper.sources.Source.Source

The pairwise orthology calls from Panther DB: http://pantherdb.org/ encompass 22 species, from the RefGenome and HCOP projects. Here, we map the orthology classes to RO homology relationships This resource may be extended in the future with additional species.

This currently makes a graph of orthologous relationships between genes, with the assumption that gene metadata (labels, equivalent ids) are provided from other sources.

Gene families are nominally created from the orthology files, though these are incomplete with no hierarchical (subfamily) information. This will get updated from the HMM files in the future.

Note that there is a fair amount of identifier cleanup performed to align with our standard CURIE prefixes.

The test graph of data is output based on configured “protein” identifiers in resources/test_id.yaml.

By default, this will produce a file with ALL orthologous relationships. IF YOU WANT ONLY A SUBSET, YOU NEED TO PROVIDE A FILTER UPON CALLING THIS WITH THE TAXON IDS

PNTHDL = 'ftp://ftp.pantherdb.org/ortholog/current_release'
fetch(is_dl_forced=False)
Returns:None
files = {'Orthologs_HCOP': {'columns': ['Gene', 'Ortholog', 'Type of ortholog', 'Common ancestor for the orthologs', 'Panther Ortholog ID'], 'file': 'Orthologs_HCOP.tar.gz', 'url': 'ftp://ftp.pantherdb.org/ortholog/current_release/Orthologs_HCOP.tar.gz'}, 'RefGenomeOrthologs': {'columns': ['Gene', 'Ortholog', 'Type of ortholog', 'Common ancestor for the orthologs', 'Panther Ortholog ID'], 'file': 'RefGenomeOrthologs.tar.gz', 'url': 'ftp://ftp.pantherdb.org/ortholog/current_release/RefGenomeOrthologs.tar.gz'}, 'current_release': {'columns': ['version'], 'file': 'current_release.ver', 'url': 'ftp://ftp.pantherdb.org/ortholog/'}}
getTestSuite()

An abstract method that should be overwritten with tests appropriate for the specific source. :return:

panther_format = ['Gene', 'Ortholog', 'Type of ortholog', 'Common ancestor for the orthologs', 'Panther Ortholog ID']
parse(limit=None)
Returns:None