dipper.sources.GeneOntology module

class dipper.sources.GeneOntology.GeneOntology(graph_type, are_bnodes_skolemized, data_release_version=None, tax_ids=None)

Bases: dipper.sources.Source.Source

This is the parser for the [Gene Ontology Annotations](http://www.geneontology.org), from which we process gene-process/function/subcellular location associations.

We generate the GO graph to include the following information: * genes * gene-process * gene-function * gene-location

We process only a subset of the organisms:

Status: IN PROGRESS / INCOMPLETE

fetch(is_dl_forced=False)

abstract method to fetch all data from an external resource. this should be overridden by subclasses :return: None

files = {'10090': {'columnns': ['DB', 'DB_Object_ID', 'DB_Object_Symbol', 'Qualifier', 'GO_ID', 'DB:Reference', 'Evidence Code', 'With (or) From', 'Aspect', 'DB_Object_Name', 'DB_Object_Synonym', 'DB_Object_Type', 'Taxon and Interacting taxon', 'Date', 'Assigned_By', 'Annotation_Extension', 'Gene_Product_Form_ID'], 'file': 'mgi.gaf.gz', 'url': 'http://current.geneontology.org/annotations/mgi.gaf.gz'}, '10116': {'columnns': ['DB', 'DB_Object_ID', 'DB_Object_Symbol', 'Qualifier', 'GO_ID', 'DB:Reference', 'Evidence Code', 'With (or) From', 'Aspect', 'DB_Object_Name', 'DB_Object_Synonym', 'DB_Object_Type', 'Taxon and Interacting taxon', 'Date', 'Assigned_By', 'Annotation_Extension', 'Gene_Product_Form_ID'], 'file': 'rgd.gaf.gz', 'url': 'http://current.geneontology.org/annotations/rgd.gaf.gz'}, '4896': {'columnns': ['DB', 'DB_Object_ID', 'DB_Object_Symbol', 'Qualifier', 'GO_ID', 'DB:Reference', 'Evidence Code', 'With (or) From', 'Aspect', 'DB_Object_Name', 'DB_Object_Synonym', 'DB_Object_Type', 'Taxon and Interacting taxon', 'Date', 'Assigned_By', 'Annotation_Extension', 'Gene_Product_Form_ID'], 'file': 'pombase.gaf.gz', 'url': 'http://current.geneontology.org/annotations/pombase.gaf.gz'}, '5052': {'columnns': ['DB', 'DB_Object_ID', 'DB_Object_Symbol', 'Qualifier', 'GO_ID', 'DB:Reference', 'Evidence Code', 'With (or) From', 'Aspect', 'DB_Object_Name', 'DB_Object_Synonym', 'DB_Object_Type', 'Taxon and Interacting taxon', 'Date', 'Assigned_By', 'Annotation_Extension', 'Gene_Product_Form_ID'], 'file': 'aspgd.gaf.gz', 'url': 'http://current.geneontology.org/annotations/aspgd.gaf.gz'}, '559292': {'columnns': ['DB', 'DB_Object_ID', 'DB_Object_Symbol', 'Qualifier', 'GO_ID', 'DB:Reference', 'Evidence Code', 'With (or) From', 'Aspect', 'DB_Object_Name', 'DB_Object_Synonym', 'DB_Object_Type', 'Taxon and Interacting taxon', 'Date', 'Assigned_By', 'Annotation_Extension', 'Gene_Product_Form_ID'], 'file': 'sgd.gaf.gz', 'url': 'http://current.geneontology.org/annotations/sgd.gaf.gz'}, '5782': {'columnns': ['DB', 'DB_Object_ID', 'DB_Object_Symbol', 'Qualifier', 'GO_ID', 'DB:Reference', 'Evidence Code', 'With (or) From', 'Aspect', 'DB_Object_Name', 'DB_Object_Synonym', 'DB_Object_Type', 'Taxon and Interacting taxon', 'Date', 'Assigned_By', 'Annotation_Extension', 'Gene_Product_Form_ID'], 'file': 'dictybase.gaf.gz', 'url': 'http://current.geneontology.org/annotations/dictybase.gaf.gz'}, '6239': {'columnns': ['DB', 'DB_Object_ID', 'DB_Object_Symbol', 'Qualifier', 'GO_ID', 'DB:Reference', 'Evidence Code', 'With (or) From', 'Aspect', 'DB_Object_Name', 'DB_Object_Synonym', 'DB_Object_Type', 'Taxon and Interacting taxon', 'Date', 'Assigned_By', 'Annotation_Extension', 'Gene_Product_Form_ID'], 'file': 'wb.gaf.gz', 'url': 'http://current.geneontology.org/annotations/wb.gaf.gz'}, '7227': {'columnns': ['DB', 'DB_Object_ID', 'DB_Object_Symbol', 'Qualifier', 'GO_ID', 'DB:Reference', 'Evidence Code', 'With (or) From', 'Aspect', 'DB_Object_Name', 'DB_Object_Synonym', 'DB_Object_Type', 'Taxon and Interacting taxon', 'Date', 'Assigned_By', 'Annotation_Extension', 'Gene_Product_Form_ID'], 'file': 'fb.gaf.gz', 'url': 'http://current.geneontology.org/annotations/fb.gaf.gz'}, '7955': {'columnns': ['DB', 'DB_Object_ID', 'DB_Object_Symbol', 'Qualifier', 'GO_ID', 'DB:Reference', 'Evidence Code', 'With (or) From', 'Aspect', 'DB_Object_Name', 'DB_Object_Synonym', 'DB_Object_Type', 'Taxon and Interacting taxon', 'Date', 'Assigned_By', 'Annotation_Extension', 'Gene_Product_Form_ID'], 'file': 'zfin.gaf.gz', 'url': 'http://current.geneontology.org/annotations/zfin.gaf.gz'}, '9031': {'columnns': ['DB', 'DB_Object_ID', 'DB_Object_Symbol', 'Qualifier', 'GO_ID', 'DB:Reference', 'Evidence Code', 'With (or) From', 'Aspect', 'DB_Object_Name', 'DB_Object_Synonym', 'DB_Object_Type', 'Taxon and Interacting taxon', 'Date', 'Assigned_By', 'Annotation_Extension', 'Gene_Product_Form_ID'], 'file': 'goa_chicken.gaf.gz', 'url': 'http://current.geneontology.org/annotations/goa_chicken.gaf.gz'}, '9606': {'columnns': ['DB', 'DB_Object_ID', 'DB_Object_Symbol', 'Qualifier', 'GO_ID', 'DB:Reference', 'Evidence Code', 'With (or) From', 'Aspect', 'DB_Object_Name', 'DB_Object_Synonym', 'DB_Object_Type', 'Taxon and Interacting taxon', 'Date', 'Assigned_By', 'Annotation_Extension', 'Gene_Product_Form_ID'], 'file': 'goa_human.gaf.gz', 'url': 'http://current.geneontology.org/annotations/goa_human.gaf.gz'}, '9615': {'columnns': ['DB', 'DB_Object_ID', 'DB_Object_Symbol', 'Qualifier', 'GO_ID', 'DB:Reference', 'Evidence Code', 'With (or) From', 'Aspect', 'DB_Object_Name', 'DB_Object_Synonym', 'DB_Object_Type', 'Taxon and Interacting taxon', 'Date', 'Assigned_By', 'Annotation_Extension', 'Gene_Product_Form_ID'], 'file': 'goa_dog.gaf.gz', 'url': 'http://current.geneontology.org/annotations/goa_dog.gaf.gz'}, '9823': {'columnns': ['DB', 'DB_Object_ID', 'DB_Object_Symbol', 'Qualifier', 'GO_ID', 'DB:Reference', 'Evidence Code', 'With (or) From', 'Aspect', 'DB_Object_Name', 'DB_Object_Synonym', 'DB_Object_Type', 'Taxon and Interacting taxon', 'Date', 'Assigned_By', 'Annotation_Extension', 'Gene_Product_Form_ID'], 'file': 'goa_pig.gaf.gz', 'url': 'http://current.geneontology.org/annotations/goa_pig.gaf.gz'}, '9913': {'columnns': ['DB', 'DB_Object_ID', 'DB_Object_Symbol', 'Qualifier', 'GO_ID', 'DB:Reference', 'Evidence Code', 'With (or) From', 'Aspect', 'DB_Object_Name', 'DB_Object_Synonym', 'DB_Object_Type', 'Taxon and Interacting taxon', 'Date', 'Assigned_By', 'Annotation_Extension', 'Gene_Product_Form_ID'], 'file': 'goa_cow.gaf.gz', 'url': 'http://current.geneontology.org/annotations/goa_cow.gaf.gz'}, 'gaf-eco-mapping': {'file': 'gaf-eco-mapping.yaml', 'url': 'https://archive.monarchinitiative.org/DipperCache/go/gaf-eco-mapping.yaml'}, 'idmapping_selected': {'columns': ['UniProtKB-AC', 'UniProtKB-ID', 'GeneID (EntrezGene)', 'RefSeq', 'GI', 'PDB', 'GO', 'UniRef100', 'UniRef90', 'UniRef50', 'UniParc', 'PIR', 'NCBI-taxon', 'MIM', 'UniGene', 'PubMed', 'EMBL', 'EMBL-CDS', 'Ensembl', 'Ensembl_TRS', 'Ensembl_PRO', 'Additional PubMed'], 'file': 'idmapping_selected.tab.gz', 'url': 'ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/idmapping/idmapping_selected.tab.gz'}}
gaf_columns = ['DB', 'DB_Object_ID', 'DB_Object_Symbol', 'Qualifier', 'GO_ID', 'DB:Reference', 'Evidence Code', 'With (or) From', 'Aspect', 'DB_Object_Name', 'DB_Object_Synonym', 'DB_Object_Type', 'Taxon and Interacting taxon', 'Date', 'Assigned_By', 'Annotation_Extension', 'Gene_Product_Form_ID']
getTestSuite()

An abstract method that should be overwritten with tests appropriate for the specific source. :return:

get_uniprot_entrez_id_map()
parse(limit=None)

abstract method to parse all data from an external resource, that was fetched in fetch() this should be overridden by subclasses :return: None

process_gaf(gaffile, limit, id_map=None)
wont_prefix = ['zgc', 'wu', 'si', 'im', 'BcDNA', 'sb', 'anon-EST', 'EG', 'id', 'zmp', 'BEST', 'BG', 'hm', 'tRNA', 'NEST', 'xx']