dipper.sources.MGI module¶
-
class
dipper.sources.MGI.
MGI
(graph_type, are_bnodes_skolemized, data_release_version=None)¶ Bases:
dipper.sources.PostgreSQLSource.PostgreSQLSource
This is the [Mouse Genome Informatics](http://www.informatics.jax.org/) resource, from which we process genotype and phenotype data about laboratory mice. Genotypes leverage the GENO genotype model.
Here, we connect to their public database, and download a subset of tables/views to get specifically at the geno-pheno data, then iterate over the tables. We end up effectively performing joins when adding nodes to the graph. In order to use this parser, you will need to have user/password connection details in your conf.yaml file, like: dbauth : {‘mgi’ : {‘user’ : ‘<username>’, ‘password’ : ‘<password>’}} You can request access by contacting mgi-help@jax.org
-
fetch
(is_dl_forced=False)¶ For the MGI resource, we connect to the remote database, and pull the tables into local files. We’ll check the local table versions against the remote version :return:
-
fetch_transgene_genes_from_db
(cxn)¶ This is a custom query to fetch the non-mouse genes that are part of transgene alleles.
Parameters: cxn – Returns:
-
parse
(limit=None)¶ We process each of the postgres tables in turn. The order of processing is important here, as we build up a hashmap of internal vs external identifers (unique keys by type to MGI id). These include allele, marker (gene), publication, strain, genotype, annotation (association), and descriptive notes. :param limit: Only parse this many rows in each table :return:
-
process_mgi_note_allele_view
(limit=None)¶ These are the descriptive notes about the alleles. Note that these notes have embedded HTML - should we do anything about that? :param limit: :return:
-
process_mgi_relationship_transgene_genes
(limit=None)¶ Here, we have the relationship between MGI transgene alleles, and the non-mouse gene ids that are part of them. We augment the allele with the transgene parts.
Parameters: limit – Returns:
-
resources
= {'query_map': [{'query': '../../resources/sql/mgi/mgi_dbinfo.sql', 'outfile': 'mgi_dbinfo', 'Force': True}, {'query': '../../resources/sql/mgi/gxd_genotype_view.sql', 'outfile': 'gxd_genotype_view'}, {'query': '../../resources/sql/mgi/gxd_genotype_summary_view.sql', 'outfile': 'gxd_genotype_summary_view'}, {'query': '../../resources/sql/mgi/gxd_allelepair_view.sql', 'outfile': 'gxd_allelepair_view'}, {'query': '../../resources/sql/mgi/all_summary_view.sql', 'outfile': 'all_summary_view'}, {'query': '../../resources/sql/mgi/all_allele_view.sql', 'outfile': 'all_allele_view'}, {'query': '../../resources/sql/mgi/all_allele_mutation_view.sql', 'outfile': 'all_allele_mutation_view'}, {'query': '../../resources/sql/mgi/mrk_marker_view.sql', 'outfile': 'mrk_marker_view'}, {'query': '../../resources/sql/mgi/voc_annot_view.sql', 'outfile': 'voc_annot_view'}, {'query': '../../resources/sql/mgi/evidence.sql', 'outfile': 'evidence_view'}, {'query': '../../resources/sql/mgi/bib_acc_view.sql', 'outfile': 'bib_acc_view'}, {'query': '../../resources/sql/mgi/prb_strain_view.sql', 'outfile': 'prb_strain_view'}, {'query': '../../resources/sql/mgi/mrk_summary_view.sql', 'outfile': 'mrk_summary_view'}, {'query': '../../resources/sql/mgi/mrk_acc_view.sql', 'outfile': 'mrk_acc_view'}, {'query': '../../resources/sql/mgi/prb_strain_acc_view.sql', 'outfile': 'prb_strain_acc_view'}, {'query': '../../resources/sql/mgi/prb_strain_genotype_view.sql', 'outfile': 'prb_strain_genotype_view'}, {'query': '../../resources/sql/mgi/mgi_note_vocevidence_view.sql', 'outfile': 'mgi_note_vocevidence_view'}, {'query': '../../resources/sql/mgi/mgi_note_allele_view.sql', 'outfile': 'mgi_note_allele_view'}, {'query': '../../resources/sql/mgi/mrk_location_cache.sql', 'outfile': 'mrk_location_cache'}], 'test_keys': '../../resources/mgi_test_keys.yaml'}¶
-
tables
= {'all_allele_mutation_view': {'columns': ['_allele_key', 'mutation']}, 'all_allele_view': {'columns': ['_allele_key', '_marker_key', '_strain_key', 'symbol', 'name', 'iswildtype']}, 'all_summary_view': {'columns': ['_object_key', 'preferred', 'mgiid', 'description', 'short_description']}, 'bib_acc_view': {'columns': ['accid', 'prefixpart', 'numericpart', '_object_key', 'logicaldb', '_logicaldb_key']}, 'evidence_view': {'columns': ['_annotevidence_key', '_annot_key', 'evidencecode', 'jnumid', 'term', 'value', 'annottype']}, 'gxd_allelepair_view': {'columns': ['_allelepair_key', '_genotype_key', '_allele_key_1', '_allele_key_2', 'allele1', 'allele2', 'allelestate']}, 'gxd_genotype_summary_view': {'columns': ['_object_key', 'preferred', 'mgiid', 'subtype', 'short_description']}, 'gxd_genotype_view': {'columns': ['_genotype_key', '_strain_key', 'strain', 'mgiid']}, 'mgi_note_allele_view': {'columns': ['_object_key', 'notetype', 'note', 'sequencenum']}, 'mgi_note_vocevidence_view': {'columns': ['_object_key', 'note']}, 'mgi_relationship_transgene_genes': {'columns': ['rel_key', 'object_1', 'allele_id', 'allele_label', 'category_key', 'category_name', 'property_key', 'property_name', 'property_value']}, 'mrk_acc_view': {'columns': ['accid', 'prefixpart', '_logicaldb_key', '_object_key', 'preferred', '_organism_key']}, 'mrk_location_cache': {'columns': ['_marker_key', '_organism_key', 'chromosome', 'startcoordinate', 'endcoordinate', 'strand', 'version']}, 'mrk_marker_view': {'columns': ['_marker_key', '_organism_key', '_marker_status_key', 'symbol', 'name', 'latinname', 'markertype']}, 'mrk_summary_view': {'columns': ['accid', '_logicaldb_key', '_object_key', 'preferred', 'mgiid', 'subtype', 'short_description']}, 'prb_strain_acc_view': {'columns': ['accid', 'prefixpart', '_logicaldb_key', '_object_key', 'preferred']}, 'prb_strain_genotype_view': {'columns': ['_strain_key', '_genotype_key']}, 'prb_strain_view': {'columns': ['_strain_key', 'strain', 'species']}, 'voc_annot_view': {'columns': ['_annot_key', 'annottype', '_object_key', '_term_key', '_qualifier_key', 'qualifier', 'term', 'accid']}}¶
-
unknown_taxa
= ['Not Applicable', 'Not Specified']¶
-