dipper.sources.MGI module

class dipper.sources.MGI.MGI(graph_type, are_bnodes_skolemized, data_release_version=None)

Bases: dipper.sources.PostgreSQLSource.PostgreSQLSource

This is the [Mouse Genome Informatics](http://www.informatics.jax.org/) resource, from which we process genotype and phenotype data about laboratory mice. Genotypes leverage the GENO genotype model.

Here, we connect to their public database, and download a subset of tables/views to get specifically at the geno-pheno data, then iterate over the tables. We end up effectively performing joins when adding nodes to the graph. In order to use this parser, you will need to have user/password connection details in your conf.yaml file, like: dbauth : {‘mgi’ : {‘user’ : ‘<username>’, ‘password’ : ‘<password>’}} You can request access by contacting mgi-help@jax.org

fetch(is_dl_forced=False)

For the MGI resource, we connect to the remote database, and pull the tables into local files. We’ll check the local table versions against the remote version :return:

fetch_transgene_genes_from_db(cxn)

This is a custom query to fetch the non-mouse genes that are part of transgene alleles.

Parameters:cxn
Returns:
parse(limit=None)

We process each of the postgres tables in turn. The order of processing is important here, as we build up a hashmap of internal vs external identifers (unique keys by type to MGI id). These include allele, marker (gene), publication, strain, genotype, annotation (association), and descriptive notes. :param limit: Only parse this many rows in each table :return:

process_mgi_note_allele_view(limit=None)

These are the descriptive notes about the alleles. Note that these notes have embedded HTML - should we do anything about that? :param limit: :return:

process_mgi_relationship_transgene_genes(limit=None)

Here, we have the relationship between MGI transgene alleles, and the non-mouse gene ids that are part of them. We augment the allele with the transgene parts.

Parameters:limit
Returns:
resources = {'query_map': [{'query': '../../resources/sql/mgi/mgi_dbinfo.sql', 'outfile': 'mgi_dbinfo', 'Force': True}, {'query': '../../resources/sql/mgi/gxd_genotype_view.sql', 'outfile': 'gxd_genotype_view'}, {'query': '../../resources/sql/mgi/gxd_genotype_summary_view.sql', 'outfile': 'gxd_genotype_summary_view'}, {'query': '../../resources/sql/mgi/gxd_allelepair_view.sql', 'outfile': 'gxd_allelepair_view'}, {'query': '../../resources/sql/mgi/all_summary_view.sql', 'outfile': 'all_summary_view'}, {'query': '../../resources/sql/mgi/all_allele_view.sql', 'outfile': 'all_allele_view'}, {'query': '../../resources/sql/mgi/all_allele_mutation_view.sql', 'outfile': 'all_allele_mutation_view'}, {'query': '../../resources/sql/mgi/mrk_marker_view.sql', 'outfile': 'mrk_marker_view'}, {'query': '../../resources/sql/mgi/voc_annot_view.sql', 'outfile': 'voc_annot_view'}, {'query': '../../resources/sql/mgi/evidence.sql', 'outfile': 'evidence_view'}, {'query': '../../resources/sql/mgi/bib_acc_view.sql', 'outfile': 'bib_acc_view'}, {'query': '../../resources/sql/mgi/prb_strain_view.sql', 'outfile': 'prb_strain_view'}, {'query': '../../resources/sql/mgi/mrk_summary_view.sql', 'outfile': 'mrk_summary_view'}, {'query': '../../resources/sql/mgi/mrk_acc_view.sql', 'outfile': 'mrk_acc_view'}, {'query': '../../resources/sql/mgi/prb_strain_acc_view.sql', 'outfile': 'prb_strain_acc_view'}, {'query': '../../resources/sql/mgi/prb_strain_genotype_view.sql', 'outfile': 'prb_strain_genotype_view'}, {'query': '../../resources/sql/mgi/mgi_note_vocevidence_view.sql', 'outfile': 'mgi_note_vocevidence_view'}, {'query': '../../resources/sql/mgi/mgi_note_allele_view.sql', 'outfile': 'mgi_note_allele_view'}, {'query': '../../resources/sql/mgi/mrk_location_cache.sql', 'outfile': 'mrk_location_cache'}], 'test_keys': '../../resources/mgi_test_keys.yaml'}
tables = {'all_allele_mutation_view': {'columns': ['_allele_key', 'mutation']}, 'all_allele_view': {'columns': ['_allele_key', '_marker_key', '_strain_key', 'symbol', 'name', 'iswildtype']}, 'all_summary_view': {'columns': ['_object_key', 'preferred', 'mgiid', 'description', 'short_description']}, 'bib_acc_view': {'columns': ['accid', 'prefixpart', 'numericpart', '_object_key', 'logicaldb', '_logicaldb_key']}, 'evidence_view': {'columns': ['_annotevidence_key', '_annot_key', 'evidencecode', 'jnumid', 'term', 'value', 'annottype']}, 'gxd_allelepair_view': {'columns': ['_allelepair_key', '_genotype_key', '_allele_key_1', '_allele_key_2', 'allele1', 'allele2', 'allelestate']}, 'gxd_genotype_summary_view': {'columns': ['_object_key', 'preferred', 'mgiid', 'subtype', 'short_description']}, 'gxd_genotype_view': {'columns': ['_genotype_key', '_strain_key', 'strain', 'mgiid']}, 'mgi_note_allele_view': {'columns': ['_object_key', 'notetype', 'note', 'sequencenum']}, 'mgi_note_vocevidence_view': {'columns': ['_object_key', 'note']}, 'mgi_relationship_transgene_genes': {'columns': ['rel_key', 'object_1', 'allele_id', 'allele_label', 'category_key', 'category_name', 'property_key', 'property_name', 'property_value']}, 'mrk_acc_view': {'columns': ['accid', 'prefixpart', '_logicaldb_key', '_object_key', 'preferred', '_organism_key']}, 'mrk_location_cache': {'columns': ['_marker_key', '_organism_key', 'chromosome', 'startcoordinate', 'endcoordinate', 'strand', 'version']}, 'mrk_marker_view': {'columns': ['_marker_key', '_organism_key', '_marker_status_key', 'symbol', 'name', 'latinname', 'markertype']}, 'mrk_summary_view': {'columns': ['accid', '_logicaldb_key', '_object_key', 'preferred', 'mgiid', 'subtype', 'short_description']}, 'prb_strain_acc_view': {'columns': ['accid', 'prefixpart', '_logicaldb_key', '_object_key', 'preferred']}, 'prb_strain_genotype_view': {'columns': ['_strain_key', '_genotype_key']}, 'prb_strain_view': {'columns': ['_strain_key', 'strain', 'species']}, 'voc_annot_view': {'columns': ['_annot_key', 'annottype', '_object_key', '_term_key', '_qualifier_key', 'qualifier', 'term', 'accid']}}
unknown_taxa = ['Not Applicable', 'Not Specified']