dipper.sources.IMPC module

class dipper.sources.IMPC.IMPC(graph_type, are_bnodes_skolemized, data_release_version=None)

Bases: dipper.sources.Source.Source

From the [IMPC](https://mousephenotype.org) website: The IMPC is generating a knockout mouse strain for every protein coding gene by using the embryonic stem cell resource generated by the International Knockout Mouse Consortium (IKMC). Systematic broad-based phenotyping is performed by each IMPC center using standardized procedures found within the International Mouse Phenotyping Resource of Standardised Screens (IMPReSS) resource. Gene-to-phenotype associations are made by a versioned statistical analysis with all data freely available by this web portal and by several data download features.

Here, we pull the data and model the genotypes using GENO and the genotype-to-phenotype associations using the OBAN schema.

We use all identifiers given by the IMPC with a few exceptions:

  • For identifiers that IMPC provides, but does not resolve,

we instantiate them as Blank Nodes. Examples include things with the pattern of: UROALL, EUROCURATE, NULL-*,

  • We mint three identifiers:
  1. Intrinsic genotypes not including sex, based on:
  • colony_id (ES cell line + phenotyping center)
  • strain
  • zygosity
  1. For the Effective genotypes that are attached to the phenotypes:
  • colony_id (ES cell line + phenotyping center)
  • strain
  • zygosity
  • sex

3. Associations based on: effective_genotype_id + phenotype_id + phenotyping_center + pipeline_stable_id + procedure_stable_id + parameter_stable_id

We DO NOT yet add the assays as evidence for the G2P associations here. To be added in the future.

compare_checksums()

test to see if fetched file matches checksum from ebi :return: True or False

fetch(is_dl_forced=False)

abstract method to fetch all data from an external resource. this should be overridden by subclasses :return: None

files = {'checksum': {'file': 'genotype-phenotype-assertions-ALL.csv.tgz.md5', 'url': 'ftp://ftp.ebi.ac.uk/pub/databases/impc/all-data-releases/latest/results/genotype-phenotype-assertions-ALL.csv.tgz.md5'}, 'evidence': {'columns': ['evidence', 'stable', 'key'], 'file': 'impc_evidence_stable_key.tsv', 'url': 'https://archive.monarchinitiative.org/DipperCache/impc/impc_evidence_stable_key.tsv'}, 'g2p_assertions': {'columns': ['marker_accession_id', 'marker_symbol', 'phenotyping_center', 'colony_id', 'sex', 'zygosity', 'allele_accession_id', 'allele_symbol', 'allele_name', 'strain_accession_id', 'strain_name', 'project_name', 'project_fullname', 'pipeline_name', 'pipeline_stable_id', 'procedure_stable_id', 'procedure_name', 'parameter_stable_id', 'parameter_name', 'top_level_mp_term_id', 'top_level_mp_term_name', 'mp_term_id', 'mp_term_name', 'p_value', 'percentage_change', 'effect_size', 'statistical_method', 'resource_name'], 'file': 'genotype-phenotype-assertions-ALL.csv.gz', 'url': 'ftp://ftp.ebi.ac.uk/pub/databases/impc/all-data-releases/latest/results/genotype-phenotype-assertions-ALL.csv.tgz'}}
getTestSuite()

An abstract method that should be overwritten with tests appropriate for the specific source. :return:

parse(limit=None)

IMPC data is delivered in three separate csv files OR in one integrated file, each with the same file format.

Parameters:limit
Returns:
parse_checksum_file(file)

:param file :return dict