dipper.sources.ZFIN module

class dipper.sources.ZFIN.ZFIN(graph_type, are_bnodes_skolemized)

Bases: dipper.sources.Source.Source

This is the parser for the [Zebrafish Model Organism Database (ZFIN)](http://www.zfin.org), from which we process genotype and phenotype data for laboratory zebrafish.

We generate the zfin graph to include the following information: * genes * sequence alterations (includes SNPs/del/ins/indel and large chromosomal rearrangements) * transgenic constructs * morpholinos, talens, crisprs as expression-affecting reagents * genotypes, and their components * fish (as comprised of intrinsic and extrinsic genotypes) * publications (and their mapping to PMIDs, if available) * genotype-to-phenotype associations (including environments and stages at which they are assayed) * environmental components * orthology to human genes * genetic positional information for genes and sequence alterations * fish-to-disease model associations

Genotypes leverage the GENO genotype model and include both intrinsic and extrinsic genotypes. Where necessary, we create anonymous nodes of the genotype partonomy (such as for variant single locus complements, genomic variation complements, variant loci, extrinsic genotypes, and extrinsic genotype parts).

Furthermore, we process the genotype components to build labels in a monarch-style. This leads to genotype labels that include: * all genes targeted by reagents (morphants, crisprs, etc), in addition to the ones that the reagent was designed against. * all affected genes within deficiencies * complex hets being listed as gene<mutation1>/gene<mutation2> rather than gene<mutation1>/+; gene<mutation2>/+

fetch(is_dl_forced=False)

abstract method to fetch all data from an external resource. this should be overridden by subclasses :return: None

files = {'backgrounds': {'url': 'http://zfin.org/downloads/genotype_backgrounds.txt', 'file': 'genotype_backgrounds.txt'}, 'crispr': {'url': 'http://zfin.org/downloads/CRISPR.txt', 'file': 'CRISPR.txt'}, 'enviro': {'url': 'http://zfin.org/downloads/pheno_environment_fish.txt', 'file': 'pheno_environment_fish.txt'}, 'feature_affected_gene': {'url': 'http://zfin.org/downloads/features-affected-genes.txt', 'file': 'features-affected-genes.txt'}, 'features': {'url': 'http://zfin.org/downloads/features.txt', 'file': 'features.txt'}, 'fish_components': {'url': 'http://zfin.org/downloads/fish_components_fish.txt', 'file': 'fish_components_fish.txt'}, 'fish_disease_models': {'url': 'http://zfin.org/downloads/fish_model_disease.txt', 'file': 'fish_model_disease.txt'}, 'genbank': {'url': 'http://zfin.org/downloads/genbank.txt', 'file': 'genbank.txt'}, 'gene': {'url': 'http://zfin.org/downloads/gene.txt', 'file': 'gene.txt'}, 'gene_coordinates': {'url': 'http://zfin.org/downloads/E_zfin_gene_alias.gff3', 'file': 'E_zfin_gene_alias.gff3'}, 'gene_marker_rel': {'url': 'http://zfin.org/downloads/gene_marker_relationship.txt', 'file': 'gene_marker_relationship.txt'}, 'geno': {'url': 'http://zfin.org/downloads/genotype_features.txt', 'file': 'genotype_features.txt'}, 'human_orthos': {'url': 'http://zfin.org/downloads/human_orthos.txt', 'file': 'human_orthos.txt'}, 'mappings': {'url': 'http://zfin.org/downloads/mappings.txt', 'file': 'mappings.txt'}, 'morph': {'url': 'http://zfin.org/downloads/Morpholinos.txt', 'file': 'Morpholinos.txt'}, 'pheno': {'url': 'http://zfin.org/downloads/phenotype_fish.txt', 'file': 'phenotype_fish.txt'}, 'pub2pubmed': {'url': 'http://zfin.org/downloads/pub_to_pubmed_id_translation.txt', 'file': 'pub_to_pubmed_id_translation.txt'}, 'pubs': {'url': 'http://zfin.org/downloads/zfinpubs.txt', 'file': 'zfinpubs.txt'}, 'stage': {'url': 'http://zfin.org/Downloads/stage_ontology.txt', 'file': 'stage_ontology.txt'}, 'talen': {'url': 'http://zfin.org/downloads/TALEN.txt', 'file': 'TALEN.txt'}, 'uniprot': {'url': 'http://zfin.org/downloads/uniprot.txt', 'file': 'uniprot.txt'}, 'wild': {'url': 'http://zfin.org/downloads/wildtypes_fish.txt', 'file': 'wildtypes.txt'}, 'zpmap': {'url': 'http://compbio.charite.de/hudson/job/zp-owl-new/lastSuccessfulBuild/artifact/zp.annot_sourceinfo', 'file': 'zp-mapping.txt'}}
getTestSuite()

An abstract method that should be overwritten with tests appropriate for the specific source. :return:

get_orthology_evidence_code(abbrev)
get_orthology_sources_from_zebrafishmine()

Fetch the zfin gene to other species orthology annotations, together with the evidence for the assertion. Write the file locally to be read in a separate function. :return:

static make_targeted_gene_id(geneid, reagentid)
parse(limit=None)

abstract method to parse all data from an external resource, that was fetched in fetch() this should be overridden by subclasses :return: None

process_fish(limit=None)

Fish give identifiers to the “effective genotypes” that we create. We can match these by: Fish = (intrinsic) genotype + set of morpholinos

We assume here that the intrinsic genotypes and their parts will be processed separately, prior to calling this function.

Parameters:limit
Returns:
process_fish_disease_models(limit=None)
process_orthology_evidence(limit)
scrub()

Perform various data-scrubbing on the raw data files prior to parsing. For this resource, this currently includes: * remove oddities where there are “” instead of empty strings :return: None

test_ids = {'allele': ['ZDB-ALT-010426-4', 'ZDB-ALT-010427-8', 'ZDB-ALT-011017-8', 'ZDB-ALT-051005-2', 'ZDB-ALT-051227-8', 'ZDB-ALT-060221-2', 'ZDB-ALT-070314-1', 'ZDB-ALT-070409-1', 'ZDB-ALT-070420-6', 'ZDB-ALT-080528-1', 'ZDB-ALT-080528-6', 'ZDB-ALT-080827-15', 'ZDB-ALT-080908-7', 'ZDB-ALT-090316-1', 'ZDB-ALT-100519-1', 'ZDB-ALT-111024-1', 'ZDB-ALT-980203-1374', 'ZDB-ALT-980203-412', 'ZDB-ALT-980203-465', 'ZDB-ALT-980203-470', 'ZDB-ALT-980203-605', 'ZDB-ALT-980413-636', 'ZDB-ALT-021021-2', 'ZDB-ALT-080728-1', 'ZDB-ALT-100729-1', 'ZDB-ALT-980203-1560', 'ZDB-ALT-001127-6', 'ZDB-ALT-001129-2', 'ZDB-ALT-980203-1091', 'ZDB-ALT-070118-2', 'ZDB-ALT-991005-33', 'ZDB-ALT-020918-2', 'ZDB-ALT-040913-6', 'ZDB-ALT-980203-1827', 'ZDB-ALT-090504-6', 'ZDB-ALT-121218-1'], 'environment': ['ZDB-EXP-050202-1', 'ZDB-EXP-071005-3', 'ZDB-EXP-071227-14', 'ZDB-EXP-080428-1', 'ZDB-EXP-080428-2', 'ZDB-EXP-080501-1', 'ZDB-EXP-080805-7', 'ZDB-EXP-080806-5', 'ZDB-EXP-080806-8', 'ZDB-EXP-080806-9', 'ZDB-EXP-081110-3', 'ZDB-EXP-090505-2', 'ZDB-EXP-100330-7', 'ZDB-EXP-100402-1', 'ZDB-EXP-100402-2', 'ZDB-EXP-100422-3', 'ZDB-EXP-100511-5', 'ZDB-EXP-101025-12', 'ZDB-EXP-101025-13', 'ZDB-EXP-110926-4', 'ZDB-EXP-110927-1', 'ZDB-EXP-120809-5', 'ZDB-EXP-120809-7', 'ZDB-EXP-120809-9', 'ZDB-EXP-120913-5', 'ZDB-EXP-130222-13', 'ZDB-EXP-130222-7', 'ZDB-EXP-130904-2', 'ZDB-EXP-041102-1', 'ZDB-EXP-140822-13', 'ZDB-EXP-041102-1', 'ZDB-EXP-070129-3', 'ZDB-EXP-110929-7', 'ZDB-EXP-100520-2', 'ZDB-EXP-100920-3', 'ZDB-EXP-100920-5', 'ZDB-EXP-090601-2', 'ZDB-EXP-151116-3'], 'fish': ['ZDB-FISH-150901-17912', 'ZDB-FISH-150901-18649', 'ZDB-FISH-150901-26314', 'ZDB-FISH-150901-9418', 'ZDB-FISH-150901-14591', 'ZDB-FISH-150901-9997', 'ZDB-FISH-150901-23877', 'ZDB-FISH-150901-22128', 'ZDB-FISH-150901-14869', 'ZDB-FISH-150901-6695', 'ZDB-FISH-150901-24158', 'ZDB-FISH-150901-3631', 'ZDB-FISH-150901-20836', 'ZDB-FISH-150901-1060', 'ZDB-FISH-150901-8451', 'ZDB-FISH-150901-2423', 'ZDB-FISH-150901-20257', 'ZDB-FISH-150901-10002', 'ZDB-FISH-150901-12520', 'ZDB-FISH-150901-14833', 'ZDB-FISH-150901-2104', 'ZDB-FISH-150901-6607', 'ZDB-FISH-150901-1409'], 'gene': ['ZDB-GENE-000616-6', 'ZDB-GENE-000710-4', 'ZDB-GENE-030131-2773', 'ZDB-GENE-030131-8769', 'ZDB-GENE-030219-146', 'ZDB-GENE-030404-2', 'ZDB-GENE-030826-1', 'ZDB-GENE-030826-2', 'ZDB-GENE-040123-1', 'ZDB-GENE-040426-1309', 'ZDB-GENE-050522-534', 'ZDB-GENE-060503-719', 'ZDB-GENE-080405-1', 'ZDB-GENE-081211-2', 'ZDB-GENE-091118-129', 'ZDB-GENE-980526-135', 'ZDB-GENE-980526-166', 'ZDB-GENE-980526-196', 'ZDB-GENE-980526-265', 'ZDB-GENE-980526-299', 'ZDB-GENE-980526-41', 'ZDB-GENE-980526-437', 'ZDB-GENE-980526-44', 'ZDB-GENE-980526-481', 'ZDB-GENE-980526-561', 'ZDB-GENE-980526-89', 'ZDB-GENE-990415-181', 'ZDB-GENE-990415-72', 'ZDB-GENE-990415-75', 'ZDB-GENE-980526-44', 'ZDB-GENE-030421-3', 'ZDB-GENE-980526-196', 'ZDB-GENE-050320-62', 'ZDB-GENE-061013-403', 'ZDB-GENE-041114-104', 'ZDB-GENE-030131-9700', 'ZDB-GENE-031114-1', 'ZDB-GENE-990415-72', 'ZDB-GENE-030131-2211', 'ZDB-GENE-030131-3063', 'ZDB-GENE-030131-9460', 'ZDB-GENE-980526-26', 'ZDB-GENE-980526-27', 'ZDB-GENE-980526-29', 'ZDB-GENE-071218-6', 'ZDB-GENE-070912-423', 'ZDB-GENE-011207-1', 'ZDB-GENE-980526-284', 'ZDB-GENE-980526-72', 'ZDB-GENE-991129-7', 'ZDB-GENE-000607-83', 'ZDB-GENE-090504-2'], 'genotype': ['ZDB-GENO-010426-2', 'ZDB-GENO-010427-3', 'ZDB-GENO-010427-4', 'ZDB-GENO-050209-30', 'ZDB-GENO-051018-1', 'ZDB-GENO-070209-80', 'ZDB-GENO-070215-11', 'ZDB-GENO-070215-12', 'ZDB-GENO-070228-3', 'ZDB-GENO-070406-1', 'ZDB-GENO-070712-5', 'ZDB-GENO-070917-2', 'ZDB-GENO-080328-1', 'ZDB-GENO-080418-2', 'ZDB-GENO-080516-8', 'ZDB-GENO-080606-609', 'ZDB-GENO-080701-2', 'ZDB-GENO-080713-1', 'ZDB-GENO-080729-2', 'ZDB-GENO-080804-4', 'ZDB-GENO-080825-3', 'ZDB-GENO-091027-1', 'ZDB-GENO-091027-2', 'ZDB-GENO-091109-1', 'ZDB-GENO-100325-3', 'ZDB-GENO-100325-4', 'ZDB-GENO-100325-5', 'ZDB-GENO-100325-6', 'ZDB-GENO-100524-2', 'ZDB-GENO-100601-2', 'ZDB-GENO-100910-1', 'ZDB-GENO-111025-3', 'ZDB-GENO-120522-18', 'ZDB-GENO-121210-1', 'ZDB-GENO-130402-5', 'ZDB-GENO-980410-268', 'ZDB-GENO-080307-1', 'ZDB-GENO-960809-7', 'ZDB-GENO-990623-3', 'ZDB-GENO-130603-1', 'ZDB-GENO-001127-3', 'ZDB-GENO-001129-1', 'ZDB-GENO-090203-8', 'ZDB-GENO-070209-1', 'ZDB-GENO-070118-1', 'ZDB-GENO-140529-1', 'ZDB-GENO-070820-1', 'ZDB-GENO-071127-3', 'ZDB-GENO-000209-20', 'ZDB-GENO-980202-1565', 'ZDB-GENO-010924-10', 'ZDB-GENO-010531-2', 'ZDB-GENO-090504-5', 'ZDB-GENO-070215-11', 'ZDB-GENO-121221-1'], 'morpholino': ['ZDB-MRPHLNO-041129-1', 'ZDB-MRPHLNO-041129-2', 'ZDB-MRPHLNO-041129-3', 'ZDB-MRPHLNO-050308-1', 'ZDB-MRPHLNO-050308-3', 'ZDB-MRPHLNO-060508-2', 'ZDB-MRPHLNO-070118-1', 'ZDB-MRPHLNO-070522-3', 'ZDB-MRPHLNO-070706-1', 'ZDB-MRPHLNO-070725-1', 'ZDB-MRPHLNO-070725-2', 'ZDB-MRPHLNO-071005-1', 'ZDB-MRPHLNO-071227-1', 'ZDB-MRPHLNO-080307-1', 'ZDB-MRPHLNO-080428-1', 'ZDB-MRPHLNO-080430-1', 'ZDB-MRPHLNO-080919-4', 'ZDB-MRPHLNO-081110-3', 'ZDB-MRPHLNO-090106-5', 'ZDB-MRPHLNO-090114-1', 'ZDB-MRPHLNO-090505-1', 'ZDB-MRPHLNO-090630-11', 'ZDB-MRPHLNO-090804-1', 'ZDB-MRPHLNO-100728-1', 'ZDB-MRPHLNO-100823-6', 'ZDB-MRPHLNO-101105-3', 'ZDB-MRPHLNO-110323-3', 'ZDB-MRPHLNO-111104-5', 'ZDB-MRPHLNO-130222-4', 'ZDB-MRPHLNO-080430', 'ZDB-MRPHLNO-100823-6', 'ZDB-MRPHLNO-140822-1', 'ZDB-MRPHLNO-100520-4', 'ZDB-MRPHLNO-100520-5', 'ZDB-MRPHLNO-100920-3', 'ZDB-MRPHLNO-050604-1', 'ZDB-CRISPR-131113-1', 'ZDB-MRPHLNO-140430-12', 'ZDB-MRPHLNO-140430-13'], 'pub': ['PMID:11566854', 'PMID:12588855', 'PMID:12867027', 'PMID:14667409', 'PMID:15456722', 'PMID:16914492', 'PMID:17374715', 'PMID:17545503', 'PMID:17618647', 'PMID:17785424', 'PMID:18201692', 'PMID:18358464', 'PMID:18388326', 'PMID:18638469', 'PMID:18846223', 'PMID:19151781', 'PMID:19759004', 'PMID:19855021', 'PMID:20040115', 'PMID:20138861', 'PMID:20306498', 'PMID:20442775', 'PMID:20603019', 'PMID:21147088', 'PMID:21893049', 'PMID:21925157', 'PMID:22718903', 'PMID:22814753', 'PMID:22960038', 'PMID:22996643', 'PMID:23086717', 'PMID:23203810', 'PMID:23760954', 'ZFIN:ZDB-PUB-140303-33', 'ZFIN:ZDB-PUB-140404-9', 'ZFIN:ZDB-PUB-080902-16', 'ZFIN:ZDB-PUB-101222-7', 'ZFIN:ZDB-PUB-140614-2', 'ZFIN:ZDB-PUB-120927-26', 'ZFIN:ZDB-PUB-100504-5', 'ZFIN:ZDB-PUB-140513-341']}