dipper.sources.ZFIN module

class dipper.sources.ZFIN.ZFIN(graph_type, are_bnodes_skolemized, data_release_version=None)

Bases: dipper.sources.Source.Source

This is the parser for the [Zebrafish Model Organism Database (ZFIN)](http://www.zfin.org), from which we process genotype and phenotype data for laboratory zebrafish.

We generate the zfin graph to include the following information: * genes * sequence alterations (includes SNPs/del/ins/indel and large chromosomal rearrangements) * transgenic constructs * morpholinos, talens, crisprs as expression-affecting reagents * genotypes, and their components * fish (as comprised of intrinsic and extrinsic genotypes) * publications (and their mapping to PMIDs, if available) * genotype-to-phenotype associations (including environments and stages at which they are assayed) * environmental components * orthology to human genes * genetic positional information for genes and sequence alterations * fish-to-disease model associations

Genotypes leverage the GENO genotype model and include both intrinsic and extrinsic genotypes. Where necessary, we create anonymous nodes of the genotype partonomy (such as for variant single locus complements, genomic variation complements, variant loci, extrinsic genotypes, and extrinsic genotype parts).

Genotype labels are output as ZFIN genotype name + “[background]”. We also process the genotype components to build labels in a monarch-style, and these are added as synonyms. The monarch-style genotype label includes: * all genes targeted by reagents (morphants, crisprs, etc), in addition to the ones that the reagent was designed against. * all affected genes within deficiencies * complex hets being listed as gene<mutation1>/gene<mutation2> rather than gene<mutation1>/+; gene<mutation2>/+

see: resources/zfin/README for column extraction from downloads page

fetch(is_dl_forced=False)

abstract method to fetch all data from an external resource. this should be overridden by subclasses :return: None

fhandle = <_io.TextIOWrapper name='/home/docs/checkouts/readthedocs.org/user_builds/dipper/checkouts/latest/dipper/sources/../../tests/resources/zfin/zfin_test_ids.yaml' mode='r' encoding='UTF-8'>
files = {'backgrounds': {'columns': ['Genotype ID', 'Genotype Name', 'Background', 'Background Name'], 'file': 'genotype_backgrounds.txt', 'url': 'http://zfin.org/downloads/genotype_backgrounds.txt'}, 'crispr': {'columns': ['Gene ID', 'Gene SO ID', 'Gene Symbol', 'CRISPR ID', 'CRISPR SO ID', 'CRISPR Symbol', 'CRISPR Target Sequence', 'Publication(s)', 'Note'], 'file': 'CRISPR.txt', 'url': 'http://zfin.org/downloads/CRISPR.txt'}, 'enviro': {'columns': ['Environment ID', 'ZECO Term Name', 'ZECO Term ID (ZECO:ID)', 'Chebi Term Name', 'Chebi Term ID (Chebi:ID)', 'ZFA Term Name', 'ZFA Term ID (ZFA:ID)', 'Affected Structure Subterm Name', 'Affected Structure Subterm ID (GO-CC:ID)', 'NCBI Taxon Name', 'NCBI Taxon ID (NCBI Taxon:ID)'], 'file': 'pheno_environment_fish.txt', 'url': 'http://zfin.org/downloads/pheno_environment_fish.txt'}, 'feature_affected_gene': {'columns': ['Genomic Feature ID', 'Feature SO ID', 'Genomic Feature Abbreviation', 'Gene Symbol', 'Gene ID', 'Gene SO ID', 'Genomic Feature - Marker Relationship', 'Feature Type', 'DNA/cDNA Change SO ID', 'Reference Nucleotide', 'Mutant Nucleotide', 'Base Pairs Added', 'Base Pairs Removed', 'DNA/cDNA Change Position Start', 'DNA/cDNA Change Position End', 'DNA/cDNA Reference Sequence', 'DNA/cDNA Change Localization', 'DNA/cDNA Change Localization SO ID', 'DNA/cDNA Change Localization Exon', 'DNA/cDNA Change Localization Intron', 'Transcript Consequence', 'Transcript Consequence SO ID', 'Transcript Consequence Exon', 'Transcript Consequence Intron', 'Protein Consequence', 'Protein Consequence SO ID', 'Reference Amino Acid', 'Mutant Amino Acid', 'Amino Acids Added', 'Amino Acids Removed', 'Protein Consequence Position Start', 'Protein Consequence Position End', 'Protein Reference Sequence'], 'file': 'features-affected-genes.txt', 'url': 'http://zfin.org/downloads/features-affected-genes.txt'}, 'features': {'columns': ['Genomic Feature ID', 'Feature SO ID', 'Genomic Feature Abbreviation', 'Genomic Feature Name', 'Genomic Feature Type', 'Mutagen', 'Mutagee', 'Construct ID', 'Construct name', 'Construct SO ID', 'TALEN/CRISPR ID', 'TALEN/CRISPR Name'], 'file': 'features.txt', 'url': 'http://zfin.org/downloads/features.txt'}, 'fish_components': {'columns': ['Fish ID', 'Fish Name', 'Gene ID', 'Gene Symbol', 'Affector ID', 'Affector Symbol', 'Construct ID', 'Construct Symbol', 'Background ID', 'Background Name', 'Genotype ID', 'Genotype Name'], 'file': 'fish_components_fish.txt', 'url': 'http://zfin.org/downloads/fish_components_fish.txt'}, 'fish_disease_models': {'columns': ['Fish ZDB ID', 'Environment ZDB ID', 'is_a_model', 'DO Term ID', 'DO Term Name', 'Publication ZDB ID', 'PubMed ID', 'Evidence Code'], 'file': 'fish_model_disease.txt', 'url': 'http://zfin.org/downloads/fish_model_disease.txt'}, 'genbank': {'columns': ['ZFIN ID', 'SO ID', 'Name', 'GenBank ID'], 'file': 'genbank.txt', 'url': 'http://zfin.org/downloads/genbank.txt'}, 'gene': {'columns': ['ZFIN ID', 'SO ID', 'Symbol', 'NCBI Gene ID'], 'file': 'gene.txt', 'url': 'http://zfin.org/downloads/gene.txt'}, 'gene_coordinates': {'columns': ['Chromosome', 'Source', 'Type', 'Start', 'End', 'Score', 'Strand', 'Phase', 'Attributes'], 'file': 'E_zfin_gene_alias.gff3', 'url': 'http://zfin.org/downloads/E_zfin_gene_alias.gff3'}, 'gene_marker_rel': {'columns': ['Gene ID', 'Gene SO ID', 'Gene Symbol', 'Marker ID', 'Marker SO ID', 'Marker Symbol', 'Relationship'], 'file': 'gene_marker_relationship.txt', 'url': 'http://zfin.org/downloads/gene_marker_relationship.txt'}, 'geno': {'columns': ['Genotype ID', 'Genotype Name', 'Genotye Unique Name', 'Allele ID', 'Allele Name', 'Allele Abbreviation', 'Allele Type', 'Allele Display Type', 'Gene or Construct Symbol', 'Corresponding ZFIN Gene ID/Construct ID', 'Allele Zygosity', 'Construct Name', 'Construct ZdbId'], 'file': 'genotype_features.txt', 'url': 'http://zfin.org/downloads/genotype_features.txt'}, 'human_orthos': {'columns': ['ZFIN ID', 'ZFIN Symbol', 'ZFIN Name', 'Human Symbol', 'Human Name', 'OMIM ID', 'Gene ID', 'HGNC ID', 'Evidence', 'Pub ID'], 'file': 'human_orthos.txt', 'url': 'http://zfin.org/downloads/human_orthos.txt'}, 'mappings': {'columns': ['ZFIN ID', 'Symbol', 'SO_id', 'Panel Symbol', 'Chromosome', 'Location', 'Metric'], 'file': 'mappings.txt', 'url': 'http://zfin.org/downloads/mappings.txt'}, 'morph': {'columns': ['Gene ID', 'Gene SO ID', 'Gene Symbol', 'Morpholino ID', 'Morpholino SO ID', 'Morpholino Symbol', 'Morpholino Sequence', 'Publication(s)', 'Note'], 'file': 'Morpholinos.txt', 'url': 'http://zfin.org/downloads/Morpholinos.txt'}, 'pheno': {'columns': ['Fish ID', 'Fish Name', 'Start Stage ID', 'Start Stage Name', 'End Stage ID', 'End Stage Name', 'Affected Structure or Process 1 subterm ID', 'Affected Structure or Process 1 subterm Name', 'Post-composed Relationship ID', 'Post-composed Relationship Name', 'Affected Structure or Process 1 superterm ID', 'Affected Structure or Process 1 superterm Name', 'Phenotype Keyword ID', 'Phenotype Keyword Name', 'Phenotype Tag', 'Affected Structure or Process 2 subterm ID', 'Affected Structure or Process 2 subterm name', 'Post-composed Relationship (rel) ID', 'Post-composed Relationship (rel) Name', 'Affected Structure or Process 2 superterm ID', 'Affected Structure or Process 2 superterm name', 'Publication ID', 'Environment ID'], 'file': 'phenotype_fish.txt', 'url': 'http://zfin.org/downloads/phenotype_fish.txt'}, 'pub2pubmed': {'columns': ['Publication ZFIN ID', 'PubMed ID (none or blank when not available)'], 'file': 'pub_to_pubmed_id_translation.txt', 'url': 'http://zfin.org/downloads/pub_to_pubmed_id_translation.txt'}, 'pubs': {'columns': ['Publication ID', 'pubMed ID (none or blank when not available)', 'Authors', 'Title', 'Journal', 'Year', 'Volume', 'Pages'], 'file': 'zfinpubs.txt', 'url': 'http://zfin.org/downloads/zfinpubs.txt'}, 'stage': {'columns': ['Stage ID', 'Stage OBO ID', 'Stage Name', 'Begin Hours', 'End Hours'], 'file': 'stage_ontology.txt', 'url': 'http://zfin.org/Downloads/stage_ontology.txt'}, 'talen': {'columns': ['Gene ID', 'Gene SO ID', 'Gene Symbol', 'TALEN ID', 'TALEN SO ID', 'TALEN Symbol', 'TALEN Target Sequence 1', 'TALEN Target Sequence 2', 'Publication(s)', 'Note'], 'file': 'TALEN.txt', 'url': 'http://zfin.org/downloads/TALEN.txt'}, 'uniprot': {'columns': ['ZFIN ID', 'SO ID', 'Symbol', 'UniProt ID'], 'file': 'uniprot.txt', 'url': 'http://zfin.org/downloads/uniprot.txt'}, 'wild': {'columns': ['Fish ID', 'Fish Name', 'Fish Abbreviation', 'Genotype ID'], 'file': 'wildtypes.txt', 'url': 'http://zfin.org/downloads/wildtypes_fish.txt'}, 'zmine_ortho_evidence': {'columns': ['zfin_gene_num', 'zfin_gene_symbol', 'ortholog_gene_symbol', 'ortholog_ncbigene_num', 'evidence_code', 'zfin_pub_num', 'pubmed_num'], 'file': 'zmine_ortho_evidence.txt', 'url': 'http://0.0.0.0'}, 'zpmap': {'columns': ['iri', 'id'], 'file': 'id_map_zfin.tsv', 'url': 'http://purl.obolibrary.org/obo/zp//id_map_zfin.tsv'}}
static get_orthology_evidence_code(abbrev)

move to localtt & globltt

get_orthology_sources_from_zebrafishmine()

Fetch the zfin gene to other species orthology annotations, together with the evidence for the assertion. Write the file locally to be read in a separate function. :return:

static make_targeted_gene_id(geneid, reagentid)
parse(limit=None)

abstract method to parse all data from an external resource, that was fetched in fetch() this should be overridden by subclasses :return: None

process_fish(limit=None)

Fish give identifiers to the “effective genotypes” that we create. We can match these by: Fish = (intrinsic) genotype + set of morpholinos

We assume here that the intrinsic genotypes and their parts will be processed separately, prior to calling this function.

Parameters:limit
Returns:
process_fish_disease_models(limit=None)
process_orthology_evidence(limit)
test_ids = {'allele': ['ZDB-ALT-010426-4', 'ZDB-ALT-010427-8', 'ZDB-ALT-011017-8', 'ZDB-ALT-051005-2', 'ZDB-ALT-051227-8', 'ZDB-ALT-060221-2', 'ZDB-ALT-070314-1', 'ZDB-ALT-070409-1', 'ZDB-ALT-070420-6', 'ZDB-ALT-080528-1', 'ZDB-ALT-080528-6', 'ZDB-ALT-080827-15', 'ZDB-ALT-080908-7', 'ZDB-ALT-090316-1', 'ZDB-ALT-100519-1', 'ZDB-ALT-111024-1', 'ZDB-ALT-980203-1374', 'ZDB-ALT-980203-412', 'ZDB-ALT-980203-465', 'ZDB-ALT-980203-470', 'ZDB-ALT-980203-605', 'ZDB-ALT-980413-636', 'ZDB-ALT-021021-2', 'ZDB-ALT-080728-1', 'ZDB-ALT-100729-1', 'ZDB-ALT-980203-1560', 'ZDB-ALT-001127-6', 'ZDB-ALT-001129-2', 'ZDB-ALT-980203-1091', 'ZDB-ALT-070118-2', 'ZDB-ALT-991005-33', 'ZDB-ALT-020918-2', 'ZDB-ALT-040913-6', 'ZDB-ALT-980203-1827', 'ZDB-ALT-090504-6', 'ZDB-ALT-121218-1'], 'environment': ['ZDB-EXP-050202-1', 'ZDB-EXP-071005-3', 'ZDB-EXP-071227-14', 'ZDB-EXP-080428-1', 'ZDB-EXP-080428-2', 'ZDB-EXP-080501-1', 'ZDB-EXP-080805-7', 'ZDB-EXP-080806-5', 'ZDB-EXP-080806-8', 'ZDB-EXP-080806-9', 'ZDB-EXP-081110-3', 'ZDB-EXP-090505-2', 'ZDB-EXP-100330-7', 'ZDB-EXP-100402-1', 'ZDB-EXP-100402-2', 'ZDB-EXP-100422-3', 'ZDB-EXP-100511-5', 'ZDB-EXP-101025-12', 'ZDB-EXP-101025-13', 'ZDB-EXP-110926-4', 'ZDB-EXP-110927-1', 'ZDB-EXP-120809-5', 'ZDB-EXP-120809-7', 'ZDB-EXP-120809-9', 'ZDB-EXP-120913-5', 'ZDB-EXP-130222-13', 'ZDB-EXP-130222-7', 'ZDB-EXP-130904-2', 'ZDB-EXP-041102-1', 'ZDB-EXP-140822-13', 'ZDB-EXP-041102-1', 'ZDB-EXP-070129-3', 'ZDB-EXP-110929-7', 'ZDB-EXP-100520-2', 'ZDB-EXP-100920-3', 'ZDB-EXP-100920-5', 'ZDB-EXP-090601-2', 'ZDB-EXP-151116-3'], 'fish': ['ZDB-FISH-150901-17912', 'ZDB-FISH-150901-18649', 'ZDB-FISH-150901-26314', 'ZDB-FISH-150901-9418', 'ZDB-FISH-150901-14591', 'ZDB-FISH-150901-9997', 'ZDB-FISH-150901-23877', 'ZDB-FISH-150901-22128', 'ZDB-FISH-150901-14869', 'ZDB-FISH-150901-6695', 'ZDB-FISH-150901-24158', 'ZDB-FISH-150901-3631', 'ZDB-FISH-150901-20836', 'ZDB-FISH-150901-1060', 'ZDB-FISH-150901-8451', 'ZDB-FISH-150901-2423', 'ZDB-FISH-150901-20257', 'ZDB-FISH-150901-10002', 'ZDB-FISH-150901-12520', 'ZDB-FISH-150901-14833', 'ZDB-FISH-150901-2104', 'ZDB-FISH-150901-6607', 'ZDB-FISH-150901-1409'], 'gene': ['ZDB-GENE-000616-6', 'ZDB-GENE-000710-4', 'ZDB-GENE-030131-2773', 'ZDB-GENE-030131-8769', 'ZDB-GENE-030219-146', 'ZDB-GENE-030404-2', 'ZDB-GENE-030826-1', 'ZDB-GENE-030826-2', 'ZDB-GENE-040123-1', 'ZDB-GENE-040426-1309', 'ZDB-GENE-050522-534', 'ZDB-GENE-060503-719', 'ZDB-GENE-080405-1', 'ZDB-GENE-081211-2', 'ZDB-GENE-091118-129', 'ZDB-GENE-980526-135', 'ZDB-GENE-980526-166', 'ZDB-GENE-980526-196', 'ZDB-GENE-980526-265', 'ZDB-GENE-980526-299', 'ZDB-GENE-980526-41', 'ZDB-GENE-980526-437', 'ZDB-GENE-980526-44', 'ZDB-GENE-980526-481', 'ZDB-GENE-980526-561', 'ZDB-GENE-980526-89', 'ZDB-GENE-990415-181', 'ZDB-GENE-990415-72', 'ZDB-GENE-990415-75', 'ZDB-GENE-980526-44', 'ZDB-GENE-030421-3', 'ZDB-GENE-980526-196', 'ZDB-GENE-050320-62', 'ZDB-GENE-061013-403', 'ZDB-GENE-041114-104', 'ZDB-GENE-030131-9700', 'ZDB-GENE-031114-1', 'ZDB-GENE-990415-72', 'ZDB-GENE-030131-2211', 'ZDB-GENE-030131-3063', 'ZDB-GENE-030131-9460', 'ZDB-GENE-980526-26', 'ZDB-GENE-980526-27', 'ZDB-GENE-980526-29', 'ZDB-GENE-071218-6', 'ZDB-GENE-070912-423', 'ZDB-GENE-011207-1', 'ZDB-GENE-980526-284', 'ZDB-GENE-980526-72', 'ZDB-GENE-991129-7', 'ZDB-GENE-000607-83', 'ZDB-GENE-090504-2'], 'genotype': ['ZDB-GENO-010426-2', 'ZDB-GENO-010427-3', 'ZDB-GENO-010427-4', 'ZDB-GENO-050209-30', 'ZDB-GENO-051018-1', 'ZDB-GENO-070209-80', 'ZDB-GENO-070215-11', 'ZDB-GENO-070215-12', 'ZDB-GENO-070228-3', 'ZDB-GENO-070406-1', 'ZDB-GENO-070712-5', 'ZDB-GENO-070917-2', 'ZDB-GENO-080328-1', 'ZDB-GENO-080418-2', 'ZDB-GENO-080516-8', 'ZDB-GENO-080606-609', 'ZDB-GENO-080701-2', 'ZDB-GENO-080713-1', 'ZDB-GENO-080729-2', 'ZDB-GENO-080804-4', 'ZDB-GENO-080825-3', 'ZDB-GENO-091027-1', 'ZDB-GENO-091027-2', 'ZDB-GENO-091109-1', 'ZDB-GENO-100325-3', 'ZDB-GENO-100325-4', 'ZDB-GENO-100325-5', 'ZDB-GENO-100325-6', 'ZDB-GENO-100524-2', 'ZDB-GENO-100601-2', 'ZDB-GENO-100910-1', 'ZDB-GENO-111025-3', 'ZDB-GENO-120522-18', 'ZDB-GENO-121210-1', 'ZDB-GENO-130402-5', 'ZDB-GENO-980410-268', 'ZDB-GENO-080307-1', 'ZDB-GENO-960809-7', 'ZDB-GENO-990623-3', 'ZDB-GENO-130603-1', 'ZDB-GENO-001127-3', 'ZDB-GENO-001129-1', 'ZDB-GENO-090203-8', 'ZDB-GENO-070209-1', 'ZDB-GENO-070118-1', 'ZDB-GENO-140529-1', 'ZDB-GENO-070820-1', 'ZDB-GENO-071127-3', 'ZDB-GENO-000209-20', 'ZDB-GENO-980202-1565', 'ZDB-GENO-010924-10', 'ZDB-GENO-010531-2', 'ZDB-GENO-090504-5', 'ZDB-GENO-070215-11', 'ZDB-GENO-121221-1'], 'morpholino': ['ZDB-MRPHLNO-041129-1', 'ZDB-MRPHLNO-041129-2', 'ZDB-MRPHLNO-041129-3', 'ZDB-MRPHLNO-050308-1', 'ZDB-MRPHLNO-050308-3', 'ZDB-MRPHLNO-060508-2', 'ZDB-MRPHLNO-070118-1', 'ZDB-MRPHLNO-070522-3', 'ZDB-MRPHLNO-070706-1', 'ZDB-MRPHLNO-070725-1', 'ZDB-MRPHLNO-070725-2', 'ZDB-MRPHLNO-071005-1', 'ZDB-MRPHLNO-071227-1', 'ZDB-MRPHLNO-080307-1', 'ZDB-MRPHLNO-080428-1', 'ZDB-MRPHLNO-080430-1', 'ZDB-MRPHLNO-080919-4', 'ZDB-MRPHLNO-081110-3', 'ZDB-MRPHLNO-090106-5', 'ZDB-MRPHLNO-090114-1', 'ZDB-MRPHLNO-090505-1', 'ZDB-MRPHLNO-090630-11', 'ZDB-MRPHLNO-090804-1', 'ZDB-MRPHLNO-100728-1', 'ZDB-MRPHLNO-100823-6', 'ZDB-MRPHLNO-101105-3', 'ZDB-MRPHLNO-110323-3', 'ZDB-MRPHLNO-111104-5', 'ZDB-MRPHLNO-130222-4', 'ZDB-MRPHLNO-080430', 'ZDB-MRPHLNO-100823-6', 'ZDB-MRPHLNO-140822-1', 'ZDB-MRPHLNO-100520-4', 'ZDB-MRPHLNO-100520-5', 'ZDB-MRPHLNO-100920-3', 'ZDB-MRPHLNO-050604-1', 'ZDB-CRISPR-131113-1', 'ZDB-MRPHLNO-140430-12', 'ZDB-MRPHLNO-140430-13'], 'pub': ['PMID:11566854', 'PMID:12588855', 'PMID:12867027', 'PMID:14667409', 'PMID:15456722', 'PMID:16914492', 'PMID:17374715', 'PMID:17545503', 'PMID:17618647', 'PMID:17785424', 'PMID:18201692', 'PMID:18358464', 'PMID:18388326', 'PMID:18638469', 'PMID:18846223', 'PMID:19151781', 'PMID:19759004', 'PMID:19855021', 'PMID:20040115', 'PMID:20138861', 'PMID:20306498', 'PMID:20442775', 'PMID:20603019', 'PMID:21147088', 'PMID:21893049', 'PMID:21925157', 'PMID:22718903', 'PMID:22814753', 'PMID:22960038', 'PMID:22996643', 'PMID:23086717', 'PMID:23203810', 'PMID:23760954', 'ZFIN:ZDB-PUB-140303-33', 'ZFIN:ZDB-PUB-140404-9', 'ZFIN:ZDB-PUB-080902-16', 'ZFIN:ZDB-PUB-101222-7', 'ZFIN:ZDB-PUB-140614-2', 'ZFIN:ZDB-PUB-120927-26', 'ZFIN:ZDB-PUB-100504-5', 'ZFIN:ZDB-PUB-140513-341']}