dipper.sources.WormBase module

class dipper.sources.WormBase.WormBase(graph_type, are_bnodes_skolemized, data_release_version=None)

Bases: dipper.sources.Source.Source

This is the parser for the [C. elegans Model Organism Database (WormBase)](http://www.wormbase.org), from which we process genotype and phenotype data for laboratory worms (C.elegans and other nematodes).

We generate the wormbase graph to include the following information: * genes * sequence alterations (includes SNPs/del/ins/indel and

large chromosomal rearrangements)
  • RNAi as expression-affecting reagents
  • genotypes, and their components
  • strains
  • publications (and their mapping to PMIDs, if available)
  • allele-to-phenotype associations (including variants by RNAi)
  • genetic positional information for genes and sequence alterations

Genotypes leverage the GENO genotype model and includes both intrinsic and extrinsic genotypes. Where necessary, we create anonymous nodes of the genotype partonomy (i.e. for variant single locus complements, genomic variation complements, variant loci, extrinsic genotypes, and extrinsic genotype parts).

TODO: get people and gene expression

fetch(is_dl_forced=False)

abstract method to fetch all data from an external resource. this should be overridden by subclasses :return: None

files = {'allele_pheno': {'columns': ['DB', 'DB Object ID', 'DB Object Symbol', 'Qualifier', 'GO ID', 'DB:Reference (|DB:Reference)', 'Evidence Code', 'With (or) From', 'Aspect', 'DB Object NameDB Object Synonym (|Synonym)', 'DB Object Type', 'Taxon(|taxon)', 'Date', 'Assigned By', 'Annotation Extension', 'Gene Product Form ID'], 'file': 'phenotype_association.wb', 'url': 'ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/ONTOLOGY/phenotype_association.WSNUMBER.wb'}, 'checksums': {'file': 'CHECKSUMS', 'url': 'ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/CHECKSUMS'}, 'disease_assoc': {'columns': ['DB', 'DB Object ID', 'DB Object Symbol', 'Qualifier', 'GO ID', 'DB:Reference (|DB:Reference)', 'Evidence Code', 'With (or) From', 'Aspect', 'DB Object NameDB Object Synonym (|Synonym)', 'DB Object Type', 'Taxon(|taxon)', 'Date', 'Assigned By', 'Annotation Extension', 'Gene Product Form ID'], 'file': 'disease_association.wb', 'url': 'ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/ONTOLOGY/disease_association.WSNUMBER.wb'}, 'feature_loc': {'columns': ['seqid', 'source', 'type', 'start', 'end', 'score', 'strand', 'phase', 'attributes'], 'file': 'c_elegans.PRJNA13758.annotations.gff3.gz', 'url': 'ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/species/c_elegans/PRJNA13758/c_elegans.PRJNA13758.WSNUMBER.annotations.gff3.gz'}, 'gaf-eco-mapping': {'file': 'gaf-eco-mapping.yaml', 'url': 'https://archive.monarchinitiative.org/DipperCache/wormbase/gaf-eco-mapping.yaml'}, 'gene_ids': {'columns': ['taxon_num', 'gene_num', 'gene_symbol', 'gene_synonym', 'live', 'gene_type'], 'file': 'c_elegans.PRJNA13758.geneIDs.txt.gz', 'url': 'ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/species/c_elegans/PRJNA13758/annotation/c_elegans.PRJNA13758.WSNUMBER.geneIDs.txt.gz'}, 'letter': {'file': 'letter', 'url': 'ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/letter.WSNUMBER'}, 'pub_xrefs': {'columns': ['wb_ref', 'xref'], 'file': 'pub_xrefs.txt', 'url': 'http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=WpaXref'}, 'rnai_pheno': {'columns': ['gene_num', 'gene_alt_symbol', 'phenotype_label', 'phenotype_id', 'rnai_and_refs'], 'file': 'rnai_phenotypes.wb', 'url': 'ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/ONTOLOGY/rnai_phenotypes.WSNUMBER.wb'}, 'xrefs': {'file': 'c_elegans.PRJNA13758.xrefs.txt.gz', 'url': 'ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/species/c_elegans/PRJNA13758/annotation/c_elegans.PRJNA13758.WSNUMBER.xrefs.txt.gz'}}
static make_reagent_targeted_gene_id(gene_id, reagent_id)
parse(limit=None)

abstract method to parse all data from an external resource, that was fetched in fetch() this should be overridden by subclasses :return: None

process_allele_phenotype(limit=None)

This file compactly lists variant to phenotype associations, such that in a single row, there may be >1 variant listed per phenotype and paper. This indicates that each variant is individually assocated with the given phenotype, as listed in 1+ papers. (Not that the combination of variants is producing the phenotype.) :param limit: :return:

process_disease_association(limit)
process_feature_loc(limit)
process_gene_desc(limit)
process_gene_ids(limit)
process_gene_interaction(limit)

The gene interaction file includes identified interactions, that are between two or more gene (products). In the case of interactions with >2 genes, this requires creating groups of genes that are involved in the interaction. From the wormbase help list: In the example WBInteraction000007779 it would likely be misleading to suggest that lin-12 interacts with (suppresses in this case) smo-1 ALONE or that lin-12 suppresses let-60 ALONE; the observation in the paper; see Table V in paper PMID:15990876 was that a lin-12 allele (heterozygous lin-12(n941/+)) could suppress the “multivulva” phenotype induced synthetically by simultaneous perturbation of BOTH smo-1 (by RNAi) AND let-60 (by the n2021 allele). So this is necessarily a three-gene interaction.

Therefore, we can create groups of genes based on their “status” of Effector | Effected.

Status: IN PROGRESS

Parameters:limit
Returns:
process_pub_xrefs(limit=None)
process_rnai_phenotypes(limit=None)
species = '/species/c_elegans/PRJNA13758'
update_wsnum_in_files(vernum)

With the given version number `vernum`, update the source’s version number, and replace in the file hashmap. the version number is in the CHECKSUMS file. :param vernum: :return:

wbdev = 'ftp://ftp.wormbase.org/pub/wormbase/releases/current-development-release'
wbprod = 'ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release'
wbrel = 'ftp://ftp.wormbase.org/pub/wormbase/releases'