dipper.sources.UCSCBands module

class dipper.sources.UCSCBands.UCSCBands(graph_type, are_bnodes_skolemized, data_release_version=None, tax_ids=None)

Bases: dipper.sources.Source.Source

This will take the UCSC defintions of cytogenic bands and create the nested structures to enable overlap and containment queries. We use `Monochrom.py` to create the OWL-classes of the chromosomal parts. Here, we simply worry about the instance-level values for particular genome builds.

Given a chr band definition, the nested containment structures look like: 13q21.31 ==> 13q21.31, 13q21.3, 13q21, 13q2, 13q, 13

We determine the containing regions of the band by parsing the band-string; since each alphanumeric is a significant “place”, we can split it with the shorter strings being parents of the longer string. # Here we create build-specific chroms, which are instances of the classes produced from `Monochrom.py`. You can instantiate any number of builds for a genome.

We leverage the Faldo model here for region definitions, and map each of the chromosomal parts to SO.

We differentiate the build by adding the build id to the identifier prior to the chromosome number. These then are instances of the species-specific chromosomal class.

The build-specific chromosomes are created like: <pre> <build number>chr<num><band> with triples for a given band like: _:hg19chr1p36.33

rdf:type SO:chromosome_band, faldo:Region, CHR:9606chr1p36.33, subsequence_of _:hg19chr1p36.3, faldo:location [ a faldo:BothStrandPosition

faldo:begin 0, faldo:end 2300000, faldo:reference ‘hg19’

] .

</pre> where any band in the file is an instance of a chr_band (or a more specific type), is a subsequence of it’s containing region, and is located in the specified coordinates.

We do not have a separate graph for testing.

TODO: any species by commandline argument

HGGP = 'http://hgdownload.cse.ucsc.edu/goldenPath'
cytobandideo_columns = ['chrom', 'chromStart', 'chromEnd', 'name', 'gieStain']
fetch(is_dl_forced=False)

abstract method to fetch all data from an external resource. this should be overridden by subclasses :return: None

files = {'10090': {'assembly': ('UCSC:mm10', 'UCSC:mm9'), 'build_num': 'mm10', 'columns': ['chrom', 'chromStart', 'chromEnd', 'name', 'gieStain'], 'file': 'mm10_cytoBandIdeo.txt.gz', 'genome_label': 'Mouse', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/mm10/database/cytoBandIdeo.txt.gz'}, '7955': {'assembly': ('UCSC:danRer11', 'UCSC:danRer10', 'UCSC:danRer7', 'UCSC:danRer6'), 'build_num': 'danRer11', 'columns': ['chrom', 'chromStart', 'chromEnd', 'name', 'gieStain'], 'file': 'danRer11_cytoBandIdeo.txt.gz', 'genome_label': 'Zebrafish', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/danRer11/database/cytoBandIdeo.txt.gz'}, '9031': {'assembly': ('UCSC:galGal4', 'UCSC:galGal6'), 'build_num': 'galGal6', 'columns': ['chrom', 'chromStart', 'chromEnd', 'name', 'gieStain'], 'file': 'galGal6_cytoBandIdeo.txt.gz', 'genome_label': 'chicken', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/galGal6/database/cytoBandIdeo.txt.gz'}, '9606': {'assembly': ('UCSC:hg38', 'UCSC:hg19', 'UCSC:hg18', 'UCSC:hg17', 'UCSC:hg16', 'UCSC:hg15'), 'build_num': 'hg19', 'columns': ['chrom', 'chromStart', 'chromEnd', 'name', 'gieStain'], 'file': 'hg19_cytoBand.txt.gz', 'genome_label': 'Human', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/cytoBand.txt.gz'}, '9615': {'assembly': ('UCSC:canFam3',), 'build_num': 'canFam3', 'columns': ['chrom', 'chromStart', 'chromEnd', 'name', 'gieStain'], 'file': 'canFam3_cytoBandIdeo.txt.gz', 'genome_label': 'dog', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/canFam3/database/cytoBandIdeo.txt.gz'}, '9685': {'assembly': ('UCSC:felCat9',), 'build_num': 'felCat9', 'columns': ['chrom', 'chromStart', 'chromEnd', 'name', 'gieStain'], 'file': 'felCat9_cytoBandIdeo.txt.gz', 'genome_label': 'cat', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/felCat9/database/cytoBandIdeo.txt.gz'}, '9796': {'assembly': ('UCSC:equCab2', 'UCSC:equCab3'), 'build_num': 'equCab2', 'columns': ['chrom', 'chromStart', 'chromEnd', 'name', 'gieStain'], 'file': 'equCab2_cytoBandIdeo.txt.gz', 'genome_label': 'horse', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/equCab2/database/cytoBandIdeo.txt.gz'}, '9823': {'assembly': ('UCSC:susScr3', 'UCSC:susScr11'), 'build_num': 'susScr11', 'columns': ['chrom', 'chromStart', 'chromEnd', 'name', 'gieStain'], 'file': 'susScr11_cytoBandIdeo.txt.gz', 'genome_label': 'pig', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/susScr11/database/cytoBandIdeo.txt.gz'}, '9913': {'assembly': ('UCSC:bosTau7',), 'build_num': 'bosTau7', 'columns': ['chrom', 'chromStart', 'chromEnd', 'name', 'gieStain'], 'file': 'bosTau7_cytoBandIdeo.txt.gz', 'genome_label': 'cow', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/bosTau7/database/cytoBandIdeo.txt.gz'}, '9940': {'assembly': ('UCSC:oviAri3', 'UCSC:oviAri4'), 'build_num': 'oviAri4', 'columns': ['chrom', 'chromStart', 'chromEnd', 'name', 'gieStain'], 'file': 'oviAri4_cytoBandIdeo.txt.gz', 'genome_label': 'sheep', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/oviAri4/database/cytoBandIdeo.txt.gz'}}
getTestSuite()

An abstract method that should be overwritten with tests appropriate for the specific source. :return:

parse(limit=None)

abstract method to parse all data from an external resource, that was fetched in fetch() this should be overridden by subclasses :return: None