dipper.sources.Monochrom module

class dipper.sources.Monochrom.Monochrom(graph_type, are_bnodes_skolemized, data_release_version=None, tax_ids=None)

Bases: dipper.sources.Source.Source

This class will leverage the GENO ontology and modeling patterns to build an ontology of chromosomes for any species. These classes represent major structural pieces of Chromosomes which are often universally referenced, using physical properties/observations that remain constant over different genome builds (such as banding patterns and arms). The idea is to create a scaffold upon which we can hang build-specific chromosomal coordinates, and reason across them.

In general, this will take the cytogenic bands files from UCSC, and create missing grouping classes, in order to build the partonomy from a very specific chromosomal band up through the chromosome itself and enable overlap and containment queries. We use RO:subsequence_of as our relationship between nested chromosomal parts. For example, 13q21.31 ==> 13q21.31, 13q21.3, 13q21, 13q2, 13q, 13

At the moment, this only computes the bands for Human, Mouse, Zebrafish, and Rat but will be expanding in the future as needed.

Because this is a universal framework to represent the chromosomal structure of any species, we must mint identifiers for each chromosome and part. (note: in truth we create blank nodes and then pretend they are something else. TEC)

We differentiate species by first creating a species-specific genome, then for each species-specific chromosome we include the NCBI taxon number together with the chromosome number, like: `<species number>chr<num><band>`. For 13q21.31, this would be 9606chr13q21.31. We then create triples for a given band like: <pre> CHR:9606chr1p36.33 rdf[type] SO:chromosome_band CHR:9606chr1p36 subsequence_of :9606chr1p36.3 </pre> where any band in the file is an instance of a chr_band (or a more specific type), is a subsequence of it’s containing region.

We determine the containing regions of the band by parsing the band-string; since each alphanumeric is a significant “place”, we can split it with the shorter strings being parents of the longer string

Since this is small, and we have limited other items in our test set to a small region, we simply use the whole graph (genome) for testing purposes, and copy the main graph to the test graph.

Since this Dipper class is building an ONTOLOGY, rather than instance-level data, we must also include domain and range constraints, and other owl-isms.

TODO: any species by commandline argument

We are currently mapping these to the CHR idspace, but this is NOT YET APPROVED and is subject to change.

fetch(is_dl_forced=False)

abstract method to fetch all data from an external resource. this should be overridden by subclasses :return: None

files = {'10090': {'build_num': 'mm10', 'file': '10090cytoBand.txt.gz', 'genome_label': 'Mouse', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/mm10/database/cytoBandIdeo.txt.gz'}, '10116': {'build_num': 'rn6', 'file': '10116cytoBand.txt.gz', 'genome_label': 'Rat', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/rn6/database/cytoBandIdeo.txt.gz'}, '7955': {'build_num': 'danRer10', 'file': '7955cytoBand.txt.gz', 'genome_label': 'Zebrafish', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/danRer10/database/cytoBandIdeo.txt.gz'}, '9031': {'build_num': 'galGal4', 'file': 'galGal4cytoBand.txt.gz', 'genome_label': 'chicken', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/galGal4/database/cytoBandIdeo.txt.gz'}, '9606': {'build_num': 'hg19', 'file': '9606cytoBand.txt.gz', 'genome_label': 'Human', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/cytoBand.txt.gz'}, '9796': {'build_num': 'equCab2', 'file': 'equCab2cytoBand.txt.gz', 'genome_label': 'horse', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/equCab2/database/cytoBandIdeo.txt.gz'}, '9823': {'build_num': 'susScr3', 'file': 'susScr3cytoBand.txt.gz', 'genome_label': 'pig', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/susScr3/database/cytoBandIdeo.txt.gz'}, '9913': {'build_num': 'bosTau7', 'file': 'bosTau7cytoBand.txt.gz', 'genome_label': 'cow', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/bosTau7/database/cytoBandIdeo.txt.gz'}, '9940': {'build_num': 'oviAri3', 'file': 'oviAri3cytoBand.txt.gz', 'genome_label': 'sheep', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/oviAri3/database/cytoBandIdeo.txt.gz'}}
getTestSuite()

An abstract method that should be overwritten with tests appropriate for the specific source. :return:

make_parent_bands(band, child_bands)

this will determine the grouping bands that it belongs to, recursively 13q21.31 ==> 13, 13q, 13q2, 13q21, 13q21.3, 13q21.31

Parameters:
  • band
  • child_bands
Returns:

map_type_of_region(regiontype)

Note that “stalk” refers to the short arm of acrocentric chromosomes chr13,14,15,21,22 for human. :param regiontype: :return:

parse(limit=None)

abstract method to parse all data from an external resource, that was fetched in fetch() this should be overridden by subclasses :return: None

dipper.sources.Monochrom.getChrPartTypeByNotation(notation, graph)

This method will figure out the kind of feature that a given band is based on pattern matching to standard karyotype notation. (e.g. 13q22.2 ==> chromosome sub-band)

This has been validated against human, mouse, fish, and rat nomenclature. :param notation: the band (without the chromosome prefix) :return: