dipper.models.GenomicFeature module

class dipper.models.GenomicFeature.Feature(graph, feature_id=None, label=None, feature_type=None, description=None)

Bases: object

Dealing with genomic features here. By default they are all faldo:Regions. We use SO for typing genomic features. At the moment, RO:has_subsequence is the default relationship between the regions, but this should be tested/verified.

TODO: the graph additions are in the addXToFeature functions, but should be separated. TODO: this will need to be extended to properly deal with fuzzy positions in faldo.

addFeatureEndLocation(coordinate, reference_id, strand=None, position_types=None)

Adds the coordinate details for the end of this feature :param coordinate: :param reference_id: :param strand:

Returns:
addFeatureProperty(property_type, property)
addFeatureStartLocation(coordinate, reference_id, strand=None, position_types=None)

Adds coordinate details for the start of this feature. :param coordinate: :param reference_id: :param strand: :param position_types:

Returns:
addFeatureToGraph(add_region=True, region_id=None, feature_as_class=False)

We make the assumption here that all features are instances. The features are located on a region, which begins and ends with faldo:Position The feature locations leverage the Faldo model, which has a general structure like: Triples: feature_id a feature_type (individual) faldo:location region_id region_id a faldo:region faldo:begin start_position faldo:end end_position start_position a (any of: faldo:(((Both|Plus|Minus)Strand)|Exact)Position) faldo:position Integer(numeric position) faldo:reference reference_id end_position a (any of: faldo:(((Both|Plus|Minus)Strand)|Exact)Position) faldo:position Integer(numeric position) faldo:reference reference_id

Parameters:graph
Returns:
addPositionToGraph(reference_id, position, position_types=None, strand=None)

Add the positional information to the graph, following the faldo model. We assume that if the strand is None, we give it a generic “Position” only. Triples: my_position a (any of: faldo:(((Both|Plus|Minus)Strand)|Exact)Position) faldo:position Integer(numeric position) faldo:reference reference_id

Parameters:
  • graph
  • reference_id
  • position
  • position_types
  • strand
Returns:

Identifier of the position created

addRegionPositionToGraph(region_id, begin_position_id, end_position_id)
addSubsequenceOfFeature(parentid)

This will add reciprocal triples like: feature is_subsequence_of parent parent has_subsequence feature :param graph: :param parentid:

Returns:
addTaxonToFeature(taxonid)

Given the taxon id, this will add the following triple: feature in_taxon taxonid :param graph: :param taxonid: :return:

annotation_properties = {}
data_properties = {'position': 'faldo:position'}
object_properties = {'begin': 'faldo:begin', 'downstream_of_sequence_of': 'RO:0002529', 'end': 'faldo:end', 'gene_product_of': 'RO:0002204', 'has_gene_product': 'RO:0002205', 'has_staining_intensity': 'GENO:0000207', 'has_subsequence': 'RO:0002524', 'is_about': 'IAO:0000136', 'is_subsequence_of': 'RO:0002525', 'location': 'faldo:location', 'reference': 'faldo:reference', 'upstream_of_sequence_of': 'RO:0002528'}
properties = {'begin': 'faldo:begin', 'downstream_of_sequence_of': 'RO:0002529', 'end': 'faldo:end', 'gene_product_of': 'RO:0002204', 'has_gene_product': 'RO:0002205', 'has_staining_intensity': 'GENO:0000207', 'has_subsequence': 'RO:0002524', 'is_about': 'IAO:0000136', 'is_subsequence_of': 'RO:0002525', 'location': 'faldo:location', 'position': 'faldo:position', 'reference': 'faldo:reference', 'upstream_of_sequence_of': 'RO:0002528'}
types = {'FuzzyPosition': 'faldo:FuzzyPosition', 'Position': 'faldo:Position', 'SNP': 'SO:0000694', 'assembly_component': 'SO:0000143', 'band_intensity': 'GENO:0000618', 'both_strand': 'faldo:BothStrandPosition', 'centromere': 'SO:0000577', 'chromosome': 'SO:0000340', 'chromosome_arm': 'SO:0000105', 'chromosome_band': 'SO:0000341', 'chromosome_part': 'SO:0000830', 'chromosome_region': 'GENO:0000614', 'chromosome_subband': 'GENO:0000616', 'genome': 'SO:0001026', 'gneg': 'GENO:0000620', 'gpos': 'GENO:0000619', 'gpos100': 'GENO:0000622', 'gpos25': 'GENO:0000625', 'gpos33': 'GENO:0000633', 'gpos50': 'GENO:0000624', 'gpos66': 'GENO:0000632', 'gpos75': 'GENO:0000623', 'gvar': 'GENO:0000621', 'haplotype': 'GENO:0000871', 'long_chromosome_arm': 'GENO:0000629', 'minus_strand': 'faldo:MinusStrandPosition', 'plus_strand': 'faldo:PlusStrandPosition', 'reference_genome': 'SO:0001505', 'region': 'faldo:Region', 'score': 'SO:0001685', 'short_chromosome_arm': 'GENO:0000628'}
dipper.models.GenomicFeature.makeChromID(chrom, reference=None, prefix=None)

This will take a chromosome number and a NCBI taxon number, and create a unique identifier for the chromosome. These identifiers are made in the @base space like: Homo sapiens (9606) chr1 ==> :9606chr1 Mus musculus (10090) chrX ==> :10090chrX

Parameters:
  • chrom – the chromosome (preferably without any chr prefix)
  • reference – the numeric portion of the taxon id
Returns:

dipper.models.GenomicFeature.makeChromLabel(chrom, reference=None)