dipper.utils.DipperUtil module

class dipper.utils.DipperUtil.DipperUtil

Bases: object

Various utilities and quick methods used in this application

(A little too quick) Per: https://www.ncbi.nlm.nih.gov/books/NBK25497/ NCBI recommends that users post

no more than three URL requests per second and limit large jobs to either weekends or between 9:00 PM and 5:00 AM Eastern time during weekdays

restructuring to make bulk queries is less likely to result in another ban for peppering them with one offs

static get_homologene_by_gene_num(gene_num)
static get_ncbi_id_from_symbol(gene_symbol)

Get ncbi gene id from symbol using monarch and mygene services :param gene_symbol: :return:

static get_ncbi_taxon_num_by_label(label)

Here we want to look up the NCBI Taxon id using some kind of label. It will only return a result if there is a unique hit.

Returns:
static is_omim_disease(gene_id)

Process omim equivalencies by examining the monarch ontology scigraph As an alternative we could examine mondo.owl, since the ontology scigraph imports the output of this script which creates an odd circular dependency (even though we’re querying mondo.owl through scigraph)

Parameters:
  • graph – rdfLib graph object
  • gene_id – ncbi gene id as curie
  • omim_id – omim id as curie
Returns:

None

remove_control_characters(s)
Filters out charcters in any of these unicode catagories [Cc] Other, Control ( 65 characters)
, …
[Cf] Other, Format (151 characters) [Cn] Other, Not Assigned ( 0 characters – none have this property) [Co] Other, Private Use ( 6 characters) [Cs] Other, Surrogate ( 6 characters)