dipper.sources.ClinVar module

class dipper.sources.ClinVar.ClinVar(graph_type, are_bnodes_skolemized, tax_ids=None, gene_ids=None)

Bases: dipper.sources.Source.Source

ClinVar is a host of clinically relevant variants, both directly-submitted and curated from the literature. We process the variant_summary file here, which is a digested version of their full xml. We add all variants (and coordinates/build) from their system.

fetch(is_dl_forced=False)

abstract method to fetch all data from an external resource. this should be overridden by subclasses :return: None

files = {'variant_citations': {'url': 'http://ftp.ncbi.nlm.nih.gov/pub/clinvar/tab_delimited/var_citations.txt', 'file': 'variant_citations.txt'}, 'variant_summary': {'url': 'http://ftp.ncbi.nlm.nih.gov/pub/clinvar/tab_delimited/variant_summary.txt.gz', 'file': 'variant_summary.txt.gz'}}
getTestSuite()

An abstract method that should be overwritten with tests appropriate for the specific source. :return:

parse(limit=None)

abstract method to parse all data from an external resource, that was fetched in fetch() this should be overridden by subclasses :return: None

scrub()

The var_citations file has a bad row in it with > 6 cols. I will comment these out.

Returns:
variant_ids = [4288, 4289, 4290, 4291, 4297, 5240, 5241, 5242, 5243, 5244, 5245, 5246, 7105, 8877, 9295, 9296, 9297, 9298, 9449, 10072, 10361, 10382, 12528, 12529, 12530, 12531, 12532, 14353, 14823, 15872, 17232, 17233, 17234, 17235, 17236, 17237, 17238, 17239, 17284, 17285, 17286, 17287, 18179, 18180, 18181, 18343, 18363, 31951, 37123, 38562, 94060, 98004, 98005, 98006, 98008, 98009, 98194, 98195, 98196, 98197, 98198, 100055, 112885, 114372, 119244, 128714, 130558, 130559, 130560, 130561, 132146, 132147, 132148, 144375, 146588, 147536, 147814, 147936, 152976, 156327, 161457, 162000, 167132]