dipper.sources.Coriell module

class dipper.sources.Coriell.Coriell(graph_type, are_bnodes_skolemized, data_release_version=None)

Bases: dipper.sources.Source.Source

The Coriell Catalog provided to Monarch includes metadata and descriptions of NIGMS, NINDS, NHGRI, and NIA cell lines. These lines are made available for research purposes. Here, we create annotations for the cell lines as models of the diseases from which they originate.

We create a handle for a patient from which the given cell line is derived (since there may be multiple cell lines created from a given patient). A genotype is assembled for a patient, which includes a karyotype (if specified) and/or a collection of variants. Both the genotype (has_genotype) and disease are linked to the patient (has_phenotype), and the cell line is listed as derived from the patient. The cell line is classified by it’s [CLO cell type](http://www.ontobee.org/browser/index.php?o=clo), which itself is linked to a tissue of origin.

Unfortunately, the omim numbers listed in this file are both for genes & diseases; we have no way of knowing a priori if a designated omim number is a gene or disease; so we presently link the patient to any omim id via the has_phenotype relationship.

Notice: The Coriell catalog is delivered to Monarch in a specific format, and requires ssh rsa fingerprint identification. Other groups wishing to get this data in it’s raw form will need to contact Coriell for credential This needs to be placed into your configuration file for it to work.

column_labels = ['catalog_id', 'description', 'omim_num', 'sample_type', 'cell_line_available', 'dna_instock', 'dna_ref', 'gender', 'age', 'race', 'ethnicity', 'affected', 'karyotype', 'relprob', 'mutation', 'gene', 'fam', 'collection', 'url', 'cat_remark', 'pubmed_ids', 'fammember', 'variant_id', 'dbsnp_id', 'species']
fetch(is_dl_forced=False)

Here we connect to the coriell sftp server using private connection details. They dump bi-weekly files with a timestamp in the filename. For each catalog, we ping the remote site and pull the most-recently updated file, renaming it to our local latest.csv.

Be sure to have pg user/password connection details in your conf.yaml file, like: dbauth : {“coriell” : { “user” : “<username>”, “password” : “<password>”, “host” : <host>, “private_key”=path/to/rsa_key} }

Parameters:is_dl_forced
Returns:
files = {'NHGRI': {'columns': ['catalog_id', 'description', 'omim_num', 'sample_type', 'cell_line_available', 'dna_instock', 'dna_ref', 'gender', 'age', 'race', 'ethnicity', 'affected', 'karyotype', 'relprob', 'mutation', 'gene', 'fam', 'collection', 'url', 'cat_remark', 'pubmed_ids', 'fammember', 'variant_id', 'dbsnp_id', 'species'], 'file': 'NHGRI.csv', 'id': 'NHGRI', 'label': 'NHGRI Sample Repository for Human Genetic Research', 'page': 'https://catalog.coriell.org/1/NHGRI'}, 'NIA': {'columns': ['catalog_id', 'description', 'omim_num', 'sample_type', 'cell_line_available', 'dna_instock', 'dna_ref', 'gender', 'age', 'race', 'ethnicity', 'affected', 'karyotype', 'relprob', 'mutation', 'gene', 'fam', 'collection', 'url', 'cat_remark', 'pubmed_ids', 'fammember', 'variant_id', 'dbsnp_id', 'species'], 'file': 'NIA.csv', 'id': 'NIA', 'label': 'NIA Aging Cell Repository', 'page': 'https://catalog.coriell.org/1/NIA'}, 'NIGMS': {'columns': ['catalog_id', 'description', 'omim_num', 'sample_type', 'cell_line_available', 'dna_instock', 'dna_ref', 'gender', 'age', 'race', 'ethnicity', 'affected', 'karyotype', 'relprob', 'mutation', 'gene', 'fam', 'collection', 'url', 'cat_remark', 'pubmed_ids', 'fammember', 'variant_id', 'dbsnp_id', 'species'], 'file': 'NIGMS.csv', 'id': 'NIGMS', 'label': 'NIGMS Human Genetic Cell Repository', 'page': 'https://catalog.coriell.org/1/NIGMS'}, 'NINDS': {'columns': ['catalog_id', 'description', 'omim_num', 'sample_type', 'cell_line_available', 'dna_instock', 'dna_ref', 'gender', 'age', 'race', 'ethnicity', 'affected', 'karyotype', 'relprob', 'mutation', 'gene', 'fam', 'collection', 'url', 'cat_remark', 'pubmed_ids', 'fammember', 'variant_id', 'dbsnp_id', 'species'], 'file': 'NINDS.csv', 'id': 'NINDS', 'label': 'NINDS Human Genetics DNA and Cell line Repository', 'page': 'https://catalog.coriell.org/1/NINDS'}}
getTestSuite()

An abstract method that should be overwritten with tests appropriate for the specific source. :return:

parse(limit=None)

abstract method to parse all data from an external resource, that was fetched in fetch() this should be overridden by subclasses :return: None

test_lines = ['ND02380', 'ND02381', 'ND02383', 'ND02384', 'GM17897', 'GM17898', 'GM17896', 'GM17944', 'GM17945', 'ND00055', 'ND00094', 'ND00136', 'GM17940', 'GM17939', 'GM20567', 'AG02506', 'AG04407', 'AG07602AG07601', 'GM19700', 'GM19701', 'GM19702', 'GM00324', 'GM00325', 'GM00142', 'NA17944', 'AG02505', 'GM01602', 'GM02455', 'AG00364', 'GM13707', 'AG00780']