Welcome to Dipper’s documentation!

Dipper is a Python package to generate RDF triples from common scientific resources. Dipper includes subpackages and modules to create graphical models of this data, including:

  • Models package for generating common sets of triples, including common OWL axioms, complex genotypes, associations, evidence and provenance models.
  • Graph package for building graphs with RDFLib or streaming n-triples
  • Source package containing fetchers and parsers that interface with remote databases and web services

Getting started

Installing, running, and the basics

Installation

Dipper requires Python version 3.5 or higher

Install with pip:

pip install dipper

Development version

The development version can be pulled from GitHub.

pip3 install git+git://github.com/monarch-initiative/dipper.git

Building locally

git clone https://github.com/monarch-initiative/dipper.git
cd dipper
pip install .

Alternatively, a subset of source specific requirements may be downloaded. To download the core requirements:

pip install -r requirements.txt

To download source specific requirements use the requirements/ directory, for example:

pip install -r requirements/mgi.txt

To download requirements for all sources:

pip install -r requirements/all-sources.txt

Getting started with Dipper

This guide assumes you have already installed dipper. If not, then follow the steps in the Installation section.

Command line

You can run the code by supplying a list of one or more sources on the command line. some examples:

dipper-etl.py --sources impc,hpoa

Furthermore, you can check things out by supplying a limit. this will only process the first N number of rows or data elements:

dipper-etl.py --sources hpoa --limit 100

Other command line parameters are explained if you request help:

dipper-etl.py --help

Notebooks

We provide Jupyter Notebooks to illustrate the functionality of the python library. These can also be used interactively.

See the Notebooks section for more details.

Building models

This code example shows some of the basics of building RDF graphs using the model API:

import pandas as pd
from dipper.graph.RDFGraph import RDFGraph
from dipper.models.Model import Model

columns = ['variant', 'variant_label', 'variant_type',
           'phenotype','relation', 'source', 'evidence', 'dbxref']

data =  [
     ['ClinVarVariant:254143', 'C326F', 'SO:0000694',
      'HP:0000504','RO:0002200', 'PMID:12503095', 'ECO:0000220',
      'dbSNP:886037891']
]

# Initialize graph and model
graph = RDFGraph()
model = Model(graph)

# Read file
dataframe = pd.DataFrame(data=data, columns=columns)

for index, row in dataframe.iterrows():
   # Add the triple ClinVarVariant:254143 RO:0002200 HP:0000504
   # RO:0002200 is the has_phenotype relation
   # HP:0000748 is the phenotype 'Inappropriate laughter'
   model.addTriple(row['variant'], row['relation'], row['phenotype'])

   # The addLabel method adds a label using the rdfs:label relation
   model.addLabel(row['variant'], row['variant_label'])

   # addType makes the variant an individual of a class,
   # in this case SO:0000694 'SNP'
   model.addType(row['variant'], row['variant_type'])

   # addXref uses the relation OIO:hasDbXref
   model.addXref(row['variant'], row['dbxref'])

   # Serialize the graph as turtle
   print(graph.serialize(format='turtle').decode("utf-8"))

For more information see the Working with the model API section.

Notebooks

Jupyter notebook examples

We use Jupyter Notebooks

Available tutorials include:

Running jupyter locally

Follow the instructions for installing from GitHub in Installation. Then start a notebook browser with:

pip install jupyter
PYTHONPATH=. jupyter notebook ./docs/notebooks

Downloads

RDF

The dipper output is quality checked and released on a regular basis. The latest release can be found here:

The output from our development branch are made available here (may contain errors):

TSV

TSV downloads for common queries can be found here:

Neo4J

A dump of our Neo4J database that includes the output from dipper plus various ontologies:

A public version can be accessed via the SciGraph REST API:

Ingest status

We use Jenkins to periodically build each source. A dashboard containing the current status of each ingest can be found here:

Applications

Monarch Initiative

The Monarch application is powered in part by Dipper:

Owlsim

Annotations loaded into Owlsim are from the Dipper/SciGraph pipeline:

Deeper into Dipper

A look into the structure of the codebase and how to write ingests

Working with graphs

The Dipper graph package provides two graph implementations, a RDFGraph which is an extension of the RDFLib [1] Graph [2], and a StreamedGraph which prints triples to standard out in the ntriple format.

RDFGraphs

The RDFGraph class reads the curie_map.yaml file and converts strings formatted as curies to RDFLib URIRefs. Triples are added via the addTriple method, for example:

from dipper.graph.RDFGraph import RDFGraph

graph = RDFGraph()
graph.addTriple('foaf:John', 'foaf:knows', 'foaf:Joseph')

The graph can then be serialized in a variety of formats using the serialize method inherited from the parent RDFLib graph class [3]:

from dipper.graph.RDFGraph import RDFGraph

graph = RDFGraph()
graph.addTriple('foaf:John', 'foaf:knows', 'foaf:Joseph')
print(graph.serialize(format='turtle').decode("utf-8"))

# Or write to file
graph.serialize(destination="/path/to/output.ttl", format='turtle')

Prints:

@prefix OBO: <http://purl.obolibrary.org/obo/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

foaf:John foaf:knows foaf:Joseph .

When an object is a literal, set the object_is_literal param to True

from dipper.graph.RDFGraph import RDFGraph

graph = RDFGraph()
graph.addTriple('foaf:John', 'rdfs:label', 'John', object_is_literal=True)

Literal types can also be passed into the method:

from dipper.graph.RDFGraph import RDFGraph

graph = RDFGraph()
graph.addTriple(
    'foaf:John', 'foaf:age', 12,
     object_is_literal=True, literal_type="xsd:integer"
)

StreamedGraphs

StreamedGraphs print triples as they are processed by the addTriple method. This is useful for large sources where. The output should be sorted and uniquified as there is no checking for duplicate triples. For example:

from dipper.graph.StreamedGraph import StreamedGraph

graph = StreamedGraph()
graph.addTriple('foaf:John', 'foaf:knows', 'foaf:Joseph')

Prints:

<http://xmlns.com/foaf/0.1/John> <http://xmlns.com/foaf/0.1/knows> <http://xmlns.com/foaf/0.1/Joseph> .

Working with the model API

The model package provides classes for building common sets of triples based on our modeling patterns.

For an example see the notebook on this topic: Building graphs with the model API

Basics

The model class provides methods for building common RDF and OWL statements

For a list of methods, see the API docs.

Building associations

We use the RDF Reification [1] pattern to create ternary statements, for example, adding frequency data to phenotype to disease associations. We utilize the Open Biomedical Association ontology [2] to reify statements, and the SEPIO ontology to add evidence and provenance.

For a list of classes and methods, see the API docs.

Building genotypes

We use the GENO ontology [4] to build complex genotypes and their parts.

For a list of methods, see the API docs.

GENO docs: The Genotype Ontology (GENO)

Building complex evidence and provenance graphs

We use the SEPIO ontology to build complex evidence and provenance. For an example see the IMPC source ingest.

For a list of methods, see the API docs for evidence and provenance.

SEPIO docs: The Scientific Evidence and Provenance Information Ontology (SEPIO)

Writing ingests with the source API

Overview

Although not required to write an ingest, we have provided a source parent class that can be extended to leverage reusable functionality in each ingest.

To create a new ingest using this method, first extend the Source class.

If the source contains flat files, include a files dictionary with this structure:

files = {
    'somekey': {
        'file': 'filename.tsv',
        'url': 'http://example.org/filename.tsv'
    },
    ...
}

For example:

from dipper.sources.Source import Source


class TPO(Source):
"""
The ToxicoPhenomicOmicsDB contains data on ...
"""

files = {
    'genes': {
        'file': 'genes.tsv',
        'url': 'http://example.org/genes.tsv'
    }
}

Initializing the class

Each source class takes a graph_type (string) and are_bnodes_skolemized (boolean) parameters. These parameters are used to initialize a graph object in the Source constructor.

Note: In the future this may be adjusted so that a graph object is passed into each source.

For example:

def __init__(self, graph_type, are_bnodes_skolemized):
    super().__init__(graph_type, are_bnodes_skolemized, 'TPO')

Writing the fetcher

This method is intended to fetch data from the remote locations (if it is newer than the local copy).

Extend the parent fetch function. If a the remote file has already been downloaded. The fetch method checks the remote headers to see if it has been updated. For sources not served over HTTP, this method may need to be overriden, for example in Bgee.

For example:

def fetch(self, is_dl_forced=False):
    """
    Fetches files from TPO

    :param is_dl_forced (bool): Force download
    :return None
    """
    self.get_files(is_dl_forced)

Writing the parser

Typically these are written by looping through the series of files that were obtained by the fetch method. The goal is to process each file minimally, adding classes and individuals as necessary, and adding triples to the sources’ graph.

For example:

def parse(self, limit=None):
    """
    Parses genes from TPO

    :param limit (int, optional) limit the number of rows processed
    :return None
    """
    if limit is not None:
        logger.info("Only parsing first %d rows", limit)

    # Open file
    fh = open('/'.join((self.rawdir, self.files['genes']['file'])), 'r')
    # Parse file
    self._add_gene_toxicology(fh, limit)
    # Close file
    fh.close()
Considerations when writing a parser

There are certain conventions that we follow when parsing data:

1. Genes are a special case of genomic feature that are added as (OWL) Classes. But all other genomic features are added as individuals of an owl class.

2. If a source references an external identifier then, assume that it has been processed in another source script, and only add the identifier (but not the label) to it within this source’s file. This will help prevent label collisions related to slightly different versions of the source data when integrating downstream.

3. You can instantiate a class or individual as many times as you want; they will get merged in the graph and will only show up once in the resulting output.

Testing ingests

Unit tests

Unit style tests can be achieved by mocking source classes (or specific functions) and testing single functions. The test_graph_equality function can be used to test graph equality by supplying a string formatted as headless (no prefixes) turtle and a graph object. Most dipper methods are not pure functions, and rely on side effects to a graph object. Therefore it is best to clear the graph object before any testing logic, eg:

from dipper.utils.TestUtils import TestUtils

source.graph = RDFGraph(True)  # Reset graph
test_util = TestUtils()
source.run_some_function()
expected_triples = """
    foaf:person1 foaf:knows foaf:person2 .
"""
self.assertTrue(self.test_util.test_graph_equality(
    expected_triples, source.graph))

Integration tests

Integration tests can be executed by generating a file that contains a subset of a source’s data in the same format, and running it through the source.parse() method, serializing the graph, and then testing this file in some other piece of code or database.

You may see testing code within source classes, but these tests will be deleted or refactored and moved to the test directory.

Configuring dipper with keys and passwords

Add private configuration parameters into your private conf.yaml file. Examples of items to put into the config include:

  • database connection parameters (in the “dbauth” object)
  • ftp login credentials
  • api keys (in the “keys” object)

These are organized such that within any object (dbauth, keys, etc), they are keyed again by the source’s name.

Here is an example:: {

“keys”: {
“omim”: “myomimkey1234”, “omimftp”: “myomimftpkey5678”, “ncbi”: “myncbikey”,

}, “dbauth”: {

“mgi”: {
‘user’: ‘mymgiuser’, ‘password’: ‘mymgipw’, ‘host’: ‘mgi-adhoc.jax.org’, ‘database’: ‘mgd’, ‘port’: 5432

}

}

}

This file must be placed in the dipper package directory and named conf.yaml. If building locally this is in the dipper/dipper/ directory. If installed with pip this will be in path/to/env/lib/python3.x/site-packages/dipper/ directory.

Schemas

Although RDF is inherently schemaless, we aim to construct consistent models across sources. This allows us to build source agnostic queries and bridge data across sources.

The dipper schemas are documented as directed graphs. Examples can be found in the ingest artifacts repo.

Some ontologies contain documentation on how to describe data using the classes and properties defined in the ontology:

While not yet implemented, in the future we plan on defining our schemas and constraints using the BioLink model specification.

The cypher queries that we use to cache inferred and direct relationships between entities are stored in GitHub.

For developers

API Docs

dipper package

Subpackages
dipper.graph package
Submodules
dipper.graph.Graph module
class dipper.graph.Graph.Graph

Bases: object

Graph class, used by RDFGraph and StreamedGraph

addTriple(subject_id, predicate_id, obj, object_is_literal=False, literal_type=None, subject_category=None, object_category=None)
curie_regexp = re.compile('^[a-zA-Z_]?[a-zA-Z_0-9-]*:[A-Za-z0-9_][A-Za-z0-9_.-]*[A-Za-z0-9_]*$')
serialize(**kwargs)
skolemizeBlankNode(curie)
dipper.graph.RDFGraph module
class dipper.graph.RDFGraph.RDFGraph(are_bnodes_skized=True, identifier=None)

Bases: dipper.graph.Graph.Graph, rdflib.graph.ConjunctiveGraph

Extends RDFLibs ConjunctiveGraph The goal of this class is wrap the creation of triples and manage creation of URIRef, Bnodes, and literals from an input curie

addTriple(subject_id, predicate_id, obj, object_is_literal=None, literal_type=None, subject_category=None, object_category=None)
bind_all_namespaces()

Results in the RDF @prefix directives for every ingest being added to this ingest.

curie_map = {'': 'https://monarchinitiative.org/', 'APB': 'http://pb.apf.edu.au/phenbank/strain.html?id=', 'APO': 'http://purl.obolibrary.org/obo/APO_', 'AQTLPub': 'https://www.animalgenome.org/cgi-bin/QTLdb/BT/qabstract?PUBMED_ID=', 'AQTLTrait': 'http://identifiers.org/animalqtltrait/', 'AspGD': 'http://www.aspergillusgenome.org/cgi-bin/locus.pl?dbid=', 'AspGD_REF': 'http://www.aspergillusgenome.org/cgi-bin/reference/reference.pl?dbid=', 'BFO': 'http://purl.obolibrary.org/obo/BFO_', 'BGD': 'http://bovinegenome.org/genepages/btau40/genes/', 'BIOGRID': 'http://thebiogrid.org/', 'BNODE': 'https://monarchinitiative.org/.well-known/genid/', 'BT': 'http://c.biothings.io/#', 'CCDS': 'http://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi?REQUEST=CCDS&DATA=', 'CGNC': 'http://birdgenenames.org/cgnc/GeneReport?id=', 'CHEBI': 'http://purl.obolibrary.org/obo/CHEBI_', 'CHR': 'http://purl.obolibrary.org/obo/CHR_', 'CID': 'http://pubchem.ncbi.nlm.nih.gov/compound/', 'CL': 'http://purl.obolibrary.org/obo/CL_', 'CLO': 'http://purl.obolibrary.org/obo/CLO_', 'CMMR': 'http://www.cmmr.ca/order.php?t=m&id=', 'CMO': 'http://purl.obolibrary.org/obo/CMO_', 'COHD': 'http://purl.obolibrary.org/obo/COHD_', 'COSMIC': 'http://cancer.sanger.ac.uk/cosmic/mutation/overview?id=', 'ClinVar': 'http://www.ncbi.nlm.nih.gov/clinvar/', 'ClinVarSubmitters': 'http://www.ncbi.nlm.nih.gov/clinvar/submitters/', 'ClinVarVariant': 'http://www.ncbi.nlm.nih.gov/clinvar/variation/', 'ComplexPortal': 'https://www.ebi.ac.uk/complexportal/complex/', 'Coriell': 'https://catalog.coriell.org/0/Sections/Search/Sample_Detail.aspx?Ref=', 'CoriellCollection': 'https://catalog.coriell.org/1/', 'CoriellFamily': 'https://catalog.coriell.org/0/Sections/BrowseCatalog/FamilyTypeSubDetail.aspx?fam=', 'CoriellIndividual': 'https://catalog.coriell.org/Search?q=', 'DC_CL': 'http://purl.obolibrary.org/obo/DC_CL', 'DECIPHER': 'https://decipher.sanger.ac.uk/syndrome/', 'DOI': 'http://dx.doi.org/', 'DOID': 'http://purl.obolibrary.org/obo/DOID_', 'DrugBank': 'http://www.drugbank.ca/drugs/', 'EC': 'https://www.enzyme-database.org/query.php?ec=', 'ECO': 'http://purl.obolibrary.org/obo/ECO_', 'EDAM-DATA': 'http://edamontology.org/data_', 'EFO': 'http://www.ebi.ac.uk/efo/EFO_', 'EMAPA': 'http://purl.obolibrary.org/obo/EMAPA_', 'EMMA': 'https://www.infrafrontier.eu/search?keyword=EM:', 'ENSEMBL': 'http://ensembl.org/id/', 'ENVO': 'http://purl.obolibrary.org/obo/ENVO_', 'EOM': 'https://elementsofmorphology.nih.gov/index.cgi?tid=', 'EOM_IMG': 'https://elementsofmorphology.nih.gov/images/terms/', 'ERO': 'http://purl.obolibrary.org/obo/ERO_', 'EcoGene': 'http://ecogene.org/gene/', 'EnsemblGenome': 'http://www.ensemblgenomes.org/id/', 'FBbt': 'http://purl.obolibrary.org/obo/FBbt_', 'FBcv': 'http://purl.obolibrary.org/obo/FBcv_', 'FBdv': 'http://purl.obolibrary.org/obo/FBdv_', 'FDADrug': 'http://www.fda.gov/Drugs/InformationOnDrugs/', 'FlyBase': 'http://flybase.org/reports/', 'GARD': 'http://purl.obolibrary.org/obo/GARD_', 'GENO': 'http://purl.obolibrary.org/obo/GENO_', 'GINAS': 'http://tripod.nih.gov/ginas/app/substance#', 'GO': 'http://purl.obolibrary.org/obo/GO_', 'GO_REF': 'http://www.geneontology.org/cgi-bin/references.cgi#GO_REF:', 'GWAS': 'https://www.ebi.ac.uk/gwas/variants/', 'GenBank': 'http://www.ncbi.nlm.nih.gov/nuccore/', 'Genatlas': 'http://genatlas.medecine.univ-paris5.fr/fiche.php?symbol=', 'GeneReviews': 'http://www.ncbi.nlm.nih.gov/books/', 'HGMD': 'http://www.hgmd.cf.ac.uk/ac/gene.php?gene=', 'HGNC': 'https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/HGNC:', 'HMDB': 'http://www.hmdb.ca/metabolites/', 'HOMOLOGENE': 'http://www.ncbi.nlm.nih.gov/homologene/', 'HP': 'http://purl.obolibrary.org/obo/HP_', 'HPO': 'http://human-phenotype-ontology.org/', 'HPRD': 'http://www.hprd.org/protein/', 'IAO': 'http://purl.obolibrary.org/obo/IAO_', 'ICD9': 'http://purl.obolibrary.org/obo/ICD9_', 'IMPC': 'https://www.mousephenotype.org/data/genes/', 'IMPC-param': 'https://www.mousephenotype.org/impress/OntologyInfo?action=list&procID=', 'IMPC-pipe': 'https://www.mousephenotype.org/impress/PipelineInfo?id=', 'IMPC-proc': 'https://www.mousephenotype.org/impress/ProcedureInfo?action=list&procID=', 'ISBN': 'https://monarchinitiative.org/ISBN_', 'ISBN-10': 'https://monarchinitiative.org/ISBN10_', 'ISBN-13': 'https://monarchinitiative.org/ISBN13_', 'IUPHAR': 'http://www.guidetopharmacology.org/GRAC/ObjectDisplayForward?objectId=', 'InterPro': 'https://www.ebi.ac.uk/interpro/entry/InterPro/', 'J': 'http://www.informatics.jax.org/reference/J:', 'JAX': 'http://jaxmice.jax.org/strain/', 'KEGG-ds': 'http://purl.obolibrary.org/KEGG-ds_', 'KEGG-hsa': 'http://www.kegg.jp/dbget-bin/www_bget?hsa:', 'KEGG-img': 'http://www.genome.jp/kegg/pathway/map/', 'KEGG-ko': 'http://www.kegg.jp/dbget-bin/www_bget?ko:', 'KEGG-path': 'http://www.kegg.jp/dbget-bin/www_bget?path:', 'LIDA': 'http://sydney.edu.au/vetscience/lida/dogs/search/disorder/', 'LPT': 'http://purl.obolibrary.org/obo/LPT_', 'MA': 'http://purl.obolibrary.org/obo/MA_', 'MEDDRA': 'http://purl.bioontology.org/ontology/MEDDRA/', 'MESH': 'http://id.nlm.nih.gov/mesh/', 'MGI': 'http://www.informatics.jax.org/accession/MGI:', 'MMRRC': 'https://www.mmrrc.org/catalog/sds.php?mmrrc_id=', 'MONARCH': 'https://monarchinitiative.org/MONARCH_', 'MONDO': 'http://purl.obolibrary.org/obo/MONDO_', 'MP': 'http://purl.obolibrary.org/obo/MP_', 'MPATH': 'http://purl.obolibrary.org/obo/MPATH_', 'MPD': 'https://phenome.jax.org/', 'MPD-assay': 'https://phenome.jax.org/db/qp?rtn=views/catlines&keymeas=', 'MPD-strain': 'http://phenome.jax.org/db/q?rtn=strains/details&strainid=', 'MUGEN': 'http://bioit.fleming.gr/mugen/Controller?workflow=ViewModel&expand_all=true&name_begins=model.block&eid=', 'MedGen': 'http://www.ncbi.nlm.nih.gov/medgen/', 'MonarchArchive': 'https://archive.monarchinitiative.org/', 'MonarchData': 'https://data.monarchinitiative.org/ttl/', 'MonarchLogoRepo': 'https://github.com/monarch-initiative/monarch-ui/blob/master/public/img/sources/', 'NBO': 'http://purl.obolibrary.org/obo/NBO_', 'NCBIAssembly': 'https://www.ncbi.nlm.nih.gov/assembly?term=', 'NCBIBSgene': 'http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=gene&part=', 'NCBIGene': 'https://www.ncbi.nlm.nih.gov/gene/', 'NCBIGenome': 'https://www.ncbi.nlm.nih.gov/genome/', 'NCBIProtein': 'http://www.ncbi.nlm.nih.gov/protein/', 'NCBITaxon': 'http://purl.obolibrary.org/obo/NCBITaxon_', 'NCIMR': 'https://mouse.ncifcrf.gov/available_details.asp?ID=', 'NCIT': 'http://purl.obolibrary.org/obo/NCIT_', 'OAE': 'http://purl.obolibrary.org/obo/OAE_', 'OBA': 'http://purl.obolibrary.org/obo/OBA_', 'OBAN': 'http://purl.org/oban/', 'OBI': 'http://purl.obolibrary.org/obo/OBI_', 'OBO': 'http://purl.obolibrary.org/obo/', 'OMIA': 'https://omia.org/OMIA', 'OMIA-breed': 'https://monarchinitiative.org/model/OMIA-breed:', 'OMIM': 'http://omim.org/entry/', 'OMIMPS': 'http://www.omim.org/phenotypicSeries/', 'ORPHA': 'http://www.orpha.net/ORDO/Orphanet_', 'PAINT_REF': 'http://www.geneontology.org/gene-associations/submission/paint/', 'PANTHER': 'http://www.pantherdb.org/panther/family.do?clsAccession=', 'PATO': 'http://purl.obolibrary.org/obo/PATO_', 'PCO': 'http://purl.obolibrary.org/obo/PCO_', 'PDB': 'http://www.ebi.ac.uk/pdbsum/', 'PMCID': 'http://www.ncbi.nlm.nih.gov/pmc/', 'PMID': 'http://www.ncbi.nlm.nih.gov/pubmed/', 'PR': 'http://purl.obolibrary.org/obo/PR_', 'PW': 'http://purl.obolibrary.org/obo/PW_', 'PomBase': 'https://www.pombase.org/spombe/result/', 'RBRC': 'http://www2.brc.riken.jp/lab/animal/detail.php?brc_no=', 'REACT': 'http://www.reactome.org/PathwayBrowser/#/', 'RGD': 'http://rgd.mcw.edu/rgdweb/report/gene/main.html?id=', 'RGDRef': 'http://rgd.mcw.edu/rgdweb/report/reference/main.html?id=', 'RO': 'http://purl.obolibrary.org/obo/RO_', 'RXCUI': 'http://purl.bioontology.org/ontology/RXNORM/', 'RefSeq': 'http://www.ncbi.nlm.nih.gov/refseq/?term=', 'SCTID': 'http://purl.obolibrary.org/obo/SCTID_', 'SEPIO': 'http://purl.obolibrary.org/obo/SEPIO_', 'SGD': 'https://www.yeastgenome.org/locus/', 'SGD_REF': 'https://www.yeastgenome.org/reference/', 'SIO': 'http://semanticscience.org/resource/SIO_', 'SMPDB': 'http://smpdb.ca/view/', 'SNOMED': 'http://purl.obolibrary.org/obo/SNOMED_', 'SO': 'http://purl.obolibrary.org/obo/SO_', 'STATO': 'http://purl.obolibrary.org/obo/STATO_', 'SwissProt': 'http://identifiers.org/SwissProt:', 'TAIR': 'https://www.arabidopsis.org/servlets/TairObject?type=locus&id=', 'TrEMBL': 'http://purl.uniprot.org/uniprot/', 'UBERON': 'http://purl.obolibrary.org/obo/UBERON_', 'UCSC': 'ftp://hgdownload.cse.ucsc.edu/goldenPath/', 'UCSCBuild': 'http://genome.ucsc.edu/cgi-bin/hgGateway?db=', 'UMLS': 'http://linkedlifedata.com/resource/umls/id/', 'UNII': 'http://fdasis.nlm.nih.gov/srs/unii/', 'UO': 'http://purl.obolibrary.org/obo/UO_', 'UPHENO': 'http://purl.obolibrary.org/obo/UPHENO_', 'UniProtKB': 'http://identifiers.org/uniprot/', 'VGNC': 'https://vertebrate.genenames.org/data/gene-symbol-report/#!/vgnc_id/', 'VIVO': 'http://vivoweb.org/ontology/core#', 'VT': 'http://purl.obolibrary.org/obo/VT_', 'WBPhenotype': 'http://purl.obolibrary.org/obo/WBPhenotype_', 'WBbt': 'http://purl.obolibrary.org/obo/WBbt_', 'WD_Entity': 'https://www.wikidata.org/wiki/', 'WD_Prop': 'https://www.wikidata.org/wiki/Property:', 'WormBase': 'https://www.wormbase.org/get?name=', 'XAO': 'http://purl.obolibrary.org/obo/XAO_', 'XCO': 'http://purl.obolibrary.org/obo/XCO_', 'XPO': 'http://purl.obolibrary.org/obo/XPO_', 'Xenbase': 'http://identifiers.org/xenbase/', 'ZFA': 'http://purl.obolibrary.org/obo/ZFA_', 'ZFIN': 'http://zfin.org/', 'ZFS': 'http://purl.obolibrary.org/obo/ZFS_', 'ZP': 'http://purl.obolibrary.org/obo/ZP_', 'biolink': 'https://w3id.org/biolink/vocab/', 'catfishQTL': 'https://www.animalgenome.org/cgi-bin/QTLdb/IP/qdetails?QTL_ID=', 'cattleQTL': 'https://www.animalgenome.org/cgi-bin/QTLdb/BT/qdetails?QTL_ID=', 'chickenQTL': 'https://www.animalgenome.org/cgi-bin/QTLdb/GG/qdetails?QTL_ID=', 'cito': 'http://purl.org/spar/cito/', 'dbSNP': 'http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=', 'dbSNPIndividual': 'http://www.ncbi.nlm.nih.gov/SNP/snp_ind.cgi?ind_id=', 'dbVar': 'http://www.ncbi.nlm.nih.gov/dbvar/', 'dc': 'http://purl.org/dc/terms/', 'dcat': 'http://www.w3.org/ns/dcat#', 'dctypes': 'http://purl.org/dc/dcmitype/', 'dictyBase': 'http://dictybase.org/gene/', 'faldo': 'http://biohackathon.org/resource/faldo#', 'foaf': 'http://xmlns.com/foaf/0.1/', 'horseQTL': 'https://www.animalgenome.org/cgi-bin/QTLdb/EC/qdetails?QTL_ID=', 'miRBase': 'http://www.mirbase.org/cgi-bin/mirna_entry.pl?acc=', 'oboInOwl': 'http://www.geneontology.org/formats/oboInOwl#', 'owl': 'http://www.w3.org/2002/07/owl#', 'pav': 'http://purl.org/pav/', 'pigQTL': 'https://www.animalgenome.org/cgi-bin/QTLdb/SS/qdetails?QTL_ID=', 'rainbow_troutQTL': 'https://www.animalgenome.org/cgi-bin/QTLdb/OM/qdetails?QTL_ID=', 'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#', 'rdfs': 'http://www.w3.org/2000/01/rdf-schema#', 'schema': 'http://schema.org/', 'sheepQTL': 'https://www.animalgenome.org/cgi-bin/QTLdb/OA/qdetails?QTL_ID=', 'skos': 'http://www.w3.org/2004/02/skos/core#', 'vfb': 'http://virtualflybrain.org/reports/', 'void': 'http://rdfs.org/ns/void#', 'xml': 'http://www.w3.org/XML/1998/namespace', 'xsd': 'http://www.w3.org/2001/XMLSchema#'}
curie_util = <dipper.utils.CurieUtil.CurieUtil object>
fhandle = <_io.TextIOWrapper name='/home/docs/checkouts/readthedocs.org/user_builds/dipper/checkouts/master/dipper/graph/../../translationtable/GLOBAL_TERMS.yaml' mode='r' encoding='UTF-8'>
globaltcid = {':activating': 'activating_mutation', ':all_missense_or_inframe': 'all_missense_or_inframe', ':biallelic': 'biallelic', ':does_not_have_phenotype': 'does_not_have_phenotype', ':frequencyOfPhenotype': 'frequency', ':has_allelic_requirement': 'has_allelic_requirement', ':has_cell_origin': 'has_cell_origin', ':has_drug_response': 'has_drug_response', ':has_functional_consequence': 'has_functional_consequence', ':has_molecular_consequence': 'has_molecular_consequence', ':has_sex_specificity': 'has_sex_specificty', ':increased_gene_dosage': 'increased_gene_dosage', ':monoallelic': 'monoallelic', ':mosaic_genotype': 'mosaic', ':onset': 'onset', ':part_of_contiguous_gene_duplication': 'part_of_contiguous_gene_duplication', 'BFO:0000050': 'part_of', 'BFO:0000051': 'has_part', 'BFO:0000066': 'occurs_in', 'CHEBI:23367': 'molecular entity', 'CHEBI:33695': 'gene_product', 'CHEBI:InChIKey': 'inchi_key', 'CL:0000000': 'cell', 'CL:0000034': 'stem cell', 'CL:0000056': 'myoblast', 'CL:0000057': 'fibroblast', 'CL:0000066': 'epithelial cell', 'CL:0000077': 'mesothelial', 'CL:0000084': 'T cell', 'CL:0000115': 'endothelial cell', 'CL:0000148': 'melanocyte', 'CL:0000192': 'smooth muscle cell', 'CL:0000236': 'B cell', 'CL:0000312': 'keratinocyte', 'CL:0002198': 'oncocyte', 'CL:0002323': 'amniocyte', 'CL:0002570': 'mesenchymal stem cell of adipose', 'CL:0011115': 'precursor cell', 'CLO:0000008': 'cell line repository', 'CLO:0000031': 'cell line', 'CLO:0000220': 'immortal kidney-derived cell line cell', 'CLO:0036934': 'Adipose stromal cell', 'CLO:0036935': 'Amniotic fluid-derived cell line', 'CLO:0036938': 'Tumor-derived cell line', 'CLO:0036939': 'Microcell hybrid', 'CLO:0036940': 'Chorionic villus-derived cell line', 'ECO:0000000': 'evidence', 'ECO:0000001': 'inference from background scientific knowledge', 'ECO:0000005': 'enzyme assay evidence', 'ECO:0000006': 'experimental evidence', 'ECO:0000008': 'expression pattern evidence', 'ECO:0000011': 'genetic interaction evidence', 'ECO:0000012': 'functional complementation evidence', 'ECO:0000015': 'mutant phenotype evidence', 'ECO:0000033': 'traceable author statement', 'ECO:0000035': 'no biological data found', 'ECO:0000059': 'experimental phenotypic evidence', 'ECO:0000061': 'quantitative trait analysis evidence', 'ECO:0000068': 'yeast 2-hybrid evidence', 'ECO:0000076': 'far-Western blotting evidence', 'ECO:0000079': 'affinity chromatography evidence', 'ECO:0000080': 'phylogenetic evidence', 'ECO:0000085': 'immunoprecipitation evidence', 'ECO:0000172': 'biochemical trait analysis evidence', 'ECO:0000177': 'genomic context evidence', 'ECO:0000180': 'clinical study evidence', 'ECO:0000200': 'sequence alignment evidence', 'ECO:0000201': 'sequence orthology evidence', 'ECO:0000202': 'match to sequence model evidence', 'ECO:0000213': 'combinatorial evidence used in automatic assertion', 'ECO:0000214': 'biological aspect of descendant evidence', 'ECO:0000220': 'sequencing assay evidence', 'ECO:0000245': 'computational combinatorial evidence used in manual assertion', 'ECO:0000250': 'sequence similarity evidence used in manual assertion', 'ECO:0000269': 'experimental evidence used in manual assertion', 'ECO:0000270': 'expression evidence used in manual assertion', 'ECO:0000303': 'author statement without traceable support used in manual assertion', 'ECO:0000304': 'author statement supported by traceable reference used in manual assertion', 'ECO:0000305': 'curator inference used in manual assertion', 'ECO:0000306': 'inference from background scientific knowledge used in manual assertion', 'ECO:0000311': 'imported information', 'ECO:0000314': 'direct assay evidence used in manual assertion', 'ECO:0000315': 'mutant phenotype evidence used in manual assertion', 'ECO:0000316': 'genetic interaction evidence used in manual assertion', 'ECO:0000318': 'biological aspect of ancestor evidence used in manual assertion', 'ECO:0000320': 'phylogenetic determination of loss of key residues evidence used in manual assertion', 'ECO:0000322': 'imported manually asserted information used in automatic assertion', 'ECO:0000323': 'imported automatically asserted information used in automatic assertion', 'ECO:0000324': 'imaging assay evidence', 'ECO:0000353': 'physical interaction evidence used in manual assertion', 'ECO:0000501': 'evidence used in automatic assertion', 'ECO:0001016': 'blood test evidence', 'ECO:0001048': 'FRET', 'ECO:0001823': 'x-ray crystallography evidence', 'ECO:0005611': 'inference from experimental data evidence', 'ECO:0005612': 'inference from phenotype manipulation evidence', 'ECO:0005613': 'inference by association of genotype from phenotype', 'EDAM-DATA:3148': 'gene_family', 'EFO:0000246': 'age', 'EFO:0000689': 'sampling_time', 'EFO:0001799': 'ethnic_group', 'EFO:0003150': 'African American', 'EFO:0003152': 'Asian', 'EFO:0003153': 'Asian Indian', 'EFO:0003154': 'Asian/Pacific Islander', 'EFO:0003156': 'Caucasian', 'EFO:0003157': 'Chinese', 'EFO:0003158': 'Eastern Indian', 'EFO:0003160': 'Filipino', 'EFO:0003164': 'Japanese', 'EFO:0003165': 'Korean', 'EFO:0003169': 'Hispanic', 'EFO:0004561': 'African', 'EFO:0004905': 'induced pluripotent stem cell', 'EFO:0005135': 'strain', 'ENVO:01000254': 'environmental_system', 'ERO:0000006': 'reagent', 'ERO:0000232': 'has_author', 'ERO:0000480': 'has_url', 'ERO:0001984': 'Native American', 'ERO:0002002': 'embryonic stem cell line', 'ERO:0002071': 'Asian, Vietnamese', 'ERO:0002190': 'collection', 'GENO:0000002': 'variant_locus', 'GENO:0000009': 'genomic_variation_complement', 'GENO:0000030': 'variant single locus complement', 'GENO:0000036': 'reference_locus', 'GENO:0000134': 'hemizygous', 'GENO:0000135': 'heterozygous', 'GENO:0000136': 'homozygous', 'GENO:0000137': 'indeterminate', 'GENO:0000141': 'condition inheritance', 'GENO:0000143': 'co-dominant inheritance', 'GENO:0000144': 'complete dominant inheritance', 'GENO:0000145': 'semi-dominant inheritance', 'GENO:0000146': 'allosomal dominant inheritance', 'GENO:0000147': 'autosomal dominant iniheritance', 'GENO:0000148': 'recessive inheritance', 'GENO:0000149': 'allosomal recessive inheritance', 'GENO:0000150': 'autosomal recessive inheritance', 'GENO:0000206': 'is_allelotype_of', 'GENO:0000207': 'has_sequence_attribute', 'GENO:0000222': 'has_genotype', 'GENO:0000225': 'has_member_with_allelotype', 'GENO:0000382': 'has_variant_part', 'GENO:0000385': 'has_reference_part', 'GENO:0000402': 'compound heterozygous', 'GENO:0000408': 'is_allele_of', 'GENO:0000414': 'targets_gene', 'GENO:0000418': 'has_affected_feature', 'GENO:0000440': 'is_mutant_of', 'GENO:0000443': 'is_expression_variant_of', 'GENO:0000444': 'is_transgene_variant_of', 'GENO:0000458': 'simple heterozygous', 'GENO:0000504': 'reagent_targeted_gene', 'GENO:0000511': 'wildtype', 'GENO:0000512': 'allele', 'GENO:0000524': 'extrinsic_genotype', 'GENO:0000525': 'effective_genotype', 'GENO:0000527': 'targeted_gene_complement', 'GENO:0000534': 'targeted_gene_subregion', 'GENO:0000580': 'has_qualifier', 'GENO:0000602': 'homoplasmic', 'GENO:0000603': 'heteroplasmic', 'GENO:0000604': 'hemizygous-y', 'GENO:0000605': 'hemizygous-x', 'GENO:0000606': 'hemizygous insertion-linked', 'GENO:0000608': 'has_zygosity', 'GENO:0000610': 'is_reference_allele_of', 'GENO:0000611': 'genomic_background', 'GENO:0000614': 'chromosome_region', 'GENO:0000616': 'chromosome_subband', 'GENO:0000618': 'band_intensity', 'GENO:0000619': 'gpos', 'GENO:0000620': 'gneg', 'GENO:0000621': 'gvar', 'GENO:0000622': 'gpos100', 'GENO:0000623': 'gpos75', 'GENO:0000624': 'gpos50', 'GENO:0000625': 'gpos25', 'GENO:0000628': 'stalk', 'GENO:0000629': 'long_chromosome_arm', 'GENO:0000630': 'has_begin_stage_qualifier', 'GENO:0000631': 'has_end_stage_qualifier', 'GENO:0000632': 'gpos66', 'GENO:0000633': 'gpos33', 'GENO:0000634': 'is_targeted_by', 'GENO:0000637': 'regulatory_transgene_feature', 'GENO:0000638': 'coding_transgene_feature', 'GENO:0000639': 'sequence_derives_from', 'GENO:0000643': 'has_origin', 'GENO:0000644': 'karyotype_variation_complement', 'GENO:0000645': 'sex_qualified_genotype', 'GENO:0000646': 'male intrinsic genotype', 'GENO:0000647': 'female intrinsic genotype', 'GENO:0000649': 'unspecified_genomic_background', 'GENO:0000650': 'has_sex_agnostic_part', 'GENO:0000678': 'has_extent', 'GENO:0000719': 'intrinsic genotype', 'GENO:0000772': 'unspecified', 'GENO:0000840': 'pathogenic_for_condition', 'GENO:0000841': 'likely_pathogenic_for_condition', 'GENO:0000843': 'benign_for_condition', 'GENO:0000844': 'likely_benign_for_condition', 'GENO:0000845': 'has_uncertain_significance_for_condition', 'GENO:0000846': 'short repeat', 'GENO:0000866': 'has_quantifier', 'GENO:0000867': 'probabalistic_quantifier', 'GENO:0000882': 'somatic', 'GENO:0000900': 'germline', 'GO:0007165': 'signal_transduction', 'GO:0009987': 'cellular_process', 'GO:0032502': 'developmental_process', 'HP:0001423': 'x_linked_dominant', 'HP:0001427': 'mitochondrial_inheritance', 'HP:0010984': 'digenic_inheritance', 'HP:0031859': 'obsolete', 'IAO:0000004': 'has measurement value', 'IAO:0000013': 'journal article', 'IAO:0000100': 'data set', 'IAO:0000109': 'measurement datum', 'IAO:0000115': 'definition', 'IAO:0000136': 'is_about', 'IAO:0000142': 'mentions', 'IAO:0000185': 'photograph', 'IAO:0000310': 'document', 'IAO:0000311': 'publication', 'IAO:0000589': 'OBO foundry unique label', 'IAO:0100001': 'term replaced by', 'MESH:D004392': 'Dwarfism', 'MONARCH:anonymous': 'is_anonymous', 'MONARCH:cliqueLeader': 'clique_leader', 'MONDO:0000001': 'disease or disorder', 'MONDO:0000009': 'inherited bleeding disorder, platelet-type', 'MONDO:0002051': 'integumentary system disease', 'MONDO:0002561': 'lysosomal storage disease', 'MONDO:0004589': 'hereditary retinal dystrophy', 'MONDO:0004992': 'cancer', 'MONDO:0005066': 'metabolic disease', 'MONDO:0005283': 'retinal disease', 'MONDO:0005453': 'congenital heart disease', 'MONDO:0005570': 'hematological system disease', 'MONDO:0008254': 'platelet disorder, undefined', 'MONDO:0015993': 'Cone–rod dystrophy', 'MONDO:0019052': 'inborn errors of metabolism', 'MONDO:0019592': 'disorder of sex development', 'MONDO:0020145': 'developmental defect of the eye', 'MONDO:0043878': 'hereditary optic atrophy', 'MP:0008762': 'embryonic lethality', 'NCBITaxon:10029': 'Cricetulus griseus', 'NCBITaxon:10036': 'Mesocricetus auratus', 'NCBITaxon:10041': 'Peromyscus leucopus', 'NCBITaxon:10042': 'Peromyscus maniculatus', 'NCBITaxon:10088': 'Mus', 'NCBITaxon:10089': 'Mus caroli', 'NCBITaxon:10090': 'Mus musculus', 'NCBITaxon:10091': 'Mus musculus castaneus', 'NCBITaxon:10092': 'Mus musculus domesticus', 'NCBITaxon:10093': 'Mus pahari', 'NCBITaxon:10094': 'Mus saxicola', 'NCBITaxon:10096': 'Mus spretus', 'NCBITaxon:10097': 'Mus cervicolor', 'NCBITaxon:10098': 'Mus cookii', 'NCBITaxon:10101': 'Mus platythrix', 'NCBITaxon:10102': 'Mus setulosus', 'NCBITaxon:10103': 'Mus spicilegus', 'NCBITaxon:10105': 'Mus minutoides', 'NCBITaxon:10108': 'Mus abbotti', 'NCBITaxon:10114': 'Rattus', 'NCBITaxon:10116': 'Rattus norvegicus', 'NCBITaxon:10141': 'Cavia porcellus', 'NCBITaxon:116058': 'Mus musculus brevirostris', 'NCBITaxon:1266728': 'Mus musculus domesticus x M. m. molossinus', 'NCBITaxon:135827': 'Mus cervicolor cervicolor', 'NCBITaxon:135828': 'Mus cervicolor popaeus', 'NCBITaxon:13616': 'Monodelphis domestica', 'NCBITaxon:1385377': 'Mus musculus gansuensis', 'NCBITaxon:186193': 'Mus fragilicauda', 'NCBITaxon:186842': 'Mus musculus x Mus spretus', 'NCBITaxon:229288': 'Mus gratus', 'NCBITaxon:254704': 'Mus terricolor', 'NCBITaxon:270352': 'Mus macedonicus spretoides', 'NCBITaxon:270353': 'Mus macedonicus macedonicus', 'NCBITaxon:273921': 'Mus indutus', 'NCBITaxon:273922': 'Mus haussa', 'NCBITaxon:27681': 'Mus booduga', 'NCBITaxon:28377': 'Anolis carolinensis', 'NCBITaxon:30608': 'Microcebus murinus', 'NCBITaxon:31033': 'Takifugu rubripe', 'NCBITaxon:35531': 'Mus musculus bactrianus', 'NCBITaxon:3702': 'Arabidopsis thaliana', 'NCBITaxon:37965': 'hybrid', 'NCBITaxon:390847': 'Mus lepidoides', 'NCBITaxon:390848': 'Mus nitidulus', 'NCBITaxon:39442': 'Mus musculus musculus', 'NCBITaxon:397330': 'Mus tenellus', 'NCBITaxon:41269': 'Mus crociduroides', 'NCBITaxon:41270': 'Mus mattheyi', 'NCBITaxon:42413': 'Peromyscus polionotus', 'NCBITaxon:42520': 'Peromyscus californicus', 'NCBITaxon:44689': 'Dictyostelium discoideum', 'NCBITaxon:468371': 'Mus cypriacus', 'NCBITaxon:473865': 'Mus triton', 'NCBITaxon:477815': 'Mus musculus musculus x M. m. domesticus', 'NCBITaxon:477816': 'Mus musculus musculus x M. m. castaneus', 'NCBITaxon:4896': 'Schizosaccharomyces pombe', 'NCBITaxon:5052': 'Aspergillus', 'NCBITaxon:544437': 'Mus baoulei', 'NCBITaxon:54600': 'Macaca nigra', 'NCBITaxon:559292': 'Saccharomyces cerevisiae S288C', 'NCBITaxon:562': 'Escherichia coli', 'NCBITaxon:57486': 'Mus musculus molossinus', 'NCBITaxon:5782': 'Dictyostelium', 'NCBITaxon:6239': 'Caenorhabditis elegans', 'NCBITaxon:66189': 'Chelonoidis niger', 'NCBITaxon:7227': 'Drosophila melanogaster', 'NCBITaxon:78454': 'Saguinus labiatus', 'NCBITaxon:7955': 'Danio rerio', 'NCBITaxon:8022': 'Oncorhynchus mykiss', 'NCBITaxon:80274': 'Mus musculus gentilulus', 'NCBITaxon:8364': 'Xenopus (Silurana) tropicalis', 'NCBITaxon:83773': 'Mus famulus', 'NCBITaxon:862510': 'Nannomys', 'NCBITaxon:887131': 'Mus emesi', 'NCBITaxon:9031': 'Gallus gallus', 'NCBITaxon:9258': 'Ornithorhynchus anatinus', 'NCBITaxon:9315': 'Macropus eugenii', 'NCBITaxon:9365': 'Erinaceus europaeus', 'NCBITaxon:9487': 'Saguinus fuscicollis', 'NCBITaxon:9519': 'Lagothrix lagotricha', 'NCBITaxon:9521': 'Saimiri sciureus', 'NCBITaxon:9523': 'Callicebus moloch', 'NCBITaxon:9538': 'Erythrocebus patas', 'NCBITaxon:9541': 'Macaca fascicularis', 'NCBITaxon:9544': 'Macaca mulatta', 'NCBITaxon:9545': 'Macaca nemestrina', 'NCBITaxon:9555': 'Papio anubis', 'NCBITaxon:9593': 'Gorilla gorilla', 'NCBITaxon:9597': 'Pan paniscus', 'NCBITaxon:9598': 'Pan troglodytes', 'NCBITaxon:9600': 'Pongo pygmaeus', 'NCBITaxon:9606': 'Homo sapiens', 'NCBITaxon:9615': 'Canis lupus familiaris', 'NCBITaxon:9649': 'Ailurus fulgens', 'NCBITaxon:9685': 'Felis catus', 'NCBITaxon:9796': 'Equus caballus', 'NCBITaxon:9823': 'Sus scrofa', 'NCBITaxon:9825': 'Sus scrofa domestica', 'NCBITaxon:9888': 'Muntiacus muntjak', 'NCBITaxon:9913': 'Bos taurus', 'NCBITaxon:9925': 'Capra hircus', 'NCBITaxon:9940': 'Ovis aries', 'NCBITaxon:9986': 'Oryctolagus cuniculus', 'NCIT:C61040': 'Statistical Significance', 'NCIT:C63513': 'Manual', 'NCIT:C71458': 'Suspected', 'NCIT:C96621': 'Percent Change From Baseline', 'OAE:0001563': 'proportional_reporting_ratio', 'OBAN:association': 'association', 'OBAN:association_has_object': 'association has object', 'OBAN:association_has_predicate': 'association has predicate', 'OBAN:association_has_subject': 'association has subject', 'OBI:0000070': 'assay', 'OBI:0000175': 'p-value', 'OBI:0000471': 'study', 'OBI:0000673': 'statistical_hypothesis_test', 'OBI:0001937': 'has specified numeric value', 'PATO:0000383': 'female', 'PATO:0000384': 'male', 'PATO:0000460': 'abnormal', 'PATO:0000461': 'normal', 'PCO:0000001': 'population', 'PCO:0000020': 'family', 'PW:0000001': 'pathway', 'RO:0000057': 'has_participant', 'RO:0000086': 'has_quality', 'RO:0000091': 'has disposition', 'RO:0001000': 'derives_from', 'RO:0002091': 'starts during', 'RO:0002093': 'ends during', 'RO:0002162': 'in taxon', 'RO:0002200': 'has phenotype', 'RO:0002204': 'gene product of', 'RO:0002205': 'has gene product', 'RO:0002206': 'expressed in', 'RO:0002224': 'starts_with', 'RO:0002230': 'ends_with', 'RO:0002233': 'has_input', 'RO:0002325': 'colocalizes with', 'RO:0002326': 'contributes to', 'RO:0002327': 'enables', 'RO:0002331': 'involved in', 'RO:0002350': 'member of', 'RO:0002351': 'has member', 'RO:0002353': 'output_of', 'RO:0002418': 'causally upstream of or within', 'RO:0002434': 'interacts with', 'RO:0002435': 'genetically interacts with', 'RO:0002436': 'molecularly_interacts_with', 'RO:0002448': 'regulates', 'RO:0002480': 'ubiquitinates', 'RO:0002488': 'existence_starts_during', 'RO:0002492': 'existence_ends_during', 'RO:0002503': 'towards', 'RO:0002513': 'translates_to', 'RO:0002524': 'has subsequence', 'RO:0002525': 'is subsequence of', 'RO:0002528': 'is upstream of sequence of', 'RO:0002529': 'is downstream of sequence of', 'RO:0002558': 'has evidence', 'RO:0002566': 'causally_influences', 'RO:0002583': 'existence starts at point', 'RO:0002593': 'existence ends at point', 'RO:0002606': 'is substance that treats', 'RO:0002607': 'is marker for', 'RO:0002610': 'correlates_with', 'RO:0002614': 'is_evidence_supported_by', 'RO:0003002': 'negatively_regulates', 'RO:0003003': 'positively_regulates', 'RO:0003301': 'is model of', 'RO:0003302': 'causes_or_contributes', 'RO:0003303': 'causes condition', 'RO:0003304': 'contributes to condition', 'RO:0003307': 'protective_for_condition', 'RO:0004011': 'is causal gain of function germline mutation of in', 'RO:0004012': 'is causal loss of function germline mutation of in', 'RO:0004013': 'is causal germline mutation in', 'RO:0004014': 'is causal somatic mutation in', 'RO:0004015': 'is causal susceptibility factor for', 'RO:0004016': 'is causal germline mutation partially giving rise to', 'RO:HOM0000011': 'in paralogy relationship with', 'RO:HOM0000017': 'in orthology relationship with', 'RO:HOM0000018': 'in xenology relationship with', 'RO:HOM0000019': 'in 1 to 1 homology relationship with', 'RO:HOM0000020': 'in 1 to 1 orthology relationship with', 'RO:HOM0000022': 'in ohnology relationship with', 'RO:HOM0000023': 'in in-paralogy relationship with', 'SEPIO:0000001': 'assertion', 'SEPIO:0000003': 'assertion process', 'SEPIO:0000006': 'has_evidence_line', 'SEPIO:0000007': 'has_supporting_evidence_line', 'SEPIO:0000011': 'has_provenance', 'SEPIO:0000015': 'is_asserted_in', 'SEPIO:0000017': 'has_agent', 'SEPIO:0000018': 'created_by', 'SEPIO:0000021': 'date_created', 'SEPIO:0000022': 'created_with_resource', 'SEPIO:0000031': 'is_evidence_for', 'SEPIO:0000032': 'is_supporting_evidence_for', 'SEPIO:0000033': 'is_refuting_evidence_for', 'SEPIO:0000037': 'assertion method', 'SEPIO:0000041': 'is_specified_by', 'SEPIO:0000059': 'is_evidence_with_support_from', 'SEPIO:0000066': 'research', 'SEPIO:0000067': 'clinical testing', 'SEPIO:0000071': 'case-control', 'SEPIO:0000073': 'in vitro', 'SEPIO:0000074': 'in vivo', 'SEPIO:0000080': 'literature only', 'SEPIO:0000081': 'curation', 'SEPIO:0000084': 'has_evidence_item', 'SEPIO:0000085': 'has_supporting_activity', 'SEPIO:0000098': 'is_equilavent_to', 'SEPIO:0000099': 'is_consistent_with', 'SEPIO:0000100': 'strongly_contradicts', 'SEPIO:0000101': 'contradicts', 'SEPIO:0000102': 'reference population', 'SEPIO:0000111': 'is_assertion_supported_by_evidence', 'SEPIO:0000114': 'measures_parameter', 'SEPIO:0000124': 'has_supporting_reference', 'SEPIO:0000126': 'is_inconsistent_with', 'SEPIO:0000130': 'asserted_by', 'SEPIO:0000167': 'assertion_confidence_level', 'SEPIO:0000168': 'assertion_confidence_score', 'SEPIO:0000186': 'phenotyping only', 'SIO:000302': 'web page', 'SIO:000794': 'count', 'SIO:001015': 'race', 'SO:0000001': 'region', 'SO:0000013': 'small cytoplasmic RNA', 'SO:0000043': 'processed_pseudogene', 'SO:0000077': 'antisense', 'SO:0000100': 'endogenous_retroviral_gene', 'SO:0000101': 'transposable_element', 'SO:0000104': 'polypeptide', 'SO:0000105': 'chromosome_arm', 'SO:0000110': 'sequence_feature', 'SO:0000111': 'transposable_element_gene', 'SO:0000134': 'genomically_imprinted', 'SO:0000143': 'assembly_component', 'SO:0000150': 'read', 'SO:0000159': 'deletion', 'SO:0000165': 'enhancer', 'SO:0000167': 'promoter', 'SO:0000180': 'retrotransposon', 'SO:0000199': 'translocation', 'SO:0000233': 'mature_transcript', 'SO:0000289': 'microsatellite', 'SO:0000307': 'CpG_island', 'SO:0000336': 'pseudogene', 'SO:0000337': 'RNAi_reagent', 'SO:0000340': 'chromosome', 'SO:0000341': 'chromosome_band', 'SO:0000374': 'ribozyme', 'SO:0000404': 'vault_RNA', 'SO:0000405': 'Y RNA', 'SO:0000409': 'binding_site', 'SO:0000453': 'chromosomal_transposition', 'SO:0000460': 'vertebrate_immunoglobulin_T_cell_receptor_segment', 'SO:0000462': 'pseudogenic_region', 'SO:0000577': 'centromere', 'SO:0000624': 'telomere', 'SO:0000643': 'minisatellite', 'SO:0000651': 'large_subunit_rRNA', 'SO:0000655': 'ncRNA', 'SO:0000667': 'insertion', 'SO:0000694': 'SNP', 'SO:0000703': 'experimental_result_region', 'SO:0000704': 'gene', 'SO:0000756': 'cDNA', 'SO:0000771': 'QTL', 'SO:0000781': 'transgenic', 'SO:0000796': 'transgenic_transposable_element', 'SO:0000806': 'fusion', 'SO:0000817': 'wild_type', 'SO:0000830': 'chromosome_part', 'SO:0000883': 'stop codon readthrough', 'SO:0000902': 'transgene', 'SO:0000903': 'endogenous_retroviral_sequence', 'SO:0000946': 'integration_excision_site', 'SO:0001024': 'haplotype', 'SO:0001026': 'genome', 'SO:0001028': 'diplotype', 'SO:0001055': 'transcriptional_cis_regulatory_region', 'SO:0001059': 'sequence_alteration', 'SO:0001060': 'sequence_variant', 'SO:0001217': 'protein_coding_gene', 'SO:0001218': 'transgenic_insertion', 'SO:0001240': 'TSS_region', 'SO:0001263': 'ncRNA_gene', 'SO:0001265': 'miRNA_gene', 'SO:0001266': 'scRNA_gene', 'SO:0001267': 'snoRNA_gene', 'SO:0001268': 'snRNA_gene', 'SO:0001269': 'SRP_RNA_gene', 'SO:0001272': 'tRNA_gene', 'SO:0001411': 'biological_region', 'SO:0001483': 'SNV', 'SO:0001500': 'heritable_phenotypic_marker', 'SO:0001503': 'processed_transcript', 'SO:0001505': 'reference_genome', 'SO:0001564': 'gene_variant', 'SO:0001566': 'regulatory_region_variant', 'SO:0001574': 'splice_acceptor_variant', 'SO:0001575': 'splice_donor_variant', 'SO:0001578': 'stop_lost', 'SO:0001580': 'coding_sequence_variant', 'SO:0001583': 'missense_variant', 'SO:0001587': 'stop_gained', 'SO:0001589': 'frameshift_variant', 'SO:0001619': 'non_coding_transcript_exon_variant', 'SO:0001620': 'mature_miRNA_variant', 'SO:0001622': 'UTR_variant', 'SO:0001623': '5_prime_UTR_variant', 'SO:0001624': '3_prime_UTR_variant', 'SO:0001627': 'intron_variant', 'SO:0001628': 'intergenic_variant', 'SO:0001630': 'splice_region_variant', 'SO:0001634': 'downstream_gene_variant', 'SO:0001636': 'upstream_gene_variant', 'SO:0001637': 'rRNA_gene', 'SO:0001638': 'piRNA_gene', 'SO:0001639': 'RNase_P_RNA_gene', 'SO:0001640': 'RNase_MRP_RNA_gene', 'SO:0001641': 'lincRNA_gene', 'SO:0001643': 'telomerase_RNA_gene', 'SO:0001645': 'genetic_marker', 'SO:0001650': 'inframe_variant', 'SO:0001685': 'score', 'SO:0001741': 'pseudogenic_gene_segment', 'SO:0001742': 'copy_number_gain', 'SO:0001743': 'copy_number_loss', 'SO:0001759': 'unitary_pseudogene', 'SO:0001760': 'unprocessed_pseudogene', 'SO:0001782': 'TF_binding_site_variant', 'SO:0001784': 'complex_structural_alteration', 'SO:0001785': 'structural_alteration', 'SO:0001792': 'non_coding_exon_variant', 'SO:0001818': 'protein_altering_variant', 'SO:0001819': 'synonymous_variant', 'SO:0001821': 'inframe_insertion', 'SO:0001822': 'inframe_deletion', 'SO:0001837': 'mobile_element_insertion', 'SO:0001838': 'novel_sequence_insertion', 'SO:0001841': 'polymorphic_pseudogene', 'SO:0001877': 'lnc_RNA', 'SO:0001882': 'feature_fusion', 'SO:0001897': 'transposable_element_pseudogene', 'SO:0001904': 'antisense_lncRNA', 'SO:0002007': 'MNV', 'SO:0002012': 'start_lost', 'SO:0002040': 'vaultRNA_primary_transcript', 'SO:0002052': 'dominant_negative_variant', 'SO:0002053': 'gain_of_function_variant', 'SO:0002054': 'loss_of_function_variant', 'SO:0002095': 'scaRNA', 'SO:0002098': 'immunoglobulin_pseudogene', 'SO:0002099': 'T_cell_receptor_pseudogene', 'SO:0002100': 'IG_C_pseudogene', 'SO:0002101': 'IG_J_pseudogene', 'SO:0002102': 'IG_V_pseudogene', 'SO:0002103': 'TR_V_pseudogene', 'SO:0002104': 'TR_J_pseudogene', 'SO:0002106': 'translated_unprocessed_pseudogene', 'SO:0002107': 'transcribed_unprocessed_pseudogene', 'SO:0002108': 'transcribed_unitary_pseudogene', 'SO:0002109': 'transcribed_processed_pseudogene', 'SO:0002120': '3prime_overlapping_ncRNA', 'SO:0002122': 'immunoglobulin_gene', 'SO:0002123': 'IG_C_gene', 'SO:0002124': 'IG_D_gene', 'SO:0002125': 'IG_J_gene', 'SO:0002126': 'IG_V_gene', 'SO:0002127': 'lncRNA_gene', 'SO:0002128': 'mt_rRNA', 'SO:0002129': 'mt_tRNA', 'SO:0002131': 'sense_intronic', 'SO:0002132': 'sense_overlapping', 'SO:0002134': 'TR_C_gene', 'SO:0002135': 'TR_D_gene', 'SO:0002136': 'TR_J_gene', 'SO:0002137': 'TR_V_gene', 'SO:0002139': 'TEC', 'SO:0002181': 'ribozyme_gene', 'SO:0002183': 'sense_overlap_ncRNA_gene', 'SO:0002184': 'sense_intronic_ncRNA_gene', 'SO:0002185': 'bidirectional_promoter_lncRNA', 'SO:1000002': 'substitution', 'SO:1000005': 'complex_substitution', 'SO:1000008': 'point_mutation', 'SO:1000029': 'chromosomal_deletion', 'SO:1000030': 'chromosomal_inversion', 'SO:1000032': 'indel', 'SO:1000035': 'duplication', 'SO:1000036': 'inversion', 'SO:1000037': 'chromosomal_duplication', 'SO:1000039': 'direct_tandem_duplication', 'SO:1000043': 'Robertsonian_fusion', 'SO:1000044': 'chromosomal_translocation', 'SO:1000048': 'reciprocal_chromosomal_translocation', 'SO:1000117': 'sequence_variant_affecting_polypeptide_function', 'SO:1000118': 'sequence_variant_causing_loss_of_function_of_polypeptide', 'SO:1000120': 'sequence_variant_causing_inactive_catalytic_site', 'SO:1000125': 'sequence_variant_causing_gain_of_function_of_polypeptide', 'SO:1000173': 'tandem_duplication', 'SO:1000183': 'chromosome_structure_variation', 'SO:3000000': 'gene_segment', 'STATO:0000073': "Fisher's exact test", 'STATO:0000076': 'Mann-Whitney U-test', 'STATO:0000085': 'effect size estimate', 'STATO:0000104': 'zscore', 'STATO:0000107': 'statistical model', 'STATO:0000129': 'has_value', 'STATO:0000169': 'fold change', 'STATO:0000182': 'odds_ratio', 'STATO:0000189': 'mixed effect model', 'STATO:0000372': 'generalized least squares estimation', 'STATO:0000464': 'linear mixed model', 'SWO:0000425': 'Similarity score', 'UPHENO:0001001': 'phenotype', 'VIVO:Project': 'project', 'XCO:0000000': 'environmental_condition', 'cito:citesAsAuthority': 'citesAsAuthority', 'dc:Publisher': 'Publisher', 'dc:created': 'Date Created', 'dc:creator': 'creator', 'dc:description': 'description', 'dc:format': 'format', 'dc:identifier': 'identifier', 'dc:isVersionOf': 'isVersionOf', 'dc:license': 'license', 'dc:rights': 'rights', 'dc:source': 'Source', 'dc:title': 'title', 'dcat:Distribution': 'Distribution', 'dcat:distribution': 'distribution', 'dcat:downloadURL': 'downloadURL', 'dctypes:Dataset': 'Dataset', 'faldo:BothStrandPosition': 'both_strand', 'faldo:FuzzyPosition': 'FuzzyPosition', 'faldo:MinusStrandPosition': 'minus_strand', 'faldo:PlusStrandPosition': 'plus_strand', 'faldo:Position': 'Position', 'faldo:Region': 'Region', 'faldo:begin': 'begin', 'faldo:end': 'end', 'faldo:location': 'location', 'faldo:position': 'position', 'faldo:reference': 'reference', 'foaf:Person': 'person', 'foaf:depiction': 'depiction', 'foaf:organization': 'organization', 'foaf:page': 'page', 'oboInOwl:consider': 'consider', 'oboInOwl:hasDbXref': 'database_cross_reference', 'oboInOwl:hasExactSynonym': 'has_exact_synonym', 'oboInOwl:hasRelatedSynonym': 'has_related_synonym', 'owl:AnnotationProperty': 'annotation_property', 'owl:Class': 'class', 'owl:DatatypeProperty': 'datatype_property', 'owl:NamedIndividual': 'named_individual', 'owl:ObjectProperty': 'object_property', 'owl:Ontology': 'ontology', 'owl:Restriction': 'restriction', 'owl:deprecated': 'deprecated', 'owl:equivalentClass': 'equivalent_class', 'owl:onProperty': 'on_property', 'owl:sameAs': 'same_as', 'owl:someValuesFrom': 'some_values_from', 'owl:versionIRI': 'version_iri', 'owl:versionInfo': 'version_info', 'pav:createdOn': 'created_on', 'pav:createdWith': 'created_with', 'pav:retrievedOn': 'retrieved_on', 'pav:version': 'version', 'rdf:type': 'type', 'rdfs:comment': 'comment', 'rdfs:domain': 'domain', 'rdfs:label': 'label', 'rdfs:subClassOf': 'subclass_of', 'rdfs:subPropertyOf': 'subPropertyOf', 'void:class': 'class (void)', 'void:classPartition': 'classPartition', 'void:distinctObjects': 'distinctObjects', 'void:distinctSubjects': 'distinctSubjects', 'void:entities': 'entities', 'void:properties': 'properties', 'void:triples': 'triples'}
globaltt = {'3_prime_UTR_variant': 'SO:0001624', '3prime_overlapping_ncRNA': 'SO:0002120', '5_prime_UTR_variant': 'SO:0001623', 'Adipose stromal cell': 'CLO:0036934', 'African': 'EFO:0004561', 'African American': 'EFO:0003150', 'Ailurus fulgens': 'NCBITaxon:9649', 'Amniotic fluid-derived cell line': 'CLO:0036935', 'Anolis carolinensis': 'NCBITaxon:28377', 'Arabidopsis thaliana': 'NCBITaxon:3702', 'Asian': 'EFO:0003152', 'Asian Indian': 'EFO:0003153', 'Asian, Vietnamese': 'ERO:0002071', 'Asian/Pacific Islander': 'EFO:0003154', 'Aspergillus': 'NCBITaxon:5052', 'B cell': 'CL:0000236', 'Bos taurus': 'NCBITaxon:9913', 'Caenorhabditis elegans': 'NCBITaxon:6239', 'Callicebus moloch': 'NCBITaxon:9523', 'Canis lupus familiaris': 'NCBITaxon:9615', 'Capra hircus': 'NCBITaxon:9925', 'Caucasian': 'EFO:0003156', 'Cavia porcellus': 'NCBITaxon:10141', 'Chelonoidis niger': 'NCBITaxon:66189', 'Chinese': 'EFO:0003157', 'Chorionic villus-derived cell line': 'CLO:0036940', 'Cone–rod dystrophy': 'MONDO:0015993', 'CpG_island': 'SO:0000307', 'Cricetulus griseus': 'NCBITaxon:10029', 'Danio rerio': 'NCBITaxon:7955', 'Dataset': 'dctypes:Dataset', 'Date Created': 'dc:created', 'Dictyostelium': 'NCBITaxon:5782', 'Dictyostelium discoideum': 'NCBITaxon:44689', 'Distribution': 'dcat:Distribution', 'Drosophila melanogaster': 'NCBITaxon:7227', 'Dwarfism': 'MESH:D004392', 'Eastern Indian': 'EFO:0003158', 'Equus caballus': 'NCBITaxon:9796', 'Erinaceus europaeus': 'NCBITaxon:9365', 'Erythrocebus patas': 'NCBITaxon:9538', 'Escherichia coli': 'NCBITaxon:562', 'FRET': 'ECO:0001048', 'Felis catus': 'NCBITaxon:9685', 'Filipino': 'EFO:0003160', "Fisher's exact test": 'STATO:0000073', 'FuzzyPosition': 'faldo:FuzzyPosition', 'Gallus gallus': 'NCBITaxon:9031', 'Gorilla gorilla': 'NCBITaxon:9593', 'Hispanic': 'EFO:0003169', 'Homo sapiens': 'NCBITaxon:9606', 'IG_C_gene': 'SO:0002123', 'IG_C_pseudogene': 'SO:0002100', 'IG_D_gene': 'SO:0002124', 'IG_J_gene': 'SO:0002125', 'IG_J_pseudogene': 'SO:0002101', 'IG_V_gene': 'SO:0002126', 'IG_V_pseudogene': 'SO:0002102', 'Japanese': 'EFO:0003164', 'Korean': 'EFO:0003165', 'Lagothrix lagotricha': 'NCBITaxon:9519', 'MNV': 'SO:0002007', 'Macaca fascicularis': 'NCBITaxon:9541', 'Macaca mulatta': 'NCBITaxon:9544', 'Macaca nemestrina': 'NCBITaxon:9545', 'Macaca nigra': 'NCBITaxon:54600', 'Macropus eugenii': 'NCBITaxon:9315', 'Mann-Whitney U-test': 'STATO:0000076', 'Manual': 'NCIT:C63513', 'Mesocricetus auratus': 'NCBITaxon:10036', 'Microcebus murinus': 'NCBITaxon:30608', 'Microcell hybrid': 'CLO:0036939', 'Monodelphis domestica': 'NCBITaxon:13616', 'Muntiacus muntjak': 'NCBITaxon:9888', 'Mus': 'NCBITaxon:10088', 'Mus abbotti': 'NCBITaxon:10108', 'Mus baoulei': 'NCBITaxon:544437', 'Mus booduga': 'NCBITaxon:27681', 'Mus caroli': 'NCBITaxon:10089', 'Mus cervicolor': 'NCBITaxon:10097', 'Mus cervicolor cervicolor': 'NCBITaxon:135827', 'Mus cervicolor popaeus': 'NCBITaxon:135828', 'Mus cookii': 'NCBITaxon:10098', 'Mus crociduroides': 'NCBITaxon:41269', 'Mus cypriacus': 'NCBITaxon:468371', 'Mus emesi': 'NCBITaxon:887131', 'Mus famulus': 'NCBITaxon:83773', 'Mus fragilicauda': 'NCBITaxon:186193', 'Mus gratus': 'NCBITaxon:229288', 'Mus haussa': 'NCBITaxon:273922', 'Mus indutus': 'NCBITaxon:273921', 'Mus lepidoides': 'NCBITaxon:390847', 'Mus macedonicus macedonicus': 'NCBITaxon:270353', 'Mus macedonicus spretoides': 'NCBITaxon:270352', 'Mus mattheyi': 'NCBITaxon:41270', 'Mus minutoides': 'NCBITaxon:10105', 'Mus musculus': 'NCBITaxon:10090', 'Mus musculus bactrianus': 'NCBITaxon:35531', 'Mus musculus brevirostris': 'NCBITaxon:116058', 'Mus musculus castaneus': 'NCBITaxon:10091', 'Mus musculus domesticus': 'NCBITaxon:10092', 'Mus musculus domesticus x M. m. molossinus': 'NCBITaxon:1266728', 'Mus musculus gansuensis': 'NCBITaxon:1385377', 'Mus musculus gentilulus': 'NCBITaxon:80274', 'Mus musculus molossinus': 'NCBITaxon:57486', 'Mus musculus musculus': 'NCBITaxon:39442', 'Mus musculus musculus x M. m. castaneus': 'NCBITaxon:477816', 'Mus musculus musculus x M. m. domesticus': 'NCBITaxon:477815', 'Mus musculus x Mus spretus': 'NCBITaxon:186842', 'Mus nitidulus': 'NCBITaxon:390848', 'Mus pahari': 'NCBITaxon:10093', 'Mus platythrix': 'NCBITaxon:10101', 'Mus saxicola': 'NCBITaxon:10094', 'Mus setulosus': 'NCBITaxon:10102', 'Mus spicilegus': 'NCBITaxon:10103', 'Mus spretus': 'NCBITaxon:10096', 'Mus tenellus': 'NCBITaxon:397330', 'Mus terricolor': 'NCBITaxon:254704', 'Mus triton': 'NCBITaxon:473865', 'Nannomys': 'NCBITaxon:862510', 'Native American': 'ERO:0001984', 'OBO foundry unique label': 'IAO:0000589', 'Oncorhynchus mykiss': 'NCBITaxon:8022', 'Ornithorhynchus anatinus': 'NCBITaxon:9258', 'Oryctolagus cuniculus': 'NCBITaxon:9986', 'Ovis aries': 'NCBITaxon:9940', 'Pan paniscus': 'NCBITaxon:9597', 'Pan troglodytes': 'NCBITaxon:9598', 'Papio anubis': 'NCBITaxon:9555', 'Percent Change From Baseline': 'NCIT:C96621', 'Peromyscus californicus': 'NCBITaxon:42520', 'Peromyscus leucopus': 'NCBITaxon:10041', 'Peromyscus maniculatus': 'NCBITaxon:10042', 'Peromyscus polionotus': 'NCBITaxon:42413', 'Pongo pygmaeus': 'NCBITaxon:9600', 'Position': 'faldo:Position', 'Publisher': 'dc:Publisher', 'QTL': 'SO:0000771', 'RNAi_reagent': 'SO:0000337', 'RNase_MRP_RNA_gene': 'SO:0001640', 'RNase_P_RNA_gene': 'SO:0001639', 'Rattus': 'NCBITaxon:10114', 'Rattus norvegicus': 'NCBITaxon:10116', 'Region': 'faldo:Region', 'Robertsonian_fusion': 'SO:1000043', 'SNP': 'SO:0000694', 'SNV': 'SO:0001483', 'SRP_RNA_gene': 'SO:0001269', 'Saccharomyces cerevisiae S288C': 'NCBITaxon:559292', 'Saguinus fuscicollis': 'NCBITaxon:9487', 'Saguinus labiatus': 'NCBITaxon:78454', 'Saimiri sciureus': 'NCBITaxon:9521', 'Schizosaccharomyces pombe': 'NCBITaxon:4896', 'Similarity score': 'SWO:0000425', 'Source': 'dc:source', 'Statistical Significance': 'NCIT:C61040', 'Sus scrofa': 'NCBITaxon:9823', 'Sus scrofa domestica': 'NCBITaxon:9825', 'Suspected': 'NCIT:C71458', 'T cell': 'CL:0000084', 'TEC': 'SO:0002139', 'TF_binding_site_variant': 'SO:0001782', 'TR_C_gene': 'SO:0002134', 'TR_D_gene': 'SO:0002135', 'TR_J_gene': 'SO:0002136', 'TR_J_pseudogene': 'SO:0002104', 'TR_V_gene': 'SO:0002137', 'TR_V_pseudogene': 'SO:0002103', 'TSS_region': 'SO:0001240', 'T_cell_receptor_pseudogene': 'SO:0002099', 'Takifugu rubripe': 'NCBITaxon:31033', 'Tumor-derived cell line': 'CLO:0036938', 'UTR_variant': 'SO:0001622', 'Xenopus (Silurana) tropicalis': 'NCBITaxon:8364', 'Y RNA': 'SO:0000405', 'abnormal': 'PATO:0000460', 'activating_mutation': ':activating', 'affinity chromatography evidence': 'ECO:0000079', 'age': 'EFO:0000246', 'all_missense_or_inframe': ':all_missense_or_inframe', 'allele': 'GENO:0000512', 'allosomal dominant inheritance': 'GENO:0000146', 'allosomal recessive inheritance': 'GENO:0000149', 'amniocyte': 'CL:0002323', 'annotation_property': 'owl:AnnotationProperty', 'antisense': 'SO:0000077', 'antisense_lncRNA': 'SO:0001904', 'assay': 'OBI:0000070', 'assembly_component': 'SO:0000143', 'asserted_by': 'SEPIO:0000130', 'assertion': 'SEPIO:0000001', 'assertion method': 'SEPIO:0000037', 'assertion process': 'SEPIO:0000003', 'assertion_confidence_level': 'SEPIO:0000167', 'assertion_confidence_score': 'SEPIO:0000168', 'association': 'OBAN:association', 'association has object': 'OBAN:association_has_object', 'association has predicate': 'OBAN:association_has_predicate', 'association has subject': 'OBAN:association_has_subject', 'author statement supported by traceable reference used in manual assertion': 'ECO:0000304', 'author statement without traceable support used in manual assertion': 'ECO:0000303', 'autosomal dominant iniheritance': 'GENO:0000147', 'autosomal recessive inheritance': 'GENO:0000150', 'band_intensity': 'GENO:0000618', 'begin': 'faldo:begin', 'benign_for_condition': 'GENO:0000843', 'biallelic': ':biallelic', 'bidirectional_promoter_lncRNA': 'SO:0002185', 'binding_site': 'SO:0000409', 'biochemical trait analysis evidence': 'ECO:0000172', 'biological aspect of ancestor evidence used in manual assertion': 'ECO:0000318', 'biological aspect of descendant evidence': 'ECO:0000214', 'biological_region': 'SO:0001411', 'blood test evidence': 'ECO:0001016', 'both_strand': 'faldo:BothStrandPosition', 'cDNA': 'SO:0000756', 'cancer': 'MONDO:0004992', 'case-control': 'SEPIO:0000071', 'causally upstream of or within': 'RO:0002418', 'causally_influences': 'RO:0002566', 'causes condition': 'RO:0003303', 'causes_or_contributes': 'RO:0003302', 'cell': 'CL:0000000', 'cell line': 'CLO:0000031', 'cell line repository': 'CLO:0000008', 'cellular_process': 'GO:0009987', 'centromere': 'SO:0000577', 'chromosomal_deletion': 'SO:1000029', 'chromosomal_duplication': 'SO:1000037', 'chromosomal_inversion': 'SO:1000030', 'chromosomal_translocation': 'SO:1000044', 'chromosomal_transposition': 'SO:0000453', 'chromosome': 'SO:0000340', 'chromosome_arm': 'SO:0000105', 'chromosome_band': 'SO:0000341', 'chromosome_part': 'SO:0000830', 'chromosome_region': 'GENO:0000614', 'chromosome_structure_variation': 'SO:1000183', 'chromosome_subband': 'GENO:0000616', 'citesAsAuthority': 'cito:citesAsAuthority', 'class': 'owl:Class', 'class (void)': 'void:class', 'classPartition': 'void:classPartition', 'clinical study evidence': 'ECO:0000180', 'clinical testing': 'SEPIO:0000067', 'clique_leader': 'MONARCH:cliqueLeader', 'co-dominant inheritance': 'GENO:0000143', 'coding_sequence_variant': 'SO:0001580', 'coding_transgene_feature': 'GENO:0000638', 'collection': 'ERO:0002190', 'colocalizes with': 'RO:0002325', 'combinatorial evidence used in automatic assertion': 'ECO:0000213', 'comment': 'rdfs:comment', 'complete dominant inheritance': 'GENO:0000144', 'complex_structural_alteration': 'SO:0001784', 'complex_substitution': 'SO:1000005', 'compound heterozygous': 'GENO:0000402', 'computational combinatorial evidence used in manual assertion': 'ECO:0000245', 'condition inheritance': 'GENO:0000141', 'congenital heart disease': 'MONDO:0005453', 'consider': 'oboInOwl:consider', 'contradicts': 'SEPIO:0000101', 'contributes to': 'RO:0002326', 'contributes to condition': 'RO:0003304', 'copy_number_gain': 'SO:0001742', 'copy_number_loss': 'SO:0001743', 'correlates_with': 'RO:0002610', 'count': 'SIO:000794', 'created_by': 'SEPIO:0000018', 'created_on': 'pav:createdOn', 'created_with': 'pav:createdWith', 'created_with_resource': 'SEPIO:0000022', 'creator': 'dc:creator', 'curation': 'SEPIO:0000081', 'curator inference used in manual assertion': 'ECO:0000305', 'data set': 'IAO:0000100', 'database_cross_reference': 'oboInOwl:hasDbXref', 'datatype_property': 'owl:DatatypeProperty', 'date_created': 'SEPIO:0000021', 'definition': 'IAO:0000115', 'deletion': 'SO:0000159', 'depiction': 'foaf:depiction', 'deprecated': 'owl:deprecated', 'derives_from': 'RO:0001000', 'description': 'dc:description', 'developmental defect of the eye': 'MONDO:0020145', 'developmental_process': 'GO:0032502', 'digenic_inheritance': 'HP:0010984', 'diplotype': 'SO:0001028', 'direct assay evidence used in manual assertion': 'ECO:0000314', 'direct_tandem_duplication': 'SO:1000039', 'disease or disorder': 'MONDO:0000001', 'disorder of sex development': 'MONDO:0019592', 'distinctObjects': 'void:distinctObjects', 'distinctSubjects': 'void:distinctSubjects', 'distribution': 'dcat:distribution', 'document': 'IAO:0000310', 'does_not_have_phenotype': ':does_not_have_phenotype', 'domain': 'rdfs:domain', 'dominant_negative_variant': 'SO:0002052', 'downloadURL': 'dcat:downloadURL', 'downstream_gene_variant': 'SO:0001634', 'duplication': 'SO:1000035', 'effect size estimate': 'STATO:0000085', 'effective_genotype': 'GENO:0000525', 'embryonic lethality': 'MP:0008762', 'embryonic stem cell line': 'ERO:0002002', 'enables': 'RO:0002327', 'end': 'faldo:end', 'endogenous_retroviral_gene': 'SO:0000100', 'endogenous_retroviral_sequence': 'SO:0000903', 'endothelial cell': 'CL:0000115', 'ends during': 'RO:0002093', 'ends_with': 'RO:0002230', 'enhancer': 'SO:0000165', 'entities': 'void:entities', 'environmental_condition': 'XCO:0000000', 'environmental_system': 'ENVO:01000254', 'enzyme assay evidence': 'ECO:0000005', 'epithelial cell': 'CL:0000066', 'equivalent_class': 'owl:equivalentClass', 'ethnic_group': 'EFO:0001799', 'evidence': 'ECO:0000000', 'evidence used in automatic assertion': 'ECO:0000501', 'existence ends at point': 'RO:0002593', 'existence starts at point': 'RO:0002583', 'existence_ends_during': 'RO:0002492', 'existence_starts_during': 'RO:0002488', 'experimental evidence': 'ECO:0000006', 'experimental evidence used in manual assertion': 'ECO:0000269', 'experimental phenotypic evidence': 'ECO:0000059', 'experimental_result_region': 'SO:0000703', 'expressed in': 'RO:0002206', 'expression evidence used in manual assertion': 'ECO:0000270', 'expression pattern evidence': 'ECO:0000008', 'extrinsic_genotype': 'GENO:0000524', 'family': 'PCO:0000020', 'far-Western blotting evidence': 'ECO:0000076', 'feature_fusion': 'SO:0001882', 'female': 'PATO:0000383', 'female intrinsic genotype': 'GENO:0000647', 'fibroblast': 'CL:0000057', 'fold change': 'STATO:0000169', 'format': 'dc:format', 'frameshift_variant': 'SO:0001589', 'frequency': ':frequencyOfPhenotype', 'functional complementation evidence': 'ECO:0000012', 'fusion': 'SO:0000806', 'gain_of_function_variant': 'SO:0002053', 'gene': 'SO:0000704', 'gene product of': 'RO:0002204', 'gene_family': 'EDAM-DATA:3148', 'gene_product': 'CHEBI:33695', 'gene_segment': 'SO:3000000', 'gene_variant': 'SO:0001564', 'generalized least squares estimation': 'STATO:0000372', 'genetic interaction evidence': 'ECO:0000011', 'genetic interaction evidence used in manual assertion': 'ECO:0000316', 'genetic_marker': 'SO:0001645', 'genetically interacts with': 'RO:0002435', 'genome': 'SO:0001026', 'genomic context evidence': 'ECO:0000177', 'genomic_background': 'GENO:0000611', 'genomic_variation_complement': 'GENO:0000009', 'genomically_imprinted': 'SO:0000134', 'germline': 'GENO:0000900', 'gneg': 'GENO:0000620', 'gpos': 'GENO:0000619', 'gpos100': 'GENO:0000622', 'gpos25': 'GENO:0000625', 'gpos33': 'GENO:0000633', 'gpos50': 'GENO:0000624', 'gpos66': 'GENO:0000632', 'gpos75': 'GENO:0000623', 'gvar': 'GENO:0000621', 'haplotype': 'SO:0001024', 'has disposition': 'RO:0000091', 'has evidence': 'RO:0002558', 'has gene product': 'RO:0002205', 'has measurement value': 'IAO:0000004', 'has member': 'RO:0002351', 'has phenotype': 'RO:0002200', 'has specified numeric value': 'OBI:0001937', 'has subsequence': 'RO:0002524', 'has_affected_feature': 'GENO:0000418', 'has_agent': 'SEPIO:0000017', 'has_allelic_requirement': ':has_allelic_requirement', 'has_author': 'ERO:0000232', 'has_begin_stage_qualifier': 'GENO:0000630', 'has_cell_origin': ':has_cell_origin', 'has_drug_response': ':has_drug_response', 'has_end_stage_qualifier': 'GENO:0000631', 'has_evidence_item': 'SEPIO:0000084', 'has_evidence_line': 'SEPIO:0000006', 'has_exact_synonym': 'oboInOwl:hasExactSynonym', 'has_extent': 'GENO:0000678', 'has_functional_consequence': ':has_functional_consequence', 'has_genotype': 'GENO:0000222', 'has_input': 'RO:0002233', 'has_member_with_allelotype': 'GENO:0000225', 'has_molecular_consequence': ':has_molecular_consequence', 'has_origin': 'GENO:0000643', 'has_part': 'BFO:0000051', 'has_participant': 'RO:0000057', 'has_provenance': 'SEPIO:0000011', 'has_qualifier': 'GENO:0000580', 'has_quality': 'RO:0000086', 'has_quantifier': 'GENO:0000866', 'has_reference_part': 'GENO:0000385', 'has_related_synonym': 'oboInOwl:hasRelatedSynonym', 'has_sequence_attribute': 'GENO:0000207', 'has_sex_agnostic_part': 'GENO:0000650', 'has_sex_specificty': ':has_sex_specificity', 'has_supporting_activity': 'SEPIO:0000085', 'has_supporting_evidence_line': 'SEPIO:0000007', 'has_supporting_reference': 'SEPIO:0000124', 'has_uncertain_significance_for_condition': 'GENO:0000845', 'has_url': 'ERO:0000480', 'has_value': 'STATO:0000129', 'has_variant_part': 'GENO:0000382', 'has_zygosity': 'GENO:0000608', 'hematological system disease': 'MONDO:0005570', 'hemizygous': 'GENO:0000134', 'hemizygous insertion-linked': 'GENO:0000606', 'hemizygous-x': 'GENO:0000605', 'hemizygous-y': 'GENO:0000604', 'hereditary optic atrophy': 'MONDO:0043878', 'hereditary retinal dystrophy': 'MONDO:0004589', 'heritable_phenotypic_marker': 'SO:0001500', 'heteroplasmic': 'GENO:0000603', 'heterozygous': 'GENO:0000135', 'homoplasmic': 'GENO:0000602', 'homozygous': 'GENO:0000136', 'hybrid': 'NCBITaxon:37965', 'identifier': 'dc:identifier', 'imaging assay evidence': 'ECO:0000324', 'immortal kidney-derived cell line cell': 'CLO:0000220', 'immunoglobulin_gene': 'SO:0002122', 'immunoglobulin_pseudogene': 'SO:0002098', 'immunoprecipitation evidence': 'ECO:0000085', 'imported automatically asserted information used in automatic assertion': 'ECO:0000323', 'imported information': 'ECO:0000311', 'imported manually asserted information used in automatic assertion': 'ECO:0000322', 'in 1 to 1 homology relationship with': 'RO:HOM0000019', 'in 1 to 1 orthology relationship with': 'RO:HOM0000020', 'in in-paralogy relationship with': 'RO:HOM0000023', 'in ohnology relationship with': 'RO:HOM0000022', 'in orthology relationship with': 'RO:HOM0000017', 'in paralogy relationship with': 'RO:HOM0000011', 'in taxon': 'RO:0002162', 'in vitro': 'SEPIO:0000073', 'in vivo': 'SEPIO:0000074', 'in xenology relationship with': 'RO:HOM0000018', 'inborn errors of metabolism': 'MONDO:0019052', 'inchi_key': 'CHEBI:InChIKey', 'increased_gene_dosage': ':increased_gene_dosage', 'indel': 'SO:1000032', 'indeterminate': 'GENO:0000137', 'induced pluripotent stem cell': 'EFO:0004905', 'inference by association of genotype from phenotype': 'ECO:0005613', 'inference from background scientific knowledge': 'ECO:0000001', 'inference from background scientific knowledge used in manual assertion': 'ECO:0000306', 'inference from experimental data evidence': 'ECO:0005611', 'inference from phenotype manipulation evidence': 'ECO:0005612', 'inframe_deletion': 'SO:0001822', 'inframe_insertion': 'SO:0001821', 'inframe_variant': 'SO:0001650', 'inherited bleeding disorder, platelet-type': 'MONDO:0000009', 'insertion': 'SO:0000667', 'integration_excision_site': 'SO:0000946', 'integumentary system disease': 'MONDO:0002051', 'interacts with': 'RO:0002434', 'intergenic_variant': 'SO:0001628', 'intrinsic genotype': 'GENO:0000719', 'intron_variant': 'SO:0001627', 'inversion': 'SO:1000036', 'involved in': 'RO:0002331', 'is causal gain of function germline mutation of in': 'RO:0004011', 'is causal germline mutation in': 'RO:0004013', 'is causal germline mutation partially giving rise to': 'RO:0004016', 'is causal loss of function germline mutation of in': 'RO:0004012', 'is causal somatic mutation in': 'RO:0004014', 'is causal susceptibility factor for': 'RO:0004015', 'is downstream of sequence of': 'RO:0002529', 'is marker for': 'RO:0002607', 'is model of': 'RO:0003301', 'is subsequence of': 'RO:0002525', 'is substance that treats': 'RO:0002606', 'is upstream of sequence of': 'RO:0002528', 'isVersionOf': 'dc:isVersionOf', 'is_about': 'IAO:0000136', 'is_allele_of': 'GENO:0000408', 'is_allelotype_of': 'GENO:0000206', 'is_anonymous': 'MONARCH:anonymous', 'is_asserted_in': 'SEPIO:0000015', 'is_assertion_supported_by_evidence': 'SEPIO:0000111', 'is_consistent_with': 'SEPIO:0000099', 'is_equilavent_to': 'SEPIO:0000098', 'is_evidence_for': 'SEPIO:0000031', 'is_evidence_supported_by': 'RO:0002614', 'is_evidence_with_support_from': 'SEPIO:0000059', 'is_expression_variant_of': 'GENO:0000443', 'is_inconsistent_with': 'SEPIO:0000126', 'is_mutant_of': 'GENO:0000440', 'is_reference_allele_of': 'GENO:0000610', 'is_refuting_evidence_for': 'SEPIO:0000033', 'is_specified_by': 'SEPIO:0000041', 'is_supporting_evidence_for': 'SEPIO:0000032', 'is_targeted_by': 'GENO:0000634', 'is_transgene_variant_of': 'GENO:0000444', 'journal article': 'IAO:0000013', 'karyotype_variation_complement': 'GENO:0000644', 'keratinocyte': 'CL:0000312', 'label': 'rdfs:label', 'large_subunit_rRNA': 'SO:0000651', 'license': 'dc:license', 'likely_benign_for_condition': 'GENO:0000844', 'likely_pathogenic_for_condition': 'GENO:0000841', 'lincRNA_gene': 'SO:0001641', 'linear mixed model': 'STATO:0000464', 'literature only': 'SEPIO:0000080', 'lncRNA_gene': 'SO:0002127', 'lnc_RNA': 'SO:0001877', 'location': 'faldo:location', 'long_chromosome_arm': 'GENO:0000629', 'loss_of_function_variant': 'SO:0002054', 'lysosomal storage disease': 'MONDO:0002561', 'male': 'PATO:0000384', 'male intrinsic genotype': 'GENO:0000646', 'match to sequence model evidence': 'ECO:0000202', 'mature_miRNA_variant': 'SO:0001620', 'mature_transcript': 'SO:0000233', 'measurement datum': 'IAO:0000109', 'measures_parameter': 'SEPIO:0000114', 'melanocyte': 'CL:0000148', 'member of': 'RO:0002350', 'mentions': 'IAO:0000142', 'mesenchymal stem cell of adipose': 'CL:0002570', 'mesothelial': 'CL:0000077', 'metabolic disease': 'MONDO:0005066', 'miRNA_gene': 'SO:0001265', 'microsatellite': 'SO:0000289', 'minisatellite': 'SO:0000643', 'minus_strand': 'faldo:MinusStrandPosition', 'missense_variant': 'SO:0001583', 'mitochondrial_inheritance': 'HP:0001427', 'mixed effect model': 'STATO:0000189', 'mobile_element_insertion': 'SO:0001837', 'molecular entity': 'CHEBI:23367', 'molecularly_interacts_with': 'RO:0002436', 'monoallelic': ':monoallelic', 'mosaic': ':mosaic_genotype', 'mt_rRNA': 'SO:0002128', 'mt_tRNA': 'SO:0002129', 'mutant phenotype evidence': 'ECO:0000015', 'mutant phenotype evidence used in manual assertion': 'ECO:0000315', 'myoblast': 'CL:0000056', 'named_individual': 'owl:NamedIndividual', 'ncRNA': 'SO:0000655', 'ncRNA_gene': 'SO:0001263', 'negatively_regulates': 'RO:0003002', 'no biological data found': 'ECO:0000035', 'non_coding_exon_variant': 'SO:0001792', 'non_coding_transcript_exon_variant': 'SO:0001619', 'normal': 'PATO:0000461', 'novel_sequence_insertion': 'SO:0001838', 'object_property': 'owl:ObjectProperty', 'obsolete': 'HP:0031859', 'occurs_in': 'BFO:0000066', 'odds_ratio': 'STATO:0000182', 'on_property': 'owl:onProperty', 'oncocyte': 'CL:0002198', 'onset': ':onset', 'ontology': 'owl:Ontology', 'organization': 'foaf:organization', 'output_of': 'RO:0002353', 'p-value': 'OBI:0000175', 'page': 'foaf:page', 'part_of': 'BFO:0000050', 'part_of_contiguous_gene_duplication': ':part_of_contiguous_gene_duplication', 'pathogenic_for_condition': 'GENO:0000840', 'pathway': 'PW:0000001', 'person': 'foaf:Person', 'phenotype': 'UPHENO:0001001', 'phenotyping only': 'SEPIO:0000186', 'photograph': 'IAO:0000185', 'phylogenetic determination of loss of key residues evidence used in manual assertion': 'ECO:0000320', 'phylogenetic evidence': 'ECO:0000080', 'physical interaction evidence used in manual assertion': 'ECO:0000353', 'piRNA_gene': 'SO:0001638', 'platelet disorder, undefined': 'MONDO:0008254', 'plus_strand': 'faldo:PlusStrandPosition', 'point_mutation': 'SO:1000008', 'polymorphic_pseudogene': 'SO:0001841', 'polypeptide': 'SO:0000104', 'population': 'PCO:0000001', 'position': 'faldo:position', 'positively_regulates': 'RO:0003003', 'precursor cell': 'CL:0011115', 'probabalistic_quantifier': 'GENO:0000867', 'processed_pseudogene': 'SO:0000043', 'processed_transcript': 'SO:0001503', 'project': 'VIVO:Project', 'promoter': 'SO:0000167', 'properties': 'void:properties', 'proportional_reporting_ratio': 'OAE:0001563', 'protective_for_condition': 'RO:0003307', 'protein_altering_variant': 'SO:0001818', 'protein_coding_gene': 'SO:0001217', 'pseudogene': 'SO:0000336', 'pseudogenic_gene_segment': 'SO:0001741', 'pseudogenic_region': 'SO:0000462', 'publication': 'IAO:0000311', 'quantitative trait analysis evidence': 'ECO:0000061', 'rRNA_gene': 'SO:0001637', 'race': 'SIO:001015', 'read': 'SO:0000150', 'reagent': 'ERO:0000006', 'reagent_targeted_gene': 'GENO:0000504', 'recessive inheritance': 'GENO:0000148', 'reciprocal_chromosomal_translocation': 'SO:1000048', 'reference': 'faldo:reference', 'reference population': 'SEPIO:0000102', 'reference_genome': 'SO:0001505', 'reference_locus': 'GENO:0000036', 'region': 'SO:0000001', 'regulates': 'RO:0002448', 'regulatory_region_variant': 'SO:0001566', 'regulatory_transgene_feature': 'GENO:0000637', 'research': 'SEPIO:0000066', 'restriction': 'owl:Restriction', 'retinal disease': 'MONDO:0005283', 'retrieved_on': 'pav:retrievedOn', 'retrotransposon': 'SO:0000180', 'ribozyme': 'SO:0000374', 'ribozyme_gene': 'SO:0002181', 'rights': 'dc:rights', 'same_as': 'owl:sameAs', 'sampling_time': 'EFO:0000689', 'scRNA_gene': 'SO:0001266', 'scaRNA': 'SO:0002095', 'score': 'SO:0001685', 'semi-dominant inheritance': 'GENO:0000145', 'sense_intronic': 'SO:0002131', 'sense_intronic_ncRNA_gene': 'SO:0002184', 'sense_overlap_ncRNA_gene': 'SO:0002183', 'sense_overlapping': 'SO:0002132', 'sequence alignment evidence': 'ECO:0000200', 'sequence orthology evidence': 'ECO:0000201', 'sequence similarity evidence used in manual assertion': 'ECO:0000250', 'sequence_alteration': 'SO:0001059', 'sequence_derives_from': 'GENO:0000639', 'sequence_feature': 'SO:0000110', 'sequence_variant': 'SO:0001060', 'sequence_variant_affecting_polypeptide_function': 'SO:1000117', 'sequence_variant_causing_gain_of_function_of_polypeptide': 'SO:1000125', 'sequence_variant_causing_inactive_catalytic_site': 'SO:1000120', 'sequence_variant_causing_loss_of_function_of_polypeptide': 'SO:1000118', 'sequencing assay evidence': 'ECO:0000220', 'sex_qualified_genotype': 'GENO:0000645', 'short repeat': 'GENO:0000846', 'signal_transduction': 'GO:0007165', 'simple heterozygous': 'GENO:0000458', 'small cytoplasmic RNA': 'SO:0000013', 'smooth muscle cell': 'CL:0000192', 'snRNA_gene': 'SO:0001268', 'snoRNA_gene': 'SO:0001267', 'somatic': 'GENO:0000882', 'some_values_from': 'owl:someValuesFrom', 'splice_acceptor_variant': 'SO:0001574', 'splice_donor_variant': 'SO:0001575', 'splice_region_variant': 'SO:0001630', 'stalk': 'GENO:0000628', 'start_lost': 'SO:0002012', 'starts during': 'RO:0002091', 'starts_with': 'RO:0002224', 'statistical model': 'STATO:0000107', 'statistical_hypothesis_test': 'OBI:0000673', 'stem cell': 'CL:0000034', 'stop codon readthrough': 'SO:0000883', 'stop_gained': 'SO:0001587', 'stop_lost': 'SO:0001578', 'strain': 'EFO:0005135', 'strongly_contradicts': 'SEPIO:0000100', 'structural_alteration': 'SO:0001785', 'study': 'OBI:0000471', 'subPropertyOf': 'rdfs:subPropertyOf', 'subclass_of': 'rdfs:subClassOf', 'substitution': 'SO:1000002', 'synonymous_variant': 'SO:0001819', 'tRNA_gene': 'SO:0001272', 'tandem_duplication': 'SO:1000173', 'targeted_gene_complement': 'GENO:0000527', 'targeted_gene_subregion': 'GENO:0000534', 'targets_gene': 'GENO:0000414', 'telomerase_RNA_gene': 'SO:0001643', 'telomere': 'SO:0000624', 'term replaced by': 'IAO:0100001', 'title': 'dc:title', 'towards': 'RO:0002503', 'traceable author statement': 'ECO:0000033', 'transcribed_processed_pseudogene': 'SO:0002109', 'transcribed_unitary_pseudogene': 'SO:0002108', 'transcribed_unprocessed_pseudogene': 'SO:0002107', 'transcriptional_cis_regulatory_region': 'SO:0001055', 'transgene': 'SO:0000902', 'transgenic': 'SO:0000781', 'transgenic_insertion': 'SO:0001218', 'transgenic_transposable_element': 'SO:0000796', 'translated_unprocessed_pseudogene': 'SO:0002106', 'translates_to': 'RO:0002513', 'translocation': 'SO:0000199', 'transposable_element': 'SO:0000101', 'transposable_element_gene': 'SO:0000111', 'transposable_element_pseudogene': 'SO:0001897', 'triples': 'void:triples', 'type': 'rdf:type', 'ubiquitinates': 'RO:0002480', 'unitary_pseudogene': 'SO:0001759', 'unprocessed_pseudogene': 'SO:0001760', 'unspecified': 'GENO:0000772', 'unspecified_genomic_background': 'GENO:0000649', 'upstream_gene_variant': 'SO:0001636', 'variant single locus complement': 'GENO:0000030', 'variant_locus': 'GENO:0000002', 'vaultRNA_primary_transcript': 'SO:0002040', 'vault_RNA': 'SO:0000404', 'version': 'pav:version', 'version_info': 'owl:versionInfo', 'version_iri': 'owl:versionIRI', 'vertebrate_immunoglobulin_T_cell_receptor_segment': 'SO:0000460', 'web page': 'SIO:000302', 'wild_type': 'SO:0000817', 'wildtype': 'GENO:0000511', 'x-ray crystallography evidence': 'ECO:0001823', 'x_linked_dominant': 'HP:0001423', 'yeast 2-hybrid evidence': 'ECO:0000068', 'zscore': 'STATO:0000104'}
serialize(destination=None, format='turtle', base=None, encoding=None)

Serialize the Graph to destination

If destination is None serialize method returns the serialization as bytes or string.

If encoding is None and destination is None, returns a string If encoding is set, and Destination is None, returns bytes

Format defaults to turtle.

Format support can be extended with plugins, but “xml”, “n3”, “turtle”, “nt”, “pretty-xml”, “trix”, “trig” and “nquads” are built in.

skolemizeBlankNode(curie)
dipper.graph.StreamedGraph module
class dipper.graph.StreamedGraph.StreamedGraph(are_bnodes_skized=True, identifier=None, file_handle=None, fmt='nt')

Bases: dipper.graph.Graph.Graph

Stream rdf triples to file or stdout Assumes a downstream process will sort then uniquify triples

Theoretically could support both ntriple, rdfxml formats, for now just support nt

addTriple(subject_id, predicate_id, obj, object_is_literal=None, literal_type=None, subject_category=None, object_category=None)
curie_map = {'': 'https://monarchinitiative.org/', 'APB': 'http://pb.apf.edu.au/phenbank/strain.html?id=', 'APO': 'http://purl.obolibrary.org/obo/APO_', 'AQTLPub': 'https://www.animalgenome.org/cgi-bin/QTLdb/BT/qabstract?PUBMED_ID=', 'AQTLTrait': 'http://identifiers.org/animalqtltrait/', 'AspGD': 'http://www.aspergillusgenome.org/cgi-bin/locus.pl?dbid=', 'AspGD_REF': 'http://www.aspergillusgenome.org/cgi-bin/reference/reference.pl?dbid=', 'BFO': 'http://purl.obolibrary.org/obo/BFO_', 'BGD': 'http://bovinegenome.org/genepages/btau40/genes/', 'BIOGRID': 'http://thebiogrid.org/', 'BNODE': 'https://monarchinitiative.org/.well-known/genid/', 'BT': 'http://c.biothings.io/#', 'CCDS': 'http://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi?REQUEST=CCDS&DATA=', 'CGNC': 'http://birdgenenames.org/cgnc/GeneReport?id=', 'CHEBI': 'http://purl.obolibrary.org/obo/CHEBI_', 'CHR': 'http://purl.obolibrary.org/obo/CHR_', 'CID': 'http://pubchem.ncbi.nlm.nih.gov/compound/', 'CL': 'http://purl.obolibrary.org/obo/CL_', 'CLO': 'http://purl.obolibrary.org/obo/CLO_', 'CMMR': 'http://www.cmmr.ca/order.php?t=m&id=', 'CMO': 'http://purl.obolibrary.org/obo/CMO_', 'COHD': 'http://purl.obolibrary.org/obo/COHD_', 'COSMIC': 'http://cancer.sanger.ac.uk/cosmic/mutation/overview?id=', 'ClinVar': 'http://www.ncbi.nlm.nih.gov/clinvar/', 'ClinVarSubmitters': 'http://www.ncbi.nlm.nih.gov/clinvar/submitters/', 'ClinVarVariant': 'http://www.ncbi.nlm.nih.gov/clinvar/variation/', 'ComplexPortal': 'https://www.ebi.ac.uk/complexportal/complex/', 'Coriell': 'https://catalog.coriell.org/0/Sections/Search/Sample_Detail.aspx?Ref=', 'CoriellCollection': 'https://catalog.coriell.org/1/', 'CoriellFamily': 'https://catalog.coriell.org/0/Sections/BrowseCatalog/FamilyTypeSubDetail.aspx?fam=', 'CoriellIndividual': 'https://catalog.coriell.org/Search?q=', 'DC_CL': 'http://purl.obolibrary.org/obo/DC_CL', 'DECIPHER': 'https://decipher.sanger.ac.uk/syndrome/', 'DOI': 'http://dx.doi.org/', 'DOID': 'http://purl.obolibrary.org/obo/DOID_', 'DrugBank': 'http://www.drugbank.ca/drugs/', 'EC': 'https://www.enzyme-database.org/query.php?ec=', 'ECO': 'http://purl.obolibrary.org/obo/ECO_', 'EDAM-DATA': 'http://edamontology.org/data_', 'EFO': 'http://www.ebi.ac.uk/efo/EFO_', 'EMAPA': 'http://purl.obolibrary.org/obo/EMAPA_', 'EMMA': 'https://www.infrafrontier.eu/search?keyword=EM:', 'ENSEMBL': 'http://ensembl.org/id/', 'ENVO': 'http://purl.obolibrary.org/obo/ENVO_', 'EOM': 'https://elementsofmorphology.nih.gov/index.cgi?tid=', 'EOM_IMG': 'https://elementsofmorphology.nih.gov/images/terms/', 'ERO': 'http://purl.obolibrary.org/obo/ERO_', 'EcoGene': 'http://ecogene.org/gene/', 'EnsemblGenome': 'http://www.ensemblgenomes.org/id/', 'FBbt': 'http://purl.obolibrary.org/obo/FBbt_', 'FBcv': 'http://purl.obolibrary.org/obo/FBcv_', 'FBdv': 'http://purl.obolibrary.org/obo/FBdv_', 'FDADrug': 'http://www.fda.gov/Drugs/InformationOnDrugs/', 'FlyBase': 'http://flybase.org/reports/', 'GARD': 'http://purl.obolibrary.org/obo/GARD_', 'GENO': 'http://purl.obolibrary.org/obo/GENO_', 'GINAS': 'http://tripod.nih.gov/ginas/app/substance#', 'GO': 'http://purl.obolibrary.org/obo/GO_', 'GO_REF': 'http://www.geneontology.org/cgi-bin/references.cgi#GO_REF:', 'GWAS': 'https://www.ebi.ac.uk/gwas/variants/', 'GenBank': 'http://www.ncbi.nlm.nih.gov/nuccore/', 'Genatlas': 'http://genatlas.medecine.univ-paris5.fr/fiche.php?symbol=', 'GeneReviews': 'http://www.ncbi.nlm.nih.gov/books/', 'HGMD': 'http://www.hgmd.cf.ac.uk/ac/gene.php?gene=', 'HGNC': 'https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/HGNC:', 'HMDB': 'http://www.hmdb.ca/metabolites/', 'HOMOLOGENE': 'http://www.ncbi.nlm.nih.gov/homologene/', 'HP': 'http://purl.obolibrary.org/obo/HP_', 'HPO': 'http://human-phenotype-ontology.org/', 'HPRD': 'http://www.hprd.org/protein/', 'IAO': 'http://purl.obolibrary.org/obo/IAO_', 'ICD9': 'http://purl.obolibrary.org/obo/ICD9_', 'IMPC': 'https://www.mousephenotype.org/data/genes/', 'IMPC-param': 'https://www.mousephenotype.org/impress/OntologyInfo?action=list&procID=', 'IMPC-pipe': 'https://www.mousephenotype.org/impress/PipelineInfo?id=', 'IMPC-proc': 'https://www.mousephenotype.org/impress/ProcedureInfo?action=list&procID=', 'ISBN': 'https://monarchinitiative.org/ISBN_', 'ISBN-10': 'https://monarchinitiative.org/ISBN10_', 'ISBN-13': 'https://monarchinitiative.org/ISBN13_', 'IUPHAR': 'http://www.guidetopharmacology.org/GRAC/ObjectDisplayForward?objectId=', 'InterPro': 'https://www.ebi.ac.uk/interpro/entry/InterPro/', 'J': 'http://www.informatics.jax.org/reference/J:', 'JAX': 'http://jaxmice.jax.org/strain/', 'KEGG-ds': 'http://purl.obolibrary.org/KEGG-ds_', 'KEGG-hsa': 'http://www.kegg.jp/dbget-bin/www_bget?hsa:', 'KEGG-img': 'http://www.genome.jp/kegg/pathway/map/', 'KEGG-ko': 'http://www.kegg.jp/dbget-bin/www_bget?ko:', 'KEGG-path': 'http://www.kegg.jp/dbget-bin/www_bget?path:', 'LIDA': 'http://sydney.edu.au/vetscience/lida/dogs/search/disorder/', 'LPT': 'http://purl.obolibrary.org/obo/LPT_', 'MA': 'http://purl.obolibrary.org/obo/MA_', 'MEDDRA': 'http://purl.bioontology.org/ontology/MEDDRA/', 'MESH': 'http://id.nlm.nih.gov/mesh/', 'MGI': 'http://www.informatics.jax.org/accession/MGI:', 'MMRRC': 'https://www.mmrrc.org/catalog/sds.php?mmrrc_id=', 'MONARCH': 'https://monarchinitiative.org/MONARCH_', 'MONDO': 'http://purl.obolibrary.org/obo/MONDO_', 'MP': 'http://purl.obolibrary.org/obo/MP_', 'MPATH': 'http://purl.obolibrary.org/obo/MPATH_', 'MPD': 'https://phenome.jax.org/', 'MPD-assay': 'https://phenome.jax.org/db/qp?rtn=views/catlines&keymeas=', 'MPD-strain': 'http://phenome.jax.org/db/q?rtn=strains/details&strainid=', 'MUGEN': 'http://bioit.fleming.gr/mugen/Controller?workflow=ViewModel&expand_all=true&name_begins=model.block&eid=', 'MedGen': 'http://www.ncbi.nlm.nih.gov/medgen/', 'MonarchArchive': 'https://archive.monarchinitiative.org/', 'MonarchData': 'https://data.monarchinitiative.org/ttl/', 'MonarchLogoRepo': 'https://github.com/monarch-initiative/monarch-ui/blob/master/public/img/sources/', 'NBO': 'http://purl.obolibrary.org/obo/NBO_', 'NCBIAssembly': 'https://www.ncbi.nlm.nih.gov/assembly?term=', 'NCBIBSgene': 'http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=gene&part=', 'NCBIGene': 'https://www.ncbi.nlm.nih.gov/gene/', 'NCBIGenome': 'https://www.ncbi.nlm.nih.gov/genome/', 'NCBIProtein': 'http://www.ncbi.nlm.nih.gov/protein/', 'NCBITaxon': 'http://purl.obolibrary.org/obo/NCBITaxon_', 'NCIMR': 'https://mouse.ncifcrf.gov/available_details.asp?ID=', 'NCIT': 'http://purl.obolibrary.org/obo/NCIT_', 'OAE': 'http://purl.obolibrary.org/obo/OAE_', 'OBA': 'http://purl.obolibrary.org/obo/OBA_', 'OBAN': 'http://purl.org/oban/', 'OBI': 'http://purl.obolibrary.org/obo/OBI_', 'OBO': 'http://purl.obolibrary.org/obo/', 'OMIA': 'https://omia.org/OMIA', 'OMIA-breed': 'https://monarchinitiative.org/model/OMIA-breed:', 'OMIM': 'http://omim.org/entry/', 'OMIMPS': 'http://www.omim.org/phenotypicSeries/', 'ORPHA': 'http://www.orpha.net/ORDO/Orphanet_', 'PAINT_REF': 'http://www.geneontology.org/gene-associations/submission/paint/', 'PANTHER': 'http://www.pantherdb.org/panther/family.do?clsAccession=', 'PATO': 'http://purl.obolibrary.org/obo/PATO_', 'PCO': 'http://purl.obolibrary.org/obo/PCO_', 'PDB': 'http://www.ebi.ac.uk/pdbsum/', 'PMCID': 'http://www.ncbi.nlm.nih.gov/pmc/', 'PMID': 'http://www.ncbi.nlm.nih.gov/pubmed/', 'PR': 'http://purl.obolibrary.org/obo/PR_', 'PW': 'http://purl.obolibrary.org/obo/PW_', 'PomBase': 'https://www.pombase.org/spombe/result/', 'RBRC': 'http://www2.brc.riken.jp/lab/animal/detail.php?brc_no=', 'REACT': 'http://www.reactome.org/PathwayBrowser/#/', 'RGD': 'http://rgd.mcw.edu/rgdweb/report/gene/main.html?id=', 'RGDRef': 'http://rgd.mcw.edu/rgdweb/report/reference/main.html?id=', 'RO': 'http://purl.obolibrary.org/obo/RO_', 'RXCUI': 'http://purl.bioontology.org/ontology/RXNORM/', 'RefSeq': 'http://www.ncbi.nlm.nih.gov/refseq/?term=', 'SCTID': 'http://purl.obolibrary.org/obo/SCTID_', 'SEPIO': 'http://purl.obolibrary.org/obo/SEPIO_', 'SGD': 'https://www.yeastgenome.org/locus/', 'SGD_REF': 'https://www.yeastgenome.org/reference/', 'SIO': 'http://semanticscience.org/resource/SIO_', 'SMPDB': 'http://smpdb.ca/view/', 'SNOMED': 'http://purl.obolibrary.org/obo/SNOMED_', 'SO': 'http://purl.obolibrary.org/obo/SO_', 'STATO': 'http://purl.obolibrary.org/obo/STATO_', 'SwissProt': 'http://identifiers.org/SwissProt:', 'TAIR': 'https://www.arabidopsis.org/servlets/TairObject?type=locus&id=', 'TrEMBL': 'http://purl.uniprot.org/uniprot/', 'UBERON': 'http://purl.obolibrary.org/obo/UBERON_', 'UCSC': 'ftp://hgdownload.cse.ucsc.edu/goldenPath/', 'UCSCBuild': 'http://genome.ucsc.edu/cgi-bin/hgGateway?db=', 'UMLS': 'http://linkedlifedata.com/resource/umls/id/', 'UNII': 'http://fdasis.nlm.nih.gov/srs/unii/', 'UO': 'http://purl.obolibrary.org/obo/UO_', 'UPHENO': 'http://purl.obolibrary.org/obo/UPHENO_', 'UniProtKB': 'http://identifiers.org/uniprot/', 'VGNC': 'https://vertebrate.genenames.org/data/gene-symbol-report/#!/vgnc_id/', 'VIVO': 'http://vivoweb.org/ontology/core#', 'VT': 'http://purl.obolibrary.org/obo/VT_', 'WBPhenotype': 'http://purl.obolibrary.org/obo/WBPhenotype_', 'WBbt': 'http://purl.obolibrary.org/obo/WBbt_', 'WD_Entity': 'https://www.wikidata.org/wiki/', 'WD_Prop': 'https://www.wikidata.org/wiki/Property:', 'WormBase': 'https://www.wormbase.org/get?name=', 'XAO': 'http://purl.obolibrary.org/obo/XAO_', 'XCO': 'http://purl.obolibrary.org/obo/XCO_', 'XPO': 'http://purl.obolibrary.org/obo/XPO_', 'Xenbase': 'http://identifiers.org/xenbase/', 'ZFA': 'http://purl.obolibrary.org/obo/ZFA_', 'ZFIN': 'http://zfin.org/', 'ZFS': 'http://purl.obolibrary.org/obo/ZFS_', 'ZP': 'http://purl.obolibrary.org/obo/ZP_', 'biolink': 'https://w3id.org/biolink/vocab/', 'catfishQTL': 'https://www.animalgenome.org/cgi-bin/QTLdb/IP/qdetails?QTL_ID=', 'cattleQTL': 'https://www.animalgenome.org/cgi-bin/QTLdb/BT/qdetails?QTL_ID=', 'chickenQTL': 'https://www.animalgenome.org/cgi-bin/QTLdb/GG/qdetails?QTL_ID=', 'cito': 'http://purl.org/spar/cito/', 'dbSNP': 'http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=', 'dbSNPIndividual': 'http://www.ncbi.nlm.nih.gov/SNP/snp_ind.cgi?ind_id=', 'dbVar': 'http://www.ncbi.nlm.nih.gov/dbvar/', 'dc': 'http://purl.org/dc/terms/', 'dcat': 'http://www.w3.org/ns/dcat#', 'dctypes': 'http://purl.org/dc/dcmitype/', 'dictyBase': 'http://dictybase.org/gene/', 'faldo': 'http://biohackathon.org/resource/faldo#', 'foaf': 'http://xmlns.com/foaf/0.1/', 'horseQTL': 'https://www.animalgenome.org/cgi-bin/QTLdb/EC/qdetails?QTL_ID=', 'miRBase': 'http://www.mirbase.org/cgi-bin/mirna_entry.pl?acc=', 'oboInOwl': 'http://www.geneontology.org/formats/oboInOwl#', 'owl': 'http://www.w3.org/2002/07/owl#', 'pav': 'http://purl.org/pav/', 'pigQTL': 'https://www.animalgenome.org/cgi-bin/QTLdb/SS/qdetails?QTL_ID=', 'rainbow_troutQTL': 'https://www.animalgenome.org/cgi-bin/QTLdb/OM/qdetails?QTL_ID=', 'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#', 'rdfs': 'http://www.w3.org/2000/01/rdf-schema#', 'schema': 'http://schema.org/', 'sheepQTL': 'https://www.animalgenome.org/cgi-bin/QTLdb/OA/qdetails?QTL_ID=', 'skos': 'http://www.w3.org/2004/02/skos/core#', 'vfb': 'http://virtualflybrain.org/reports/', 'void': 'http://rdfs.org/ns/void#', 'xml': 'http://www.w3.org/XML/1998/namespace', 'xsd': 'http://www.w3.org/2001/XMLSchema#'}
curie_util = <dipper.utils.CurieUtil.CurieUtil object>
fhandle = <_io.TextIOWrapper name='/home/docs/checkouts/readthedocs.org/user_builds/dipper/checkouts/master/dipper/graph/../../translationtable/GLOBAL_TERMS.yaml' mode='r' encoding='UTF-8'>
globaltcid = {':activating': 'activating_mutation', ':all_missense_or_inframe': 'all_missense_or_inframe', ':biallelic': 'biallelic', ':does_not_have_phenotype': 'does_not_have_phenotype', ':frequencyOfPhenotype': 'frequency', ':has_allelic_requirement': 'has_allelic_requirement', ':has_cell_origin': 'has_cell_origin', ':has_drug_response': 'has_drug_response', ':has_functional_consequence': 'has_functional_consequence', ':has_molecular_consequence': 'has_molecular_consequence', ':has_sex_specificity': 'has_sex_specificty', ':increased_gene_dosage': 'increased_gene_dosage', ':monoallelic': 'monoallelic', ':mosaic_genotype': 'mosaic', ':onset': 'onset', ':part_of_contiguous_gene_duplication': 'part_of_contiguous_gene_duplication', 'BFO:0000050': 'part_of', 'BFO:0000051': 'has_part', 'BFO:0000066': 'occurs_in', 'CHEBI:23367': 'molecular entity', 'CHEBI:33695': 'gene_product', 'CHEBI:InChIKey': 'inchi_key', 'CL:0000000': 'cell', 'CL:0000034': 'stem cell', 'CL:0000056': 'myoblast', 'CL:0000057': 'fibroblast', 'CL:0000066': 'epithelial cell', 'CL:0000077': 'mesothelial', 'CL:0000084': 'T cell', 'CL:0000115': 'endothelial cell', 'CL:0000148': 'melanocyte', 'CL:0000192': 'smooth muscle cell', 'CL:0000236': 'B cell', 'CL:0000312': 'keratinocyte', 'CL:0002198': 'oncocyte', 'CL:0002323': 'amniocyte', 'CL:0002570': 'mesenchymal stem cell of adipose', 'CL:0011115': 'precursor cell', 'CLO:0000008': 'cell line repository', 'CLO:0000031': 'cell line', 'CLO:0000220': 'immortal kidney-derived cell line cell', 'CLO:0036934': 'Adipose stromal cell', 'CLO:0036935': 'Amniotic fluid-derived cell line', 'CLO:0036938': 'Tumor-derived cell line', 'CLO:0036939': 'Microcell hybrid', 'CLO:0036940': 'Chorionic villus-derived cell line', 'ECO:0000000': 'evidence', 'ECO:0000001': 'inference from background scientific knowledge', 'ECO:0000005': 'enzyme assay evidence', 'ECO:0000006': 'experimental evidence', 'ECO:0000008': 'expression pattern evidence', 'ECO:0000011': 'genetic interaction evidence', 'ECO:0000012': 'functional complementation evidence', 'ECO:0000015': 'mutant phenotype evidence', 'ECO:0000033': 'traceable author statement', 'ECO:0000035': 'no biological data found', 'ECO:0000059': 'experimental phenotypic evidence', 'ECO:0000061': 'quantitative trait analysis evidence', 'ECO:0000068': 'yeast 2-hybrid evidence', 'ECO:0000076': 'far-Western blotting evidence', 'ECO:0000079': 'affinity chromatography evidence', 'ECO:0000080': 'phylogenetic evidence', 'ECO:0000085': 'immunoprecipitation evidence', 'ECO:0000172': 'biochemical trait analysis evidence', 'ECO:0000177': 'genomic context evidence', 'ECO:0000180': 'clinical study evidence', 'ECO:0000200': 'sequence alignment evidence', 'ECO:0000201': 'sequence orthology evidence', 'ECO:0000202': 'match to sequence model evidence', 'ECO:0000213': 'combinatorial evidence used in automatic assertion', 'ECO:0000214': 'biological aspect of descendant evidence', 'ECO:0000220': 'sequencing assay evidence', 'ECO:0000245': 'computational combinatorial evidence used in manual assertion', 'ECO:0000250': 'sequence similarity evidence used in manual assertion', 'ECO:0000269': 'experimental evidence used in manual assertion', 'ECO:0000270': 'expression evidence used in manual assertion', 'ECO:0000303': 'author statement without traceable support used in manual assertion', 'ECO:0000304': 'author statement supported by traceable reference used in manual assertion', 'ECO:0000305': 'curator inference used in manual assertion', 'ECO:0000306': 'inference from background scientific knowledge used in manual assertion', 'ECO:0000311': 'imported information', 'ECO:0000314': 'direct assay evidence used in manual assertion', 'ECO:0000315': 'mutant phenotype evidence used in manual assertion', 'ECO:0000316': 'genetic interaction evidence used in manual assertion', 'ECO:0000318': 'biological aspect of ancestor evidence used in manual assertion', 'ECO:0000320': 'phylogenetic determination of loss of key residues evidence used in manual assertion', 'ECO:0000322': 'imported manually asserted information used in automatic assertion', 'ECO:0000323': 'imported automatically asserted information used in automatic assertion', 'ECO:0000324': 'imaging assay evidence', 'ECO:0000353': 'physical interaction evidence used in manual assertion', 'ECO:0000501': 'evidence used in automatic assertion', 'ECO:0001016': 'blood test evidence', 'ECO:0001048': 'FRET', 'ECO:0001823': 'x-ray crystallography evidence', 'ECO:0005611': 'inference from experimental data evidence', 'ECO:0005612': 'inference from phenotype manipulation evidence', 'ECO:0005613': 'inference by association of genotype from phenotype', 'EDAM-DATA:3148': 'gene_family', 'EFO:0000246': 'age', 'EFO:0000689': 'sampling_time', 'EFO:0001799': 'ethnic_group', 'EFO:0003150': 'African American', 'EFO:0003152': 'Asian', 'EFO:0003153': 'Asian Indian', 'EFO:0003154': 'Asian/Pacific Islander', 'EFO:0003156': 'Caucasian', 'EFO:0003157': 'Chinese', 'EFO:0003158': 'Eastern Indian', 'EFO:0003160': 'Filipino', 'EFO:0003164': 'Japanese', 'EFO:0003165': 'Korean', 'EFO:0003169': 'Hispanic', 'EFO:0004561': 'African', 'EFO:0004905': 'induced pluripotent stem cell', 'EFO:0005135': 'strain', 'ENVO:01000254': 'environmental_system', 'ERO:0000006': 'reagent', 'ERO:0000232': 'has_author', 'ERO:0000480': 'has_url', 'ERO:0001984': 'Native American', 'ERO:0002002': 'embryonic stem cell line', 'ERO:0002071': 'Asian, Vietnamese', 'ERO:0002190': 'collection', 'GENO:0000002': 'variant_locus', 'GENO:0000009': 'genomic_variation_complement', 'GENO:0000030': 'variant single locus complement', 'GENO:0000036': 'reference_locus', 'GENO:0000134': 'hemizygous', 'GENO:0000135': 'heterozygous', 'GENO:0000136': 'homozygous', 'GENO:0000137': 'indeterminate', 'GENO:0000141': 'condition inheritance', 'GENO:0000143': 'co-dominant inheritance', 'GENO:0000144': 'complete dominant inheritance', 'GENO:0000145': 'semi-dominant inheritance', 'GENO:0000146': 'allosomal dominant inheritance', 'GENO:0000147': 'autosomal dominant iniheritance', 'GENO:0000148': 'recessive inheritance', 'GENO:0000149': 'allosomal recessive inheritance', 'GENO:0000150': 'autosomal recessive inheritance', 'GENO:0000206': 'is_allelotype_of', 'GENO:0000207': 'has_sequence_attribute', 'GENO:0000222': 'has_genotype', 'GENO:0000225': 'has_member_with_allelotype', 'GENO:0000382': 'has_variant_part', 'GENO:0000385': 'has_reference_part', 'GENO:0000402': 'compound heterozygous', 'GENO:0000408': 'is_allele_of', 'GENO:0000414': 'targets_gene', 'GENO:0000418': 'has_affected_feature', 'GENO:0000440': 'is_mutant_of', 'GENO:0000443': 'is_expression_variant_of', 'GENO:0000444': 'is_transgene_variant_of', 'GENO:0000458': 'simple heterozygous', 'GENO:0000504': 'reagent_targeted_gene', 'GENO:0000511': 'wildtype', 'GENO:0000512': 'allele', 'GENO:0000524': 'extrinsic_genotype', 'GENO:0000525': 'effective_genotype', 'GENO:0000527': 'targeted_gene_complement', 'GENO:0000534': 'targeted_gene_subregion', 'GENO:0000580': 'has_qualifier', 'GENO:0000602': 'homoplasmic', 'GENO:0000603': 'heteroplasmic', 'GENO:0000604': 'hemizygous-y', 'GENO:0000605': 'hemizygous-x', 'GENO:0000606': 'hemizygous insertion-linked', 'GENO:0000608': 'has_zygosity', 'GENO:0000610': 'is_reference_allele_of', 'GENO:0000611': 'genomic_background', 'GENO:0000614': 'chromosome_region', 'GENO:0000616': 'chromosome_subband', 'GENO:0000618': 'band_intensity', 'GENO:0000619': 'gpos', 'GENO:0000620': 'gneg', 'GENO:0000621': 'gvar', 'GENO:0000622': 'gpos100', 'GENO:0000623': 'gpos75', 'GENO:0000624': 'gpos50', 'GENO:0000625': 'gpos25', 'GENO:0000628': 'stalk', 'GENO:0000629': 'long_chromosome_arm', 'GENO:0000630': 'has_begin_stage_qualifier', 'GENO:0000631': 'has_end_stage_qualifier', 'GENO:0000632': 'gpos66', 'GENO:0000633': 'gpos33', 'GENO:0000634': 'is_targeted_by', 'GENO:0000637': 'regulatory_transgene_feature', 'GENO:0000638': 'coding_transgene_feature', 'GENO:0000639': 'sequence_derives_from', 'GENO:0000643': 'has_origin', 'GENO:0000644': 'karyotype_variation_complement', 'GENO:0000645': 'sex_qualified_genotype', 'GENO:0000646': 'male intrinsic genotype', 'GENO:0000647': 'female intrinsic genotype', 'GENO:0000649': 'unspecified_genomic_background', 'GENO:0000650': 'has_sex_agnostic_part', 'GENO:0000678': 'has_extent', 'GENO:0000719': 'intrinsic genotype', 'GENO:0000772': 'unspecified', 'GENO:0000840': 'pathogenic_for_condition', 'GENO:0000841': 'likely_pathogenic_for_condition', 'GENO:0000843': 'benign_for_condition', 'GENO:0000844': 'likely_benign_for_condition', 'GENO:0000845': 'has_uncertain_significance_for_condition', 'GENO:0000846': 'short repeat', 'GENO:0000866': 'has_quantifier', 'GENO:0000867': 'probabalistic_quantifier', 'GENO:0000882': 'somatic', 'GENO:0000900': 'germline', 'GO:0007165': 'signal_transduction', 'GO:0009987': 'cellular_process', 'GO:0032502': 'developmental_process', 'HP:0001423': 'x_linked_dominant', 'HP:0001427': 'mitochondrial_inheritance', 'HP:0010984': 'digenic_inheritance', 'HP:0031859': 'obsolete', 'IAO:0000004': 'has measurement value', 'IAO:0000013': 'journal article', 'IAO:0000100': 'data set', 'IAO:0000109': 'measurement datum', 'IAO:0000115': 'definition', 'IAO:0000136': 'is_about', 'IAO:0000142': 'mentions', 'IAO:0000185': 'photograph', 'IAO:0000310': 'document', 'IAO:0000311': 'publication', 'IAO:0000589': 'OBO foundry unique label', 'IAO:0100001': 'term replaced by', 'MESH:D004392': 'Dwarfism', 'MONARCH:anonymous': 'is_anonymous', 'MONARCH:cliqueLeader': 'clique_leader', 'MONDO:0000001': 'disease or disorder', 'MONDO:0000009': 'inherited bleeding disorder, platelet-type', 'MONDO:0002051': 'integumentary system disease', 'MONDO:0002561': 'lysosomal storage disease', 'MONDO:0004589': 'hereditary retinal dystrophy', 'MONDO:0004992': 'cancer', 'MONDO:0005066': 'metabolic disease', 'MONDO:0005283': 'retinal disease', 'MONDO:0005453': 'congenital heart disease', 'MONDO:0005570': 'hematological system disease', 'MONDO:0008254': 'platelet disorder, undefined', 'MONDO:0015993': 'Cone–rod dystrophy', 'MONDO:0019052': 'inborn errors of metabolism', 'MONDO:0019592': 'disorder of sex development', 'MONDO:0020145': 'developmental defect of the eye', 'MONDO:0043878': 'hereditary optic atrophy', 'MP:0008762': 'embryonic lethality', 'NCBITaxon:10029': 'Cricetulus griseus', 'NCBITaxon:10036': 'Mesocricetus auratus', 'NCBITaxon:10041': 'Peromyscus leucopus', 'NCBITaxon:10042': 'Peromyscus maniculatus', 'NCBITaxon:10088': 'Mus', 'NCBITaxon:10089': 'Mus caroli', 'NCBITaxon:10090': 'Mus musculus', 'NCBITaxon:10091': 'Mus musculus castaneus', 'NCBITaxon:10092': 'Mus musculus domesticus', 'NCBITaxon:10093': 'Mus pahari', 'NCBITaxon:10094': 'Mus saxicola', 'NCBITaxon:10096': 'Mus spretus', 'NCBITaxon:10097': 'Mus cervicolor', 'NCBITaxon:10098': 'Mus cookii', 'NCBITaxon:10101': 'Mus platythrix', 'NCBITaxon:10102': 'Mus setulosus', 'NCBITaxon:10103': 'Mus spicilegus', 'NCBITaxon:10105': 'Mus minutoides', 'NCBITaxon:10108': 'Mus abbotti', 'NCBITaxon:10114': 'Rattus', 'NCBITaxon:10116': 'Rattus norvegicus', 'NCBITaxon:10141': 'Cavia porcellus', 'NCBITaxon:116058': 'Mus musculus brevirostris', 'NCBITaxon:1266728': 'Mus musculus domesticus x M. m. molossinus', 'NCBITaxon:135827': 'Mus cervicolor cervicolor', 'NCBITaxon:135828': 'Mus cervicolor popaeus', 'NCBITaxon:13616': 'Monodelphis domestica', 'NCBITaxon:1385377': 'Mus musculus gansuensis', 'NCBITaxon:186193': 'Mus fragilicauda', 'NCBITaxon:186842': 'Mus musculus x Mus spretus', 'NCBITaxon:229288': 'Mus gratus', 'NCBITaxon:254704': 'Mus terricolor', 'NCBITaxon:270352': 'Mus macedonicus spretoides', 'NCBITaxon:270353': 'Mus macedonicus macedonicus', 'NCBITaxon:273921': 'Mus indutus', 'NCBITaxon:273922': 'Mus haussa', 'NCBITaxon:27681': 'Mus booduga', 'NCBITaxon:28377': 'Anolis carolinensis', 'NCBITaxon:30608': 'Microcebus murinus', 'NCBITaxon:31033': 'Takifugu rubripe', 'NCBITaxon:35531': 'Mus musculus bactrianus', 'NCBITaxon:3702': 'Arabidopsis thaliana', 'NCBITaxon:37965': 'hybrid', 'NCBITaxon:390847': 'Mus lepidoides', 'NCBITaxon:390848': 'Mus nitidulus', 'NCBITaxon:39442': 'Mus musculus musculus', 'NCBITaxon:397330': 'Mus tenellus', 'NCBITaxon:41269': 'Mus crociduroides', 'NCBITaxon:41270': 'Mus mattheyi', 'NCBITaxon:42413': 'Peromyscus polionotus', 'NCBITaxon:42520': 'Peromyscus californicus', 'NCBITaxon:44689': 'Dictyostelium discoideum', 'NCBITaxon:468371': 'Mus cypriacus', 'NCBITaxon:473865': 'Mus triton', 'NCBITaxon:477815': 'Mus musculus musculus x M. m. domesticus', 'NCBITaxon:477816': 'Mus musculus musculus x M. m. castaneus', 'NCBITaxon:4896': 'Schizosaccharomyces pombe', 'NCBITaxon:5052': 'Aspergillus', 'NCBITaxon:544437': 'Mus baoulei', 'NCBITaxon:54600': 'Macaca nigra', 'NCBITaxon:559292': 'Saccharomyces cerevisiae S288C', 'NCBITaxon:562': 'Escherichia coli', 'NCBITaxon:57486': 'Mus musculus molossinus', 'NCBITaxon:5782': 'Dictyostelium', 'NCBITaxon:6239': 'Caenorhabditis elegans', 'NCBITaxon:66189': 'Chelonoidis niger', 'NCBITaxon:7227': 'Drosophila melanogaster', 'NCBITaxon:78454': 'Saguinus labiatus', 'NCBITaxon:7955': 'Danio rerio', 'NCBITaxon:8022': 'Oncorhynchus mykiss', 'NCBITaxon:80274': 'Mus musculus gentilulus', 'NCBITaxon:8364': 'Xenopus (Silurana) tropicalis', 'NCBITaxon:83773': 'Mus famulus', 'NCBITaxon:862510': 'Nannomys', 'NCBITaxon:887131': 'Mus emesi', 'NCBITaxon:9031': 'Gallus gallus', 'NCBITaxon:9258': 'Ornithorhynchus anatinus', 'NCBITaxon:9315': 'Macropus eugenii', 'NCBITaxon:9365': 'Erinaceus europaeus', 'NCBITaxon:9487': 'Saguinus fuscicollis', 'NCBITaxon:9519': 'Lagothrix lagotricha', 'NCBITaxon:9521': 'Saimiri sciureus', 'NCBITaxon:9523': 'Callicebus moloch', 'NCBITaxon:9538': 'Erythrocebus patas', 'NCBITaxon:9541': 'Macaca fascicularis', 'NCBITaxon:9544': 'Macaca mulatta', 'NCBITaxon:9545': 'Macaca nemestrina', 'NCBITaxon:9555': 'Papio anubis', 'NCBITaxon:9593': 'Gorilla gorilla', 'NCBITaxon:9597': 'Pan paniscus', 'NCBITaxon:9598': 'Pan troglodytes', 'NCBITaxon:9600': 'Pongo pygmaeus', 'NCBITaxon:9606': 'Homo sapiens', 'NCBITaxon:9615': 'Canis lupus familiaris', 'NCBITaxon:9649': 'Ailurus fulgens', 'NCBITaxon:9685': 'Felis catus', 'NCBITaxon:9796': 'Equus caballus', 'NCBITaxon:9823': 'Sus scrofa', 'NCBITaxon:9825': 'Sus scrofa domestica', 'NCBITaxon:9888': 'Muntiacus muntjak', 'NCBITaxon:9913': 'Bos taurus', 'NCBITaxon:9925': 'Capra hircus', 'NCBITaxon:9940': 'Ovis aries', 'NCBITaxon:9986': 'Oryctolagus cuniculus', 'NCIT:C61040': 'Statistical Significance', 'NCIT:C63513': 'Manual', 'NCIT:C71458': 'Suspected', 'NCIT:C96621': 'Percent Change From Baseline', 'OAE:0001563': 'proportional_reporting_ratio', 'OBAN:association': 'association', 'OBAN:association_has_object': 'association has object', 'OBAN:association_has_predicate': 'association has predicate', 'OBAN:association_has_subject': 'association has subject', 'OBI:0000070': 'assay', 'OBI:0000175': 'p-value', 'OBI:0000471': 'study', 'OBI:0000673': 'statistical_hypothesis_test', 'OBI:0001937': 'has specified numeric value', 'PATO:0000383': 'female', 'PATO:0000384': 'male', 'PATO:0000460': 'abnormal', 'PATO:0000461': 'normal', 'PCO:0000001': 'population', 'PCO:0000020': 'family', 'PW:0000001': 'pathway', 'RO:0000057': 'has_participant', 'RO:0000086': 'has_quality', 'RO:0000091': 'has disposition', 'RO:0001000': 'derives_from', 'RO:0002091': 'starts during', 'RO:0002093': 'ends during', 'RO:0002162': 'in taxon', 'RO:0002200': 'has phenotype', 'RO:0002204': 'gene product of', 'RO:0002205': 'has gene product', 'RO:0002206': 'expressed in', 'RO:0002224': 'starts_with', 'RO:0002230': 'ends_with', 'RO:0002233': 'has_input', 'RO:0002325': 'colocalizes with', 'RO:0002326': 'contributes to', 'RO:0002327': 'enables', 'RO:0002331': 'involved in', 'RO:0002350': 'member of', 'RO:0002351': 'has member', 'RO:0002353': 'output_of', 'RO:0002418': 'causally upstream of or within', 'RO:0002434': 'interacts with', 'RO:0002435': 'genetically interacts with', 'RO:0002436': 'molecularly_interacts_with', 'RO:0002448': 'regulates', 'RO:0002480': 'ubiquitinates', 'RO:0002488': 'existence_starts_during', 'RO:0002492': 'existence_ends_during', 'RO:0002503': 'towards', 'RO:0002513': 'translates_to', 'RO:0002524': 'has subsequence', 'RO:0002525': 'is subsequence of', 'RO:0002528': 'is upstream of sequence of', 'RO:0002529': 'is downstream of sequence of', 'RO:0002558': 'has evidence', 'RO:0002566': 'causally_influences', 'RO:0002583': 'existence starts at point', 'RO:0002593': 'existence ends at point', 'RO:0002606': 'is substance that treats', 'RO:0002607': 'is marker for', 'RO:0002610': 'correlates_with', 'RO:0002614': 'is_evidence_supported_by', 'RO:0003002': 'negatively_regulates', 'RO:0003003': 'positively_regulates', 'RO:0003301': 'is model of', 'RO:0003302': 'causes_or_contributes', 'RO:0003303': 'causes condition', 'RO:0003304': 'contributes to condition', 'RO:0003307': 'protective_for_condition', 'RO:0004011': 'is causal gain of function germline mutation of in', 'RO:0004012': 'is causal loss of function germline mutation of in', 'RO:0004013': 'is causal germline mutation in', 'RO:0004014': 'is causal somatic mutation in', 'RO:0004015': 'is causal susceptibility factor for', 'RO:0004016': 'is causal germline mutation partially giving rise to', 'RO:HOM0000011': 'in paralogy relationship with', 'RO:HOM0000017': 'in orthology relationship with', 'RO:HOM0000018': 'in xenology relationship with', 'RO:HOM0000019': 'in 1 to 1 homology relationship with', 'RO:HOM0000020': 'in 1 to 1 orthology relationship with', 'RO:HOM0000022': 'in ohnology relationship with', 'RO:HOM0000023': 'in in-paralogy relationship with', 'SEPIO:0000001': 'assertion', 'SEPIO:0000003': 'assertion process', 'SEPIO:0000006': 'has_evidence_line', 'SEPIO:0000007': 'has_supporting_evidence_line', 'SEPIO:0000011': 'has_provenance', 'SEPIO:0000015': 'is_asserted_in', 'SEPIO:0000017': 'has_agent', 'SEPIO:0000018': 'created_by', 'SEPIO:0000021': 'date_created', 'SEPIO:0000022': 'created_with_resource', 'SEPIO:0000031': 'is_evidence_for', 'SEPIO:0000032': 'is_supporting_evidence_for', 'SEPIO:0000033': 'is_refuting_evidence_for', 'SEPIO:0000037': 'assertion method', 'SEPIO:0000041': 'is_specified_by', 'SEPIO:0000059': 'is_evidence_with_support_from', 'SEPIO:0000066': 'research', 'SEPIO:0000067': 'clinical testing', 'SEPIO:0000071': 'case-control', 'SEPIO:0000073': 'in vitro', 'SEPIO:0000074': 'in vivo', 'SEPIO:0000080': 'literature only', 'SEPIO:0000081': 'curation', 'SEPIO:0000084': 'has_evidence_item', 'SEPIO:0000085': 'has_supporting_activity', 'SEPIO:0000098': 'is_equilavent_to', 'SEPIO:0000099': 'is_consistent_with', 'SEPIO:0000100': 'strongly_contradicts', 'SEPIO:0000101': 'contradicts', 'SEPIO:0000102': 'reference population', 'SEPIO:0000111': 'is_assertion_supported_by_evidence', 'SEPIO:0000114': 'measures_parameter', 'SEPIO:0000124': 'has_supporting_reference', 'SEPIO:0000126': 'is_inconsistent_with', 'SEPIO:0000130': 'asserted_by', 'SEPIO:0000167': 'assertion_confidence_level', 'SEPIO:0000168': 'assertion_confidence_score', 'SEPIO:0000186': 'phenotyping only', 'SIO:000302': 'web page', 'SIO:000794': 'count', 'SIO:001015': 'race', 'SO:0000001': 'region', 'SO:0000013': 'small cytoplasmic RNA', 'SO:0000043': 'processed_pseudogene', 'SO:0000077': 'antisense', 'SO:0000100': 'endogenous_retroviral_gene', 'SO:0000101': 'transposable_element', 'SO:0000104': 'polypeptide', 'SO:0000105': 'chromosome_arm', 'SO:0000110': 'sequence_feature', 'SO:0000111': 'transposable_element_gene', 'SO:0000134': 'genomically_imprinted', 'SO:0000143': 'assembly_component', 'SO:0000150': 'read', 'SO:0000159': 'deletion', 'SO:0000165': 'enhancer', 'SO:0000167': 'promoter', 'SO:0000180': 'retrotransposon', 'SO:0000199': 'translocation', 'SO:0000233': 'mature_transcript', 'SO:0000289': 'microsatellite', 'SO:0000307': 'CpG_island', 'SO:0000336': 'pseudogene', 'SO:0000337': 'RNAi_reagent', 'SO:0000340': 'chromosome', 'SO:0000341': 'chromosome_band', 'SO:0000374': 'ribozyme', 'SO:0000404': 'vault_RNA', 'SO:0000405': 'Y RNA', 'SO:0000409': 'binding_site', 'SO:0000453': 'chromosomal_transposition', 'SO:0000460': 'vertebrate_immunoglobulin_T_cell_receptor_segment', 'SO:0000462': 'pseudogenic_region', 'SO:0000577': 'centromere', 'SO:0000624': 'telomere', 'SO:0000643': 'minisatellite', 'SO:0000651': 'large_subunit_rRNA', 'SO:0000655': 'ncRNA', 'SO:0000667': 'insertion', 'SO:0000694': 'SNP', 'SO:0000703': 'experimental_result_region', 'SO:0000704': 'gene', 'SO:0000756': 'cDNA', 'SO:0000771': 'QTL', 'SO:0000781': 'transgenic', 'SO:0000796': 'transgenic_transposable_element', 'SO:0000806': 'fusion', 'SO:0000817': 'wild_type', 'SO:0000830': 'chromosome_part', 'SO:0000883': 'stop codon readthrough', 'SO:0000902': 'transgene', 'SO:0000903': 'endogenous_retroviral_sequence', 'SO:0000946': 'integration_excision_site', 'SO:0001024': 'haplotype', 'SO:0001026': 'genome', 'SO:0001028': 'diplotype', 'SO:0001055': 'transcriptional_cis_regulatory_region', 'SO:0001059': 'sequence_alteration', 'SO:0001060': 'sequence_variant', 'SO:0001217': 'protein_coding_gene', 'SO:0001218': 'transgenic_insertion', 'SO:0001240': 'TSS_region', 'SO:0001263': 'ncRNA_gene', 'SO:0001265': 'miRNA_gene', 'SO:0001266': 'scRNA_gene', 'SO:0001267': 'snoRNA_gene', 'SO:0001268': 'snRNA_gene', 'SO:0001269': 'SRP_RNA_gene', 'SO:0001272': 'tRNA_gene', 'SO:0001411': 'biological_region', 'SO:0001483': 'SNV', 'SO:0001500': 'heritable_phenotypic_marker', 'SO:0001503': 'processed_transcript', 'SO:0001505': 'reference_genome', 'SO:0001564': 'gene_variant', 'SO:0001566': 'regulatory_region_variant', 'SO:0001574': 'splice_acceptor_variant', 'SO:0001575': 'splice_donor_variant', 'SO:0001578': 'stop_lost', 'SO:0001580': 'coding_sequence_variant', 'SO:0001583': 'missense_variant', 'SO:0001587': 'stop_gained', 'SO:0001589': 'frameshift_variant', 'SO:0001619': 'non_coding_transcript_exon_variant', 'SO:0001620': 'mature_miRNA_variant', 'SO:0001622': 'UTR_variant', 'SO:0001623': '5_prime_UTR_variant', 'SO:0001624': '3_prime_UTR_variant', 'SO:0001627': 'intron_variant', 'SO:0001628': 'intergenic_variant', 'SO:0001630': 'splice_region_variant', 'SO:0001634': 'downstream_gene_variant', 'SO:0001636': 'upstream_gene_variant', 'SO:0001637': 'rRNA_gene', 'SO:0001638': 'piRNA_gene', 'SO:0001639': 'RNase_P_RNA_gene', 'SO:0001640': 'RNase_MRP_RNA_gene', 'SO:0001641': 'lincRNA_gene', 'SO:0001643': 'telomerase_RNA_gene', 'SO:0001645': 'genetic_marker', 'SO:0001650': 'inframe_variant', 'SO:0001685': 'score', 'SO:0001741': 'pseudogenic_gene_segment', 'SO:0001742': 'copy_number_gain', 'SO:0001743': 'copy_number_loss', 'SO:0001759': 'unitary_pseudogene', 'SO:0001760': 'unprocessed_pseudogene', 'SO:0001782': 'TF_binding_site_variant', 'SO:0001784': 'complex_structural_alteration', 'SO:0001785': 'structural_alteration', 'SO:0001792': 'non_coding_exon_variant', 'SO:0001818': 'protein_altering_variant', 'SO:0001819': 'synonymous_variant', 'SO:0001821': 'inframe_insertion', 'SO:0001822': 'inframe_deletion', 'SO:0001837': 'mobile_element_insertion', 'SO:0001838': 'novel_sequence_insertion', 'SO:0001841': 'polymorphic_pseudogene', 'SO:0001877': 'lnc_RNA', 'SO:0001882': 'feature_fusion', 'SO:0001897': 'transposable_element_pseudogene', 'SO:0001904': 'antisense_lncRNA', 'SO:0002007': 'MNV', 'SO:0002012': 'start_lost', 'SO:0002040': 'vaultRNA_primary_transcript', 'SO:0002052': 'dominant_negative_variant', 'SO:0002053': 'gain_of_function_variant', 'SO:0002054': 'loss_of_function_variant', 'SO:0002095': 'scaRNA', 'SO:0002098': 'immunoglobulin_pseudogene', 'SO:0002099': 'T_cell_receptor_pseudogene', 'SO:0002100': 'IG_C_pseudogene', 'SO:0002101': 'IG_J_pseudogene', 'SO:0002102': 'IG_V_pseudogene', 'SO:0002103': 'TR_V_pseudogene', 'SO:0002104': 'TR_J_pseudogene', 'SO:0002106': 'translated_unprocessed_pseudogene', 'SO:0002107': 'transcribed_unprocessed_pseudogene', 'SO:0002108': 'transcribed_unitary_pseudogene', 'SO:0002109': 'transcribed_processed_pseudogene', 'SO:0002120': '3prime_overlapping_ncRNA', 'SO:0002122': 'immunoglobulin_gene', 'SO:0002123': 'IG_C_gene', 'SO:0002124': 'IG_D_gene', 'SO:0002125': 'IG_J_gene', 'SO:0002126': 'IG_V_gene', 'SO:0002127': 'lncRNA_gene', 'SO:0002128': 'mt_rRNA', 'SO:0002129': 'mt_tRNA', 'SO:0002131': 'sense_intronic', 'SO:0002132': 'sense_overlapping', 'SO:0002134': 'TR_C_gene', 'SO:0002135': 'TR_D_gene', 'SO:0002136': 'TR_J_gene', 'SO:0002137': 'TR_V_gene', 'SO:0002139': 'TEC', 'SO:0002181': 'ribozyme_gene', 'SO:0002183': 'sense_overlap_ncRNA_gene', 'SO:0002184': 'sense_intronic_ncRNA_gene', 'SO:0002185': 'bidirectional_promoter_lncRNA', 'SO:1000002': 'substitution', 'SO:1000005': 'complex_substitution', 'SO:1000008': 'point_mutation', 'SO:1000029': 'chromosomal_deletion', 'SO:1000030': 'chromosomal_inversion', 'SO:1000032': 'indel', 'SO:1000035': 'duplication', 'SO:1000036': 'inversion', 'SO:1000037': 'chromosomal_duplication', 'SO:1000039': 'direct_tandem_duplication', 'SO:1000043': 'Robertsonian_fusion', 'SO:1000044': 'chromosomal_translocation', 'SO:1000048': 'reciprocal_chromosomal_translocation', 'SO:1000117': 'sequence_variant_affecting_polypeptide_function', 'SO:1000118': 'sequence_variant_causing_loss_of_function_of_polypeptide', 'SO:1000120': 'sequence_variant_causing_inactive_catalytic_site', 'SO:1000125': 'sequence_variant_causing_gain_of_function_of_polypeptide', 'SO:1000173': 'tandem_duplication', 'SO:1000183': 'chromosome_structure_variation', 'SO:3000000': 'gene_segment', 'STATO:0000073': "Fisher's exact test", 'STATO:0000076': 'Mann-Whitney U-test', 'STATO:0000085': 'effect size estimate', 'STATO:0000104': 'zscore', 'STATO:0000107': 'statistical model', 'STATO:0000129': 'has_value', 'STATO:0000169': 'fold change', 'STATO:0000182': 'odds_ratio', 'STATO:0000189': 'mixed effect model', 'STATO:0000372': 'generalized least squares estimation', 'STATO:0000464': 'linear mixed model', 'SWO:0000425': 'Similarity score', 'UPHENO:0001001': 'phenotype', 'VIVO:Project': 'project', 'XCO:0000000': 'environmental_condition', 'cito:citesAsAuthority': 'citesAsAuthority', 'dc:Publisher': 'Publisher', 'dc:created': 'Date Created', 'dc:creator': 'creator', 'dc:description': 'description', 'dc:format': 'format', 'dc:identifier': 'identifier', 'dc:isVersionOf': 'isVersionOf', 'dc:license': 'license', 'dc:rights': 'rights', 'dc:source': 'Source', 'dc:title': 'title', 'dcat:Distribution': 'Distribution', 'dcat:distribution': 'distribution', 'dcat:downloadURL': 'downloadURL', 'dctypes:Dataset': 'Dataset', 'faldo:BothStrandPosition': 'both_strand', 'faldo:FuzzyPosition': 'FuzzyPosition', 'faldo:MinusStrandPosition': 'minus_strand', 'faldo:PlusStrandPosition': 'plus_strand', 'faldo:Position': 'Position', 'faldo:Region': 'Region', 'faldo:begin': 'begin', 'faldo:end': 'end', 'faldo:location': 'location', 'faldo:position': 'position', 'faldo:reference': 'reference', 'foaf:Person': 'person', 'foaf:depiction': 'depiction', 'foaf:organization': 'organization', 'foaf:page': 'page', 'oboInOwl:consider': 'consider', 'oboInOwl:hasDbXref': 'database_cross_reference', 'oboInOwl:hasExactSynonym': 'has_exact_synonym', 'oboInOwl:hasRelatedSynonym': 'has_related_synonym', 'owl:AnnotationProperty': 'annotation_property', 'owl:Class': 'class', 'owl:DatatypeProperty': 'datatype_property', 'owl:NamedIndividual': 'named_individual', 'owl:ObjectProperty': 'object_property', 'owl:Ontology': 'ontology', 'owl:Restriction': 'restriction', 'owl:deprecated': 'deprecated', 'owl:equivalentClass': 'equivalent_class', 'owl:onProperty': 'on_property', 'owl:sameAs': 'same_as', 'owl:someValuesFrom': 'some_values_from', 'owl:versionIRI': 'version_iri', 'owl:versionInfo': 'version_info', 'pav:createdOn': 'created_on', 'pav:createdWith': 'created_with', 'pav:retrievedOn': 'retrieved_on', 'pav:version': 'version', 'rdf:type': 'type', 'rdfs:comment': 'comment', 'rdfs:domain': 'domain', 'rdfs:label': 'label', 'rdfs:subClassOf': 'subclass_of', 'rdfs:subPropertyOf': 'subPropertyOf', 'void:class': 'class (void)', 'void:classPartition': 'classPartition', 'void:distinctObjects': 'distinctObjects', 'void:distinctSubjects': 'distinctSubjects', 'void:entities': 'entities', 'void:properties': 'properties', 'void:triples': 'triples'}
globaltt = {'3_prime_UTR_variant': 'SO:0001624', '3prime_overlapping_ncRNA': 'SO:0002120', '5_prime_UTR_variant': 'SO:0001623', 'Adipose stromal cell': 'CLO:0036934', 'African': 'EFO:0004561', 'African American': 'EFO:0003150', 'Ailurus fulgens': 'NCBITaxon:9649', 'Amniotic fluid-derived cell line': 'CLO:0036935', 'Anolis carolinensis': 'NCBITaxon:28377', 'Arabidopsis thaliana': 'NCBITaxon:3702', 'Asian': 'EFO:0003152', 'Asian Indian': 'EFO:0003153', 'Asian, Vietnamese': 'ERO:0002071', 'Asian/Pacific Islander': 'EFO:0003154', 'Aspergillus': 'NCBITaxon:5052', 'B cell': 'CL:0000236', 'Bos taurus': 'NCBITaxon:9913', 'Caenorhabditis elegans': 'NCBITaxon:6239', 'Callicebus moloch': 'NCBITaxon:9523', 'Canis lupus familiaris': 'NCBITaxon:9615', 'Capra hircus': 'NCBITaxon:9925', 'Caucasian': 'EFO:0003156', 'Cavia porcellus': 'NCBITaxon:10141', 'Chelonoidis niger': 'NCBITaxon:66189', 'Chinese': 'EFO:0003157', 'Chorionic villus-derived cell line': 'CLO:0036940', 'Cone–rod dystrophy': 'MONDO:0015993', 'CpG_island': 'SO:0000307', 'Cricetulus griseus': 'NCBITaxon:10029', 'Danio rerio': 'NCBITaxon:7955', 'Dataset': 'dctypes:Dataset', 'Date Created': 'dc:created', 'Dictyostelium': 'NCBITaxon:5782', 'Dictyostelium discoideum': 'NCBITaxon:44689', 'Distribution': 'dcat:Distribution', 'Drosophila melanogaster': 'NCBITaxon:7227', 'Dwarfism': 'MESH:D004392', 'Eastern Indian': 'EFO:0003158', 'Equus caballus': 'NCBITaxon:9796', 'Erinaceus europaeus': 'NCBITaxon:9365', 'Erythrocebus patas': 'NCBITaxon:9538', 'Escherichia coli': 'NCBITaxon:562', 'FRET': 'ECO:0001048', 'Felis catus': 'NCBITaxon:9685', 'Filipino': 'EFO:0003160', "Fisher's exact test": 'STATO:0000073', 'FuzzyPosition': 'faldo:FuzzyPosition', 'Gallus gallus': 'NCBITaxon:9031', 'Gorilla gorilla': 'NCBITaxon:9593', 'Hispanic': 'EFO:0003169', 'Homo sapiens': 'NCBITaxon:9606', 'IG_C_gene': 'SO:0002123', 'IG_C_pseudogene': 'SO:0002100', 'IG_D_gene': 'SO:0002124', 'IG_J_gene': 'SO:0002125', 'IG_J_pseudogene': 'SO:0002101', 'IG_V_gene': 'SO:0002126', 'IG_V_pseudogene': 'SO:0002102', 'Japanese': 'EFO:0003164', 'Korean': 'EFO:0003165', 'Lagothrix lagotricha': 'NCBITaxon:9519', 'MNV': 'SO:0002007', 'Macaca fascicularis': 'NCBITaxon:9541', 'Macaca mulatta': 'NCBITaxon:9544', 'Macaca nemestrina': 'NCBITaxon:9545', 'Macaca nigra': 'NCBITaxon:54600', 'Macropus eugenii': 'NCBITaxon:9315', 'Mann-Whitney U-test': 'STATO:0000076', 'Manual': 'NCIT:C63513', 'Mesocricetus auratus': 'NCBITaxon:10036', 'Microcebus murinus': 'NCBITaxon:30608', 'Microcell hybrid': 'CLO:0036939', 'Monodelphis domestica': 'NCBITaxon:13616', 'Muntiacus muntjak': 'NCBITaxon:9888', 'Mus': 'NCBITaxon:10088', 'Mus abbotti': 'NCBITaxon:10108', 'Mus baoulei': 'NCBITaxon:544437', 'Mus booduga': 'NCBITaxon:27681', 'Mus caroli': 'NCBITaxon:10089', 'Mus cervicolor': 'NCBITaxon:10097', 'Mus cervicolor cervicolor': 'NCBITaxon:135827', 'Mus cervicolor popaeus': 'NCBITaxon:135828', 'Mus cookii': 'NCBITaxon:10098', 'Mus crociduroides': 'NCBITaxon:41269', 'Mus cypriacus': 'NCBITaxon:468371', 'Mus emesi': 'NCBITaxon:887131', 'Mus famulus': 'NCBITaxon:83773', 'Mus fragilicauda': 'NCBITaxon:186193', 'Mus gratus': 'NCBITaxon:229288', 'Mus haussa': 'NCBITaxon:273922', 'Mus indutus': 'NCBITaxon:273921', 'Mus lepidoides': 'NCBITaxon:390847', 'Mus macedonicus macedonicus': 'NCBITaxon:270353', 'Mus macedonicus spretoides': 'NCBITaxon:270352', 'Mus mattheyi': 'NCBITaxon:41270', 'Mus minutoides': 'NCBITaxon:10105', 'Mus musculus': 'NCBITaxon:10090', 'Mus musculus bactrianus': 'NCBITaxon:35531', 'Mus musculus brevirostris': 'NCBITaxon:116058', 'Mus musculus castaneus': 'NCBITaxon:10091', 'Mus musculus domesticus': 'NCBITaxon:10092', 'Mus musculus domesticus x M. m. molossinus': 'NCBITaxon:1266728', 'Mus musculus gansuensis': 'NCBITaxon:1385377', 'Mus musculus gentilulus': 'NCBITaxon:80274', 'Mus musculus molossinus': 'NCBITaxon:57486', 'Mus musculus musculus': 'NCBITaxon:39442', 'Mus musculus musculus x M. m. castaneus': 'NCBITaxon:477816', 'Mus musculus musculus x M. m. domesticus': 'NCBITaxon:477815', 'Mus musculus x Mus spretus': 'NCBITaxon:186842', 'Mus nitidulus': 'NCBITaxon:390848', 'Mus pahari': 'NCBITaxon:10093', 'Mus platythrix': 'NCBITaxon:10101', 'Mus saxicola': 'NCBITaxon:10094', 'Mus setulosus': 'NCBITaxon:10102', 'Mus spicilegus': 'NCBITaxon:10103', 'Mus spretus': 'NCBITaxon:10096', 'Mus tenellus': 'NCBITaxon:397330', 'Mus terricolor': 'NCBITaxon:254704', 'Mus triton': 'NCBITaxon:473865', 'Nannomys': 'NCBITaxon:862510', 'Native American': 'ERO:0001984', 'OBO foundry unique label': 'IAO:0000589', 'Oncorhynchus mykiss': 'NCBITaxon:8022', 'Ornithorhynchus anatinus': 'NCBITaxon:9258', 'Oryctolagus cuniculus': 'NCBITaxon:9986', 'Ovis aries': 'NCBITaxon:9940', 'Pan paniscus': 'NCBITaxon:9597', 'Pan troglodytes': 'NCBITaxon:9598', 'Papio anubis': 'NCBITaxon:9555', 'Percent Change From Baseline': 'NCIT:C96621', 'Peromyscus californicus': 'NCBITaxon:42520', 'Peromyscus leucopus': 'NCBITaxon:10041', 'Peromyscus maniculatus': 'NCBITaxon:10042', 'Peromyscus polionotus': 'NCBITaxon:42413', 'Pongo pygmaeus': 'NCBITaxon:9600', 'Position': 'faldo:Position', 'Publisher': 'dc:Publisher', 'QTL': 'SO:0000771', 'RNAi_reagent': 'SO:0000337', 'RNase_MRP_RNA_gene': 'SO:0001640', 'RNase_P_RNA_gene': 'SO:0001639', 'Rattus': 'NCBITaxon:10114', 'Rattus norvegicus': 'NCBITaxon:10116', 'Region': 'faldo:Region', 'Robertsonian_fusion': 'SO:1000043', 'SNP': 'SO:0000694', 'SNV': 'SO:0001483', 'SRP_RNA_gene': 'SO:0001269', 'Saccharomyces cerevisiae S288C': 'NCBITaxon:559292', 'Saguinus fuscicollis': 'NCBITaxon:9487', 'Saguinus labiatus': 'NCBITaxon:78454', 'Saimiri sciureus': 'NCBITaxon:9521', 'Schizosaccharomyces pombe': 'NCBITaxon:4896', 'Similarity score': 'SWO:0000425', 'Source': 'dc:source', 'Statistical Significance': 'NCIT:C61040', 'Sus scrofa': 'NCBITaxon:9823', 'Sus scrofa domestica': 'NCBITaxon:9825', 'Suspected': 'NCIT:C71458', 'T cell': 'CL:0000084', 'TEC': 'SO:0002139', 'TF_binding_site_variant': 'SO:0001782', 'TR_C_gene': 'SO:0002134', 'TR_D_gene': 'SO:0002135', 'TR_J_gene': 'SO:0002136', 'TR_J_pseudogene': 'SO:0002104', 'TR_V_gene': 'SO:0002137', 'TR_V_pseudogene': 'SO:0002103', 'TSS_region': 'SO:0001240', 'T_cell_receptor_pseudogene': 'SO:0002099', 'Takifugu rubripe': 'NCBITaxon:31033', 'Tumor-derived cell line': 'CLO:0036938', 'UTR_variant': 'SO:0001622', 'Xenopus (Silurana) tropicalis': 'NCBITaxon:8364', 'Y RNA': 'SO:0000405', 'abnormal': 'PATO:0000460', 'activating_mutation': ':activating', 'affinity chromatography evidence': 'ECO:0000079', 'age': 'EFO:0000246', 'all_missense_or_inframe': ':all_missense_or_inframe', 'allele': 'GENO:0000512', 'allosomal dominant inheritance': 'GENO:0000146', 'allosomal recessive inheritance': 'GENO:0000149', 'amniocyte': 'CL:0002323', 'annotation_property': 'owl:AnnotationProperty', 'antisense': 'SO:0000077', 'antisense_lncRNA': 'SO:0001904', 'assay': 'OBI:0000070', 'assembly_component': 'SO:0000143', 'asserted_by': 'SEPIO:0000130', 'assertion': 'SEPIO:0000001', 'assertion method': 'SEPIO:0000037', 'assertion process': 'SEPIO:0000003', 'assertion_confidence_level': 'SEPIO:0000167', 'assertion_confidence_score': 'SEPIO:0000168', 'association': 'OBAN:association', 'association has object': 'OBAN:association_has_object', 'association has predicate': 'OBAN:association_has_predicate', 'association has subject': 'OBAN:association_has_subject', 'author statement supported by traceable reference used in manual assertion': 'ECO:0000304', 'author statement without traceable support used in manual assertion': 'ECO:0000303', 'autosomal dominant iniheritance': 'GENO:0000147', 'autosomal recessive inheritance': 'GENO:0000150', 'band_intensity': 'GENO:0000618', 'begin': 'faldo:begin', 'benign_for_condition': 'GENO:0000843', 'biallelic': ':biallelic', 'bidirectional_promoter_lncRNA': 'SO:0002185', 'binding_site': 'SO:0000409', 'biochemical trait analysis evidence': 'ECO:0000172', 'biological aspect of ancestor evidence used in manual assertion': 'ECO:0000318', 'biological aspect of descendant evidence': 'ECO:0000214', 'biological_region': 'SO:0001411', 'blood test evidence': 'ECO:0001016', 'both_strand': 'faldo:BothStrandPosition', 'cDNA': 'SO:0000756', 'cancer': 'MONDO:0004992', 'case-control': 'SEPIO:0000071', 'causally upstream of or within': 'RO:0002418', 'causally_influences': 'RO:0002566', 'causes condition': 'RO:0003303', 'causes_or_contributes': 'RO:0003302', 'cell': 'CL:0000000', 'cell line': 'CLO:0000031', 'cell line repository': 'CLO:0000008', 'cellular_process': 'GO:0009987', 'centromere': 'SO:0000577', 'chromosomal_deletion': 'SO:1000029', 'chromosomal_duplication': 'SO:1000037', 'chromosomal_inversion': 'SO:1000030', 'chromosomal_translocation': 'SO:1000044', 'chromosomal_transposition': 'SO:0000453', 'chromosome': 'SO:0000340', 'chromosome_arm': 'SO:0000105', 'chromosome_band': 'SO:0000341', 'chromosome_part': 'SO:0000830', 'chromosome_region': 'GENO:0000614', 'chromosome_structure_variation': 'SO:1000183', 'chromosome_subband': 'GENO:0000616', 'citesAsAuthority': 'cito:citesAsAuthority', 'class': 'owl:Class', 'class (void)': 'void:class', 'classPartition': 'void:classPartition', 'clinical study evidence': 'ECO:0000180', 'clinical testing': 'SEPIO:0000067', 'clique_leader': 'MONARCH:cliqueLeader', 'co-dominant inheritance': 'GENO:0000143', 'coding_sequence_variant': 'SO:0001580', 'coding_transgene_feature': 'GENO:0000638', 'collection': 'ERO:0002190', 'colocalizes with': 'RO:0002325', 'combinatorial evidence used in automatic assertion': 'ECO:0000213', 'comment': 'rdfs:comment', 'complete dominant inheritance': 'GENO:0000144', 'complex_structural_alteration': 'SO:0001784', 'complex_substitution': 'SO:1000005', 'compound heterozygous': 'GENO:0000402', 'computational combinatorial evidence used in manual assertion': 'ECO:0000245', 'condition inheritance': 'GENO:0000141', 'congenital heart disease': 'MONDO:0005453', 'consider': 'oboInOwl:consider', 'contradicts': 'SEPIO:0000101', 'contributes to': 'RO:0002326', 'contributes to condition': 'RO:0003304', 'copy_number_gain': 'SO:0001742', 'copy_number_loss': 'SO:0001743', 'correlates_with': 'RO:0002610', 'count': 'SIO:000794', 'created_by': 'SEPIO:0000018', 'created_on': 'pav:createdOn', 'created_with': 'pav:createdWith', 'created_with_resource': 'SEPIO:0000022', 'creator': 'dc:creator', 'curation': 'SEPIO:0000081', 'curator inference used in manual assertion': 'ECO:0000305', 'data set': 'IAO:0000100', 'database_cross_reference': 'oboInOwl:hasDbXref', 'datatype_property': 'owl:DatatypeProperty', 'date_created': 'SEPIO:0000021', 'definition': 'IAO:0000115', 'deletion': 'SO:0000159', 'depiction': 'foaf:depiction', 'deprecated': 'owl:deprecated', 'derives_from': 'RO:0001000', 'description': 'dc:description', 'developmental defect of the eye': 'MONDO:0020145', 'developmental_process': 'GO:0032502', 'digenic_inheritance': 'HP:0010984', 'diplotype': 'SO:0001028', 'direct assay evidence used in manual assertion': 'ECO:0000314', 'direct_tandem_duplication': 'SO:1000039', 'disease or disorder': 'MONDO:0000001', 'disorder of sex development': 'MONDO:0019592', 'distinctObjects': 'void:distinctObjects', 'distinctSubjects': 'void:distinctSubjects', 'distribution': 'dcat:distribution', 'document': 'IAO:0000310', 'does_not_have_phenotype': ':does_not_have_phenotype', 'domain': 'rdfs:domain', 'dominant_negative_variant': 'SO:0002052', 'downloadURL': 'dcat:downloadURL', 'downstream_gene_variant': 'SO:0001634', 'duplication': 'SO:1000035', 'effect size estimate': 'STATO:0000085', 'effective_genotype': 'GENO:0000525', 'embryonic lethality': 'MP:0008762', 'embryonic stem cell line': 'ERO:0002002', 'enables': 'RO:0002327', 'end': 'faldo:end', 'endogenous_retroviral_gene': 'SO:0000100', 'endogenous_retroviral_sequence': 'SO:0000903', 'endothelial cell': 'CL:0000115', 'ends during': 'RO:0002093', 'ends_with': 'RO:0002230', 'enhancer': 'SO:0000165', 'entities': 'void:entities', 'environmental_condition': 'XCO:0000000', 'environmental_system': 'ENVO:01000254', 'enzyme assay evidence': 'ECO:0000005', 'epithelial cell': 'CL:0000066', 'equivalent_class': 'owl:equivalentClass', 'ethnic_group': 'EFO:0001799', 'evidence': 'ECO:0000000', 'evidence used in automatic assertion': 'ECO:0000501', 'existence ends at point': 'RO:0002593', 'existence starts at point': 'RO:0002583', 'existence_ends_during': 'RO:0002492', 'existence_starts_during': 'RO:0002488', 'experimental evidence': 'ECO:0000006', 'experimental evidence used in manual assertion': 'ECO:0000269', 'experimental phenotypic evidence': 'ECO:0000059', 'experimental_result_region': 'SO:0000703', 'expressed in': 'RO:0002206', 'expression evidence used in manual assertion': 'ECO:0000270', 'expression pattern evidence': 'ECO:0000008', 'extrinsic_genotype': 'GENO:0000524', 'family': 'PCO:0000020', 'far-Western blotting evidence': 'ECO:0000076', 'feature_fusion': 'SO:0001882', 'female': 'PATO:0000383', 'female intrinsic genotype': 'GENO:0000647', 'fibroblast': 'CL:0000057', 'fold change': 'STATO:0000169', 'format': 'dc:format', 'frameshift_variant': 'SO:0001589', 'frequency': ':frequencyOfPhenotype', 'functional complementation evidence': 'ECO:0000012', 'fusion': 'SO:0000806', 'gain_of_function_variant': 'SO:0002053', 'gene': 'SO:0000704', 'gene product of': 'RO:0002204', 'gene_family': 'EDAM-DATA:3148', 'gene_product': 'CHEBI:33695', 'gene_segment': 'SO:3000000', 'gene_variant': 'SO:0001564', 'generalized least squares estimation': 'STATO:0000372', 'genetic interaction evidence': 'ECO:0000011', 'genetic interaction evidence used in manual assertion': 'ECO:0000316', 'genetic_marker': 'SO:0001645', 'genetically interacts with': 'RO:0002435', 'genome': 'SO:0001026', 'genomic context evidence': 'ECO:0000177', 'genomic_background': 'GENO:0000611', 'genomic_variation_complement': 'GENO:0000009', 'genomically_imprinted': 'SO:0000134', 'germline': 'GENO:0000900', 'gneg': 'GENO:0000620', 'gpos': 'GENO:0000619', 'gpos100': 'GENO:0000622', 'gpos25': 'GENO:0000625', 'gpos33': 'GENO:0000633', 'gpos50': 'GENO:0000624', 'gpos66': 'GENO:0000632', 'gpos75': 'GENO:0000623', 'gvar': 'GENO:0000621', 'haplotype': 'SO:0001024', 'has disposition': 'RO:0000091', 'has evidence': 'RO:0002558', 'has gene product': 'RO:0002205', 'has measurement value': 'IAO:0000004', 'has member': 'RO:0002351', 'has phenotype': 'RO:0002200', 'has specified numeric value': 'OBI:0001937', 'has subsequence': 'RO:0002524', 'has_affected_feature': 'GENO:0000418', 'has_agent': 'SEPIO:0000017', 'has_allelic_requirement': ':has_allelic_requirement', 'has_author': 'ERO:0000232', 'has_begin_stage_qualifier': 'GENO:0000630', 'has_cell_origin': ':has_cell_origin', 'has_drug_response': ':has_drug_response', 'has_end_stage_qualifier': 'GENO:0000631', 'has_evidence_item': 'SEPIO:0000084', 'has_evidence_line': 'SEPIO:0000006', 'has_exact_synonym': 'oboInOwl:hasExactSynonym', 'has_extent': 'GENO:0000678', 'has_functional_consequence': ':has_functional_consequence', 'has_genotype': 'GENO:0000222', 'has_input': 'RO:0002233', 'has_member_with_allelotype': 'GENO:0000225', 'has_molecular_consequence': ':has_molecular_consequence', 'has_origin': 'GENO:0000643', 'has_part': 'BFO:0000051', 'has_participant': 'RO:0000057', 'has_provenance': 'SEPIO:0000011', 'has_qualifier': 'GENO:0000580', 'has_quality': 'RO:0000086', 'has_quantifier': 'GENO:0000866', 'has_reference_part': 'GENO:0000385', 'has_related_synonym': 'oboInOwl:hasRelatedSynonym', 'has_sequence_attribute': 'GENO:0000207', 'has_sex_agnostic_part': 'GENO:0000650', 'has_sex_specificty': ':has_sex_specificity', 'has_supporting_activity': 'SEPIO:0000085', 'has_supporting_evidence_line': 'SEPIO:0000007', 'has_supporting_reference': 'SEPIO:0000124', 'has_uncertain_significance_for_condition': 'GENO:0000845', 'has_url': 'ERO:0000480', 'has_value': 'STATO:0000129', 'has_variant_part': 'GENO:0000382', 'has_zygosity': 'GENO:0000608', 'hematological system disease': 'MONDO:0005570', 'hemizygous': 'GENO:0000134', 'hemizygous insertion-linked': 'GENO:0000606', 'hemizygous-x': 'GENO:0000605', 'hemizygous-y': 'GENO:0000604', 'hereditary optic atrophy': 'MONDO:0043878', 'hereditary retinal dystrophy': 'MONDO:0004589', 'heritable_phenotypic_marker': 'SO:0001500', 'heteroplasmic': 'GENO:0000603', 'heterozygous': 'GENO:0000135', 'homoplasmic': 'GENO:0000602', 'homozygous': 'GENO:0000136', 'hybrid': 'NCBITaxon:37965', 'identifier': 'dc:identifier', 'imaging assay evidence': 'ECO:0000324', 'immortal kidney-derived cell line cell': 'CLO:0000220', 'immunoglobulin_gene': 'SO:0002122', 'immunoglobulin_pseudogene': 'SO:0002098', 'immunoprecipitation evidence': 'ECO:0000085', 'imported automatically asserted information used in automatic assertion': 'ECO:0000323', 'imported information': 'ECO:0000311', 'imported manually asserted information used in automatic assertion': 'ECO:0000322', 'in 1 to 1 homology relationship with': 'RO:HOM0000019', 'in 1 to 1 orthology relationship with': 'RO:HOM0000020', 'in in-paralogy relationship with': 'RO:HOM0000023', 'in ohnology relationship with': 'RO:HOM0000022', 'in orthology relationship with': 'RO:HOM0000017', 'in paralogy relationship with': 'RO:HOM0000011', 'in taxon': 'RO:0002162', 'in vitro': 'SEPIO:0000073', 'in vivo': 'SEPIO:0000074', 'in xenology relationship with': 'RO:HOM0000018', 'inborn errors of metabolism': 'MONDO:0019052', 'inchi_key': 'CHEBI:InChIKey', 'increased_gene_dosage': ':increased_gene_dosage', 'indel': 'SO:1000032', 'indeterminate': 'GENO:0000137', 'induced pluripotent stem cell': 'EFO:0004905', 'inference by association of genotype from phenotype': 'ECO:0005613', 'inference from background scientific knowledge': 'ECO:0000001', 'inference from background scientific knowledge used in manual assertion': 'ECO:0000306', 'inference from experimental data evidence': 'ECO:0005611', 'inference from phenotype manipulation evidence': 'ECO:0005612', 'inframe_deletion': 'SO:0001822', 'inframe_insertion': 'SO:0001821', 'inframe_variant': 'SO:0001650', 'inherited bleeding disorder, platelet-type': 'MONDO:0000009', 'insertion': 'SO:0000667', 'integration_excision_site': 'SO:0000946', 'integumentary system disease': 'MONDO:0002051', 'interacts with': 'RO:0002434', 'intergenic_variant': 'SO:0001628', 'intrinsic genotype': 'GENO:0000719', 'intron_variant': 'SO:0001627', 'inversion': 'SO:1000036', 'involved in': 'RO:0002331', 'is causal gain of function germline mutation of in': 'RO:0004011', 'is causal germline mutation in': 'RO:0004013', 'is causal germline mutation partially giving rise to': 'RO:0004016', 'is causal loss of function germline mutation of in': 'RO:0004012', 'is causal somatic mutation in': 'RO:0004014', 'is causal susceptibility factor for': 'RO:0004015', 'is downstream of sequence of': 'RO:0002529', 'is marker for': 'RO:0002607', 'is model of': 'RO:0003301', 'is subsequence of': 'RO:0002525', 'is substance that treats': 'RO:0002606', 'is upstream of sequence of': 'RO:0002528', 'isVersionOf': 'dc:isVersionOf', 'is_about': 'IAO:0000136', 'is_allele_of': 'GENO:0000408', 'is_allelotype_of': 'GENO:0000206', 'is_anonymous': 'MONARCH:anonymous', 'is_asserted_in': 'SEPIO:0000015', 'is_assertion_supported_by_evidence': 'SEPIO:0000111', 'is_consistent_with': 'SEPIO:0000099', 'is_equilavent_to': 'SEPIO:0000098', 'is_evidence_for': 'SEPIO:0000031', 'is_evidence_supported_by': 'RO:0002614', 'is_evidence_with_support_from': 'SEPIO:0000059', 'is_expression_variant_of': 'GENO:0000443', 'is_inconsistent_with': 'SEPIO:0000126', 'is_mutant_of': 'GENO:0000440', 'is_reference_allele_of': 'GENO:0000610', 'is_refuting_evidence_for': 'SEPIO:0000033', 'is_specified_by': 'SEPIO:0000041', 'is_supporting_evidence_for': 'SEPIO:0000032', 'is_targeted_by': 'GENO:0000634', 'is_transgene_variant_of': 'GENO:0000444', 'journal article': 'IAO:0000013', 'karyotype_variation_complement': 'GENO:0000644', 'keratinocyte': 'CL:0000312', 'label': 'rdfs:label', 'large_subunit_rRNA': 'SO:0000651', 'license': 'dc:license', 'likely_benign_for_condition': 'GENO:0000844', 'likely_pathogenic_for_condition': 'GENO:0000841', 'lincRNA_gene': 'SO:0001641', 'linear mixed model': 'STATO:0000464', 'literature only': 'SEPIO:0000080', 'lncRNA_gene': 'SO:0002127', 'lnc_RNA': 'SO:0001877', 'location': 'faldo:location', 'long_chromosome_arm': 'GENO:0000629', 'loss_of_function_variant': 'SO:0002054', 'lysosomal storage disease': 'MONDO:0002561', 'male': 'PATO:0000384', 'male intrinsic genotype': 'GENO:0000646', 'match to sequence model evidence': 'ECO:0000202', 'mature_miRNA_variant': 'SO:0001620', 'mature_transcript': 'SO:0000233', 'measurement datum': 'IAO:0000109', 'measures_parameter': 'SEPIO:0000114', 'melanocyte': 'CL:0000148', 'member of': 'RO:0002350', 'mentions': 'IAO:0000142', 'mesenchymal stem cell of adipose': 'CL:0002570', 'mesothelial': 'CL:0000077', 'metabolic disease': 'MONDO:0005066', 'miRNA_gene': 'SO:0001265', 'microsatellite': 'SO:0000289', 'minisatellite': 'SO:0000643', 'minus_strand': 'faldo:MinusStrandPosition', 'missense_variant': 'SO:0001583', 'mitochondrial_inheritance': 'HP:0001427', 'mixed effect model': 'STATO:0000189', 'mobile_element_insertion': 'SO:0001837', 'molecular entity': 'CHEBI:23367', 'molecularly_interacts_with': 'RO:0002436', 'monoallelic': ':monoallelic', 'mosaic': ':mosaic_genotype', 'mt_rRNA': 'SO:0002128', 'mt_tRNA': 'SO:0002129', 'mutant phenotype evidence': 'ECO:0000015', 'mutant phenotype evidence used in manual assertion': 'ECO:0000315', 'myoblast': 'CL:0000056', 'named_individual': 'owl:NamedIndividual', 'ncRNA': 'SO:0000655', 'ncRNA_gene': 'SO:0001263', 'negatively_regulates': 'RO:0003002', 'no biological data found': 'ECO:0000035', 'non_coding_exon_variant': 'SO:0001792', 'non_coding_transcript_exon_variant': 'SO:0001619', 'normal': 'PATO:0000461', 'novel_sequence_insertion': 'SO:0001838', 'object_property': 'owl:ObjectProperty', 'obsolete': 'HP:0031859', 'occurs_in': 'BFO:0000066', 'odds_ratio': 'STATO:0000182', 'on_property': 'owl:onProperty', 'oncocyte': 'CL:0002198', 'onset': ':onset', 'ontology': 'owl:Ontology', 'organization': 'foaf:organization', 'output_of': 'RO:0002353', 'p-value': 'OBI:0000175', 'page': 'foaf:page', 'part_of': 'BFO:0000050', 'part_of_contiguous_gene_duplication': ':part_of_contiguous_gene_duplication', 'pathogenic_for_condition': 'GENO:0000840', 'pathway': 'PW:0000001', 'person': 'foaf:Person', 'phenotype': 'UPHENO:0001001', 'phenotyping only': 'SEPIO:0000186', 'photograph': 'IAO:0000185', 'phylogenetic determination of loss of key residues evidence used in manual assertion': 'ECO:0000320', 'phylogenetic evidence': 'ECO:0000080', 'physical interaction evidence used in manual assertion': 'ECO:0000353', 'piRNA_gene': 'SO:0001638', 'platelet disorder, undefined': 'MONDO:0008254', 'plus_strand': 'faldo:PlusStrandPosition', 'point_mutation': 'SO:1000008', 'polymorphic_pseudogene': 'SO:0001841', 'polypeptide': 'SO:0000104', 'population': 'PCO:0000001', 'position': 'faldo:position', 'positively_regulates': 'RO:0003003', 'precursor cell': 'CL:0011115', 'probabalistic_quantifier': 'GENO:0000867', 'processed_pseudogene': 'SO:0000043', 'processed_transcript': 'SO:0001503', 'project': 'VIVO:Project', 'promoter': 'SO:0000167', 'properties': 'void:properties', 'proportional_reporting_ratio': 'OAE:0001563', 'protective_for_condition': 'RO:0003307', 'protein_altering_variant': 'SO:0001818', 'protein_coding_gene': 'SO:0001217', 'pseudogene': 'SO:0000336', 'pseudogenic_gene_segment': 'SO:0001741', 'pseudogenic_region': 'SO:0000462', 'publication': 'IAO:0000311', 'quantitative trait analysis evidence': 'ECO:0000061', 'rRNA_gene': 'SO:0001637', 'race': 'SIO:001015', 'read': 'SO:0000150', 'reagent': 'ERO:0000006', 'reagent_targeted_gene': 'GENO:0000504', 'recessive inheritance': 'GENO:0000148', 'reciprocal_chromosomal_translocation': 'SO:1000048', 'reference': 'faldo:reference', 'reference population': 'SEPIO:0000102', 'reference_genome': 'SO:0001505', 'reference_locus': 'GENO:0000036', 'region': 'SO:0000001', 'regulates': 'RO:0002448', 'regulatory_region_variant': 'SO:0001566', 'regulatory_transgene_feature': 'GENO:0000637', 'research': 'SEPIO:0000066', 'restriction': 'owl:Restriction', 'retinal disease': 'MONDO:0005283', 'retrieved_on': 'pav:retrievedOn', 'retrotransposon': 'SO:0000180', 'ribozyme': 'SO:0000374', 'ribozyme_gene': 'SO:0002181', 'rights': 'dc:rights', 'same_as': 'owl:sameAs', 'sampling_time': 'EFO:0000689', 'scRNA_gene': 'SO:0001266', 'scaRNA': 'SO:0002095', 'score': 'SO:0001685', 'semi-dominant inheritance': 'GENO:0000145', 'sense_intronic': 'SO:0002131', 'sense_intronic_ncRNA_gene': 'SO:0002184', 'sense_overlap_ncRNA_gene': 'SO:0002183', 'sense_overlapping': 'SO:0002132', 'sequence alignment evidence': 'ECO:0000200', 'sequence orthology evidence': 'ECO:0000201', 'sequence similarity evidence used in manual assertion': 'ECO:0000250', 'sequence_alteration': 'SO:0001059', 'sequence_derives_from': 'GENO:0000639', 'sequence_feature': 'SO:0000110', 'sequence_variant': 'SO:0001060', 'sequence_variant_affecting_polypeptide_function': 'SO:1000117', 'sequence_variant_causing_gain_of_function_of_polypeptide': 'SO:1000125', 'sequence_variant_causing_inactive_catalytic_site': 'SO:1000120', 'sequence_variant_causing_loss_of_function_of_polypeptide': 'SO:1000118', 'sequencing assay evidence': 'ECO:0000220', 'sex_qualified_genotype': 'GENO:0000645', 'short repeat': 'GENO:0000846', 'signal_transduction': 'GO:0007165', 'simple heterozygous': 'GENO:0000458', 'small cytoplasmic RNA': 'SO:0000013', 'smooth muscle cell': 'CL:0000192', 'snRNA_gene': 'SO:0001268', 'snoRNA_gene': 'SO:0001267', 'somatic': 'GENO:0000882', 'some_values_from': 'owl:someValuesFrom', 'splice_acceptor_variant': 'SO:0001574', 'splice_donor_variant': 'SO:0001575', 'splice_region_variant': 'SO:0001630', 'stalk': 'GENO:0000628', 'start_lost': 'SO:0002012', 'starts during': 'RO:0002091', 'starts_with': 'RO:0002224', 'statistical model': 'STATO:0000107', 'statistical_hypothesis_test': 'OBI:0000673', 'stem cell': 'CL:0000034', 'stop codon readthrough': 'SO:0000883', 'stop_gained': 'SO:0001587', 'stop_lost': 'SO:0001578', 'strain': 'EFO:0005135', 'strongly_contradicts': 'SEPIO:0000100', 'structural_alteration': 'SO:0001785', 'study': 'OBI:0000471', 'subPropertyOf': 'rdfs:subPropertyOf', 'subclass_of': 'rdfs:subClassOf', 'substitution': 'SO:1000002', 'synonymous_variant': 'SO:0001819', 'tRNA_gene': 'SO:0001272', 'tandem_duplication': 'SO:1000173', 'targeted_gene_complement': 'GENO:0000527', 'targeted_gene_subregion': 'GENO:0000534', 'targets_gene': 'GENO:0000414', 'telomerase_RNA_gene': 'SO:0001643', 'telomere': 'SO:0000624', 'term replaced by': 'IAO:0100001', 'title': 'dc:title', 'towards': 'RO:0002503', 'traceable author statement': 'ECO:0000033', 'transcribed_processed_pseudogene': 'SO:0002109', 'transcribed_unitary_pseudogene': 'SO:0002108', 'transcribed_unprocessed_pseudogene': 'SO:0002107', 'transcriptional_cis_regulatory_region': 'SO:0001055', 'transgene': 'SO:0000902', 'transgenic': 'SO:0000781', 'transgenic_insertion': 'SO:0001218', 'transgenic_transposable_element': 'SO:0000796', 'translated_unprocessed_pseudogene': 'SO:0002106', 'translates_to': 'RO:0002513', 'translocation': 'SO:0000199', 'transposable_element': 'SO:0000101', 'transposable_element_gene': 'SO:0000111', 'transposable_element_pseudogene': 'SO:0001897', 'triples': 'void:triples', 'type': 'rdf:type', 'ubiquitinates': 'RO:0002480', 'unitary_pseudogene': 'SO:0001759', 'unprocessed_pseudogene': 'SO:0001760', 'unspecified': 'GENO:0000772', 'unspecified_genomic_background': 'GENO:0000649', 'upstream_gene_variant': 'SO:0001636', 'variant single locus complement': 'GENO:0000030', 'variant_locus': 'GENO:0000002', 'vaultRNA_primary_transcript': 'SO:0002040', 'vault_RNA': 'SO:0000404', 'version': 'pav:version', 'version_info': 'owl:versionInfo', 'version_iri': 'owl:versionIRI', 'vertebrate_immunoglobulin_T_cell_receptor_segment': 'SO:0000460', 'web page': 'SIO:000302', 'wild_type': 'SO:0000817', 'wildtype': 'GENO:0000511', 'x-ray crystallography evidence': 'ECO:0001823', 'x_linked_dominant': 'HP:0001423', 'yeast 2-hybrid evidence': 'ECO:0000068', 'zscore': 'STATO:0000104'}
serialize(subject_iri, predicate_iri, obj, object_is_literal=False, literal_type=None, subject_category_iri=None, predicate_category_iri='biolink:category', object_category_iri=None)
skolemizeBlankNode(curie)
dipper.models package
Subpackages
dipper.models.assoc package
Submodules
dipper.models.assoc.Association module
class dipper.models.assoc.Association.Assoc(graph, definedby, sub=None, obj=None, pred=None, subject_category=None, object_category=None)

Bases: object

A base class for OBAN (Monarch)-style associations, to enable attribution of source and evidence on statements.

add_association_to_graph(association_category=None)
add_date(date)
add_evidence(identifier)

Add an evidence code to the association object (maintained as a list) :param identifier:

Returns:
add_predicate_object(predicate, object_node, object_type=None, datatype=None)
add_provenance(identifier)
add_source(identifier)

Add a source identifier (such as publication id) to the association object (maintained as a list) TODO we need to greatly expand this function!

Parameters:identifier
Returns:
get_association_id()
static make_association_id(definedby, sub, pred, obj, attributes=None)

A method to create unique identifiers for OBAN-style associations, based on all the parts of the association If any of the items is empty or None, it will convert it to blank. It effectively digests the string of concatonated values. Subclasses of Assoc can submit an additional array of attributes that will be appeded to the ID.

Note this is equivalent to a RDF blank node

Parameters:
  • definedby – The (data) resource that provided the annotation
  • subject
  • predicate
  • object
  • attributes
Returns:

set_association_id(assoc_id=None)

This will set the association ID based on the internal parts of the association. To be used in cases where an external association identifier should be used.

Parameters:assoc_id
Returns:
set_description(description)
set_object(identifier)
set_relationship(identifier)
set_score(score, unit=None, score_type=None)
set_subject(identifier)
dipper.models.assoc.Chem2DiseaseAssoc module
class dipper.models.assoc.Chem2DiseaseAssoc.Chem2DiseaseAssoc(graph, definedby, chem_id, phenotype_id, rel_id=None)

Bases: dipper.models.assoc.Association.Assoc

Attributes: assoc_id (str): Association Curie (Prefix:ID) chem_id (str): Chemical Curie phenotype_id (str): Phenotype Curie pub_list (str,list): One or more publication curies rel (str): Property relating assoc_id and chem_id evidence (str): Evidence curie

make_c2p_assoc_id()
set_association_id(assoc_id=None)

This will set the association ID based on the internal parts of the association. To be used in cases where an external association identifier should be used.

Parameters:assoc_id
Returns:
dipper.models.assoc.D2PAssoc module
class dipper.models.assoc.D2PAssoc.D2PAssoc(graph, definedby, disease_id, phenotype_id, onset=None, frequency=None, rel=None, disease_category=None, phenotype_category=None)

Bases: dipper.models.assoc.Association.Assoc

A specific association class for defining Disease-to-Phenotype relationships This assumes that a graph is created outside of this class, and nodes get added. By default, an association will assume the “has_phenotype” relationship, unless otherwise specified.

add_association_to_graph(association_category=None)

The reified relationship between a disease and a phenotype is decorated with some provenance information. This makes the assumption that both the disease and phenotype are classes.

Parameters:
  • g
  • disease_category – a biolink category CURIE for disease_id (defaults to

biolink:Disease via the constructor) :param phenotype_category: a biolink category CURIE for phenotype_id (defaults to biolink:PhenotypicFeature via the constructor) :return:

make_d2p_id()

Make an association id for phenotypic associations with disease that is defined by: source of association + disease + relationship + phenotype + onset + frequency

Returns:
set_association_id(assoc_id=None)

This will set the association ID based on the internal parts of the association. To be used in cases where an external association identifier should be used.

Parameters:assoc_id
Returns:
dipper.models.assoc.G2PAssoc module
class dipper.models.assoc.G2PAssoc.G2PAssoc(graph, definedby, entity_id, phenotype_id, rel=None, entity_category=None, phenotype_category=None)

Bases: dipper.models.assoc.Association.Assoc

A specific association class for defining Genotype-to-Phenotype relationships. This assumes that a graph is created outside of this class, and nodes get added. By default, an association will assume the “has_phenotype” relationship, unless otherwise specified. Note that genotypes are expected to be created and defined outside of this association, most likely by calling methods in the Genotype() class.

add_association_to_graph(entity_category=None, phenotype_category=None)

Overrides Association by including bnode support

The reified relationship between a genotype (or any genotype part) and a phenotype is decorated with some provenance information. This makes the assumption that both the genotype and phenotype are classes.

currently hardcoded to map the annotation to the monarch namespace :param g: :param entity_category: a biolink category CURIE for self.sub :param phenotype_category: a biolink category CURIE for self.obj :return:

make_g2p_id()

Make an association id for phenotypic associations that is defined by: source of association + (Annot subject) + relationship + phenotype/disease + environment + start stage + end stage

Returns:
set_association_id(assoc_id=None)

This will set the association ID based on the internal parts of the association. To be used in cases where an external association identifier should be used.

Parameters:assoc_id
Returns:
set_environment(environment_id)
set_stage(start_stage_id, end_stage_id)
dipper.models.assoc.InteractionAssoc module
class dipper.models.assoc.InteractionAssoc.InteractionAssoc(graph, definedby, subj, obj, rel=None)

Bases: dipper.models.assoc.Association.Assoc

dipper.models.assoc.OrthologyAssoc module
class dipper.models.assoc.OrthologyAssoc.OrthologyAssoc(graph, definedby, gene1, gene2, rel=None, subject_category=None, object_category=None)

Bases: dipper.models.assoc.Association.Assoc

add_gene_family_to_graph(family_id)

Make an association between a group of genes and some grouping class. We make the assumption that the genes in the association are part of the supplied family_id, and that the genes have already been declared as classes elsewhere. The family_id is added as an individual of type DATA:gene_family.

Triples: <family_id> a EDAM-DATA:gene_family <family_id> RO:has_member <gene1> <family_id> RO:has_member <gene2> <gene1> biolink:category <subject_category> <gene2> biolink:category <object_category> :param family_id: :param g: the graph to modify :return:

Submodules
dipper.models.BiolinkVocabulary module
class dipper.models.BiolinkVocabulary.BioLinkVocabulary

Bases: object

bl_file_with_path = '/home/docs/checkouts/readthedocs.org/user_builds/dipper/checkouts/master/resources/biolink_vocabulary.yaml'
bl_vocab = {'curie_prefix': 'biolink', 'terms': ['AnatomicalEntity', 'Association', 'BiologicalProcess', 'BiologicalSex', 'Case', 'category', 'ChemicalSubstance', 'CellLine', 'Disease', 'DataFile', 'DataSet', 'DataSetVersion', 'DiseaseOrPhenotypicFeature', 'EnvironmentFeature', 'ExposureEvent', 'EvidenceType', 'FrequencyValue', 'Gene', 'Genotype', 'GenomicEntity', 'GeneProduct', 'GeneGrouping', 'GeneToGeneHomologyAssociation', 'Genome', 'GenomeBuild', 'GeneFamily', 'GenomicSequenceLocalization', 'InformationContentEntity', 'IndividualOrganism', 'LifeStage', 'MolecularEntity', 'NamedThing', 'NoncodingRnaProduct', 'OrganismTaxon', 'OntologyClass', 'PopulationOfIndividualOrganisms', 'Publications', 'PhenotypicFeature', 'Provider', 'Protein', 'Publication', 'Procedure', 'Pathway', 'SourceFile', 'SequenceVariant', 'Transcript', 'Zygosity']}
key = 'Zygosity'
terms = {'AnatomicalEntity': 'biolink:AnatomicalEntity', 'Association': 'biolink:Association', 'BiologicalProcess': 'biolink:BiologicalProcess', 'BiologicalSex': 'biolink:BiologicalSex', 'Case': 'biolink:Case', 'CellLine': 'biolink:CellLine', 'ChemicalSubstance': 'biolink:ChemicalSubstance', 'DataFile': 'biolink:DataFile', 'DataSet': 'biolink:DataSet', 'DataSetVersion': 'biolink:DataSetVersion', 'Disease': 'biolink:Disease', 'DiseaseOrPhenotypicFeature': 'biolink:DiseaseOrPhenotypicFeature', 'EnvironmentFeature': 'biolink:EnvironmentFeature', 'EvidenceType': 'biolink:EvidenceType', 'ExposureEvent': 'biolink:ExposureEvent', 'FrequencyValue': 'biolink:FrequencyValue', 'Gene': 'biolink:Gene', 'GeneFamily': 'biolink:GeneFamily', 'GeneGrouping': 'biolink:GeneGrouping', 'GeneProduct': 'biolink:GeneProduct', 'GeneToGeneHomologyAssociation': 'biolink:GeneToGeneHomologyAssociation', 'Genome': 'biolink:Genome', 'GenomeBuild': 'biolink:GenomeBuild', 'GenomicEntity': 'biolink:GenomicEntity', 'GenomicSequenceLocalization': 'biolink:GenomicSequenceLocalization', 'Genotype': 'biolink:Genotype', 'IndividualOrganism': 'biolink:IndividualOrganism', 'InformationContentEntity': 'biolink:InformationContentEntity', 'LifeStage': 'biolink:LifeStage', 'MolecularEntity': 'biolink:MolecularEntity', 'NamedThing': 'biolink:NamedThing', 'NoncodingRnaProduct': 'biolink:NoncodingRnaProduct', 'OntologyClass': 'biolink:OntologyClass', 'OrganismTaxon': 'biolink:OrganismTaxon', 'Pathway': 'biolink:Pathway', 'PhenotypicFeature': 'biolink:PhenotypicFeature', 'PopulationOfIndividualOrganisms': 'biolink:PopulationOfIndividualOrganisms', 'Procedure': 'biolink:Procedure', 'Protein': 'biolink:Protein', 'Provider': 'biolink:Provider', 'Publication': 'biolink:Publication', 'Publications': 'biolink:Publications', 'SequenceVariant': 'biolink:SequenceVariant', 'SourceFile': 'biolink:SourceFile', 'Transcript': 'biolink:Transcript', 'Zygosity': 'biolink:Zygosity', 'category': 'biolink:category'}
yaml_file = <_io.TextIOWrapper name='/home/docs/checkouts/readthedocs.org/user_builds/dipper/checkouts/master/resources/biolink_vocabulary.yaml' mode='r' encoding='UTF-8'>
dipper.models.ClinVarRecord module

https://www.ncbi.nlm.nih.gov/clinvar/docs/details/ Object mapping to XML schema

class dipper.models.ClinVarRecord.Allele(id: str, label: Optional[str] = None, variant_type: Optional[str] = None, genes: Optional[List[dipper.models.ClinVarRecord.Gene]] = None, synonyms: Optional[List[str]] = None, dbsnps: Optional[List[str]] = None)

Bases: object

ClinVar Allele Alleles can have 0 to many genes

These are called alleles and variants on the ClinVar UI, and variant, single nucleotide variant, etc in the XML

id: allele id label: label variant_type: single nucleotide variant genes: gene(s) in which the variant is found synonyms: eg HGVC dbsnp: dbSNP curies

class dipper.models.ClinVarRecord.ClinVarRecord(id: str, accession: str, created: str, updated: str, genovar: dipper.models.ClinVarRecord.Genovar, significance: str, conditions: Optional[List[dipper.models.ClinVarRecord.Condition]] = None)

Bases: object

Reference ClinVar Record (RCV) id: RCV id accession: RCV accession (eg RCV000123456) created: Created date updated: Updated date genovar: the variant or genotype associated with the condition(s) significance: clinical significance (eg benign, pathogenic) condition: The condition(s) for which this allele set was interpreted, with

links to databases with defining information about that condition.
class dipper.models.ClinVarRecord.Condition(id: str, label: Optional[str] = None, database: Optional[str] = None, medgen_id: Optional[str] = None)

Bases: object

ClinVar condition

class dipper.models.ClinVarRecord.Gene(id: Union[str, int, None], association_to_allele: str)

Bases: object

ClinVar Gene Intentionally leaves out label/symbol, this should come from HGNC

class dipper.models.ClinVarRecord.Genotype(id: str, label: Optional[str] = None, variants: Optional[List[dipper.models.ClinVarRecord.Variant]] = None, variant_type: Optional[str] = None)

Bases: dipper.models.ClinVarRecord.Genovar

ClinVar genotype Example: Compound Heterozygote, Diplotype

These are called variants on the ClinVar UI, and a GenotypeSet in the XML

class dipper.models.ClinVarRecord.Genovar(id: str, label: Optional[str] = None, variant_type: Optional[str] = None)

Bases: object

Sequence feature entity that is linked to a disease in an Reference ClinVar Record, it is either a Variant or Genotype

class dipper.models.ClinVarRecord.Variant(id: str, label: Optional[str] = None, alleles: Optional[List[dipper.models.ClinVarRecord.Allele]] = None, variant_type: Optional[str] = None)

Bases: dipper.models.ClinVarRecord.Genovar

ClinVar variant, variants can have one or more alleles

These are called variants and alleles on the ClinVar UI, and Variants in the XML

dipper.models.Dataset module

Produces metadata about ingested data

class dipper.models.Dataset.Dataset(identifier, data_release_version, ingest_name, ingest_title, ingest_url, ingest_logo=None, ingest_description=None, license_url=None, data_rights=None, graph_type='rdf_graph', file_handle=None, distribution_type='ttl', dataset_curie_prefix='MonarchArchive')

Bases: object

This class produces metadata about a dataset that is compliant with the HCLS dataset specification: https://www.w3.org/TR/2015/NOTE-hcls-dataset-20150514/#s4_4

Summary level: The summary level provides a description of a dataset that is independent of a specific version or format. (e.g. the Monarch ingest of CTD) CURIE for this is something like MonarchData:[SOURCE IDENTIFIER]

Version level: The version level captures version-specific characteristics of a dataset. (e.g. the 01-02-2018 ingest of CTD) CURIE for this is something like MonarchData:[SOURCE IDENTIFIER_INGESTTIMESTAMP]

Distribution level: The distribution level captures metadata about a specific form and version of a dataset (e.g. turtle file for 01-02-2018 ingest of CTD). There is a [distribution level resource] for each different downloadable file we emit, i.e. one for the TTL file, one for the ntriples file, etc. CURIE for this is like MonarchData:[SOURCE IDENTIFIER_INGESTTIMESTAMP].ttl or MonarchData:[SOURCE IDENTIFIER_INGESTTIMESTAMP].nt or MonarchData:[SOURCE IDENTIFIER_INGESTTIMESTAMP].[whatever file format]

We write out at least the following triples:

SUMMARY LEVEL TRIPLES: [summary level resource] - rdf:type -> dctypes:Dataset [summary level resource] - dc:title -> title (literal) [summary level resource] - dc:description -> description (literal)

(use docstring from Source class)

[summary level resource] - dc:source -> [source web page, e.g. omim.org] [summary level resource] - schema:logo -> [source logo IRI] [summary level resource] - dc:publisher -> monarchinitiative.org

n.b: about summary level resource triples: – HCLS spec says we “should” link to our logo and web page, but I’m not, because it would confuse the issue of whether we are pointing to our logo/page or the logo/page of the data source for this ingest. Same below for [version level resource] and [distibution level resource] - I’m not linking to our page/logo down there either. - spec says we “should” include summary level triples describing Update frequency and SPARQL endpoint but I’m omitting this for now, because these are not clearly defined at the moment

VERSION LEVEL TRIPLES: [version level resource] - rdf:type -> dctypes:Dataset [version level resource] - dc:title -> version title (literal) [version level resource] - dc:description -> version description (literal) [version level resource] - dc:created -> ingest timestamp [ISO 8601 compliant] [version level resource] - pav:version -> ingest timestamp (same one above) [version level resource] - dc:creator -> monarchinitiative.org [version level resource] - dc:publisher -> monarchinitiative.org [version level resource] - dc:isVersionOf -> [summary level resource] [version level resource] - dc:source -> [source file 1 IRI] [version level resource] - dc:source -> [source file 2 IRI] …

[source file 1 IRI] - pav:retrievedOn -> [download date timestamp] [source file 2 IRI] - pav:version -> [source version (if set, optional)] [source file 2 IRI] - pav:retrievedOn -> [download date timestamp] [source file 2 IRI] - pav:version -> [source version (if set, optional)] …

[version level resource] - pav:createdWith -> [Dipper github URI] [version level resource] - void:dataset -> [distribution level resource]

[version level resource] - cito:citesAsAuthoriy -> [citation id 1] [version level resource] - cito:citesAsAuthoriy -> [citation id 2] [version level resource] - cito:citesAsAuthoriy -> [citation id 3]

n.b: about version level resource triples: - spec says we “should” include Date of issue/dc:issued triple, but I’m not because it is redundant with this triple above: [version level resource] - dc:created -> time stamp and would introduce ambiguity and confusion if the two disagree. Same below for [distribution level resource] - dc:created -> tgiime stamp below Also omitting:

  • triples linking to our logo and page, see above.
  • License/dc:license triple, because we will make this triple via the [distribution level resource] below
  • Language/dc:language triple b/c it seems superfluous. Same below for [distribution level resource] - no language triple.
  • [version level resource] - pav:version triple is also a bit redundant

with the pav:version triple below, but the spec requires both these triples - I’m omitting the [version level resource] -> pav:previousVersion because Dipper doesn’t know this info for certain at run time. Same below for [distribution level resource] - pav:previousVersion.

DISTRIBUTION LEVEL TRIPLES: [distribution level resource] - rdf:type -> dctypes:Dataset [distribution level resource] - rdf:type -> dcat:Distribution [distribution level resource] - dc:title -> distribution title (literal) [distribution level resource] - dc:description -> distribution description (lit.) [distribution level resource] - dc:created -> ingest timestamp[ISO 8601 compliant] [distribution level resource] - pav:version -> ingest timestamp (same as above) [distribution level resource] - dc:creator -> monarchinitiative.org [distribution level resource] - dc:publisher -> monarchinitiative.org [distribution level resource] - dc:license -> [license info, if available

otherwise indicate unknown]

[distribution level resource] - dc:rights -> [data rights IRI] [distribution level resource] - pav:createdWith -> [Dipper github URI] [distribution level resource] - dc:format -> [IRI of ttl|nt|whatever spec] [distribution level resource] - dcat:downloadURL -> [ttl|nt URI] [distribution level resource] - void:triples -> [triples count (literal)] [distribution level resource] - void:entities -> [entities count (literal)] [distribution level resource] - void:distinctSubjects -> [subject count (literal)] [distribution level resource] - void:distinctObjects -> [object count (literal)] [distribution level resource] - void:properties -> [properties count (literal)] …

n.b: about distribution level resource triples: - omitting Vocabularies used/void:vocabulary and Standards used/dc:conformTo triples, because they are described in the ttl file - also omitting Example identifier/idot:exampleIdentifier and Example resource/void:exampleResource, because we don’t really have one canonical example of either - they’re all very different. - [distribution level resource] - dc:created should have the exact same time stamp as this triple above: [version level resource] - dc:created -> time stamp - this [distribution level resource] - pav:version triple should have the same object as [version level resource] - pav:version triple above - Data source provenance/dc:source triples are above in the [version level resource] - omitting Byte size/dc:byteSize, RDF File URL/void:dataDump, and Linkset/void:subset triples because they probably aren’t necessary for MI right now - these triples “should” be emitted, but we will do this in a later iteration: # of classes void:classPartition IRI # of literals void:classPartition IRI # of RDF graphs void:classPartition IRI

Note: Do not use blank nodes in the dataset graph. This dataset graph is added to the main Dipper graph in Source.write() like so

$ mainGraph = mainGraph + datasetGraph

which apparently in theory could lead to blank node ID collisions between the two graphs.

Note also that this implementation currently does not support producing metadata for StreamedGraph graphs (see dipper/graph/StreamedGraph.py). StreamedGraph is currently not being used for any ingests, so this isn’t a problem. There was talk of using StreamedGraph for a rewrite/refactor of the Clinvar ingest, which would probably require adding support here for StreamedGraph’s.

get_graph()

This method returns the dataset graph :param :return: dataset graph

get_license()

This method returns the license info :param :return: license info

static hash_id(word)

Given a string, make a hash Duplicated from Source.py.

Parameters:word – str string to be hashed
Returns:hash of id
static make_id(long_string, prefix='MONARCH')

A method to create DETERMINISTIC identifiers based on a string’s digest. currently implemented with sha1 Duplicated from Source.py to avoid circular imports. :param long_string: string to use to generate identifier :param prefix: prefix to prepend to identifier [Monarch] :return: a Monarch identifier

set_citation(citation_id)

This method adds [citaton_id] argument to the set of citations, and also adds a triple indicating that version level cito:citesAsAuthority [citation_id] :param: citation_id :return: none

set_ingest_source(url, predicate=None, is_object_literal=False)

This method writes a triple to the dataset graph indicating that the ingest used a file or resource at [url] during the ingest.

Triple emitted is version_level_curie dc:source [url]

This triple is likely to be redundant if Source.get_files() is used to retrieve the remote files/resources, since this triple should also be emitted as files/resources are being retrieved. This method is provided as a convenience method for sources that do their own downloading of files.

Parameters:
  • url – a remote resource used as a source during ingest
  • predicate – the predicate to use for the triple [“dc:source”] from spec (https://www.w3.org/TR/2015/NOTE-hcls-dataset-20150514/) “Use dc:source when the source dataset was used in whole or in part. Use pav:retrievedFrom when the source dataset was used in whole and was not modified from its original distribution. Use prov:wasDerivedFrom when the source dataset was in whole or in part and was modified from its original distribution.”
Returns:

None

set_ingest_source_file_version_date(file_iri, date, datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#date'))

This method sets the version that the source (OMIM, CTD, whatever) uses to refer to this version of the remote file/resource that was used in the ingest

It writes this triple:

file_iri - ‘pav:version’ -> date or timestamp

Version is added as a literal of datatype XSD date

Note: if file_iri was retrieved using get_files(), then the following triple was created and you might not need this method:

file_iri - ‘pav:retrievedOn’ -> download date

Parameters:
  • file_iri – a remote file or resource used in ingest
  • date – a date in YYYYMMDD format that the source (OMIM, CTD). You can

add timestamp as a version by using a different datatype (below) :param datatype: an XSD literal datatype, default is XSD.date uses to refer to this version of the file/resource used during the ingest :return: None

set_ingest_source_file_version_num(file_iri, version)

This method sets the version of a remote file or resource that is used in the ingest. It writes this triple:

file_iri - ‘pav:version’ -> version

Version is an untyped literal

Note: if your version is a date or timestamp, use set_ingest_source_file_version_date() instead

Parameters:
  • file_iri – a remote file or resource used in ingest
  • version – a number or string (e.g. v1.2.3) that the source (OMIM, CTD)

uses to refer to this version of the file/resource used during the ingest :return: None

set_ingest_source_file_version_retrieved_on(file_iri, date, datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#date'))

This method sets the date on which a remote file/resource (from OMIM, CTD, etc) was retrieved.

It writes this triple:

file_iri - ‘pav:retrievedOn’ -> date or timestamp

Version is added as a literal of datatype XSD date by default

Note: if file_iri was retrieved using get_files(), then the following triple was created and you might not need this method:

file_iri - ‘pav:retrievedOn’ -> download date

Parameters:
  • file_iri – a remote file or resource used in ingest
  • date – a date in YYYYMMDD format that the source (OMIM, CTD). You can

add timestamp as a version by using a different datatype (below) :param datatype: an XSD literal datatype, default is XSD.date uses to refer to this version of the file/resource used during the ingest :return: None

dipper.models.Environment module
class dipper.models.Environment.Environment(graph)

Bases: object

These methods provide convenient methods to add items related to an experimental environment and it’s parts to a supplied graph.

This is a stub.

addComponentAttributes(component_id, entity_id, value=None, unit=None, component_category=None, entity_category=None)
addComponentToEnvironment(env_id, component_id, environment_category=None, component_category=None)
addEnvironment(env_id, env_label, env_type=None, env_description=None)
addEnvironmentalCondition(cond_id, cond_label, cond_type=None, cond_description=None, condition_category=None)
dipper.models.Evidence module
class dipper.models.Evidence.Evidence(graph, association)

Bases: object

To model evidence as the basis for an association. This encompasses:

  • measurements taken from the lab, and their significance.
    these can be derived from papers or other agents.
  • papers
>1 measurement may result from an assay,
each of which may have it’s own significance
add_data_individual(data_curie, label=None, ind_type=None, data_curie_category=None)

Add data individual :param data_curie: str either curie formatted or long string,

long strings will be converted to bnodes
Parameters:
  • data_curie_category – a biolink category CURIE for data_curie
  • type – str curie
  • label – str
Returns:

None

add_evidence(evidence_line, evidence_type=None, label=None)

Add line of evidence node to association id

Parameters:
  • evidence_line – curie or iri, evidence line
  • evidence_type – curie or iri, evidence type if available
Returns:

None

add_source(evidence_line, source, label=None, src_type=None, source_category=None)

Applies the triples: <evidence> <dc:source> <source> <source> <rdf:type> <type> <source> <rdfs:label> “label”

TODO this should belong in a higher level class :param evidence_line: str curie :param source: str source as curie :param label: optional, str type as curie :param type: optional, str type as curie :return: None

add_supporting_data(evidence_line, measurement_dict)

Add supporting data :param evidence_line: :param data_object: dict, where keys are curies or iris and values are measurement values for example:

{
“_:1234” : “1.53E07” “_:4567”: “20.25”

}

Note: assumes measurements are RDF:Type ‘ed elsewhere :return: None

add_supporting_evidence(evidence_line, evidence_type=None, label=None)

Add supporting line of evidence node to association id

Parameters:
  • evidence_line – curie or iri, evidence line
  • evidence_type – curie or iri, evidence type if available
Returns:

None

add_supporting_publication(evidence_line, publication, label=None, pub_type=None)

<evidence> <has_supporting_reference> <source> <source> <rdf:type> <type> <source> <rdfs:label> “label” :param evidence_line: str curie :param publication: str curie :param label: optional, str type as curie :param type: optional, str type as curie :return:

dipper.models.Family module
class dipper.models.Family.Family(graph)

Bases: object

Model mereological/part whole relationships

Although these relations are more abstract, we often use them to model family relationships (proteins, humans, etc.) The naming of this class may change in the future to better reflect the meaning of the relations it is modeling

addMember(group_id, member_id, group_category=None, member_category=None)
addMemberOf(member_id, group_id, member_category=None, group_category=None)
dipper.models.GenomicFeature module
class dipper.models.GenomicFeature.Feature(graph, feature_id=None, label=None, feature_type=None, description=None, feature_category=None)

Bases: object

Dealing with genomic features here. By default they are all faldo:Regions. We use SO for typing genomic features. At the moment, RO:has_subsequence is the default relationship between the regions, but this should be tested/verified.

TODO: the graph additions are in the addXToFeature functions, but should be separated. TODO: this will need to be extended to properly deal with fuzzy positions in faldo.

addFeatureEndLocation(coordinate, reference_id, strand=None, position_types=None)

Adds the coordinate details for the end of this feature :param coordinate: :param reference_id: :param strand:

addFeatureProperty(property_type, feature_property)
addFeatureStartLocation(coordinate, reference_id, strand=None, position_types=None)

Adds coordinate details for the start of this feature. :param coordinate: :param reference_id: :param strand: :param position_types:

addFeatureToGraph(add_region=True, region_id=None, feature_as_class=False, feature_category=None)

We make the assumption here that all features are instances. The features are located on a region, which begins and ends with faldo:Position The feature locations leverage the Faldo model, which has a general structure like: Triples: feature_id a feature_type (individual) faldo:location region_id region_id a faldo:region faldo:begin start_position faldo:end end_position start_position a (any of: faldo:(((Both|Plus|Minus)Strand)|Exact)Position) faldo:position Integer(numeric position) faldo:reference reference_id end_position a (any of: faldo:(((Both|Plus|Minus)Strand)|Exact)Position) faldo:position Integer(numeric position) faldo:reference reference_id

:param add_region [True] :param region_id [None] :param feature_as_class [False] :param feature_category: a biolink category CURIE for feature

addPositionToGraph(reference_id, position, position_types=None, strand=None)

Add the positional information to the graph, following the faldo model. We assume that if the strand is None, we give it a generic “Position” only. Triples: my_position a (any of: faldo:(((Both|Plus|Minus)Strand)|Exact)Position) faldo:position Integer(numeric position) faldo:reference reference_id

Parameters:
  • graph
  • reference_id
  • position
  • position_types
  • strand
Returns:

Identifier of the position created

addRegionPositionToGraph(region_id, begin_position_id, end_position_id)
addSubsequenceOfFeature(parentid, subject_category=None, object_category=None)

This will add reciprocal triples like: feature <is subsequence of> parent parent has_subsequence feature :param graph: :param parentid:

Returns:
addTaxonToFeature(taxonid)

Given the taxon id, this will add the following triple: feature in_taxon taxonid :param graph: :param taxonid: :return:

dipper.models.GenomicFeature.makeChromID(chrom, reference=None, prefix=None)

This will take a chromosome number and a NCBI taxon number, and create a unique identifier for the chromosome. These identifiers are made in the @base space like: Homo sapiens (9606) chr1 ==> :9606chr1 Mus musculus (10090) chrX ==> :10090chrX

Parameters:
  • chrom – the chromosome (preferably without any chr prefix)
  • reference – the numeric portion of the taxon id
Returns:

dipper.models.GenomicFeature.makeChromLabel(chrom, reference=None)
dipper.models.Genotype module
class dipper.models.Genotype.Genotype(graph)

Bases: object

These methods provide convenient methods to add items related to a genotype and it’s parts to a supplied graph. They follow the patterns set out in GENO https://github.com/monarch-initiative/GENO-ontology. For specific sequence features, we use the GenomicFeature class to create them.

addAffectedLocus(allele_id, gene_id, rel_id=None)

We make the assumption here that if the relationship is not provided, it is a GENO:has_affected_feature.

Here, the allele should be a variant_locus, not a sequence alteration. :param allele_id: :param gene_id: :param rel_id: :return:

addAllele(allele_id, allele_label, allele_type=None, allele_description=None)

Make an allele object. If no allele_type is added, it will default to a geno:allele :param allele_id: curie for allele (required) :param allele_label: label for allele (required) :param allele_type: id for an allele type (optional, recommended SO or GENO class) :param allele_description: a free-text description of the allele :return:

addAlleleOfGene(allele_id, gene_id, rel_id=None)

We make the assumption here that if the relationship is not provided, it is a GENO:is_allele_of.

Here, the allele should be a variant_locus, not a sequence alteration. :param allele_id: :param gene_id: :param rel_id: :return:

addChromosome(chrom, tax_id, tax_label=None, build_id=None, build_label=None)

if it’s just the chromosome, add it as an instance of a SO:chromosome, and add it to the genome. If a build is included, punn the chromosome as a subclass of SO:chromsome, and make the build-specific chromosome an instance of the supplied chr. The chr then becomes part of the build or genome.

addChromosomeClass(chrom_num, taxon_id, taxon_label)
addChromosomeInstance(chr_num, reference_id, reference_label, chr_type=None)

Add the supplied chromosome as an instance within the given reference :param chr_num: :param reference_id: for example, a build id like UCSC:hg19 :param reference_label: :param chr_type: this is the class that this is an instance of. typically a genome-specific chr

Returns:
addConstruct(construct_id, construct_label, construct_type=None, construct_description=None, construct_category=None, construct_type_category=None)
Parameters:
  • construct_id
  • construct_label
  • construct_type
  • construct_description
  • construct_category – a biolink category CURIE for construct_id
  • construct_type_category – a biolink category CURIE for construct_type
Returns:

addDerivesFrom(child_id, parent_id, child_category=None, parent_category=None)

We add a derives_from relationship between the child and parent id. Examples of uses include between: an allele and a construct or strain here, a cell line and it’s parent genotype. Adding the parent and child to the graph should happen outside of this function call to ensure graph integrity. :param child_id: :param parent_id: :return:

addGene(gene_id, gene_label=None, gene_type=None, gene_description=None)

genes are classes

addGeneProduct(sequence_id, product_id, product_label=None, product_type=None, sequence_category=None, product_category=None)

Add gene/variant/allele has_gene_product relationship Can be used to either describe a gene to transcript relationship or gene to protein :param sequence_id: :param product_id: :param product_label: :param product_type: :param sequence_category: bl category CURIE for seq_id [blv.terms.Gene].value :param product_category: biolink category CURIE for product_id :return:

addGeneTargetingReagent(reagent_id, reagent_label, reagent_type, gene_id, description=None, reagent_category=None)

Here, a gene-targeting reagent is added. The actual targets of this reagent should be added separately. :param reagent_id: :param reagent_label: :param reagent_type:

Returns:
addGeneTargetingReagentToGenotype(reagent_id, genotype_id)

Add genotype has_variant_part reagent_id. For example, add a morphant reagent thingy to the genotype, assuming it’s a extrinsic_genotype Also a triple to assign biolink categories to genotype and reagent. :param reagent_id :param genotype_id :return:

addGenome(taxon_num, taxon_label=None, genome_id=None)
addGenomicBackground(background_id, background_label, background_type=None, background_description=None)
addGenomicBackgroundToGenotype(background_id, genotype_id, background_type=None)
addGenotype(genotype_id, genotype_label, genotype_type=None, genotype_description=None)

If a genotype_type is not supplied, we will default to ‘intrinsic genotype’ :param genotype_id: :param genotype_label: :param genotype_type: :param genotype_description: :return:

addMemberOfPopulation(member_id, population_id)
addParts(part_id, parent_id, part_relationship=None, part_category=None, parent_category=None)

This will add a has_part (or subproperty) relationship between a parent_id and the supplied part. By default the relationship will be BFO:has_part, but any relationship could be given here. :param part_id: :param parent_id: :param part_relationship: :param part_category: a biolink vocab curie for part_id :param parent_category: a biolink vocab curie for parent_id :return:

addPartsToVSLC(vslc_id, allele1_id, allele2_id, zygosity_id=None, allele1_rel=None, allele2_rel=None)

Here we add the parts to the VSLC. While traditionally alleles (reference or variant loci) are traditionally added, you can add any node (such as sequence_alterations for unlocated variations) to a vslc if they are known to be paired. However, if a sequence_alteration’s loci is unknown, it probably should be added directly to the GVC. :param vslc_id: :param allele1_id: :param allele2_id: :param zygosity_id: :param allele1_rel: :param allele2_rel: :return:

addPolypeptide(polypeptide_id, polypeptide_label=None, transcript_id=None, polypeptide_type=None)
Parameters:
  • polypeptide_id
  • polypeptide_label
  • polypeptide_type
  • transcript_id
Returns:

addReagentTargetedGene(reagent_id, gene_id, targeted_gene_id=None, targeted_gene_label=None, description=None, reagent_category=None)

This will create the instance of a gene that is targeted by a molecular reagent (such as a morpholino or rnai). If an instance id is not supplied, we will create it as an anonymous individual which is of the type GENO:reagent_targeted_gene. We will also add the targets relationship between the reagent and gene class.

<targeted_gene_id> a GENO:reagent_targeted_gene rdfs:label targeted_gene_label dc:description description <reagent_id> GENO:targets_gene <gene_id>

Parameters:
  • reagent_id
  • gene_id
  • targeted_gene_id
  • reagent_category – a biolink category CURIE for reagent_id
Returns:

addReferenceGenome(build_id, build_label, taxon_id)
addSequenceAlteration(sa_id, sa_label, sa_type=None, sa_description=None)
addSequenceAlterationToVariantLocus(sa_id, vl_id)
addSequenceDerivesFrom(child_id, parent_id, child_category=None, parent_category=None)
addTargetedGeneComplement(tgc_id, tgc_label, tgc_type=None, tgc_description=None)
addTargetedGeneSubregion(tgs_id, tgs_label, tgs_type=None, tgs_description=None)
addTaxon(taxon_id, genopart_id, genopart_category=None)

The supplied geno part will have the specified taxon added with RO:in_taxon relation. Generally the taxon is associated with a genomic_background, but could be added to any genotype part (including a gene, regulatory element, or sequence alteration). :param taxon_id: :param genopart_id: :param genopart_category: a biolink term for genopart_id :return:

addVSLCtoParent(vslc_id, parent_id, part_category=None, parent_category=None)

The VSLC can either be added to a genotype or to a GVC. The vslc is added as a part of the parent. :param vslc_id: :param parent_id: :param part_category: a biolink category CURIE for part :param parent_category: a biolink category CURIE for parent :return:

static makeGenomeID(taxon_id)
make_experimental_model_with_genotype(genotype_id, genotype_label, taxon_id, taxon_label)
static make_variant_locus_label(gene_label, allele_label)
make_vslc_label(gene_label, allele1_label, allele2_label)

Make a Variant Single Locus Complement (VSLC) in monarch-style. :param gene_label: :param allele1_label: :param allele2_label: :return:

dipper.models.Model module
class dipper.models.Model.Model(graph)

Bases: object

Utility class to add common triples to a graph (subClassOf, type, label, sameAs)

addBlankNodeAnnotation(node_id)

Add an annotation property to the given `node_id` to be a pseudo blank node. This is a monarchism. :param node_id: :return:

addClassToGraph(class_id, label=None, class_type=None, description=None, class_category=None, class_type_category=None)

Any node added to the graph will get at least 3 triples: *(node, type, owl:Class) and *(node, label, literal(label)) *if a type is added,

then the node will be an OWL:subclassOf that the type
*if a description is provided,
it will also get added as a dc:description
Parameters:
  • class_id
  • label
  • class_type
  • description
  • class_category – a biolink category CURIE for class
  • class_type_category – a biolink category CURIE for class type
Returns:

addComment(subject_id, comment, subject_category=None)
addDefinition(class_id, definition, class_category=None)
addDepiction(subject_id, image_url)
addDeprecatedClass(old_id, new_ids=None, old_id_category=None, new_ids_category=None)

Will mark the oldid as a deprecated class. if one newid is supplied, it will mark it as replaced by. if >1 newid is supplied, it will mark it with consider properties :param old_id: str - the class id to deprecate :param new_ids: list - the class list that is

the replacement(s) of the old class. Not required.

:param old_id_category - a biolink category CURIE for old id :param new_ids_category - a biolink category CURIE for new ids :return: None

addDeprecatedIndividual(old_id, new_ids=None, old_id_category=None, new_id_category=None)

Will mark the oldid as a deprecated individual. if one newid is supplied, it will mark it as replaced by. if >1 newid is supplied, it will mark it with consider properties :param g: :param oldid: the individual id to deprecate :param newids: the individual idlist that is the replacement(s) of

the old individual. Not required.

:param old_id_category - a biolink category CURIE for old id :param new_ids_category - a biolink category CURIE for new ids :return:

addDescription(subject_id, description, subject_category=None)
addEquivalentClass(sub, obj, subject_category=None, object_category=None)
addIndividualToGraph(ind_id, label, ind_type=None, description=None, ind_category=None, ind_type_category=None)
addLabel(subject_id, label, subject_category=None)
addOWLPropertyClassRestriction(class_id, property_id, property_value, class_category=None, property_id_category=None, property_value_category=None)
addOWLVersionIRI(ontology_id, version_iri)
addOWLVersionInfo(ontology_id, version_info)
addOntologyDeclaration(ontology_id)
addPerson(person_id, person_label=None)
addSameIndividual(sub, obj, subject_category=None, object_category=None)
addSubClass(child_id, parent_id, child_category=None, parent_category=None)
addSynonym(class_id, synonym, synonym_type=None, class_category=None)

Add the synonym as a property of the class cid. Assume it is an exact synonym, unless otherwise specified :param self: :param class_id: class id :param synonym: the literal synonym label :param synonym_type: the CURIE of the synonym type (not the URI) :param class_category: biolink category CURIE for class_id (no biolink category is possible for synonym, since this is added to the triple as a literal) :return:

addTriple(subject_id, predicate_id, obj, object_is_literal=False, literal_type=None, subject_category=None, object_category=None)
addType(subject_id, subject_type, subject_category=None, subject_type_category=None)
addXref(class_id, xref_id, xref_as_literal=False, class_category=None, xref_category=None)
makeLeader(node_id)

Add an annotation property to the given `node_id` to be the clique_leader. This is a monarchism. :param node_id: :param node_category: a biolink category CURIE for node_id :return:

dipper.models.Pathway module
class dipper.models.Pathway.Pathway(graph)

Bases: object

This provides convenience methods to deal with gene and protein collections in the context of pathways.

addComponentToPathway(component_id, pathway_id)

This can be used directly when the component is directly involved in the pathway. If a transforming event is performed on the component first, then the addGeneToPathway should be used instead.

Parameters:
  • pathway_id
  • component_id
  • component_category – biolink category for component_id
  • pathway_category – biolink category for pathway_id
Returns:

addGeneToPathway(gene_id, pathway_id)

When adding a gene to a pathway, we create an intermediate ‘gene product’ that is involved in the pathway, through a blank node.

gene_id RO:has_gene_product _gene_product _gene_product RO:involved_in pathway_id

Parameters:
  • pathway_id
  • gene_id
Returns:

addPathway(pathway_id, pathway_label, pathway_type=None, pathway_description=None)

Adds a pathway as a class. If no specific type is specified, it will default to a subclass of “GO:cellular_process” and “PW:pathway”. :param pathway_id: :param pathway_label: :param pathway_type: :param pathway_description: :return:

dipper.models.Provenance module
class dipper.models.Provenance.Provenance(graph)

Bases: object

To model provenance as the basis for an association. This encompasses:

  • Process history leading to a claim being made, including processes through which evidence is evaluated
  • Processes through which information used as evidence is created.
Provenance metadata includes accounts of who conducted these processes,
what entities participated in them, and when/where they occurred.
add_agent_to_graph(agent_id, agent_label, agent_type=None, agent_description=None, agent_category=None)
add_assay_to_graph(assay_id, assay_label, assay_type=None, assay_description=None)
add_assertion(assertion, agent, agent_label, date=None)

Add assertion to graph :param assertion: :param agent: :param evidence_line: :param date: :return: None

add_date_created(prov_type, date)
add_study_measure(study, measure, object_is_literal=None)
add_study_parts(study, study_parts, study_parts_category=None)
add_study_to_measurements(study, measurements)
dipper.models.Reference module
class dipper.models.Reference.Reference(graph, ref_id=None, ref_type=None)

Bases: object

To model references for associations
(such as journal articles, books, etc.).
By default, references will be typed as “documents”,
unless if the type is set otherwise.
If a short_citation is set, this will be used for the individual’s label.
We may wish to subclass this later.
addAuthor(author)
addPage(subject_id, page_url, subject_category=None, page_category=None)
addRefToGraph()
addTitle(subject_id, title)
setAuthorList(author_list)
Parameters:author_list – Array of authors
Returns:
setShortCitation(citation)
setTitle(title)
setType(reference_type)
setYear(year)
dipper.sources package
Submodules
dipper.sources.AnimalQTLdb module
class dipper.sources.AnimalQTLdb.AnimalQTLdb(graph_type, are_bnodes_skolemized, data_release_version=None)

Bases: dipper.sources.Source.Source

The Animal Quantitative Trait Loci (QTL) database (Animal QTLdb) is designed to house publicly all available QTL and single-nucleotide polymorphism/gene association data on livestock animal species. This includes:

  • chicken
  • horse
  • cow
  • sheep
  • rainbow trout
  • pig

While most of the phenotypes here are related to animal husbandry, production, and rearing, integration of these phenotypes with other species may lead to insight for human disease.

Here, we use the QTL genetic maps and their computed genomic locations to create associations between the QTLs and their traits. The traits come in their internal Animal Trait ontology vocabulary, which they further map to [Vertebrate Trait](http://bioportal.bioontology.org/ontologies/VT), Product Trait, and Clinical Measurement Ontology vocabularies.

Since these are only associations to broad locations, we link the traits via “is_marker_for”, since there is no specific causative nature in the association. p-values for the associations are attached to the Association objects. We default to the UCSC build for the genomic coordinates, and make equivalences.

Any genetic position ranges that are <0, we do not include here.

GENEINFO = 'ftp://ftp.ncbi.nih.gov/gene/DATA/GENE_INFO'
GITDIP = 'https://raw.githubusercontent.com/monarch-initiative/dipper/master'
fetch(is_dl_forced=False)

abstract method to fetch all data from an external resource. this should be overridden by subclasses :return: None

files = {'Bos_taurus_info': {'columns': ['tax_id', 'GeneID', 'Symbol', 'LocusTag', 'Synonyms', 'dbXrefs', 'chromosome', 'map_location', 'description', 'type_of_gene', 'Symbol_from_nomenclature_authority', 'Full_name_from_nomenclature_authority', 'Nomenclature_status', 'Other_designations', 'Modification_date', 'Feature_type'], 'file': 'Bos_taurus.gene_info.gz', 'url': 'ftp://ftp.ncbi.nih.gov/gene/DATA/GENE_INFO/Mammalia/Bos_taurus.gene_info.gz'}, 'Equus_caballus_info': {'columns': ['tax_id', 'GeneID', 'Symbol', 'LocusTag', 'Synonyms', 'dbXrefs', 'chromosome', 'map_location', 'description', 'type_of_gene', 'Symbol_from_nomenclature_authority', 'Full_name_from_nomenclature_authority', 'Nomenclature_status', 'Other_designations', 'Modification_date', 'Feature_type'], 'file': 'Equus_caballus.gene_info.gz', 'url': 'https://archive.monarchinitiative.org/DipperCache/Equus_caballus.gene_info.gz'}, 'Gallus_gallus_info': {'columns': ['tax_id', 'GeneID', 'Symbol', 'LocusTag', 'Synonyms', 'dbXrefs', 'chromosome', 'map_location', 'description', 'type_of_gene', 'Symbol_from_nomenclature_authority', 'Full_name_from_nomenclature_authority', 'Nomenclature_status', 'Other_designations', 'Modification_date', 'Feature_type'], 'file': 'Gallus_gallus.gene_info.gz', 'url': 'ftp://ftp.ncbi.nih.gov/gene/DATA/GENE_INFO/Non-mammalian_vertebrates/Gallus_gallus.gene_info.gz'}, 'Oncorhynchus_mykiss_info': {'columns': ['tax_id', 'GeneID', 'Symbol', 'LocusTag', 'Synonyms', 'dbXrefs', 'chromosome', 'map_location', 'description', 'type_of_gene', 'Symbol_from_nomenclature_authority', 'Full_name_from_nomenclature_authority', 'Nomenclature_status', 'Other_designations', 'Modification_date', 'Feature_type'], 'file': 'Oncorhynchus_mykiss.gene_info.gz', 'url': 'https://archive.monarchinitiative.org/DipperCache/Oncorhynchus_mykiss.gene_info.gz'}, 'Ovis_aries_info': {'columns': ['tax_id', 'GeneID', 'Symbol', 'LocusTag', 'Synonyms', 'dbXrefs', 'chromosome', 'map_location', 'description', 'type_of_gene', 'Symbol_from_nomenclature_authority', 'Full_name_from_nomenclature_authority', 'Nomenclature_status', 'Other_designations', 'Modification_date', 'Feature_type'], 'file': 'Ovis_aries.gene_info.gz', 'url': 'https://archive.monarchinitiative.org/DipperCache/Ovis_aries.gene_info.gz'}, 'Sus_scrofa_info': {'columns': ['tax_id', 'GeneID', 'Symbol', 'LocusTag', 'Synonyms', 'dbXrefs', 'chromosome', 'map_location', 'description', 'type_of_gene', 'Symbol_from_nomenclature_authority', 'Full_name_from_nomenclature_authority', 'Nomenclature_status', 'Other_designations', 'Modification_date', 'Feature_type'], 'file': 'Sus_scrofa.gene_info.gz', 'url': 'ftp://ftp.ncbi.nih.gov/gene/DATA/GENE_INFO/Mammalia/Sus_scrofa.gene_info.gz'}, 'cattle_bp': {'columns': ['SEQNAME', 'SOURCE', 'FEATURE', 'START', 'END', 'SCORE', 'STRAND', 'FRAME', 'ATTRIBUTE'], 'curie': 'cattleQTL', 'file': 'QTL_Btau_4.6.gff.txt.gz', 'url': 'https://www.animalgenome.org/QTLdb/tmp/QTL_Btau_4.6.gff.txt.gz'}, 'cattle_cm': {'columns': ['QTL_ID', 'QTL_symbol', 'Trait_name', 'assotype', '(empty)', 'Chromosome', 'Position_cm', 'range_cm', 'FlankMark_A2', 'FlankMark_A1', 'Peak_Mark', 'FlankMark_B1', 'FlankMark_B2', 'Exp_ID', 'Model', 'testbase', 'siglevel', 'LOD_score', 'LS_mean', 'P_values', 'F_Statistics', 'VARIANCE', 'Bayes_value', 'LikelihoodR', 'TRAIT_ID', 'Dom_effect', 'Add_effect', 'PUBMED_ID', 'geneID', 'geneIDsrc', 'geneIDtype'], 'curie': 'cattleQTL', 'file': 'cattle_QTLdata.txt', 'url': 'https://www.animalgenome.org/QTLdb/export/KSUI8GFHOT6/cattle_QTLdata.txt'}, 'chicken_bp': {'columns': ['SEQNAME', 'SOURCE', 'FEATURE', 'START', 'END', 'SCORE', 'STRAND', 'FRAME', 'ATTRIBUTE'], 'curie': 'chickenQTL', 'file': 'QTL_GG_5.0.gff.txt.gz', 'url': 'https://www.animalgenome.org/QTLdb/tmp/QTL_GG_5.0.gff.txt.gz'}, 'chicken_cm': {'columns': ['QTL_ID', 'QTL_symbol', 'Trait_name', 'assotype', '(empty)', 'Chromosome', 'Position_cm', 'range_cm', 'FlankMark_A2', 'FlankMark_A1', 'Peak_Mark', 'FlankMark_B1', 'FlankMark_B2', 'Exp_ID', 'Model', 'testbase', 'siglevel', 'LOD_score', 'LS_mean', 'P_values', 'F_Statistics', 'VARIANCE', 'Bayes_value', 'LikelihoodR', 'TRAIT_ID', 'Dom_effect', 'Add_effect', 'PUBMED_ID', 'geneID', 'geneIDsrc', 'geneIDtype'], 'curie': 'chickenQTL', 'file': 'chicken_QTLdata.txt', 'url': 'https://www.animalgenome.org/QTLdb/export/KSUI8GFHOT6/chicken_QTLdata.txt'}, 'horse_bp': {'columns': ['SEQNAME', 'SOURCE', 'FEATURE', 'START', 'END', 'SCORE', 'STRAND', 'FRAME', 'ATTRIBUTE'], 'curie': 'horseQTL', 'file': 'QTL_EquCab2.0.gff.txt.gz', 'url': 'https://www.animalgenome.org/QTLdb/tmp/QTL_EquCab2.0.gff.txt.gz'}, 'horse_cm': {'columns': ['QTL_ID', 'QTL_symbol', 'Trait_name', 'assotype', '(empty)', 'Chromosome', 'Position_cm', 'range_cm', 'FlankMark_A2', 'FlankMark_A1', 'Peak_Mark', 'FlankMark_B1', 'FlankMark_B2', 'Exp_ID', 'Model', 'testbase', 'siglevel', 'LOD_score', 'LS_mean', 'P_values', 'F_Statistics', 'VARIANCE', 'Bayes_value', 'LikelihoodR', 'TRAIT_ID', 'Dom_effect', 'Add_effect', 'PUBMED_ID', 'geneID', 'geneIDsrc', 'geneIDtype'], 'curie': 'horseQTL', 'file': 'horse_QTLdata.txt', 'url': 'https://www.animalgenome.org/QTLdb/export/KSUI8GFHOT6/horse_QTLdata.txt'}, 'pig_bp': {'columns': ['SEQNAME', 'SOURCE', 'FEATURE', 'START', 'END', 'SCORE', 'STRAND', 'FRAME', 'ATTRIBUTE'], 'curie': 'pigQTL', 'file': 'QTL_SS_11.1.gff.txt.gz', 'url': 'https://www.animalgenome.org/QTLdb/tmp/QTL_SS_11.1.gff.txt.gz'}, 'pig_cm': {'columns': ['QTL_ID', 'QTL_symbol', 'Trait_name', 'assotype', '(empty)', 'Chromosome', 'Position_cm', 'range_cm', 'FlankMark_A2', 'FlankMark_A1', 'Peak_Mark', 'FlankMark_B1', 'FlankMark_B2', 'Exp_ID', 'Model', 'testbase', 'siglevel', 'LOD_score', 'LS_mean', 'P_values', 'F_Statistics', 'VARIANCE', 'Bayes_value', 'LikelihoodR', 'TRAIT_ID', 'Dom_effect', 'Add_effect', 'PUBMED_ID', 'geneID', 'geneIDsrc', 'geneIDtype'], 'curie': 'pigQTL', 'file': 'pig_QTLdata.txt', 'url': 'https://www.animalgenome.org/QTLdb/export/KSUI8GFHOT6/pig_QTLdata.txt'}, 'rainbow_trout_cm': {'columns': ['QTL_ID', 'QTL_symbol', 'Trait_name', 'assotype', '(empty)', 'Chromosome', 'Position_cm', 'range_cm', 'FlankMark_A2', 'FlankMark_A1', 'Peak_Mark', 'FlankMark_B1', 'FlankMark_B2', 'Exp_ID', 'Model', 'testbase', 'siglevel', 'LOD_score', 'LS_mean', 'P_values', 'F_Statistics', 'VARIANCE', 'Bayes_value', 'LikelihoodR', 'TRAIT_ID', 'Dom_effect', 'Add_effect', 'PUBMED_ID', 'geneID', 'geneIDsrc', 'geneIDtype'], 'curie': 'rainbow_troutQTL', 'file': 'rainbow_trout_QTLdata.txt', 'url': 'https://www.animalgenome.org/QTLdb/export/KSUI8GFHOT6/rainbow_trout_QTLdata.txt'}, 'sheep_bp': {'columns': ['SEQNAME', 'SOURCE', 'FEATURE', 'START', 'END', 'SCORE', 'STRAND', 'FRAME', 'ATTRIBUTE'], 'curie': 'sheepQTL', 'file': 'QTL_OAR_4.0.gff.txt.gz', 'url': 'https://www.animalgenome.org/QTLdb/tmp/QTL_OAR_4.0.gff.txt.gz'}, 'sheep_cm': {'columns': ['QTL_ID', 'QTL_symbol', 'Trait_name', 'assotype', '(empty)', 'Chromosome', 'Position_cm', 'range_cm', 'FlankMark_A2', 'FlankMark_A1', 'Peak_Mark', 'FlankMark_B1', 'FlankMark_B2', 'Exp_ID', 'Model', 'testbase', 'siglevel', 'LOD_score', 'LS_mean', 'P_values', 'F_Statistics', 'VARIANCE', 'Bayes_value', 'LikelihoodR', 'TRAIT_ID', 'Dom_effect', 'Add_effect', 'PUBMED_ID', 'geneID', 'geneIDsrc', 'geneIDtype'], 'curie': 'sheepQTL', 'file': 'sheep_QTLdata.txt', 'url': 'https://www.animalgenome.org/QTLdb/export/KSUI8GFHOT6/sheep_QTLdata.txt'}, 'trait_mappings': {'columns': ['VT', 'LPT', 'CMO', 'ATO', 'Species', 'Class', 'Type', 'QTL_Count'], 'file': 'trait_mappings.csv', 'url': 'https://www.animalgenome.org/QTLdb/export/trait_mappings.csv'}}
gene_info_columns = ['tax_id', 'GeneID', 'Symbol', 'LocusTag', 'Synonyms', 'dbXrefs', 'chromosome', 'map_location', 'description', 'type_of_gene', 'Symbol_from_nomenclature_authority', 'Full_name_from_nomenclature_authority', 'Nomenclature_status', 'Other_designations', 'Modification_date', 'Feature_type']
getTestSuite()

An abstract method that should be overwritten with tests appropriate for the specific source. :return:

gff_columns = ['SEQNAME', 'SOURCE', 'FEATURE', 'START', 'END', 'SCORE', 'STRAND', 'FRAME', 'ATTRIBUTE']
parse(limit=None)
Parameters:limit
Returns:
qtl_columns = ['QTL_ID', 'QTL_symbol', 'Trait_name', 'assotype', '(empty)', 'Chromosome', 'Position_cm', 'range_cm', 'FlankMark_A2', 'FlankMark_A1', 'Peak_Mark', 'FlankMark_B1', 'FlankMark_B2', 'Exp_ID', 'Model', 'testbase', 'siglevel', 'LOD_score', 'LS_mean', 'P_values', 'F_Statistics', 'VARIANCE', 'Bayes_value', 'LikelihoodR', 'TRAIT_ID', 'Dom_effect', 'Add_effect', 'PUBMED_ID', 'geneID', 'geneIDsrc', 'geneIDtype']
test_ids = {1795, 1798, 8945, 12532, 14234, 17138, 28483, 29016, 29018, 29385, 31023, 32133}
trait_mapping_columns = ['VT', 'LPT', 'CMO', 'ATO', 'Species', 'Class', 'Type', 'QTL_Count']
dipper.sources.Bgee module
class dipper.sources.Bgee.Bgee(graph_type, are_bnodes_skolemized, data_release_version=None, tax_ids=None, version=None)

Bases: dipper.sources.Source.Source

Bgee is a database to retrieve and compare gene expression patterns between animal species.

Bgee first maps heterogeneous expression data (currently RNA-Seq, Affymetrix, in situ hybridization, and EST data) to anatomy and development of different species.

Then, in order to perform automated cross species comparisons, homology relationships across anatomies, and comparison criteria between developmental stages, are designed.

check_if_remote_is_newer(localfile, remote_size, remote_modify)

Overrides check_if_remote_is_newer in Source class

Parameters:
  • localfile – str file path
  • remote_size – str bytes
  • remote_modify – str last modify date in the form 20160705042714
Returns:

boolean True if remote file is newer else False

default_species = ['Cavia porcellus', 'Mus musculus', 'Rattus norvegicus', 'Monodelphis domestica', 'Anolis carolinensis', 'Caenorhabditis elegans', 'Drosophila melanogaster', 'Danio rerio', 'Xenopus (Silurana) tropicalis', 'Gallus gallus', 'Ornithorhynchus anatinus', 'Erinaceus europaeus', 'Macaca mulatta', 'Gorilla gorilla', 'Pan paniscus', 'Pan troglodytes', 'Homo sapiens', 'Canis lupus familiaris', 'Felis catus', 'Equus caballus', 'Sus scrofa', 'Bos taurus', 'Oryctolagus cuniculus']
fetch(is_dl_forced=False)
Parameters:is_dl_forced – boolean, force download
Returns:
files = {'anat_entity': {'columns': ['Ensembl gene ID', 'gene name', 'anatomical entity ID', 'anatomical entity name', 'rank score', 'XRefs to BTO'], 'path': '/download/ranks/anat_entity/', 'pattern': re.compile('^[0-9]+_anat_entity_all_data_.*.tsv.gz')}}
parse(limit=None)

Given the input taxa, expects files in the raw directory with the name {tax_id}_anat_entity_all_data_Pan_troglodytes.tsv.zip

Parameters:limit – int Limit to top ranked anatomy associations per group
Returns:None
dipper.sources.BioGrid module
class dipper.sources.BioGrid.BioGrid(graph_type, are_bnodes_skolemized, data_release_version=None, tax_ids=None)

Bases: dipper.sources.Source.Source

Biogrid interaction data

biogrid_ids = [106638, 107308, 107506, 107674, 107675, 108277, 108506, 108767, 108814, 108899, 110308, 110364, 110678, 111642, 112300, 112365, 112771, 112898, 199832, 203220, 247276, 120150, 120160, 124085]
fetch(is_dl_forced=False)
Parameters:is_dl_forced
Returns:None
files = {'identifiers': {'file': 'BIOGRID-IDENTIFIERS-LATEST.tab.zip', 'url': 'https://downloads.thebiogrid.org/Download/BioGRID/Latest-Release/BIOGRID-IDENTIFIERS-LATEST.tab.zip'}, 'interactions': {'file': 'BIOGRID-ALL-LATEST.mitab.zip', 'url': 'https://downloads.thebiogrid.org/Download/BioGRID/Latest-Release/BIOGRID-ALL-LATEST.mitab.zip'}}
getTestSuite()

An abstract method that should be overwritten with tests appropriate for the specific source. :return:

parse(limit=None)
Parameters:limit
Returns:
dipper.sources.CTD module
class dipper.sources.CTD.CTD(graph_type, are_bnodes_skolemized, data_release_version=None)

Bases: dipper.sources.Source.Source

The Comparative Toxicogenomics Database (CTD) includes curated data describing cross-species chemical–gene/protein interactions and chemical– and gene–disease associations to illuminate molecular mechanisms underlying variable susceptibility and environmentally influenced diseases. (updated monthly).

Here, we fetch, parse, and convert data from CTD into triples, leveraging only the associations based on DIRECT evidence (not using the inferred associations). We currently process the following associations: * chemical-disease * gene-pathway * gene-disease

CTD curates relationships between genes and chemicals/diseases with ‘marker/mechanism’ or ‘therapeutic’. (observe strictly OR) Unfortunately, we cannot disambiguate between marker (gene expression) and mechanism (causation) for these associations. Therefore, we are left to relate these simply by “marker”.

# We DISCONTIUED at some point prior to 202005 # CTD also pulls in genes and pathway membership from KEGG and REACTOME. # We create groups of these following the pattern that the specific pathway # is a subclass of ‘cellular process’ (a go process), and the gene is # “involved in” that process.

For diseases, we preferentially use OMIM identifiers when they can be used uniquely over MESH. Otherwise, we use MESH ids.

Note that we scrub the following identifiers and their associated data: * REACT:REACT_116125 - generic disease class * MESH:D004283 - dog diseases * MESH:D004195 - disease models, animal * MESH:D030342 - genetic diseases, inborn * MESH:D040181 - genetic dieases, x-linked * MESH:D020022 - genetic predisposition to a disease

fetch(is_dl_forced=False)

Override Source.fetch() Fetches resources from CTD using the CTD.files dictionary Args: :param is_dl_forced (bool): Force download Returns: :return None

files = {'chemical_disease_associations': {'columns': ['ChemicalName', 'ChemicalID', 'CasRN', 'DiseaseName', 'DiseaseID', 'DirectEvidence', 'InferenceGeneSymbol', 'InferenceScore', 'OmimIDs', 'PubMedIDs'], 'file': 'CTD_chemicals_diseases.tsv.gz', 'url': 'http://ctdbase.org/reports/CTD_chemicals_diseases.tsv.gz'}}
getTestSuite()

An abstract method that should be overwritten with tests appropriate for the specific source. :return:

parse(limit=None)

Override Source.parse() Parses version and interaction information from CTD Args: :param limit (int, optional) limit the number of rows processed Returns: :return None

dipper.sources.ClinVar module

Converts ClinVar XML into RDF triples to be ingested by SciGraph. These triples conform to the core of the SEPIO Evidence & Provenance model

We also use the clinvar curated gene to disease mappings to discern the functional consequence of a variant on a gene in cases where this is ambiguous. For example, some variants are located in two genes overlapping on different strands, and may only have a functional consequence on one gene. This is suboptimal and we should look for a source that directly provides this.

creating a test set.
get a full dataset default ClinVarFullRelease_00-latest.xml.gz get the mapping file default gene_condition_source_id get a list of RCV default CV_test_RCV.txt put the input files the raw directory write the test set back to the raw directory

./scripts/ClinVarXML_Subset.sh | gzip > raw/clinvar/ClinVarTestSet.xml.gz

parsing a test set (Skolemizing blank nodes i.e. for Protege) dipper/sources/ClinVar.py -f ClinVarTestSet.xml.gz -o ClinVarTestSet_`datestamp`.nt

For while we are still required to redundantly conflate the owl properties in with the data files.

python3 ./scripts/add-properties2turtle.py –input ./out/ClinVarTestSet_`datestamp`.nt –output ./out/ClinVarTestSet_`datestamp`.nt –format nt

dipper.sources.ClinVar.allele_to_triples(allele, triples) → None

Process allele info such as dbsnp ids and synonyms :param allele: Allele :param triples: List, Buffer to store the triples :return: None

dipper.sources.ClinVar.digest_id(wordage)

return a deterministic digest of input the ‘b’ is an experiment forcing the first char to be non numeric but valid hex; which is in no way required for RDF but may help when using the identifier in other contexts which do not allow identifiers to begin with a digit

:param wordage the string to hash :returns 20 hex char digest

dipper.sources.ClinVar.expand_curie(this_curie)
dipper.sources.ClinVar.is_literal(thing)

make inference on type (literal or CURIE)

return: logical

dipper.sources.ClinVar.make_spo(sub, prd, obj, subject_category=None, object_category=None)

Decorates the three given strings as a line of ntriples (also writes a triple for subj biolink:category and obj biolink:category)

dipper.sources.ClinVar.parse()

Main function for parsing a clinvar XML release and outputting triples

dipper.sources.ClinVar.process_measure_set(measure_set, rcv_acc) → dipper.models.ClinVarRecord.Variant

Given a MeasureSet, create a Variant object :param measure_set: XML object :param rcv_acc: str rcv accession :return: Variant object

dipper.sources.ClinVar.record_to_triples(rcv: dipper.models.ClinVarRecord.ClinVarRecord, triples: List[T], g2p_map: Dict[KT, VT]) → None

Given a ClinVarRecord, adds triples to the triples list

Parameters:
  • rcv – ClinVarRecord
  • triples – List, Buffer to store the triples
  • g2p_map – Gene to phenotype dict
Returns:

None

dipper.sources.ClinVar.resolve(label)

composite mapping given f(x) and g(x) here: GLOBALTT & LOCALTT respectivly in order of preference return g(f(x))|f(x)|g(x) | x TODO consider returning x on fall through

# the decendent resolve(label) function in Source.py # should be used instead and this f(x) removed

: return label’s mapping

Creates links between SCV based on their pathonicty/significance calls

# GENO:0000840 - GENO:0000840 –> is_equilavent_to SEPIO:0000098 # GENO:0000841 - GENO:0000841 –> is_equilavent_to SEPIO:0000098 # GENO:0000843 - GENO:0000843 –> is_equilavent_to SEPIO:0000098 # GENO:0000844 - GENO:0000844 –> is_equilavent_to SEPIO:0000098 # GENO:0000840 - GENO:0000844 –> contradicts SEPIO:0000101 # GENO:0000841 - GENO:0000844 –> contradicts SEPIO:0000101 # GENO:0000841 - GENO:0000843 –> contradicts SEPIO:0000101 # GENO:0000840 - GENO:0000841 –> is_consistent_with SEPIO:0000099 # GENO:0000843 - GENO:0000844 –> is_consistent_with SEPIO:0000099 # GENO:0000840 - GENO:0000843 –> strongly_contradicts SEPIO:0000100

dipper.sources.ClinVar.write_review_status_scores()

Make triples that attach a “star” score to each of ClinVar’s review statuses. (Stars are basically a 0-4 rating of the review status.)

Per https://www.ncbi.nlm.nih.gov/clinvar/docs/details/ Table 1. The review status and assignment of stars( with changes made mid-2015) Number of gold stars Description and review statuses

NO STARS: <ReviewStatus> “no assertion criteria provided” <ReviewStatus> “no assertion provided” No submitter provided an interpretation with assertion criteria (no assertion criteria provided), or no interpretation was provided (no assertion provided)

ONE STAR: <ReviewStatus> “criteria provided, single submitter” <ReviewStatus> “criteria provided, conflicting interpretations” One submitter provided an interpretation with assertion criteria (criteria provided, single submitter) or multiple submitters provided assertion criteria but there are conflicting interpretations in which case the independent values are enumerated for clinical significance (criteria provided, conflicting interpretations)

TWO STARS: <ReviewStatus> “criteria provided, multiple submitters, no conflicts” Two or more submitters providing assertion criteria provided the same interpretation (criteria provided, multiple submitters, no conflicts)

THREE STARS: <ReviewStatus> “reviewed by expert panel” reviewed by expert panel

FOUR STARS: <ReviewStatus> “practice guideline” practice guideline A group wishing to be recognized as an expert panel must first apply to ClinGen by completing the form that can be downloaded from our ftp site.

:param None :return: list of triples that attach a “star” score to each of ClinVar’s review statuses

dipper.sources.ClinVar.write_spo(sub, prd, obj, triples, subject_category=None, object_category=None)

write triples to a buffer in case we decide to drop them

dipper.sources.Coriell module
class dipper.sources.Coriell.Coriell(graph_type, are_bnodes_skolemized, data_release_version=None)

Bases: dipper.sources.Source.Source

The Coriell Catalog provided to Monarch includes metadata and descriptions of NIGMS, NINDS, NHGRI, and NIA cell lines. These lines are made available for research purposes. Here, we create annotations for the cell lines as models of the diseases from which they originate.

We create a handle for a patient from which the given cell line is derived (since there may be multiple cell lines created from a given patient). A genotype is assembled for a patient, which includes a karyotype (if specified) and/or a collection of variants. Both the genotype (has_genotype) and disease are linked to the patient (has_phenotype), and the cell line is listed as derived from the patient. The cell line is classified by it’s [CLO cell type](http://www.ontobee.org/browser/index.php?o=clo), which itself is linked to a tissue of origin.

Unfortunately, the omim numbers listed in this file are both for genes & diseases; we have no way of knowing a priori if a designated omim number is a gene or disease; so we presently link the patient to any omim id via the has_phenotype relationship.

Notice: The Coriell catalog is delivered to Monarch in a specific format, and requires ssh rsa fingerprint identification. Other groups wishing to get this data in it’s raw form will need to contact Coriell for credential This needs to be placed into your configuration file for it to work.

column_labels = ['catalog_id', 'description', 'omim_num', 'sample_type', 'cell_line_available', 'dna_instock', 'dna_ref', 'gender', 'age', 'race', 'ethnicity', 'affected', 'karyotype', 'relprob', 'mutation', 'gene', 'fam', 'collection', 'url', 'cat_remark', 'pubmed_ids', 'fammember', 'variant_id', 'dbsnp_id', 'species']
fetch(is_dl_forced=False)

Here we connect to the coriell sftp server using private connection details. They dump bi-weekly files with a timestamp in the filename. For each catalog, we ping the remote site and pull the most-recently updated file, renaming it to our local latest.csv.

Be sure to have pg user/password connection details in your conf.yaml file, like: dbauth : {“coriell” : { “user” : “<username>”, “password” : “<password>”, “host” : <host>, “private_key”=path/to/rsa_key} }

Parameters:is_dl_forced
Returns:
files = {'NHGRI': {'columns': ['catalog_id', 'description', 'omim_num', 'sample_type', 'cell_line_available', 'dna_instock', 'dna_ref', 'gender', 'age', 'race', 'ethnicity', 'affected', 'karyotype', 'relprob', 'mutation', 'gene', 'fam', 'collection', 'url', 'cat_remark', 'pubmed_ids', 'fammember', 'variant_id', 'dbsnp_id', 'species'], 'file': 'NHGRI.csv', 'id': 'NHGRI', 'label': 'NHGRI Sample Repository for Human Genetic Research', 'page': 'https://catalog.coriell.org/1/NHGRI'}, 'NIA': {'columns': ['catalog_id', 'description', 'omim_num', 'sample_type', 'cell_line_available', 'dna_instock', 'dna_ref', 'gender', 'age', 'race', 'ethnicity', 'affected', 'karyotype', 'relprob', 'mutation', 'gene', 'fam', 'collection', 'url', 'cat_remark', 'pubmed_ids', 'fammember', 'variant_id', 'dbsnp_id', 'species'], 'file': 'NIA.csv', 'id': 'NIA', 'label': 'NIA Aging Cell Repository', 'page': 'https://catalog.coriell.org/1/NIA'}, 'NIGMS': {'columns': ['catalog_id', 'description', 'omim_num', 'sample_type', 'cell_line_available', 'dna_instock', 'dna_ref', 'gender', 'age', 'race', 'ethnicity', 'affected', 'karyotype', 'relprob', 'mutation', 'gene', 'fam', 'collection', 'url', 'cat_remark', 'pubmed_ids', 'fammember', 'variant_id', 'dbsnp_id', 'species'], 'file': 'NIGMS.csv', 'id': 'NIGMS', 'label': 'NIGMS Human Genetic Cell Repository', 'page': 'https://catalog.coriell.org/1/NIGMS'}, 'NINDS': {'columns': ['catalog_id', 'description', 'omim_num', 'sample_type', 'cell_line_available', 'dna_instock', 'dna_ref', 'gender', 'age', 'race', 'ethnicity', 'affected', 'karyotype', 'relprob', 'mutation', 'gene', 'fam', 'collection', 'url', 'cat_remark', 'pubmed_ids', 'fammember', 'variant_id', 'dbsnp_id', 'species'], 'file': 'NINDS.csv', 'id': 'NINDS', 'label': 'NINDS Human Genetics DNA and Cell line Repository', 'page': 'https://catalog.coriell.org/1/NINDS'}}
getTestSuite()

An abstract method that should be overwritten with tests appropriate for the specific source. :return:

parse(limit=None)

abstract method to parse all data from an external resource, that was fetched in fetch() this should be overridden by subclasses :return: None

test_lines = ['ND02380', 'ND02381', 'ND02383', 'ND02384', 'GM17897', 'GM17898', 'GM17896', 'GM17944', 'GM17945', 'ND00055', 'ND00094', 'ND00136', 'GM17940', 'GM17939', 'GM20567', 'AG02506', 'AG04407', 'AG07602AG07601', 'GM19700', 'GM19701', 'GM19702', 'GM00324', 'GM00325', 'GM00142', 'NA17944', 'AG02505', 'GM01602', 'GM02455', 'AG00364', 'GM13707', 'AG00780']
dipper.sources.Decipher module
dipper.sources.EBIGene2Phen module
class dipper.sources.EBIGene2Phen.EBIGene2Phen(graph_type, are_bnodes_skolemized, data_release_version=None)

Bases: dipper.sources.Source.Source

From EBI: The gene2phenotype dataset (G2P) integrates data on genes, variants and phenotypes for example relating to developmental disorders. It is constructed entirely from published literature, and is primarily an inclusion list to allow targeted filtering of genome-wide data for diagnostic purposes. The dataset was compiled with respect to published genes, and annotated with types of disease- causing gene variants. Each row of the dataset associates a gene with a disease phenotype via an evidence level, inheritance mechanism and mutation consequence. Some genes therefore appear in the database more than once, where different genetic mechanisms result in different phenotypes.

Disclaimer: https://www.ebi.ac.uk/gene2phenotype/disclaimer Terms of Use: https://www.ebi.ac.uk/about/terms-of-use#general Documentation: https://www.ebi.ac.uk/gene2phenotype/documentation

This script operates on the Developmental Disorders (DDG2P.csv) file In the future we may update to include the cancer gene disease pairs in the CancerG2P.csv file

EBI_BASE = 'https://www.ebi.ac.uk/gene2phenotype/downloads/'
fetch(is_dl_forced: bool = False)

Fetch DDG2P.csv.gz and check headers to see if it has been updated

Parameters:is_dl_forced – {bool}
Returns:None
files = {'developmental_disorders': {'columns': ['gene_symbol', 'gene_omim_id', 'disease_label', 'disease_omim_id', 'g2p_relation_label', 'allelic_requirement', 'mutation_consequence', 'phenotypes', 'organ_specificity_list', 'pmids', 'panel', 'prev_symbols', 'hgnc_id', 'entry_date'], 'file': 'DDG2P.csv.gz', 'url': 'https://www.ebi.ac.uk/gene2phenotype/downloads/DDG2P.csv.gz'}}
map_files = {'mondo_map': 'https://data.monarchinitiative.org/dipper/cache/unmapped_ebi_diseases.tsv'}
parse(limit: Optional[int] = None)

Here we parse each row of the gene to phenotype file

We create anonymous variants along with their attributes (allelic requirement, functional consequence) and connect these to genes and diseases

genes are connected to variants via global_terms[‘has_affected_locus’]

variants are connected to attributes via: global_terms[‘has_allelic_requirement’] global_terms[‘has_functional_consequence’]

variants are connected to disease based on mappings to the DDD category column, see the translationtable specific to this source for mappings

For cases where there are no disease OMIM id, we either use a disease cache file with mappings to MONDO that has been manually curated

Parameters:limit – {int} number of rows to parse
Returns:None
dipper.sources.EOM module
class dipper.sources.EOM.EOM(graph_type, are_bnodes_skolemized, data_release_version=None)

Bases: dipper.sources.PostgreSQLSource.PostgreSQLSource

Elements of Morphology is a resource from NHGRI that has definitions of morphological abnormalities, together with image depictions. We pull those relationships, as well as our local mapping of equivalences between EOM and HP terminologies.

The website is crawled monthly by NIF’s DISCO crawler system, which we utilize here. Be sure to have pg user/password connection details in your conf.yaml file, like: dbauth : {‘disco’ : {‘user’ : ‘<username>’, ‘password’ : ‘<password>’}}

Monarch-curated data for the HP to EOM mapping is stored at https://raw.githubusercontent.com/obophenotype/human-phenotype-ontology/master/src/mappings/hp-to-eom-mapping.tsv

Since this resource is so small, the entirety of it is the “test” set.

GHRAW = 'https://raw.githubusercontent.com/obophenotype/human-phenotype-ontology'
fetch(is_dl_forced=False)

connection details for DISCO

files = {'map': {'columns': ['morphology_term_id', 'morphology_term_label', 'HP ID', 'HP Label', 'Notes'], 'file': 'hp-to-eom-mapping.tsv', 'url': 'https://raw.githubusercontent.com/obophenotype/human-phenotype-ontology/master/src/mappings/hp-to-eom-mapping.tsv'}}
getTestSuite()

An abstract method that should be overwritten with tests appropriate for the specific source. :return:

parse(limit=None)

Over ride Source.parse inherited via PostgreSQLSource

resources = {'tables': {'columns': ['morphology_term_id', 'morphology_term_num', 'morphology_term_label', 'morphology_term_url', 'terminology_category_label', 'terminology_category_url', 'subcategory', 'objective_definition', 'subjective_definition', 'comments', 'synonyms', 'replaces', 'small_figure_url', 'large_figure_url', 'e_uid', 'v_uid', 'v_uuid', 'v_lastmodified', 'v_status', 'v_lastmodified_epoch'], 'file': 'dvp.pr_nlx_157874_1', 'url': 'nif-db.crbs.ucsd.edu:5432'}}
tables = ['dvp.pr_nlx_157874_1']
dipper.sources.Ensembl module
class dipper.sources.Ensembl.Ensembl(graph_type, are_bnodes_skolemized, data_release_version=None, tax_ids=None, gene_ids=None)

Bases: dipper.sources.Source.Source

This is the processing module for Ensembl.

It only includes methods to acquire the equivalences between NCBIGene and ENSG ids using ENSEMBL’s Biomart services.

columns = {'bmq_attributes': ['ensembl_gene_id', 'external_gene_name', 'description', 'gene_biotype', 'entrezgene_id', 'ensembl_peptide_id', 'uniprotswissprot', 'hgnc_id'], 'bmq_headers': ['Gene stable ID', 'Gene name', 'Gene description', 'Gene type', 'NCBI gene (formerly Entrezgene) ID', 'Protein stable ID', 'UniProtKB/Swiss-Prot ID', 'HGNC ID']}
fetch(is_dl_forced=True)

abstract method to fetch all data from an external resource. this should be overridden by subclasses :return: None

fetch_protein_gene_map(taxon_id)

Fetch a mapping from proteins to ensembl_gene(S)? for a species in biomart :param taxid: :return: dict

fetch_uniprot_gene_map(taxon_id)

Fetch a dict of uniprot-gene for a species in biomart :param taxid: :return: dict

files = {'10090': {'file': 'ensembl_10090.txt'}, '10116': {'file': 'ensembl_10116.txt'}, '13616': {'file': 'ensembl_13616.txt'}, '28377': {'file': 'ensembl_28377.txt'}, '31033': {'file': 'ensembl_31033.txt'}, '3702': {'file': 'ensembl_3702.txt'}, '44689': {'file': 'ensembl_44689.txt'}, '4896': {'file': 'ensembl_4896.txt'}, '4932': {'file': 'ensembl_4932.txt'}, '6239': {'file': 'ensembl_6239.txt'}, '7227': {'file': 'ensembl_7227.txt'}, '7955': {'file': 'ensembl_7955.txt'}, '8364': {'file': 'ensembl_8364.txt'}, '9031': {'file': 'ensembl_9031.txt'}, '9258': {'file': 'ensembl_9258.txt'}, '9544': {'file': 'ensembl_9544.txt'}, '9606': {'file': 'ensembl_9606.txt'}, '9615': {'file': 'ensembl_9615.txt'}, '9796': {'file': 'ensembl_9796.txt'}, '9823': {'file': 'ensembl_9823.txt'}, '9913': {'file': 'ensembl_9913.txt'}}
getTestSuite()

An abstract method that should be overwritten with tests appropriate for the specific source. :return:

parse(limit=None)

abstract method to parse all data from an external resource, that was fetched in fetch() this should be overridden by subclasses :return: None

dipper.sources.FlyBase module
class dipper.sources.FlyBase.FlyBase(graph_type, are_bnodes_skolemized, data_release_version=None)

Bases: dipper.sources.PostgreSQLSource.PostgreSQLSource

This is the [Drosophila Genetics](http://www.flybase.org/) resource, from which we process genotype and phenotype data about the fruit fly.

Here, we connect to their public database and download preprocessed files

Queries from the relational db 1. allele-phenotype data: ../../sources/sql/fb/allele_phenotype.sql 2. gene dbxrefs: ../../resources/sql/fb/gene_xref.sql

Downloads: 1. allele_human_disease_model_data_fb_*.tsv.gz - models of disease 2. species.ab.gz - species prefix mappings 3. fbal_to_fbgn_fb*.tsv.gz - allele to gene 4. fbrf_pmid_pmcid_doi_fb_*.tsv.gz - flybase ref to pmid

We connect using the [Direct Chado Access](http://gmod.org/wiki/ Public_Chado_Databases#Direct_Chado_Access)

When running the whole set, it performs best by dumping raw triples using the flag `--format nt`.

Note that this script underwent a major revision after commit bd5f555 in which genotypes, stocks, and environments were removed

CURREL = 'releases/current/precomputed_files'
FLYFTP = 'ftp.flybase.net'
fetch(is_dl_forced=False)

Fetch flat files and sql queries

Parameters:is_dl_forced – force download
Returns:None
files = {'allele_gene': {'columns': ['AlleleID', 'AlleleSymbol', 'GeneID', 'GeneSymbol'], 'file': 'fbal_to_fbgn_fb.tsv.gz', 'url': 'releases/current/precomputed_files/alleles/fbal_to_fbgn.*tsv\\.gz$'}, 'disease_model': {'columns': ['FBgn ID', 'Gene symbol', 'HGNC ID', 'DO qualifier', 'DO ID', 'DO term', 'Allele used in model (FBal ID)', 'Allele used in model (symbol)', 'Based on orthology with (HGNC ID)', 'Based on orthology with (symbol)', 'Evidence/interacting alleles', 'Reference (FBrf ID)'], 'file': 'disease_model_annotations.tsv.gz', 'url': 'releases/current/precomputed_files/human_disease/disease_model_annotations.+tsv\\.gz$'}, 'ref_pubmed': {'columns': ['FBrf', 'PMID', 'PMCID', 'DOI', 'pub_type', 'miniref', 'pmid_added'], 'file': 'fbrf_pmid_pmcid_doi_fb.tsv.gz', 'url': 'releases/current/precomputed_files/references/fbrf_pmid_pmcid_doi.+tsv\\.gz$'}, 'species_map': {'columns': ['internal_id', 'taxgroup', 'abbreviation', 'genus', 'species name', 'common name', 'comment', 'ncbi-taxon-id'], 'file': 'species.ab.gz', 'url': 'releases/current/precomputed_files/species/species\\.ab\\.gz$'}}
parse(limit=None)

Parse flybase files and add to graph

Parameters:limit – number of rows to process
Returns:None
queries = {'allele_phenotype': {'columns': ['allele_id', 'pheno_desc', 'pheno_type', 'pub_id', 'pub_title', 'pmid_id'], 'file': 'allele_phenotype.tsv', 'query': '../../resources/sql/fb/allele_phenotype.sql'}, 'gene_xref': {'columns': ['gene_id', 'xref_id', 'xref_source'], 'file': 'gene_xref.tsv', 'query': '../../resources/sql/fb/gene_xref.sql'}}
dipper.sources.GWASCatalog module
class dipper.sources.GWASCatalog.GWASCatalog(graph_type, are_bnodes_skolemized, data_release_version=None)

Bases: dipper.sources.Source.Source

The NHGRI-EBI Catalog of published genome-wide association studies.

We link the variants recorded here to the curated EFO-classes using a “contributes to” linkage because the only thing we know is that the SNPs are associated with the trait/disease, but we don’t know if it is actually causative.

Description of the GWAS catalog is here: http://www.ebi.ac.uk/gwas/docs/fileheaders#_file_headers_for_catalog_version_1_0_1

GWAS also pulishes Owl files described here http://www.ebi.ac.uk/gwas/docs/ontology

Status: IN PROGRESS

GWASFILE = 'gwas-catalog-associations_ontology-annotated.tsv'
GWASFTP = 'ftp://ftp.ebi.ac.uk/pub/databases/gwas/releases/latest/'
fetch(is_dl_forced=False)
Parameters:is_dl_forced
Returns:
files = {'catalog': {'columns': ['DATE ADDED TO CATALOG', 'PUBMEDID', 'FIRST AUTHOR', 'DATE', 'JOURNAL', 'LINK', 'STUDY', 'DISEASE/TRAIT', 'INITIAL SAMPLE SIZE', 'REPLICATION SAMPLE SIZE', 'REGION', 'CHR_ID', 'CHR_POS', 'REPORTED GENE(S)', 'MAPPED_GENE', 'UPSTREAM_GENE_ID', 'DOWNSTREAM_GENE_ID', 'SNP_GENE_IDS', 'UPSTREAM_GENE_DISTANCE', 'DOWNSTREAM_GENE_DISTANCE', 'STRONGEST SNP-RISK ALLELE', 'SNPS', 'MERGED', 'SNP_ID_CURRENT', 'CONTEXT', 'INTERGENIC', 'RISK ALLELE FREQUENCY', 'P-VALUE', 'PVALUE_MLOG', 'P-VALUE (TEXT)', 'OR or BETA', '95% CI (TEXT)', 'PLATFORM [SNPS PASSING QC]', 'CNV', 'MAPPED_TRAIT', 'MAPPED_TRAIT_URI', 'STUDY ACCESSION', 'GENOTYPING TECHNOLOGY'], 'file': 'gwas-catalog-associations_ontology-annotated.tsv', 'url': 'ftp://ftp.ebi.ac.uk/pub/databases/gwas/releases/latest/gwas-catalog-associations_ontology-annotated.tsv'}, 'mondo': {'file': 'mondo.json', 'url': 'https://github.com/monarch-initiative/mondo/releases/download/2019-04-06/mondo-minimal.json'}, 'so': {'file': 'so.owl', 'url': 'http://purl.obolibrary.org/obo/so.owl'}}
parse(limit=None)

abstract method to parse all data from an external resource, that was fetched in fetch() this should be overridden by subclasses :return: None

process_catalog(limit=None)
Parameters:limit
Returns:
dipper.sources.GeneOntology module
class dipper.sources.GeneOntology.GeneOntology(graph_type, are_bnodes_skolemized, data_release_version=None, tax_ids=None)

Bases: dipper.sources.Source.Source

This is the parser for the [Gene Ontology Annotations](http://www.geneontology.org), from which we process gene-process/function/subcellular location associations.

We generate the GO graph to include the following information: * genes * gene-process * gene-function * gene-location

We process only a subset of the organisms:

Status: IN PROGRESS / INCOMPLETE

fetch(is_dl_forced=False)

abstract method to fetch all data from an external resource. this should be overridden by subclasses :return: None

files = {'10090': {'columnns': ['DB', 'DB_Object_ID', 'DB_Object_Symbol', 'Qualifier', 'GO_ID', 'DB:Reference', 'Evidence Code', 'With (or) From', 'Aspect', 'DB_Object_Name', 'DB_Object_Synonym', 'DB_Object_Type', 'Taxon and Interacting taxon', 'Date', 'Assigned_By', 'Annotation_Extension', 'Gene_Product_Form_ID'], 'file': 'mgi.gaf.gz', 'url': 'http://current.geneontology.org/annotations/mgi.gaf.gz'}, '10116': {'columnns': ['DB', 'DB_Object_ID', 'DB_Object_Symbol', 'Qualifier', 'GO_ID', 'DB:Reference', 'Evidence Code', 'With (or) From', 'Aspect', 'DB_Object_Name', 'DB_Object_Synonym', 'DB_Object_Type', 'Taxon and Interacting taxon', 'Date', 'Assigned_By', 'Annotation_Extension', 'Gene_Product_Form_ID'], 'file': 'rgd.gaf.gz', 'url': 'http://current.geneontology.org/annotations/rgd.gaf.gz'}, '4896': {'columnns': ['DB', 'DB_Object_ID', 'DB_Object_Symbol', 'Qualifier', 'GO_ID', 'DB:Reference', 'Evidence Code', 'With (or) From', 'Aspect', 'DB_Object_Name', 'DB_Object_Synonym', 'DB_Object_Type', 'Taxon and Interacting taxon', 'Date', 'Assigned_By', 'Annotation_Extension', 'Gene_Product_Form_ID'], 'file': 'pombase.gaf.gz', 'url': 'http://current.geneontology.org/annotations/pombase.gaf.gz'}, '5052': {'columnns': ['DB', 'DB_Object_ID', 'DB_Object_Symbol', 'Qualifier', 'GO_ID', 'DB:Reference', 'Evidence Code', 'With (or) From', 'Aspect', 'DB_Object_Name', 'DB_Object_Synonym', 'DB_Object_Type', 'Taxon and Interacting taxon', 'Date', 'Assigned_By', 'Annotation_Extension', 'Gene_Product_Form_ID'], 'file': 'aspgd.gaf.gz', 'url': 'http://current.geneontology.org/annotations/aspgd.gaf.gz'}, '559292': {'columnns': ['DB', 'DB_Object_ID', 'DB_Object_Symbol', 'Qualifier', 'GO_ID', 'DB:Reference', 'Evidence Code', 'With (or) From', 'Aspect', 'DB_Object_Name', 'DB_Object_Synonym', 'DB_Object_Type', 'Taxon and Interacting taxon', 'Date', 'Assigned_By', 'Annotation_Extension', 'Gene_Product_Form_ID'], 'file': 'sgd.gaf.gz', 'url': 'http://current.geneontology.org/annotations/sgd.gaf.gz'}, '5782': {'columnns': ['DB', 'DB_Object_ID', 'DB_Object_Symbol', 'Qualifier', 'GO_ID', 'DB:Reference', 'Evidence Code', 'With (or) From', 'Aspect', 'DB_Object_Name', 'DB_Object_Synonym', 'DB_Object_Type', 'Taxon and Interacting taxon', 'Date', 'Assigned_By', 'Annotation_Extension', 'Gene_Product_Form_ID'], 'file': 'dictybase.gaf.gz', 'url': 'http://current.geneontology.org/annotations/dictybase.gaf.gz'}, '6239': {'columnns': ['DB', 'DB_Object_ID', 'DB_Object_Symbol', 'Qualifier', 'GO_ID', 'DB:Reference', 'Evidence Code', 'With (or) From', 'Aspect', 'DB_Object_Name', 'DB_Object_Synonym', 'DB_Object_Type', 'Taxon and Interacting taxon', 'Date', 'Assigned_By', 'Annotation_Extension', 'Gene_Product_Form_ID'], 'file': 'wb.gaf.gz', 'url': 'http://current.geneontology.org/annotations/wb.gaf.gz'}, '7227': {'columnns': ['DB', 'DB_Object_ID', 'DB_Object_Symbol', 'Qualifier', 'GO_ID', 'DB:Reference', 'Evidence Code', 'With (or) From', 'Aspect', 'DB_Object_Name', 'DB_Object_Synonym', 'DB_Object_Type', 'Taxon and Interacting taxon', 'Date', 'Assigned_By', 'Annotation_Extension', 'Gene_Product_Form_ID'], 'file': 'fb.gaf.gz', 'url': 'http://current.geneontology.org/annotations/fb.gaf.gz'}, '7955': {'columnns': ['DB', 'DB_Object_ID', 'DB_Object_Symbol', 'Qualifier', 'GO_ID', 'DB:Reference', 'Evidence Code', 'With (or) From', 'Aspect', 'DB_Object_Name', 'DB_Object_Synonym', 'DB_Object_Type', 'Taxon and Interacting taxon', 'Date', 'Assigned_By', 'Annotation_Extension', 'Gene_Product_Form_ID'], 'file': 'zfin.gaf.gz', 'url': 'http://current.geneontology.org/annotations/zfin.gaf.gz'}, '9031': {'columnns': ['DB', 'DB_Object_ID', 'DB_Object_Symbol', 'Qualifier', 'GO_ID', 'DB:Reference', 'Evidence Code', 'With (or) From', 'Aspect', 'DB_Object_Name', 'DB_Object_Synonym', 'DB_Object_Type', 'Taxon and Interacting taxon', 'Date', 'Assigned_By', 'Annotation_Extension', 'Gene_Product_Form_ID'], 'file': 'goa_chicken.gaf.gz', 'url': 'http://current.geneontology.org/annotations/goa_chicken.gaf.gz'}, '9606': {'columnns': ['DB', 'DB_Object_ID', 'DB_Object_Symbol', 'Qualifier', 'GO_ID', 'DB:Reference', 'Evidence Code', 'With (or) From', 'Aspect', 'DB_Object_Name', 'DB_Object_Synonym', 'DB_Object_Type', 'Taxon and Interacting taxon', 'Date', 'Assigned_By', 'Annotation_Extension', 'Gene_Product_Form_ID'], 'file': 'goa_human.gaf.gz', 'url': 'http://current.geneontology.org/annotations/goa_human.gaf.gz'}, '9615': {'columnns': ['DB', 'DB_Object_ID', 'DB_Object_Symbol', 'Qualifier', 'GO_ID', 'DB:Reference', 'Evidence Code', 'With (or) From', 'Aspect', 'DB_Object_Name', 'DB_Object_Synonym', 'DB_Object_Type', 'Taxon and Interacting taxon', 'Date', 'Assigned_By', 'Annotation_Extension', 'Gene_Product_Form_ID'], 'file': 'goa_dog.gaf.gz', 'url': 'http://current.geneontology.org/annotations/goa_dog.gaf.gz'}, '9823': {'columnns': ['DB', 'DB_Object_ID', 'DB_Object_Symbol', 'Qualifier', 'GO_ID', 'DB:Reference', 'Evidence Code', 'With (or) From', 'Aspect', 'DB_Object_Name', 'DB_Object_Synonym', 'DB_Object_Type', 'Taxon and Interacting taxon', 'Date', 'Assigned_By', 'Annotation_Extension', 'Gene_Product_Form_ID'], 'file': 'goa_pig.gaf.gz', 'url': 'http://current.geneontology.org/annotations/goa_pig.gaf.gz'}, '9913': {'columnns': ['DB', 'DB_Object_ID', 'DB_Object_Symbol', 'Qualifier', 'GO_ID', 'DB:Reference', 'Evidence Code', 'With (or) From', 'Aspect', 'DB_Object_Name', 'DB_Object_Synonym', 'DB_Object_Type', 'Taxon and Interacting taxon', 'Date', 'Assigned_By', 'Annotation_Extension', 'Gene_Product_Form_ID'], 'file': 'goa_cow.gaf.gz', 'url': 'http://current.geneontology.org/annotations/goa_cow.gaf.gz'}, 'gaf-eco-mapping': {'file': 'gaf-eco-mapping.yaml', 'url': 'https://archive.monarchinitiative.org/DipperCache/go/gaf-eco-mapping.yaml'}, 'idmapping_selected': {'columns': ['UniProtKB-AC', 'UniProtKB-ID', 'GeneID (EntrezGene)', 'RefSeq', 'GI', 'PDB', 'GO', 'UniRef100', 'UniRef90', 'UniRef50', 'UniParc', 'PIR', 'NCBI-taxon', 'MIM', 'UniGene', 'PubMed', 'EMBL', 'EMBL-CDS', 'Ensembl', 'Ensembl_TRS', 'Ensembl_PRO', 'Additional PubMed'], 'file': 'idmapping_selected.tab.gz', 'url': 'ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/idmapping/idmapping_selected.tab.gz'}}
gaf_columns = ['DB', 'DB_Object_ID', 'DB_Object_Symbol', 'Qualifier', 'GO_ID', 'DB:Reference', 'Evidence Code', 'With (or) From', 'Aspect', 'DB_Object_Name', 'DB_Object_Synonym', 'DB_Object_Type', 'Taxon and Interacting taxon', 'Date', 'Assigned_By', 'Annotation_Extension', 'Gene_Product_Form_ID']
getTestSuite()

An abstract method that should be overwritten with tests appropriate for the specific source. :return:

get_uniprot_entrez_id_map()
parse(limit=None)

abstract method to parse all data from an external resource, that was fetched in fetch() this should be overridden by subclasses :return: None

process_gaf(gaffile, limit, id_map=None)
wont_prefix = ['zgc', 'wu', 'si', 'im', 'BcDNA', 'sb', 'anon-EST', 'EG', 'id', 'zmp', 'BEST', 'BG', 'hm', 'tRNA', 'NEST', 'xx']
dipper.sources.GeneReviews module
dipper.sources.HGNC module
dipper.sources.HPOAnnotations module
class dipper.sources.HPOAnnotations.HPOAnnotations(graph_type, are_bnodes_skolemized, data_release_version=None)

Bases: dipper.sources.Source.Source

The [Human Phenotype Ontology](http://human-phenotype-ontology.org) group curates and assembles over 115,000 annotations to hereditary diseases using the HPO ontology. Here we create OBAN-style associations between diseases and phenotypic features, together with their evidence, and age of onset and frequency (if known). The parser currently only processes the “abnormal” annotations. Association to “remarkable normality” will be added in the near future.

We create additional associations from text mining. See info at http://pubmed-browser.human-phenotype-ontology.org/.

Also, you can read about these annotations in [PMID:26119816](http://www.ncbi.nlm.nih.gov/pubmed/26119816).

In order to properly test this class, you should have a resources/test_ids.yaml file configured with some test ids, in the structure of: # as examples. put your favorite ids in the config. <pre> test_ids: {“disease” : [“OMIM:119600”, “OMIM:120160”]} </pre>

add_common_files_to_file_list()

The (several thousands) common-disease files from the repo tarball are added to the files object. try adding the ‘common-disease-mondo’ files as well?

fetch(is_dl_forced=False)

abstract method to fetch all data from an external resource. this should be overridden by subclasses :return: None

files = {'doid': {'file': 'doid.owl', 'url': 'http://purl.obolibrary.org/obo/doid.owl'}, 'hpoa': {'columns': ['#DatabaseID', 'DiseaseName', 'Qualifier', 'HPO_ID', 'Reference', 'Evidence', 'Onset', 'Frequency', 'Sex', 'Modifier', 'Aspect', 'Biocuration'], 'file': 'phenotype.hpoa', 'url': 'http://compbio.charite.de/jenkins/job/hpo.annotations.current/lastSuccessfulBuild/artifact/current/phenotype.hpoa'}}
getTestSuite()

An abstract method that should be overwritten with tests appropriate for the specific source. :return:

get_common_files()

Fetch the hpo-annotation-data [repository](https://github.com/monarch-initiative/hpo-annotation-data.git) as a tarball

Returns:
parse(limit=None)

abstract method to parse all data from an external resource, that was fetched in fetch() this should be overridden by subclasses :return: None

process_all_common_disease_files(limit=None)

Loop through all of the files that we previously fetched from git, creating the disease-phenotype association. :param limit: :return:

process_common_disease_file(raw, unpadded_doids, limit=None)

Make disaese-phenotype associations. Some identifiers need clean up: * DOIDs are listed as DOID-DOID: –> DOID: * DOIDs may be unnecessarily zero-padded. these are remapped to their non-padded equivalent.

Parameters:
  • raw
  • unpadded_doids
  • limit
Returns:

small_files = {'columns': ['Disease ID', 'Disease Name', 'Gene ID', 'Gene Name', 'Genotype', 'Gene Symbol(s)', 'Phenotype ID', 'Phenotype Name', 'Age of Onset ID', 'Age of Onset Name', 'Evidence ID', 'Evidence Name', 'Frequency', 'Sex ID', 'Sex Name', 'Negation ID', 'Negation Name', 'Description', 'Pub', 'Assigned by', 'Date Created']}
dipper.sources.IMPC module
class dipper.sources.IMPC.IMPC(graph_type, are_bnodes_skolemized, data_release_version=None)

Bases: dipper.sources.Source.Source

From the [IMPC](https://mousephenotype.org) website: The IMPC is generating a knockout mouse strain for every protein coding gene by using the embryonic stem cell resource generated by the International Knockout Mouse Consortium (IKMC). Systematic broad-based phenotyping is performed by each IMPC center using standardized procedures found within the International Mouse Phenotyping Resource of Standardised Screens (IMPReSS) resource. Gene-to-phenotype associations are made by a versioned statistical analysis with all data freely available by this web portal and by several data download features.

Here, we pull the data and model the genotypes using GENO and the genotype-to-phenotype associations using the OBAN schema.

We use all identifiers given by the IMPC with a few exceptions:

  • For identifiers that IMPC provides, but does not resolve,

we instantiate them as Blank Nodes. Examples include things with the pattern of: UROALL, EUROCURATE, NULL-*,

  • We mint three identifiers:
  1. Intrinsic genotypes not including sex, based on:
  • colony_id (ES cell line + phenotyping center)
  • strain
  • zygosity
  1. For the Effective genotypes that are attached to the phenotypes:
  • colony_id (ES cell line + phenotyping center)
  • strain
  • zygosity
  • sex

3. Associations based on: effective_genotype_id + phenotype_id + phenotyping_center + pipeline_stable_id + procedure_stable_id + parameter_stable_id

We DO NOT yet add the assays as evidence for the G2P associations here. To be added in the future.

compare_checksums()

test to see if fetched file matches checksum from ebi :return: True or False

fetch(is_dl_forced=False)

abstract method to fetch all data from an external resource. this should be overridden by subclasses :return: None

files = {'checksum': {'file': 'genotype-phenotype-assertions-ALL.csv.tgz.md5', 'url': 'ftp://ftp.ebi.ac.uk/pub/databases/impc/all-data-releases/latest/results/genotype-phenotype-assertions-ALL.csv.tgz.md5'}, 'evidence': {'columns': ['evidence', 'stable', 'key'], 'file': 'impc_evidence_stable_key.tsv', 'url': 'https://archive.monarchinitiative.org/DipperCache/impc/impc_evidence_stable_key.tsv'}, 'g2p_assertions': {'columns': ['marker_accession_id', 'marker_symbol', 'phenotyping_center', 'colony_id', 'sex', 'zygosity', 'allele_accession_id', 'allele_symbol', 'allele_name', 'strain_accession_id', 'strain_name', 'project_name', 'project_fullname', 'pipeline_name', 'pipeline_stable_id', 'procedure_stable_id', 'procedure_name', 'parameter_stable_id', 'parameter_name', 'top_level_mp_term_id', 'top_level_mp_term_name', 'mp_term_id', 'mp_term_name', 'p_value', 'percentage_change', 'effect_size', 'statistical_method', 'resource_name'], 'file': 'genotype-phenotype-assertions-ALL.csv.gz', 'url': 'ftp://ftp.ebi.ac.uk/pub/databases/impc/all-data-releases/latest/results/genotype-phenotype-assertions-ALL.csv.tgz'}}
getTestSuite()

An abstract method that should be overwritten with tests appropriate for the specific source. :return:

parse(limit=None)

IMPC data is delivered in three separate csv files OR in one integrated file, each with the same file format.

Parameters:limit
Returns:
parse_checksum_file(file)

:param file :return dict

dipper.sources.KEGG module
dipper.sources.MGI module
class dipper.sources.MGI.MGI(graph_type, are_bnodes_skolemized, data_release_version=None)

Bases: dipper.sources.PostgreSQLSource.PostgreSQLSource

This is the [Mouse Genome Informatics](http://www.informatics.jax.org/) resource, from which we process genotype and phenotype data about laboratory mice. Genotypes leverage the GENO genotype model.

Here, we connect to their public database, and download a subset of tables/views to get specifically at the geno-pheno data, then iterate over the tables. We end up effectively performing joins when adding nodes to the graph. In order to use this parser, you will need to have user/password connection details in your conf.yaml file, like: dbauth : {‘mgi’ : {‘user’ : ‘<username>’, ‘password’ : ‘<password>’}} You can request access by contacting mgi-help@jax.org

fetch(is_dl_forced=False)

For the MGI resource, we connect to the remote database, and pull the tables into local files. We’ll check the local table versions against the remote version :return:

fetch_transgene_genes_from_db(cxn)

This is a custom query to fetch the non-mouse genes that are part of transgene alleles.

Parameters:cxn
Returns:
parse(limit=None)

We process each of the postgres tables in turn. The order of processing is important here, as we build up a hashmap of internal vs external identifers (unique keys by type to MGI id). These include allele, marker (gene), publication, strain, genotype, annotation (association), and descriptive notes. :param limit: Only parse this many rows in each table :return:

process_mgi_note_allele_view(limit=None)

These are the descriptive notes about the alleles. Note that these notes have embedded HTML - should we do anything about that? :param limit: :return:

process_mgi_relationship_transgene_genes(limit=None)

Here, we have the relationship between MGI transgene alleles, and the non-mouse gene ids that are part of them. We augment the allele with the transgene parts.

Parameters:limit
Returns:
resources = {'query_map': [{'query': '../../resources/sql/mgi/mgi_dbinfo.sql', 'outfile': 'mgi_dbinfo', 'Force': True}, {'query': '../../resources/sql/mgi/gxd_genotype_view.sql', 'outfile': 'gxd_genotype_view'}, {'query': '../../resources/sql/mgi/gxd_genotype_summary_view.sql', 'outfile': 'gxd_genotype_summary_view'}, {'query': '../../resources/sql/mgi/gxd_allelepair_view.sql', 'outfile': 'gxd_allelepair_view'}, {'query': '../../resources/sql/mgi/all_summary_view.sql', 'outfile': 'all_summary_view'}, {'query': '../../resources/sql/mgi/all_allele_view.sql', 'outfile': 'all_allele_view'}, {'query': '../../resources/sql/mgi/all_allele_mutation_view.sql', 'outfile': 'all_allele_mutation_view'}, {'query': '../../resources/sql/mgi/mrk_marker_view.sql', 'outfile': 'mrk_marker_view'}, {'query': '../../resources/sql/mgi/voc_annot_view.sql', 'outfile': 'voc_annot_view'}, {'query': '../../resources/sql/mgi/evidence.sql', 'outfile': 'evidence_view'}, {'query': '../../resources/sql/mgi/bib_acc_view.sql', 'outfile': 'bib_acc_view'}, {'query': '../../resources/sql/mgi/prb_strain_view.sql', 'outfile': 'prb_strain_view'}, {'query': '../../resources/sql/mgi/mrk_summary_view.sql', 'outfile': 'mrk_summary_view'}, {'query': '../../resources/sql/mgi/mrk_acc_view.sql', 'outfile': 'mrk_acc_view'}, {'query': '../../resources/sql/mgi/prb_strain_acc_view.sql', 'outfile': 'prb_strain_acc_view'}, {'query': '../../resources/sql/mgi/prb_strain_genotype_view.sql', 'outfile': 'prb_strain_genotype_view'}, {'query': '../../resources/sql/mgi/mgi_note_vocevidence_view.sql', 'outfile': 'mgi_note_vocevidence_view'}, {'query': '../../resources/sql/mgi/mgi_note_allele_view.sql', 'outfile': 'mgi_note_allele_view'}, {'query': '../../resources/sql/mgi/mrk_location_cache.sql', 'outfile': 'mrk_location_cache'}], 'test_keys': '../../resources/mgi_test_keys.yaml'}
tables = {'all_allele_mutation_view': {'columns': ['_allele_key', 'mutation']}, 'all_allele_view': {'columns': ['_allele_key', '_marker_key', '_strain_key', 'symbol', 'name', 'iswildtype']}, 'all_summary_view': {'columns': ['_object_key', 'preferred', 'mgiid', 'description', 'short_description']}, 'bib_acc_view': {'columns': ['accid', 'prefixpart', 'numericpart', '_object_key', 'logicaldb', '_logicaldb_key']}, 'evidence_view': {'columns': ['_annotevidence_key', '_annot_key', 'evidencecode', 'jnumid', 'term', 'value', 'annottype']}, 'gxd_allelepair_view': {'columns': ['_allelepair_key', '_genotype_key', '_allele_key_1', '_allele_key_2', 'allele1', 'allele2', 'allelestate']}, 'gxd_genotype_summary_view': {'columns': ['_object_key', 'preferred', 'mgiid', 'subtype', 'short_description']}, 'gxd_genotype_view': {'columns': ['_genotype_key', '_strain_key', 'strain', 'mgiid']}, 'mgi_note_allele_view': {'columns': ['_object_key', 'notetype', 'note', 'sequencenum']}, 'mgi_note_vocevidence_view': {'columns': ['_object_key', 'note']}, 'mgi_relationship_transgene_genes': {'columns': ['rel_key', 'object_1', 'allele_id', 'allele_label', 'category_key', 'category_name', 'property_key', 'property_name', 'property_value']}, 'mrk_acc_view': {'columns': ['accid', 'prefixpart', '_logicaldb_key', '_object_key', 'preferred', '_organism_key']}, 'mrk_location_cache': {'columns': ['_marker_key', '_organism_key', 'chromosome', 'startcoordinate', 'endcoordinate', 'strand', 'version']}, 'mrk_marker_view': {'columns': ['_marker_key', '_organism_key', '_marker_status_key', 'symbol', 'name', 'latinname', 'markertype']}, 'mrk_summary_view': {'columns': ['accid', '_logicaldb_key', '_object_key', 'preferred', 'mgiid', 'subtype', 'short_description']}, 'prb_strain_acc_view': {'columns': ['accid', 'prefixpart', '_logicaldb_key', '_object_key', 'preferred']}, 'prb_strain_genotype_view': {'columns': ['_strain_key', '_genotype_key']}, 'prb_strain_view': {'columns': ['_strain_key', 'strain', 'species']}, 'voc_annot_view': {'columns': ['_annot_key', 'annottype', '_object_key', '_term_key', '_qualifier_key', 'qualifier', 'term', 'accid']}}
unknown_taxa = ['Not Applicable', 'Not Specified']
dipper.sources.MGISlim module
class dipper.sources.MGISlim.MGISlim(graph_type, are_bnodes_skolemized, data_release_version=None)

Bases: dipper.sources.Source.Source

slim mgi model only containing Gene to phenotype associations Uses mousemine: http://www.mousemine.org/mousemine/begin.do python lib api http://intermine.org/intermine-ws-python/

fetch(is_dl_forced=False)

abstract method to fetch all data from an external resource. this should be overridden by subclasses :return: None

parse(limit=None)

abstract method to parse all data from an external resource, that was fetched in fetch() this should be overridden by subclasses :return: None

dipper.sources.MMRRC module
class dipper.sources.MMRRC.MMRRC(graph_type, are_bnodes_skolemized, data_release_version=None)

Bases: dipper.sources.Source.Source

Here we process the Mutant Mouse Resource and Research Center (https://www.mmrrc.org) strain data, which includes: * strains, their mutant alleles * phenotypes of the alleles * descriptions of the research uses of the strains

Note that some gene identifiers are not included (for many of the transgenics with human genes) in the raw data. We do our best to process the links between the variant and the affected gene, but sometimes the mapping is not clear, and we do not include it. Many of these details will be solved by merging this source with the MGI data source, who has the variant-to-gene designations.

Also note that even though the strain pages at the MMRRC site do list phenotypic differences in the context of the strain backgrounds, they do not provide that data to us, and thus we cannot supply that disambiguation here.

fetch(is_dl_forced=False)

abstract method to fetch all data from an external resource. this should be overridden by subclasses :return: None

files = {'catalog': {'columns': ['STRAIN/STOCK_ID', 'STRAIN/STOCK_DESIGNATION', 'STRAIN_TYPE', 'STATE', 'MGI_ALLELE_ACCESSION_ID', 'ALLELE_SYMBOL', 'ALLELE_NAME', 'MUTATION_TYPE', 'CHROMOSOME', 'MGI_GENE_ACCESSION_ID', 'GENE_SYMBOL', 'GENE_NAME', 'SDS_URL', 'ACCEPTED_DATE', 'MPT_IDS', 'PUBMED_IDS', 'RESEARCH_AREAS'], 'file': 'mmrrc_catalog_data.csv', 'url': 'https://www.mmrrc.org/about/mmrrc_catalog_data.csv'}}
getTestSuite()

An abstract method that should be overwritten with tests appropriate for the specific source. :return:

parse(limit=None)

abstract method to parse all data from an external resource, that was fetched in fetch() this should be overridden by subclasses :return: None

test_ids = ['MMRRC:037507-MU', 'MMRRC:041175-UCD', 'MMRRC:036933-UNC', 'MMRRC:037884-UCD', 'MMRRC:000255-MU', 'MMRRC:037372-UCD', 'MMRRC:000001-UNC']
dipper.sources.MPD module
class dipper.sources.MPD.MPD(graph_type, are_bnodes_skolemized, data_release_version=None)

Bases: dipper.sources.Source.Source

From the [MPD](http://phenome.jax.org/) website: This resource is a collaborative standardized collection of measured data on laboratory mouse strains and populations. Includes baseline phenotype data sets as well as studies of drug, diet, disease and aging effect. Also includes protocols, projects and publications, and SNP, variation and gene expression studies.

Here, we pull the data and model the genotypes using GENO and the genotype-to-phenotype associations using the OBAN schema.

MPD provide measurements for particular assays for several strains. Each of these measurements is itself mapped to a MP or VT term as a phenotype. Therefore, we can create a strain-to-phenotype association based on those strains that lie outside of the “normal” range for the given measurements. We can compute the average of the measurements for all strains tested, and then threshold any extreme measurements being beyond some threshold beyond the average.

Our default threshold here, is +/-2 standard deviations beyond the mean.

Because the measurements are made and recorded at the level of a specific sex of each strain, we associate the MP/VT phenotype with the sex-qualified genotype/strain.

MPDDL = 'http://phenomedoc.jax.org/MPD_downloads'
static build_measurement_description(row, localtt)
fetch(is_dl_forced=False)

abstract method to fetch all data from an external resource. this should be overridden by subclasses :return: None

files = {'assay_metadata': {'columns': ['measnum', 'mpdsector', 'projsym', 'varname', 'descrip', 'units', 'method', 'intervention', 'paneldesc', 'datatype', 'sextested', 'nstrainstested', 'ageweeks'], 'file': 'measurements.csv', 'url': 'http://phenomedoc.jax.org/MPD_downloads/measurements.csv'}, 'ontology_mappings': {'columns': ['measnum', 'ont_term', 'descrip'], 'file': 'ontology_mappings.csv', 'url': 'http://phenomedoc.jax.org/MPD_downloads/ontology_mappings.csv'}, 'straininfo': {'columns': ['strainname', 'vendor', 'stocknum', 'panel', 'mpd_strainid', 'straintype', 'n_proj', 'n_snp_datasets', 'mpd_shortname', 'url'], 'file': 'straininfo.csv', 'url': 'http://phenomedoc.jax.org/MPD_downloads/straininfo.csv'}, 'strainmeans': {'columns': ['measnum', 'varname', 'strain', 'strainid', 'sex', 'mean', 'nmice', 'sd', 'sem', 'cv', 'minval', 'maxval', 'zscore'], 'file': 'strainmeans.csv.gz', 'url': 'http://phenomedoc.jax.org/MPD_downloads/strainmeans.csv.gz'}}
getTestSuite()

An abstract method that should be overwritten with tests appropriate for the specific source. :return:

mgd_agent_id = 'MPD:db/q?rtn=people/allinv'
mgd_agent_label = 'Mouse Phenotype Database'
mgd_agent_type = 'foaf:organization'
parse(limit=None)

MPD data is delivered in four separate csv files and one xml file, which we process iteratively and write out as one large graph.

Parameters:limit
Returns:
test_ids = ['MPD:6', 'MPD:849', 'MPD:425', 'MPD:569', 'MPD:10', 'MPD:1002', 'MPD:39', 'MPD:2319']
dipper.sources.Monarch module
class dipper.sources.Monarch.Monarch(graph_type, are_bnodes_skolemized, data_release_version=None)

Bases: dipper.sources.Source.Source

This is the parser for data curated by the [Monarch Initiative](https://monarchinitiative.org). Data is currently maintained in a private repository, soon to be released.

fetch(is_dl_forced=False)

abstract method to fetch all data from an external resource. this should be overridden by subclasses :return: None

files = {'omia_d2p': {'columns': ['Disease ID', 'Species ID', 'Breed Name', 'Variant', 'Inheritance', 'Phenotype ID', 'Phenotype Name', 'Entity ID', 'Entity Name', 'Quality ID', 'Quality Name', 'Related Entity ID', 'Related Entity Name', 'Abnormal ID', 'Abnormal Name', 'Phenotype Desc', 'Assay', 'Frequency', 'Pubmed ID', 'Pub Desc', 'Curator Notes', 'Date Created'], 'file': '[0-9]{6}.txt'}}
parse(limit=None)

abstract method to parse all data from an external resource, that was fetched in fetch() this should be overridden by subclasses :return: None

process_omia_phenotypes(limit)
dipper.sources.Monochrom module
class dipper.sources.Monochrom.Monochrom(graph_type, are_bnodes_skolemized, data_release_version=None, tax_ids=None)

Bases: dipper.sources.Source.Source

This class will leverage the GENO ontology and modeling patterns to build an ontology of chromosomes for any species. These classes represent major structural pieces of Chromosomes which are often universally referenced, using physical properties/observations that remain constant over different genome builds (such as banding patterns and arms). The idea is to create a scaffold upon which we can hang build-specific chromosomal coordinates, and reason across them.

In general, this will take the cytogenic bands files from UCSC, and create missing grouping classes, in order to build the partonomy from a very specific chromosomal band up through the chromosome itself and enable overlap and containment queries. We use RO:subsequence_of as our relationship between nested chromosomal parts. For example, 13q21.31 ==> 13q21.31, 13q21.3, 13q21, 13q2, 13q, 13

At the moment, this only computes the bands for Human, Mouse, Zebrafish, and Rat but will be expanding in the future as needed.

Because this is a universal framework to represent the chromosomal structure of any species, we must mint identifiers for each chromosome and part. (note: in truth we create blank nodes and then pretend they are something else. TEC)

We differentiate species by first creating a species-specific genome, then for each species-specific chromosome we include the NCBI taxon number together with the chromosome number, like: `<species number>chr<num><band>`. For 13q21.31, this would be 9606chr13q21.31. We then create triples for a given band like: <pre> CHR:9606chr1p36.33 rdf[type] SO:chromosome_band CHR:9606chr1p36 subsequence_of :9606chr1p36.3 </pre> where any band in the file is an instance of a chr_band (or a more specific type), is a subsequence of it’s containing region.

We determine the containing regions of the band by parsing the band-string; since each alphanumeric is a significant “place”, we can split it with the shorter strings being parents of the longer string

Since this is small, and we have limited other items in our test set to a small region, we simply use the whole graph (genome) for testing purposes, and copy the main graph to the test graph.

Since this Dipper class is building an ONTOLOGY, rather than instance-level data, we must also include domain and range constraints, and other owl-isms.

TODO: any species by commandline argument

We are currently mapping these to the CHR idspace, but this is NOT YET APPROVED and is subject to change.

fetch(is_dl_forced=False)

abstract method to fetch all data from an external resource. this should be overridden by subclasses :return: None

files = {'10090': {'build_num': 'mm10', 'file': '10090cytoBand.txt.gz', 'genome_label': 'Mouse', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/mm10/database/cytoBandIdeo.txt.gz'}, '10116': {'build_num': 'rn6', 'file': '10116cytoBand.txt.gz', 'genome_label': 'Rat', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/rn6/database/cytoBandIdeo.txt.gz'}, '7955': {'build_num': 'danRer10', 'file': '7955cytoBand.txt.gz', 'genome_label': 'Zebrafish', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/danRer10/database/cytoBandIdeo.txt.gz'}, '9031': {'build_num': 'galGal4', 'file': 'galGal4cytoBand.txt.gz', 'genome_label': 'chicken', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/galGal4/database/cytoBandIdeo.txt.gz'}, '9606': {'build_num': 'hg19', 'file': '9606cytoBand.txt.gz', 'genome_label': 'Human', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/cytoBand.txt.gz'}, '9796': {'build_num': 'equCab2', 'file': 'equCab2cytoBand.txt.gz', 'genome_label': 'horse', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/equCab2/database/cytoBandIdeo.txt.gz'}, '9823': {'build_num': 'susScr3', 'file': 'susScr3cytoBand.txt.gz', 'genome_label': 'pig', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/susScr3/database/cytoBandIdeo.txt.gz'}, '9913': {'build_num': 'bosTau7', 'file': 'bosTau7cytoBand.txt.gz', 'genome_label': 'cow', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/bosTau7/database/cytoBandIdeo.txt.gz'}, '9940': {'build_num': 'oviAri3', 'file': 'oviAri3cytoBand.txt.gz', 'genome_label': 'sheep', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/oviAri3/database/cytoBandIdeo.txt.gz'}}
getTestSuite()

An abstract method that should be overwritten with tests appropriate for the specific source. :return:

make_parent_bands(band, child_bands)

this will determine the grouping bands that it belongs to, recursively 13q21.31 ==> 13, 13q, 13q2, 13q21, 13q21.3, 13q21.31

Parameters:
  • band
  • child_bands
Returns:

map_type_of_region(regiontype)

Note that “stalk” refers to the short arm of acrocentric chromosomes chr13,14,15,21,22 for human. :param regiontype: :return:

parse(limit=None)

abstract method to parse all data from an external resource, that was fetched in fetch() this should be overridden by subclasses :return: None

dipper.sources.Monochrom.getChrPartTypeByNotation(notation, graph)

This method will figure out the kind of feature that a given band is based on pattern matching to standard karyotype notation. (e.g. 13q22.2 ==> chromosome sub-band)

This has been validated against human, mouse, fish, and rat nomenclature. :param notation: the band (without the chromosome prefix) :return:

dipper.sources.MyChem module
class dipper.sources.MyChem.MyChem(graph_type, are_bnodes_skolemized, data_release_version=None)

Bases: dipper.sources.Source.Source

static add_relation(results, relation)
static check_uniprot(target_dict)
static chunks(l, n)

Yield successive n-sized chunks from l.

static execute_query(query)
fetch(is_dl_forced=False)

abstract method to fetch all data from an external resource. this should be overridden by subclasses :return: None

fetch_from_mychem()
static format_actions(target_dict)
static get_drug_record(ids, fields)
static get_inchikeys()
make_triples(source, package)
parse(limit=None)

abstract method to parse all data from an external resource, that was fetched in fetch() this should be overridden by subclasses :return: None

static return_target_list(targ_in)
dipper.sources.MyDrug module
class dipper.sources.MyDrug.MyDrug(graph_type, are_bnodes_skolemized, data_release_version=None)

Bases: dipper.sources.Source.Source

Drugs and Compounds stored in the BioThings database

MY_DRUG_API = 'http://c.biothings.io/v1/query'
check_if_remote_is_newer(localfile)

Need to figure out how biothings records releases, for now if the file exists we will assume it is a fully downloaded cache :param localfile: str file path :return: boolean True if remote file is newer else False

fetch(is_dl_forced=False)

Note there is a unpublished mydrug client that works like this: from mydrug import MyDrugInfo md = MyDrugInfo() r = list(md.query(‘_exists_:aeolus’, fetch_all=True))

Parameters:is_dl_forced – boolean, force download
Returns:
files = {'aeolus': {'file': 'aeolus.json'}}
parse(limit=None, or_limit=1)

Parse mydrug files :param limit: int limit json docs processed :param or_limit: int odds ratio limit :return: None

dipper.sources.NCBIGene module
dipper.sources.OMIA module
dipper.sources.OMIM module
dipper.sources.OMIMSource module
dipper.sources.Orphanet module
class dipper.sources.Orphanet.Orphanet(graph_type, are_bnodes_skolemized, data_release_version=None)

Bases: dipper.sources.Source.Source

Orphanet’s aim is to help improve the diagnosis, care and treatment of patients with rare diseases. For Orphanet, we are currently only parsing the disease-gene associations.

fetch(is_dl_forced=False)
Parameters:is_dl_forced
Returns:
files = {'disease-gene': {'file': 'en_product6.xml', 'url': 'http://www.orphadata.org/data/xml/en_product6.xml'}}
parse(limit=None)

abstract method to parse all data from an external resource, that was fetched in fetch() this should be overridden by subclasses :return: None

dipper.sources.Panther module
class dipper.sources.Panther.Panther(graph_type, are_bnodes_skolemized, data_release_version=None, tax_ids=None)

Bases: dipper.sources.Source.Source

The pairwise orthology calls from Panther DB: http://pantherdb.org/ encompass 22 species, from the RefGenome and HCOP projects. Here, we map the orthology classes to RO homology relationships This resource may be extended in the future with additional species.

This currently makes a graph of orthologous relationships between genes, with the assumption that gene metadata (labels, equivalent ids) are provided from other sources.

Gene families are nominally created from the orthology files, though these are incomplete with no hierarchical (subfamily) information. This will get updated from the HMM files in the future.

Note that there is a fair amount of identifier cleanup performed to align with our standard CURIE prefixes.

The test graph of data is output based on configured “protein” identifiers in resources/test_id.yaml.

By default, this will produce a file with ALL orthologous relationships. IF YOU WANT ONLY A SUBSET, YOU NEED TO PROVIDE A FILTER UPON CALLING THIS WITH THE TAXON IDS

PNTHDL = 'ftp://ftp.pantherdb.org/ortholog/current_release'
fetch(is_dl_forced=False)
Returns:None
files = {'Orthologs_HCOP': {'columns': ['Gene', 'Ortholog', 'Type of ortholog', 'Common ancestor for the orthologs', 'Panther Ortholog ID'], 'file': 'Orthologs_HCOP.tar.gz', 'url': 'ftp://ftp.pantherdb.org/ortholog/current_release/Orthologs_HCOP.tar.gz'}, 'RefGenomeOrthologs': {'columns': ['Gene', 'Ortholog', 'Type of ortholog', 'Common ancestor for the orthologs', 'Panther Ortholog ID'], 'file': 'RefGenomeOrthologs.tar.gz', 'url': 'ftp://ftp.pantherdb.org/ortholog/current_release/RefGenomeOrthologs.tar.gz'}, 'current_release': {'columns': ['version'], 'file': 'current_release.ver', 'url': 'ftp://ftp.pantherdb.org/ortholog/'}}
getTestSuite()

An abstract method that should be overwritten with tests appropriate for the specific source. :return:

panther_format = ['Gene', 'Ortholog', 'Type of ortholog', 'Common ancestor for the orthologs', 'Panther Ortholog ID']
parse(limit=None)
Returns:None
dipper.sources.PostgreSQLSource module
class dipper.sources.PostgreSQLSource.PostgreSQLSource(graph_type, are_bnodes_skolemized, data_release_version=None, name=None, ingest_title=None, ingest_url=None, ingest_logo=None, ingest_description=None, license_url=None, data_rights=None, file_handle=None)

Bases: dipper.sources.Source.Source

Class for interfacing with remote Postgres databases

fetch(is_dl_forced=False)

abstract method to fetch all data from an external resource. this should be overridden by subclasses :return: None

fetch_from_pgdb(tables, cxn, limit=None)
Will fetch all Postgres tables from the specified database
in the cxn connection parameters.
This will save them to a local file named the same as the table,
in tab-delimited format, including a header.
Parameters:
  • tables – Names of tables to fetch
  • cxn – database connection details
  • limit – A max row count to fetch for each table
Returns:

None

fetch_query_from_pgdb(qname, query, con, cxn, limit=None)

Supply either an already established connection, or connection parameters. The supplied connection will override any separate cxn parameter :param qname: The name of the query to save the output to :param query: The SQL query itself :param con: The already-established connection :param cxn: The postgres connection information :param limit: If you only want a subset of rows from the query :return:

files = {}
parse(limit)

abstract method to parse all data from an external resource, that was fetched in fetch() this should be overridden by subclasses :return: None

dipper.sources.RGD module
class dipper.sources.RGD.RGD(graph_type, are_bnodes_skolemized, data_release_version=None)

Bases: dipper.sources.Source.Source

Ingest of Rat Genome Database gene to mammalian phenotype gaf file

RGD_BASE = 'ftp://ftp.rgd.mcw.edu/pub/data_release/annotated_rgd_objects_by_ontology/'
fetch(is_dl_forced=False)

Override Source.fetch() Fetches resources from rat_genome_database via the rat_genome_database ftp site Args:

param is_dl_forced (bool):
 Force download
Returns:
:return None
files = {'rat_gene2mammalian_phenotype': {'columns': ['DB', 'DB Object ID', 'DB Object Symbol', 'Qualifier', 'GO ID', 'DB:Reference (|DB:Reference)', 'Evidence Code', 'With (or) From', 'Aspect', 'DB Object Name', 'DB Object Synonym (|Synonym)', 'DB Object Type', 'Taxon(|taxon)', 'Date', 'Assigned By', 'Annotation Extension', 'Gene Product Form ID'], 'file': 'rattus_genes_mp', 'url': 'ftp://ftp.rgd.mcw.edu/pub/data_release/annotated_rgd_objects_by_ontology/rattus_genes_mp'}}
make_association(record)

contstruct the association :param record: :return: modeled association of genotype to mammalian phenotype

parse(limit=None)

Override Source.parse() Args:

:param limit (int, optional) limit the number of rows processed
Returns:
:return None
dipper.sources.Reactome module
class dipper.sources.Reactome.Reactome(graph_type, are_bnodes_skolemized, data_release_version=None)

Bases: dipper.sources.Source.Source

Reactome is a free, open-source, curated and peer reviewed pathway database. (http://reactome.org/)

REACTOME_BASE = 'http://www.reactome.org/download/current/'
fetch(is_dl_forced=False)

Override Source.fetch() Fetches resources from reactome using the Reactome.files dictionary Args:

param is_dl_forced (bool):
 Force download
Returns:
:return None
files = {'chebi2pathway': {'columns': ['component', 'pathway_id', 'pathway_iri', 'pathway_label', 'go_ecode', 'species_nam'], 'file': 'ChEBI2Reactome.txt', 'url': 'http://www.reactome.org/download/current/ChEBI2Reactome.txt'}, 'ensembl2pathway': {'columns': ['component', 'pathway_id', 'pathway_iri', 'pathway_label', 'go_ecode', 'species_nam'], 'file': 'Ensembl2Reactome.txt', 'url': 'http://www.reactome.org/download/current/Ensembl2Reactome.txt'}, 'gaf-eco-mapping': {'file': 'gaf-eco-mapping.yaml', 'url': 'https://archive.monarchinitiative.org/DipperCache/reactome/gaf-eco-mapping.yaml'}}
parse(limit=None)

Override Source.parse() Args:

:param limit (int, optional) limit the number of rows processed
Returns:
:return None
dipper.sources.SGD module
class dipper.sources.SGD.SGD(graph_type, are_bnodes_skolemized, data_release_version=None)

Bases: dipper.sources.Source.Source

Ingest of Saccharomyces Genome Database (SGD) phenotype associations

SGD_BASE = 'https://downloads.yeastgenome.org/curation/literature/'
fetch(is_dl_forced=False)

Override Source.fetch() Fetches resources from yeast_genome_database using the yeast_genome_doenload site.

Args:
param is_dl_forced (bool):
 Force download
Returns:
:return None
files = {'sgd_phenotype': {'columns': ['Feature Name', 'Feature Type', 'Gene Name', 'SGDID', 'Reference', 'Experiment Type', 'Mutant Type', 'Allele', 'Strain Background', 'Phenotype', 'Chemical', 'Condition', 'Details', 'Reporter'], 'file': 'phenotype_data.tab', 'url': 'https://downloads.yeastgenome.org/curation/literature/phenotype_data.tab'}}
static make_apo_map()
make_association(record)

contstruct the association :param record: :return: modeled association of genotype to mammalian??? phenotype

parse(limit=None)

Override Source.parse() Args:

:param limit (int, optional) limit the number of rows processed
Returns:
:return None
dipper.sources.Source module
class dipper.sources.Source.Source(graph_type='rdf_graph', are_bnodes_skized=False, data_release_version=None, name=None, ingest_title=None, ingest_url=None, ingest_logo=None, ingest_description=None, license_url=None, data_rights=None, file_handle=None)

Bases: object

Abstract class for any data sources that we’ll import and process. Each of the subclasses will fetch() the data, scrub() it as necessary, then parse() it into a graph. The graph will then be written out to a single self.name().<dest_fmt> file.

Also provides a means to marshal metadata in a consistent fashion

Houses the global translation table (from ontology label to ontology term) so it may as well be used everywhere.

ARGV = {}
DIPPERCACHE = 'https://archive.monarchinitiative.org/DipperCache'
static check_fileheader(expected, received, src_key=None)

Compare file headers received versus file headers expected if the expected headers are a subset (proper or not) of received headers report suscess (warn if proper subset)

param: expected list param: received list

return: truthyness

check_if_remote_is_newer(remote, local, headers)

Given a remote file location, and the corresponding local file this will check the datetime stamp on the files to see if the remote one is newer. This is a convenience method to be used so that we don’t have to re-fetch files that we already have saved locally :param remote: URL of file to fetch from remote server :param local: pathname to save file to locally :return: True if the remote file is newer and should be downloaded

command_args()

To make arbitrary variables from dipper-etl.py’s calling enviroment available when working in source ingests in a hopefully universal way

Does not appear to be populated till after an ingest’s _init_() finishes.

compare_local_remote_bytes(remotefile, localfile, remote_headers=None)

test to see if fetched file is the same size as the remote file using information in the content-length field in the HTTP header :return: True or False

fetch(is_dl_forced=False)

abstract method to fetch all data from an external resource. this should be overridden by subclasses :return: None

fetch_from_url(remoteurl, localfile=None, is_dl_forced=False, headers=None)

Given a remote url and a local filename, attempt to determine if the remote file is newer; if it is, fetch the remote file and save it to the specified localfile, reporting the basic file information once it is downloaded :param remoteurl: URL of remote file to fetch :param localfile: pathname of file to save locally

Returns:bool
static file_len(fname)
files = {}
getTestSuite()

An abstract method that should be overwritten with tests appropriate for the specific source. :return:

static get_file_md5(directory, filename, blocksize=1048576)
get_files(is_dl_forced, files=None, delay=0)

Given a set of files for this source, it will go fetch them, and set a default version by date. If you need to set the version number by another method, then it can be set again. :param is_dl_forced - boolean :param files dict - override instance files dict :return: None

static get_local_file_size(localfile)
Parameters:localfile
Returns:size of file
get_remote_content_len(remote, headers=None)
Parameters:remote
Returns:size of remote file
static hash_id(wordage)

prepend ‘b’ to avoid leading with digit truncate to a 20 char sized word with a leading ‘b’ return truncated sha1 hash of string.

by the birthday paradox;
expect 50% chance of collision after 69 billion invocations however these are only hoped to be unique within a single file

Consider reducing to 17 hex chars to fit in a 64 bit word 16 discounting a leading constant gives a 50% chance of collision at about 4.3b billion unique input strings (currently _many_ orders of magnitude below that)

Parameters:long_string – str string to be hashed
Returns:str hash of id
load_local_translationtable(name)

Load “ingest specific” translation from whatever they called something to the ontology label we need to map it to. To facilitate seeing more ontology labels in dipper ingests a reverse mapping from ontology labels to external strings is also generated and available as a dict localtcid

‘—

# %s.yaml “”: “” # example’

static make_id(long_string, prefix='MONARCH')

a method to create DETERMINISTIC identifiers based on a string’s digest. currently implemented with sha1 :param long_string: :return:

namespaces = {}
static open_and_parse_yaml(yamlfile)
Parameters:file – String, path to file containing label-id mappings in the first two columns of each row
Returns:dict where keys are labels and values are ids
parse(limit)

abstract method to parse all data from an external resource, that was fetched in fetch() this should be overridden by subclasses :return: None

static parse_mapping_file(file)
Parameters:file – String, path to file containing label-id mappings in the first two columns of each row
Returns:dict where keys are labels and values are ids
process_xml_table(elem, table_name, processing_function, limit)

This is a convenience function to process the elements of an xml dump of a mysql relational database. The “elem” is akin to a mysql table, with it’s name of `table_name`. It will process each `row` given the `processing_function` supplied. :param elem: The element data :param table_name: The name of the table to process :param processing_function: The row processing function :param limit:

Appears to be making calls to the elementTree library although it not explicitly imported here.

Returns:
static remove_backslash_r(filename, encoding)

A helpful utility to remove Carriage Return from any file. This will read a file into memory, and overwrite the contents of the original file.

TODO: This function may be a liability

Parameters:filename
Returns:
resolve(word, mandatory=True, default=None)

composite mapping given f(x) and g(x) here: localtt & globaltt respectivly return g(f(x))|g(x)||f(x)|x in order of preference returns x|default on fall through if finding a mapping is not mandatory (by default finding is mandatory).

This may be specialized further from any mapping to a global mapping only; if need be.

Parameters:
  • word – the string to find as a key in translation tables
  • mandatory – boolean to cause failure when no key exists
  • default – string to return if nothing is found (& not manandatory)
:return
value from global translation table, or value from local translation table, or the query key if finding a value is not mandatory (in this order)
settestmode(mode)

Set testMode to (mode). - True: run the Source in testMode; - False: run it in full mode :param mode: :return: None

settestonly(testonly)

Set that this source should only be processed in testMode :param testOnly: :return: None

whoami()

pointless convieniance

write(fmt='turtle', stream=None, write_metadata_in_main_graph=True)
This convenience method will write out all of the graphs
associated with the source.

Right now these are hardcoded to be a single main “graph” and a “src_dataset.ttl” and a “src_test.ttl” If you do not supply stream=’stdout’ it will default write these to files.

In addition, if the version number isn’t yet set in the dataset, it will be set to the date on file. :return: None

dipper.sources.StringDB module
class dipper.sources.StringDB.StringDB(graph_type, are_bnodes_skolemized, data_release_version=None, tax_ids=None, version=None)

Bases: dipper.sources.Source.Source

STRING is a database of known and predicted protein-protein interactions. The interactions include direct (physical) and indirect (functional) associations; they stem from computational prediction, from knowledge transfer between organisms, and from interactions aggregated from other (primary) databases. From: http://string-db.org/cgi/about.pl?footer_active_subpage=content

STRING uses one protein per gene. If there is more than one isoform per gene, we usually select the longest isoform, unless we have information that suggest that other isoform regarded as cannonical (e.g., proteins in the CCDS database). From: http://string-db.org/cgi/help.pl

fetch(is_dl_forced=False)

Override Source.fetch() Fetches resources from String

We also fetch ensembl to determine if protein pairs are from the same species Args:

param is_dl_forced (bool):
 Force download
Returns:
:return None
parse(limit=None)

Override Source.parse() Args:

:param limit (int, optional) limit the number of rows processed
Returns:
:return None
dipper.sources.UCSCBands module
class dipper.sources.UCSCBands.UCSCBands(graph_type, are_bnodes_skolemized, data_release_version=None, tax_ids=None)

Bases: dipper.sources.Source.Source

This will take the UCSC defintions of cytogenic bands and create the nested structures to enable overlap and containment queries. We use `Monochrom.py` to create the OWL-classes of the chromosomal parts. Here, we simply worry about the instance-level values for particular genome builds.

Given a chr band definition, the nested containment structures look like: 13q21.31 ==> 13q21.31, 13q21.3, 13q21, 13q2, 13q, 13

We determine the containing regions of the band by parsing the band-string; since each alphanumeric is a significant “place”, we can split it with the shorter strings being parents of the longer string. # Here we create build-specific chroms, which are instances of the classes produced from `Monochrom.py`. You can instantiate any number of builds for a genome.

We leverage the Faldo model here for region definitions, and map each of the chromosomal parts to SO.

We differentiate the build by adding the build id to the identifier prior to the chromosome number. These then are instances of the species-specific chromosomal class.

The build-specific chromosomes are created like: <pre> <build number>chr<num><band> with triples for a given band like: _:hg19chr1p36.33

rdf:type SO:chromosome_band, faldo:Region, CHR:9606chr1p36.33, subsequence_of _:hg19chr1p36.3, faldo:location [ a faldo:BothStrandPosition

faldo:begin 0, faldo:end 2300000, faldo:reference ‘hg19’

] .

</pre> where any band in the file is an instance of a chr_band (or a more specific type), is a subsequence of it’s containing region, and is located in the specified coordinates.

We do not have a separate graph for testing.

TODO: any species by commandline argument

HGGP = 'http://hgdownload.cse.ucsc.edu/goldenPath'
cytobandideo_columns = ['chrom', 'chromStart', 'chromEnd', 'name', 'gieStain']
fetch(is_dl_forced=False)

abstract method to fetch all data from an external resource. this should be overridden by subclasses :return: None

files = {'10090': {'assembly': ('UCSC:mm10', 'UCSC:mm9'), 'build_num': 'mm10', 'columns': ['chrom', 'chromStart', 'chromEnd', 'name', 'gieStain'], 'file': 'mm10_cytoBandIdeo.txt.gz', 'genome_label': 'Mouse', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/mm10/database/cytoBandIdeo.txt.gz'}, '7955': {'assembly': ('UCSC:danRer11', 'UCSC:danRer10', 'UCSC:danRer7', 'UCSC:danRer6'), 'build_num': 'danRer11', 'columns': ['chrom', 'chromStart', 'chromEnd', 'name', 'gieStain'], 'file': 'danRer11_cytoBandIdeo.txt.gz', 'genome_label': 'Zebrafish', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/danRer11/database/cytoBandIdeo.txt.gz'}, '9031': {'assembly': ('UCSC:galGal4', 'UCSC:galGal6'), 'build_num': 'galGal6', 'columns': ['chrom', 'chromStart', 'chromEnd', 'name', 'gieStain'], 'file': 'galGal6_cytoBandIdeo.txt.gz', 'genome_label': 'chicken', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/galGal6/database/cytoBandIdeo.txt.gz'}, '9606': {'assembly': ('UCSC:hg38', 'UCSC:hg19', 'UCSC:hg18', 'UCSC:hg17', 'UCSC:hg16', 'UCSC:hg15'), 'build_num': 'hg19', 'columns': ['chrom', 'chromStart', 'chromEnd', 'name', 'gieStain'], 'file': 'hg19_cytoBand.txt.gz', 'genome_label': 'Human', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/cytoBand.txt.gz'}, '9615': {'assembly': ('UCSC:canFam3',), 'build_num': 'canFam3', 'columns': ['chrom', 'chromStart', 'chromEnd', 'name', 'gieStain'], 'file': 'canFam3_cytoBandIdeo.txt.gz', 'genome_label': 'dog', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/canFam3/database/cytoBandIdeo.txt.gz'}, '9685': {'assembly': ('UCSC:felCat9',), 'build_num': 'felCat9', 'columns': ['chrom', 'chromStart', 'chromEnd', 'name', 'gieStain'], 'file': 'felCat9_cytoBandIdeo.txt.gz', 'genome_label': 'cat', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/felCat9/database/cytoBandIdeo.txt.gz'}, '9796': {'assembly': ('UCSC:equCab2', 'UCSC:equCab3'), 'build_num': 'equCab2', 'columns': ['chrom', 'chromStart', 'chromEnd', 'name', 'gieStain'], 'file': 'equCab2_cytoBandIdeo.txt.gz', 'genome_label': 'horse', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/equCab2/database/cytoBandIdeo.txt.gz'}, '9823': {'assembly': ('UCSC:susScr3', 'UCSC:susScr11'), 'build_num': 'susScr11', 'columns': ['chrom', 'chromStart', 'chromEnd', 'name', 'gieStain'], 'file': 'susScr11_cytoBandIdeo.txt.gz', 'genome_label': 'pig', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/susScr11/database/cytoBandIdeo.txt.gz'}, '9913': {'assembly': ('UCSC:bosTau7',), 'build_num': 'bosTau7', 'columns': ['chrom', 'chromStart', 'chromEnd', 'name', 'gieStain'], 'file': 'bosTau7_cytoBandIdeo.txt.gz', 'genome_label': 'cow', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/bosTau7/database/cytoBandIdeo.txt.gz'}, '9940': {'assembly': ('UCSC:oviAri3', 'UCSC:oviAri4'), 'build_num': 'oviAri4', 'columns': ['chrom', 'chromStart', 'chromEnd', 'name', 'gieStain'], 'file': 'oviAri4_cytoBandIdeo.txt.gz', 'genome_label': 'sheep', 'url': 'http://hgdownload.cse.ucsc.edu/goldenPath/oviAri4/database/cytoBandIdeo.txt.gz'}}
getTestSuite()

An abstract method that should be overwritten with tests appropriate for the specific source. :return:

parse(limit=None)

abstract method to parse all data from an external resource, that was fetched in fetch() this should be overridden by subclasses :return: None

dipper.sources.UDP module
class dipper.sources.UDP.UDP(graph_type, are_bnodes_skolemized, data_release_version=None)

Bases: dipper.sources.Source.Source

The National Institutes of Health (NIH) Undiagnosed Diseases Program (UDP) is part of the Undiagnosed Disease Network (UDN), an NIH Common Fund initiative that focuses on the most puzzling medical cases referred to the NIH Clinical Center in Bethesda, Maryland. from https://www.genome.gov/27544402/the-undiagnosed-diseases-program/

Data is available by request for access via the NHGRI collaboration server: https://udplims-collab.nhgri.nih.gov/api

Note the fetcher requires credentials for the UDP collaboration server Credentials are added via a config file, config.json, in the following format {

“dbauth” : {
“udp”: {
“user”: “foo” “password”: “bar”

}

} See dipper/config.py for more information

Output of fetcher: udp_variants.tsv ‘Patient’, ‘Family’, ‘Chr’, ‘Build’, ‘Chromosome Position’, ‘Reference Allele’, ‘Variant Allele’, ‘Parent of origin’, ‘Allele Type’, ‘Mutation Type’, ‘Gene’, ‘Transcript’, ‘Original Amino Acid’, ‘Variant Amino Acid’, ‘Amino Acid Change’, ‘Segregates with’, ‘Position’, ‘Exon’, ‘Inheritance model’, ‘Zygosity’, ‘dbSNP ID’, ‘1K Frequency’, ‘Number of Alleles’

udp_phenotypes.tsv ‘Patient’, ‘HPID’, ‘Present’

The script also utilizes two mapping files udp_gene_map.tsv - generated from scripts/fetch-gene-ids.py,

gene symbols from udp_variants
udp_chr_rs.tsv - rsid(s) per coordinate greped from hg19 dbsnp file,
then disambiguated with eutils, see scripts/dbsnp/dbsnp.py
UDP_SERVER = 'https://udplims-collab.nhgri.nih.gov/api'
fetch(is_dl_forced=True)

Fetches data from udp collaboration server, see top level comments for class for more information :return:

files = {'patient_phenotypes': {'file': 'udp_phenotypes.tsv'}, 'patient_variants': {'file': 'udp_variants.tsv'}}
map_files = {'dbsnp_map': '../../resources/udp/udp_chr_rs.tsv', 'gene_coord_map': '../../resources/udp/gene_coordinates.tsv', 'patient_ids': '../../resources/udp/patient_ids.yaml'}
parse(limit=None)

Override Source.parse() Args:

:param limit (int, optional) limit the number of rows processed
Returns:
:return None
dipper.sources.WormBase module
class dipper.sources.WormBase.WormBase(graph_type, are_bnodes_skolemized, data_release_version=None)

Bases: dipper.sources.Source.Source

This is the parser for the [C. elegans Model Organism Database (WormBase)](http://www.wormbase.org), from which we process genotype and phenotype data for laboratory worms (C.elegans and other nematodes).

We generate the wormbase graph to include the following information: * genes * sequence alterations (includes SNPs/del/ins/indel and

large chromosomal rearrangements)
  • RNAi as expression-affecting reagents
  • genotypes, and their components
  • strains
  • publications (and their mapping to PMIDs, if available)
  • allele-to-phenotype associations (including variants by RNAi)
  • genetic positional information for genes and sequence alterations

Genotypes leverage the GENO genotype model and includes both intrinsic and extrinsic genotypes. Where necessary, we create anonymous nodes of the genotype partonomy (i.e. for variant single locus complements, genomic variation complements, variant loci, extrinsic genotypes, and extrinsic genotype parts).

TODO: get people and gene expression

fetch(is_dl_forced=False)

abstract method to fetch all data from an external resource. this should be overridden by subclasses :return: None

files = {'allele_pheno': {'columns': ['DB', 'DB Object ID', 'DB Object Symbol', 'Qualifier', 'GO ID', 'DB:Reference (|DB:Reference)', 'Evidence Code', 'With (or) From', 'Aspect', 'DB Object NameDB Object Synonym (|Synonym)', 'DB Object Type', 'Taxon(|taxon)', 'Date', 'Assigned By', 'Annotation Extension', 'Gene Product Form ID'], 'file': 'phenotype_association.wb', 'url': 'ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/ONTOLOGY/phenotype_association.WSNUMBER.wb'}, 'checksums': {'file': 'CHECKSUMS', 'url': 'ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/CHECKSUMS'}, 'disease_assoc': {'columns': ['DB', 'DB Object ID', 'DB Object Symbol', 'Qualifier', 'GO ID', 'DB:Reference (|DB:Reference)', 'Evidence Code', 'With (or) From', 'Aspect', 'DB Object NameDB Object Synonym (|Synonym)', 'DB Object Type', 'Taxon(|taxon)', 'Date', 'Assigned By', 'Annotation Extension', 'Gene Product Form ID'], 'file': 'disease_association.wb', 'url': 'ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/ONTOLOGY/disease_association.WSNUMBER.wb'}, 'feature_loc': {'columns': ['seqid', 'source', 'type', 'start', 'end', 'score', 'strand', 'phase', 'attributes'], 'file': 'c_elegans.PRJNA13758.annotations.gff3.gz', 'url': 'ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/species/c_elegans/PRJNA13758/c_elegans.PRJNA13758.WSNUMBER.annotations.gff3.gz'}, 'gaf-eco-mapping': {'file': 'gaf-eco-mapping.yaml', 'url': 'https://archive.monarchinitiative.org/DipperCache/wormbase/gaf-eco-mapping.yaml'}, 'gene_ids': {'columns': ['taxon_num', 'gene_num', 'gene_symbol', 'gene_synonym', 'live', 'gene_type'], 'file': 'c_elegans.PRJNA13758.geneIDs.txt.gz', 'url': 'ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/species/c_elegans/PRJNA13758/annotation/c_elegans.PRJNA13758.WSNUMBER.geneIDs.txt.gz'}, 'letter': {'file': 'letter', 'url': 'ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/letter.WSNUMBER'}, 'pub_xrefs': {'columns': ['wb_ref', 'xref'], 'file': 'pub_xrefs.txt', 'url': 'http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=WpaXref'}, 'rnai_pheno': {'columns': ['gene_num', 'gene_alt_symbol', 'phenotype_label', 'phenotype_id', 'rnai_and_refs'], 'file': 'rnai_phenotypes.wb', 'url': 'ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/ONTOLOGY/rnai_phenotypes.WSNUMBER.wb'}, 'xrefs': {'file': 'c_elegans.PRJNA13758.xrefs.txt.gz', 'url': 'ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/species/c_elegans/PRJNA13758/annotation/c_elegans.PRJNA13758.WSNUMBER.xrefs.txt.gz'}}
static make_reagent_targeted_gene_id(gene_id, reagent_id)
parse(limit=None)

abstract method to parse all data from an external resource, that was fetched in fetch() this should be overridden by subclasses :return: None

process_allele_phenotype(limit=None)

This file compactly lists variant to phenotype associations, such that in a single row, there may be >1 variant listed per phenotype and paper. This indicates that each variant is individually assocated with the given phenotype, as listed in 1+ papers. (Not that the combination of variants is producing the phenotype.) :param limit: :return:

process_disease_association(limit)
process_feature_loc(limit)
process_gene_desc(limit)
process_gene_ids(limit)
process_gene_interaction(limit)

The gene interaction file includes identified interactions, that are between two or more gene (products). In the case of interactions with >2 genes, this requires creating groups of genes that are involved in the interaction. From the wormbase help list: In the example WBInteraction000007779 it would likely be misleading to suggest that lin-12 interacts with (suppresses in this case) smo-1 ALONE or that lin-12 suppresses let-60 ALONE; the observation in the paper; see Table V in paper PMID:15990876 was that a lin-12 allele (heterozygous lin-12(n941/+)) could suppress the “multivulva” phenotype induced synthetically by simultaneous perturbation of BOTH smo-1 (by RNAi) AND let-60 (by the n2021 allele). So this is necessarily a three-gene interaction.

Therefore, we can create groups of genes based on their “status” of Effector | Effected.

Status: IN PROGRESS

Parameters:limit
Returns:
process_pub_xrefs(limit=None)
process_rnai_phenotypes(limit=None)
species = '/species/c_elegans/PRJNA13758'
update_wsnum_in_files(vernum)

With the given version number `vernum`, update the source’s version number, and replace in the file hashmap. the version number is in the CHECKSUMS file. :param vernum: :return:

wbdev = 'ftp://ftp.wormbase.org/pub/wormbase/releases/current-development-release'
wbprod = 'ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release'
wbrel = 'ftp://ftp.wormbase.org/pub/wormbase/releases'
dipper.sources.Xenbase module
class dipper.sources.Xenbase.Xenbase(graph_type, are_bnodes_skolemized, data_release_version=None)

Bases: dipper.sources.Source.Source

Xenbase is a web-accessible resource that integrates all the diverse biological, genomic, genotype and phenotype data available from Xenopus research.

fetch(is_dl_forced=False)

abstract method to fetch all data from an external resource. this should be overridden by subclasses :return: None

files = {'g2p_assertions': {'columns': ['SUBJECT', 'SUBJECT_LABEL', 'SUBJECT_TAXON', 'SUBJECT_TAXON_LABEL', 'OBJECT', 'OBJECT_LABEL', 'RELATION', 'RELATION_LABEL', 'EVIDENCE', 'EVIDENCE_LABEL', 'SOURCE', 'IS_DEFINED_BY', 'QUALIFIER'], 'file': 'xb_xpo_spo_v_v1.tab', 'url': 'https://archive.monarchinitiative.org/DipperCache/xenbase/xb_xpo_spo_v_v1.tab'}, 'gene_literature': {'columns': ['xb_article', 'pmid', 'gene_pages'], 'file': 'LiteratureMatchedGenesByPaper.txt', 'url': 'http://ftp.xenbase.org//pub/GenePageReports/LiteratureMatchedGenesByPaper.txt'}, 'genepage2gene': {'columns': ['gene_page_id', 'gene_page_label', 'tropicalis_id', 'tropicalis_label', 'laevis_l_id', 'laevis_l_label', 'laevis_s_id', 'laevis_s_label'], 'file': 'XenbaseGenepageToGeneIdMapping.txt', 'url': 'http://ftp.xenbase.org//pub/GenePageReports/XenbaseGenepageToGeneIdMapping.txt'}, 'orthologs': {'columns': ['SUBJECT', 'SUBJECT_LABEL', 'SUBJECT_TAXON', 'SUBJECT_TAXON_LABEL', 'OBJECT', 'OBJECT_LABEL', 'RELATION', 'RELATION_LABEL', 'EVIDENCE', 'EVIDENCE_LABEL', 'SOURCE', 'IS_DEFINED_BY', 'QUALIFIER'], 'file': 'xb_ortho_spo_v_v20210318a.csv', 'url': 'https://archive.monarchinitiative.org/DipperCache/xenbase/xb_ortho_spo_v_v20210318a.csv'}}
parse(limit=None)

abstract method to parse all data from an external resource, that was fetched in fetch() this should be overridden by subclasses :return: None

dipper.sources.ZFIN module
class dipper.sources.ZFIN.ZFIN(graph_type, are_bnodes_skolemized, data_release_version=None)

Bases: dipper.sources.Source.Source

This is the parser for the [Zebrafish Model Organism Database (ZFIN)](http://www.zfin.org), from which we process genotype and phenotype data for laboratory zebrafish.

We generate the zfin graph to include the following information: * genes * sequence alterations (includes SNPs/del/ins/indel and large chromosomal rearrangements) * transgenic constructs * morpholinos, talens, crisprs as expression-affecting reagents * genotypes, and their components * fish (as comprised of intrinsic and extrinsic genotypes) * publications (and their mapping to PMIDs, if available) * genotype-to-phenotype associations (including environments and stages at which they are assayed) * environmental components * orthology to human genes * genetic positional information for genes and sequence alterations * fish-to-disease model associations

Genotypes leverage the GENO genotype model and include both intrinsic and extrinsic genotypes. Where necessary, we create anonymous nodes of the genotype partonomy (such as for variant single locus complements, genomic variation complements, variant loci, extrinsic genotypes, and extrinsic genotype parts).

Genotype labels are output as ZFIN genotype name + “[background]”. We also process the genotype components to build labels in a monarch-style, and these are added as synonyms. The monarch-style genotype label includes: * all genes targeted by reagents (morphants, crisprs, etc), in addition to the ones that the reagent was designed against. * all affected genes within deficiencies * complex hets being listed as gene<mutation1>/gene<mutation2> rather than gene<mutation1>/+; gene<mutation2>/+

see: resources/zfin/README for column extraction from downloads page

fetch(is_dl_forced=False)

abstract method to fetch all data from an external resource. this should be overridden by subclasses :return: None

fhandle = <_io.TextIOWrapper name='/home/docs/checkouts/readthedocs.org/user_builds/dipper/checkouts/master/dipper/sources/../../tests/resources/zfin/zfin_test_ids.yaml' mode='r' encoding='UTF-8'>
files = {'backgrounds': {'columns': ['Genotype ID', 'Genotype Name', 'Background', 'Background Name'], 'file': 'genotype_backgrounds.txt', 'url': 'http://zfin.org/downloads/genotype_backgrounds.txt'}, 'crispr': {'columns': ['Gene ID', 'Gene SO ID', 'Gene Symbol', 'CRISPR ID', 'CRISPR SO ID', 'CRISPR Symbol', 'CRISPR Target Sequence', 'Publication(s)', 'Note'], 'file': 'CRISPR.txt', 'url': 'http://zfin.org/downloads/CRISPR.txt'}, 'enviro': {'columns': ['Environment ID', 'ZECO Term Name', 'ZECO Term ID (ZECO:ID)', 'Chebi Term Name', 'Chebi Term ID (Chebi:ID)', 'ZFA Term Name', 'ZFA Term ID (ZFA:ID)', 'Affected Structure Subterm Name', 'Affected Structure Subterm ID (GO-CC:ID)', 'NCBI Taxon Name', 'NCBI Taxon ID (NCBI Taxon:ID)'], 'file': 'pheno_environment_fish.txt', 'url': 'http://zfin.org/downloads/pheno_environment_fish.txt'}, 'feature_affected_gene': {'columns': ['Genomic Feature ID', 'Feature SO ID', 'Genomic Feature Abbreviation', 'Gene Symbol', 'Gene ID', 'Gene SO ID', 'Genomic Feature - Marker Relationship', 'Feature Type', 'DNA/cDNA Change SO ID', 'Reference Nucleotide', 'Mutant Nucleotide', 'Base Pairs Added', 'Base Pairs Removed', 'DNA/cDNA Change Position Start', 'DNA/cDNA Change Position End', 'DNA/cDNA Reference Sequence', 'DNA/cDNA Change Localization', 'DNA/cDNA Change Localization SO ID', 'DNA/cDNA Change Localization Exon', 'DNA/cDNA Change Localization Intron', 'Transcript Consequence', 'Transcript Consequence SO ID', 'Transcript Consequence Exon', 'Transcript Consequence Intron', 'Protein Consequence', 'Protein Consequence SO ID', 'Reference Amino Acid', 'Mutant Amino Acid', 'Amino Acids Added', 'Amino Acids Removed', 'Protein Consequence Position Start', 'Protein Consequence Position End', 'Protein Reference Sequence'], 'file': 'features-affected-genes.txt', 'url': 'http://zfin.org/downloads/features-affected-genes.txt'}, 'features': {'columns': ['Genomic Feature ID', 'Feature SO ID', 'Genomic Feature Abbreviation', 'Genomic Feature Name', 'Genomic Feature Type', 'Mutagen', 'Mutagee', 'Construct ID', 'Construct name', 'Construct SO ID', 'TALEN/CRISPR ID', 'TALEN/CRISPR Name'], 'file': 'features.txt', 'url': 'http://zfin.org/downloads/features.txt'}, 'fish_components': {'columns': ['Fish ID', 'Fish Name', 'Gene ID', 'Gene Symbol', 'Affector ID', 'Affector Symbol', 'Construct ID', 'Construct Symbol', 'Background ID', 'Background Name', 'Genotype ID', 'Genotype Name'], 'file': 'fish_components_fish.txt', 'url': 'http://zfin.org/downloads/fish_components_fish.txt'}, 'fish_disease_models': {'columns': ['Fish ZDB ID', 'Environment ZDB ID', 'is_a_model', 'DO Term ID', 'DO Term Name', 'Publication ZDB ID', 'PubMed ID', 'Evidence Code'], 'file': 'fish_model_disease.txt', 'url': 'http://zfin.org/downloads/fish_model_disease.txt'}, 'genbank': {'columns': ['ZFIN ID', 'SO ID', 'Name', 'GenBank ID'], 'file': 'genbank.txt', 'url': 'http://zfin.org/downloads/genbank.txt'}, 'gene': {'columns': ['ZFIN ID', 'SO ID', 'Symbol', 'NCBI Gene ID'], 'file': 'gene.txt', 'url': 'http://zfin.org/downloads/gene.txt'}, 'gene_coordinates': {'columns': ['Chromosome', 'Source', 'Type', 'Start', 'End', 'Score', 'Strand', 'Phase', 'Attributes'], 'file': 'E_zfin_gene_alias.gff3', 'url': 'http://zfin.org/downloads/E_zfin_gene_alias.gff3'}, 'gene_marker_rel': {'columns': ['Gene ID', 'Gene SO ID', 'Gene Symbol', 'Marker ID', 'Marker SO ID', 'Marker Symbol', 'Relationship'], 'file': 'gene_marker_relationship.txt', 'url': 'http://zfin.org/downloads/gene_marker_relationship.txt'}, 'geno': {'columns': ['Genotype ID', 'Genotype Name', 'Genotye Unique Name', 'Allele ID', 'Allele Name', 'Allele Abbreviation', 'Allele Type', 'Allele Display Type', 'Gene or Construct Symbol', 'Corresponding ZFIN Gene ID/Construct ID', 'Allele Zygosity', 'Construct Name', 'Construct ZdbId'], 'file': 'genotype_features.txt', 'url': 'http://zfin.org/downloads/genotype_features.txt'}, 'human_orthos': {'columns': ['ZFIN ID', 'ZFIN Symbol', 'ZFIN Name', 'Human Symbol', 'Human Name', 'OMIM ID', 'Gene ID', 'HGNC ID', 'Evidence', 'Pub ID'], 'file': 'human_orthos.txt', 'url': 'http://zfin.org/downloads/human_orthos.txt'}, 'mappings': {'columns': ['ZFIN ID', 'Symbol', 'SO_id', 'Panel Symbol', 'Chromosome', 'Location', 'Metric'], 'file': 'mappings.txt', 'url': 'http://zfin.org/downloads/mappings.txt'}, 'morph': {'columns': ['Gene ID', 'Gene SO ID', 'Gene Symbol', 'Morpholino ID', 'Morpholino SO ID', 'Morpholino Symbol', 'Morpholino Sequence', 'Publication(s)', 'Note'], 'file': 'Morpholinos.txt', 'url': 'http://zfin.org/downloads/Morpholinos.txt'}, 'pheno': {'columns': ['Fish ID', 'Fish Name', 'Start Stage ID', 'Start Stage Name', 'End Stage ID', 'End Stage Name', 'Affected Structure or Process 1 subterm ID', 'Affected Structure or Process 1 subterm Name', 'Post-composed Relationship ID', 'Post-composed Relationship Name', 'Affected Structure or Process 1 superterm ID', 'Affected Structure or Process 1 superterm Name', 'Phenotype Keyword ID', 'Phenotype Keyword Name', 'Phenotype Tag', 'Affected Structure or Process 2 subterm ID', 'Affected Structure or Process 2 subterm name', 'Post-composed Relationship (rel) ID', 'Post-composed Relationship (rel) Name', 'Affected Structure or Process 2 superterm ID', 'Affected Structure or Process 2 superterm name', 'Publication ID', 'Environment ID'], 'file': 'phenotype_fish.txt', 'url': 'http://zfin.org/downloads/phenotype_fish.txt'}, 'pub2pubmed': {'columns': ['Publication ZFIN ID', 'PubMed ID (none or blank when not available)'], 'file': 'pub_to_pubmed_id_translation.txt', 'url': 'http://zfin.org/downloads/pub_to_pubmed_id_translation.txt'}, 'pubs': {'columns': ['Publication ID', 'pubMed ID (none or blank when not available)', 'Authors', 'Title', 'Journal', 'Year', 'Volume', 'Pages'], 'file': 'zfinpubs.txt', 'url': 'http://zfin.org/downloads/zfinpubs.txt'}, 'stage': {'columns': ['Stage ID', 'Stage OBO ID', 'Stage Name', 'Begin Hours', 'End Hours'], 'file': 'stage_ontology.txt', 'url': 'http://zfin.org/Downloads/stage_ontology.txt'}, 'talen': {'columns': ['Gene ID', 'Gene SO ID', 'Gene Symbol', 'TALEN ID', 'TALEN SO ID', 'TALEN Symbol', 'TALEN Target Sequence 1', 'TALEN Target Sequence 2', 'Publication(s)', 'Note'], 'file': 'TALEN.txt', 'url': 'http://zfin.org/downloads/TALEN.txt'}, 'uniprot': {'columns': ['ZFIN ID', 'SO ID', 'Symbol', 'UniProt ID'], 'file': 'uniprot.txt', 'url': 'http://zfin.org/downloads/uniprot.txt'}, 'wild': {'columns': ['Fish ID', 'Fish Name', 'Fish Abbreviation', 'Genotype ID'], 'file': 'wildtypes.txt', 'url': 'http://zfin.org/downloads/wildtypes_fish.txt'}, 'zmine_ortho_evidence': {'columns': ['zfin_gene_num', 'zfin_gene_symbol', 'ortholog_gene_symbol', 'ortholog_ncbigene_num', 'evidence_code', 'zfin_pub_num', 'pubmed_num'], 'file': 'zmine_ortho_evidence.txt', 'url': 'http://0.0.0.0'}, 'zpmap': {'columns': ['iri', 'id'], 'file': 'id_map_zfin.tsv', 'url': 'http://purl.obolibrary.org/obo/zp//id_map_zfin.tsv'}}
static get_orthology_evidence_code(abbrev)

move to localtt & globltt

get_orthology_sources_from_zebrafishmine()

Fetch the zfin gene to other species orthology annotations, together with the evidence for the assertion. Write the file locally to be read in a separate function. :return:

static make_targeted_gene_id(geneid, reagentid)
parse(limit=None)

abstract method to parse all data from an external resource, that was fetched in fetch() this should be overridden by subclasses :return: None

process_fish(limit=None)

Fish give identifiers to the “effective genotypes” that we create. We can match these by: Fish = (intrinsic) genotype + set of morpholinos

We assume here that the intrinsic genotypes and their parts will be processed separately, prior to calling this function.

Parameters:limit
Returns:
process_fish_disease_models(limit=None)
process_orthology_evidence(limit)
test_ids = {'allele': ['ZDB-ALT-010426-4', 'ZDB-ALT-010427-8', 'ZDB-ALT-011017-8', 'ZDB-ALT-051005-2', 'ZDB-ALT-051227-8', 'ZDB-ALT-060221-2', 'ZDB-ALT-070314-1', 'ZDB-ALT-070409-1', 'ZDB-ALT-070420-6', 'ZDB-ALT-080528-1', 'ZDB-ALT-080528-6', 'ZDB-ALT-080827-15', 'ZDB-ALT-080908-7', 'ZDB-ALT-090316-1', 'ZDB-ALT-100519-1', 'ZDB-ALT-111024-1', 'ZDB-ALT-980203-1374', 'ZDB-ALT-980203-412', 'ZDB-ALT-980203-465', 'ZDB-ALT-980203-470', 'ZDB-ALT-980203-605', 'ZDB-ALT-980413-636', 'ZDB-ALT-021021-2', 'ZDB-ALT-080728-1', 'ZDB-ALT-100729-1', 'ZDB-ALT-980203-1560', 'ZDB-ALT-001127-6', 'ZDB-ALT-001129-2', 'ZDB-ALT-980203-1091', 'ZDB-ALT-070118-2', 'ZDB-ALT-991005-33', 'ZDB-ALT-020918-2', 'ZDB-ALT-040913-6', 'ZDB-ALT-980203-1827', 'ZDB-ALT-090504-6', 'ZDB-ALT-121218-1'], 'environment': ['ZDB-EXP-050202-1', 'ZDB-EXP-071005-3', 'ZDB-EXP-071227-14', 'ZDB-EXP-080428-1', 'ZDB-EXP-080428-2', 'ZDB-EXP-080501-1', 'ZDB-EXP-080805-7', 'ZDB-EXP-080806-5', 'ZDB-EXP-080806-8', 'ZDB-EXP-080806-9', 'ZDB-EXP-081110-3', 'ZDB-EXP-090505-2', 'ZDB-EXP-100330-7', 'ZDB-EXP-100402-1', 'ZDB-EXP-100402-2', 'ZDB-EXP-100422-3', 'ZDB-EXP-100511-5', 'ZDB-EXP-101025-12', 'ZDB-EXP-101025-13', 'ZDB-EXP-110926-4', 'ZDB-EXP-110927-1', 'ZDB-EXP-120809-5', 'ZDB-EXP-120809-7', 'ZDB-EXP-120809-9', 'ZDB-EXP-120913-5', 'ZDB-EXP-130222-13', 'ZDB-EXP-130222-7', 'ZDB-EXP-130904-2', 'ZDB-EXP-041102-1', 'ZDB-EXP-140822-13', 'ZDB-EXP-041102-1', 'ZDB-EXP-070129-3', 'ZDB-EXP-110929-7', 'ZDB-EXP-100520-2', 'ZDB-EXP-100920-3', 'ZDB-EXP-100920-5', 'ZDB-EXP-090601-2', 'ZDB-EXP-151116-3'], 'fish': ['ZDB-FISH-150901-17912', 'ZDB-FISH-150901-18649', 'ZDB-FISH-150901-26314', 'ZDB-FISH-150901-9418', 'ZDB-FISH-150901-14591', 'ZDB-FISH-150901-9997', 'ZDB-FISH-150901-23877', 'ZDB-FISH-150901-22128', 'ZDB-FISH-150901-14869', 'ZDB-FISH-150901-6695', 'ZDB-FISH-150901-24158', 'ZDB-FISH-150901-3631', 'ZDB-FISH-150901-20836', 'ZDB-FISH-150901-1060', 'ZDB-FISH-150901-8451', 'ZDB-FISH-150901-2423', 'ZDB-FISH-150901-20257', 'ZDB-FISH-150901-10002', 'ZDB-FISH-150901-12520', 'ZDB-FISH-150901-14833', 'ZDB-FISH-150901-2104', 'ZDB-FISH-150901-6607', 'ZDB-FISH-150901-1409'], 'gene': ['ZDB-GENE-000616-6', 'ZDB-GENE-000710-4', 'ZDB-GENE-030131-2773', 'ZDB-GENE-030131-8769', 'ZDB-GENE-030219-146', 'ZDB-GENE-030404-2', 'ZDB-GENE-030826-1', 'ZDB-GENE-030826-2', 'ZDB-GENE-040123-1', 'ZDB-GENE-040426-1309', 'ZDB-GENE-050522-534', 'ZDB-GENE-060503-719', 'ZDB-GENE-080405-1', 'ZDB-GENE-081211-2', 'ZDB-GENE-091118-129', 'ZDB-GENE-980526-135', 'ZDB-GENE-980526-166', 'ZDB-GENE-980526-196', 'ZDB-GENE-980526-265', 'ZDB-GENE-980526-299', 'ZDB-GENE-980526-41', 'ZDB-GENE-980526-437', 'ZDB-GENE-980526-44', 'ZDB-GENE-980526-481', 'ZDB-GENE-980526-561', 'ZDB-GENE-980526-89', 'ZDB-GENE-990415-181', 'ZDB-GENE-990415-72', 'ZDB-GENE-990415-75', 'ZDB-GENE-980526-44', 'ZDB-GENE-030421-3', 'ZDB-GENE-980526-196', 'ZDB-GENE-050320-62', 'ZDB-GENE-061013-403', 'ZDB-GENE-041114-104', 'ZDB-GENE-030131-9700', 'ZDB-GENE-031114-1', 'ZDB-GENE-990415-72', 'ZDB-GENE-030131-2211', 'ZDB-GENE-030131-3063', 'ZDB-GENE-030131-9460', 'ZDB-GENE-980526-26', 'ZDB-GENE-980526-27', 'ZDB-GENE-980526-29', 'ZDB-GENE-071218-6', 'ZDB-GENE-070912-423', 'ZDB-GENE-011207-1', 'ZDB-GENE-980526-284', 'ZDB-GENE-980526-72', 'ZDB-GENE-991129-7', 'ZDB-GENE-000607-83', 'ZDB-GENE-090504-2'], 'genotype': ['ZDB-GENO-010426-2', 'ZDB-GENO-010427-3', 'ZDB-GENO-010427-4', 'ZDB-GENO-050209-30', 'ZDB-GENO-051018-1', 'ZDB-GENO-070209-80', 'ZDB-GENO-070215-11', 'ZDB-GENO-070215-12', 'ZDB-GENO-070228-3', 'ZDB-GENO-070406-1', 'ZDB-GENO-070712-5', 'ZDB-GENO-070917-2', 'ZDB-GENO-080328-1', 'ZDB-GENO-080418-2', 'ZDB-GENO-080516-8', 'ZDB-GENO-080606-609', 'ZDB-GENO-080701-2', 'ZDB-GENO-080713-1', 'ZDB-GENO-080729-2', 'ZDB-GENO-080804-4', 'ZDB-GENO-080825-3', 'ZDB-GENO-091027-1', 'ZDB-GENO-091027-2', 'ZDB-GENO-091109-1', 'ZDB-GENO-100325-3', 'ZDB-GENO-100325-4', 'ZDB-GENO-100325-5', 'ZDB-GENO-100325-6', 'ZDB-GENO-100524-2', 'ZDB-GENO-100601-2', 'ZDB-GENO-100910-1', 'ZDB-GENO-111025-3', 'ZDB-GENO-120522-18', 'ZDB-GENO-121210-1', 'ZDB-GENO-130402-5', 'ZDB-GENO-980410-268', 'ZDB-GENO-080307-1', 'ZDB-GENO-960809-7', 'ZDB-GENO-990623-3', 'ZDB-GENO-130603-1', 'ZDB-GENO-001127-3', 'ZDB-GENO-001129-1', 'ZDB-GENO-090203-8', 'ZDB-GENO-070209-1', 'ZDB-GENO-070118-1', 'ZDB-GENO-140529-1', 'ZDB-GENO-070820-1', 'ZDB-GENO-071127-3', 'ZDB-GENO-000209-20', 'ZDB-GENO-980202-1565', 'ZDB-GENO-010924-10', 'ZDB-GENO-010531-2', 'ZDB-GENO-090504-5', 'ZDB-GENO-070215-11', 'ZDB-GENO-121221-1'], 'morpholino': ['ZDB-MRPHLNO-041129-1', 'ZDB-MRPHLNO-041129-2', 'ZDB-MRPHLNO-041129-3', 'ZDB-MRPHLNO-050308-1', 'ZDB-MRPHLNO-050308-3', 'ZDB-MRPHLNO-060508-2', 'ZDB-MRPHLNO-070118-1', 'ZDB-MRPHLNO-070522-3', 'ZDB-MRPHLNO-070706-1', 'ZDB-MRPHLNO-070725-1', 'ZDB-MRPHLNO-070725-2', 'ZDB-MRPHLNO-071005-1', 'ZDB-MRPHLNO-071227-1', 'ZDB-MRPHLNO-080307-1', 'ZDB-MRPHLNO-080428-1', 'ZDB-MRPHLNO-080430-1', 'ZDB-MRPHLNO-080919-4', 'ZDB-MRPHLNO-081110-3', 'ZDB-MRPHLNO-090106-5', 'ZDB-MRPHLNO-090114-1', 'ZDB-MRPHLNO-090505-1', 'ZDB-MRPHLNO-090630-11', 'ZDB-MRPHLNO-090804-1', 'ZDB-MRPHLNO-100728-1', 'ZDB-MRPHLNO-100823-6', 'ZDB-MRPHLNO-101105-3', 'ZDB-MRPHLNO-110323-3', 'ZDB-MRPHLNO-111104-5', 'ZDB-MRPHLNO-130222-4', 'ZDB-MRPHLNO-080430', 'ZDB-MRPHLNO-100823-6', 'ZDB-MRPHLNO-140822-1', 'ZDB-MRPHLNO-100520-4', 'ZDB-MRPHLNO-100520-5', 'ZDB-MRPHLNO-100920-3', 'ZDB-MRPHLNO-050604-1', 'ZDB-CRISPR-131113-1', 'ZDB-MRPHLNO-140430-12', 'ZDB-MRPHLNO-140430-13'], 'pub': ['PMID:11566854', 'PMID:12588855', 'PMID:12867027', 'PMID:14667409', 'PMID:15456722', 'PMID:16914492', 'PMID:17374715', 'PMID:17545503', 'PMID:17618647', 'PMID:17785424', 'PMID:18201692', 'PMID:18358464', 'PMID:18388326', 'PMID:18638469', 'PMID:18846223', 'PMID:19151781', 'PMID:19759004', 'PMID:19855021', 'PMID:20040115', 'PMID:20138861', 'PMID:20306498', 'PMID:20442775', 'PMID:20603019', 'PMID:21147088', 'PMID:21893049', 'PMID:21925157', 'PMID:22718903', 'PMID:22814753', 'PMID:22960038', 'PMID:22996643', 'PMID:23086717', 'PMID:23203810', 'PMID:23760954', 'ZFIN:ZDB-PUB-140303-33', 'ZFIN:ZDB-PUB-140404-9', 'ZFIN:ZDB-PUB-080902-16', 'ZFIN:ZDB-PUB-101222-7', 'ZFIN:ZDB-PUB-140614-2', 'ZFIN:ZDB-PUB-120927-26', 'ZFIN:ZDB-PUB-100504-5', 'ZFIN:ZDB-PUB-140513-341']}
dipper.sources.ZFINSlim module
class dipper.sources.ZFINSlim.ZFINSlim(graph_type, are_bnodes_skolemized, data_release_version=None)

Bases: dipper.sources.Source.Source

zfin mgi model only containing Gene to phenotype associations Using the file here: https://zfin.org/downloads/phenoGeneCleanData_fish.txt

fetch(is_dl_forced=False)

abstract method to fetch all data from an external resource. this should be overridden by subclasses :return: None

files = {'g2p_clean': {'columns': ['ID', 'Gene Symbol', 'Gene ID', 'Affected Structure or Process 1 subterm ID', 'Affected Structure or Process 1 subterm Name', 'Post-composed Relationship ID', 'Post-composed Relationship Name', 'Affected Structure or Process 1 superterm ID', 'Affected Structure or Process 1 superterm Name', 'Phenotype Keyword ID', 'Phenotype Keyword Name', 'Phenotype Tag', 'Affected Structure or Process 2 subterm ID', 'Affected Structure or Process 2 subterm name', 'Post-composed Relationship (rel) ID', 'Post-composed Relationship (rel) Name', 'Affected Structure or Process 2 superterm ID', 'Affected Structure or Process 2 superterm name', 'Fish ID', 'Fish Display Name', 'Start Stage ID', 'End Stage ID', 'Fish Environment ID', 'Publication ID', 'Figure ID'], 'file': 'phenoGeneCleanData_fish.txt', 'url': 'https://zfin.org/downloads/phenoGeneCleanData_fish.txt'}, 'zpmap': {'columns': ['iri', 'id'], 'file': 'id_map_zfin.tsv', 'url': 'http://purl.obolibrary.org/obo/zp//id_map_zfin.tsv'}}
parse(limit=None)

abstract method to parse all data from an external resource, that was fetched in fetch() this should be overridden by subclasses :return: None

dipper.utils package
Submodules
dipper.utils.CurieUtil module
class dipper.utils.CurieUtil.CurieUtil(curie_map)

Bases: object

Create compact URI

get_base()
get_curie(uri)

Get a CURIE from a URI

get_curie_prefix(uri)

Return the CURIE’s prefix:

get_uri(curie)

Get a URI from a CURIE

prefix_exists(pfx)
dipper.utils.DipperUtil module
class dipper.utils.DipperUtil.DipperUtil

Bases: object

Various utilities and quick methods used in this application

(A little too quick) Per: https://www.ncbi.nlm.nih.gov/books/NBK25497/ NCBI recommends that users post

no more than three URL requests per second and limit large jobs to either weekends or between 9:00 PM and 5:00 AM Eastern time during weekdays

restructuring to make bulk queries is less likely to result in another ban for peppering them with one offs

static get_hgnc_id_from_symbol(gene_symbol)

Get HGNC curie from symbol using monarch and mygene services :param gene_symbol: :return:

static get_homologene_by_gene_num(gene_num)
static get_ncbi_taxon_num_by_label(label)

Here we want to look up the NCBI Taxon id using some kind of label. It will only return a result if there is a unique hit.

Returns:
static is_id_in_mondo(curie, mondo_min)
Parameters:
Returns:

boolean, true if ID is in mondo and false otherwise

static remove_control_characters(string)
Filters out charcters in any of these unicode catagories [Cc] Other, Control ( 65 characters)
, …
[Cf] Other, Format (151 characters) [Cn] Other, Not Assigned ( 0 characters – none have this property) [Co] Other, Private Use ( 6 characters) [Cs] Other, Surrogate ( 6 characters)
dipper.utils.GraphUtils module
class dipper.utils.GraphUtils.GraphUtils(curie_map)

Bases: object

static add_property_axioms(graph, properties)
static add_property_to_graph(results, graph, property_type, property_list)
static compare_graph_predicates(graph1, graph2)

From rdf graphs, count predicates in each and return a list of : param graph1 graph, hopefully RDFlib-like : param graph2 graph, ditto : return dict with count of predicates in each graph: : e.g.: : { : “has_a_property”: { : “graph1”: 1234, : “graph2”: 1023}, : “has_another_property”: { : “graph1”: 94, : “graph2”: 51} : }

static count_predicates(graph)

From rdf graphs, count predicates in each and return a list of : param graph : return dict with count of predicates in each graph: : e.g.: : { : “has_a_property”: 1234, : “has_another_property”: 482 : }

static digest_id(wordage)

Form a deterministic digest of input Leading ‘b’ is an experiment forcing the first char to be non numeric but valid hex Not required for RDF but some other contexts do not want the leading char to be a digit

: param str wordage arbitrary string : return str

static get_properties_from_graph(graph)

Wrapper for RDFLib.graph.predicates() that returns a unique set :param graph: RDFLib.graph :return: set, set of properties

static write(graph, fileformat=None, filename=None)

A basic graph writer (to stdout) for any of the sources. this will write raw triples in rdfxml, unless specified. to write turtle, specify format=’turtle’ an optional file can be supplied instead of stdout :return: None

dipper.utils.TestUtils module
class dipper.utils.TestUtils.TestUtils

Bases: object

static remove_ontology_axioms(graph)

Given an rdflib graph, remove any triples connected to an ontology node: {} a owl:Ontology :param graph: RDFGraph :return: None

static test_graph_equality(turtlish, graph)
Parameters:
  • turtlish – file path or string of triples in turtle format without prefix header
  • graph – Graph object to test against
Returns:

Boolean, True if graphs contain same set of triples

dipper.utils.rdf2dot module

A fork of rdflib rdf2dot utility, see https://rdflib.readthedocs.io/en/stable/_modules/rdflib/tools/rdf2dot.html#rdf2dot

We apply the formatliteral function to labels, but otherwise the code is the same

This is necessary for variants with HGVS primary labels that contain characters that need to be url encoded (<, >)

Also replaces cgi.escape with html.escape

TO DO make a PR

dipper.utils.rdf2dot.rdf2dot(g, stream, graph_opts=None)

Convert the RDF graph to DOT writes the dot output to the stream

dipper.utils.romanplus module

Convert to and from Roman numerals This program is part of “Dive Into Python”, a free Python tutorial for experienced programmers. Visit http://diveintopython.org/ for the latest version.

This program is free software; you can redistribute it and/or modify it under the terms of the Python 2.1.1 license, available at http://www.python.org/2.1.1/license.html

Note: This has been modified to add optional characters after the initial roman numbers by nlw.

dipper.utils.romanplus.fromRoman(strng)

convert Roman numeral to integer

dipper.utils.romanplus.toRoman(num)

convert integer to Roman numeral

Submodules
dipper.config module
dipper.config.conf = {'keys': {'omim': ''}}

Load the configuration file ‘conf.yaml’, if it exists. it isn’t always required, but may be for some sources. conf.yaml may contain sensitive info and should not live in a public repo

dipper.config.get_config()
dipper.curie_map module

Acroname central

Load the curie mapping file ‘curie_map.yaml’, it is necessary for most resources

dipper.curie_map.get()
dipper.curie_map.get_base()

Source APIs

Indices and tables