dipper.sources.FlyBase module

class dipper.sources.FlyBase.FlyBase(graph_type, are_bnodes_skolemized, data_release_version=None)

Bases: dipper.sources.PostgreSQLSource.PostgreSQLSource

This is the [Drosophila Genetics](http://www.flybase.org/) resource, from which we process genotype and phenotype data about the fruit fly.

Here, we connect to their public database and download preprocessed files

Queries from the relational db 1. allele-phenotype data: ../../sources/sql/fb/allele_phenotype.sql 2. gene dbxrefs: ../../resources/sql/fb/gene_xref.sql

Downloads: 1. allele_human_disease_model_data_fb_*.tsv.gz - models of disease 2. species.ab.gz - species prefix mappings 3. fbal_to_fbgn_fb*.tsv.gz - allele to gene 4. fbrf_pmid_pmcid_doi_fb_*.tsv.gz - flybase ref to pmid

We connect using the [Direct Chado Access](http://gmod.org/wiki/ Public_Chado_Databases#Direct_Chado_Access)

When running the whole set, it performs best by dumping raw triples using the flag `--format nt`.

Note that this script underwent a major revision after commit bd5f555 in which genotypes, stocks, and environments were removed

CURREL = 'releases/current/precomputed_files'
FLYFTP = 'ftp.flybase.net'
fetch(is_dl_forced=False)

Fetch flat files and sql queries

Parameters:is_dl_forced – force download
Returns:None
files = {'allele_gene': {'columns': ['AlleleID', 'AlleleSymbol', 'GeneID', 'GeneSymbol'], 'file': 'fbal_to_fbgn_fb.tsv.gz', 'url': 'releases/current/precomputed_files/alleles/fbal_to_fbgn.*tsv\\.gz$'}, 'disease_model': {'columns': ['FBgn ID', 'Gene symbol', 'HGNC ID', 'DO qualifier', 'DO ID', 'DO term', 'Allele used in model (FBal ID)', 'Allele used in model (symbol)', 'Based on orthology with (HGNC ID)', 'Based on orthology with (symbol)', 'Evidence/interacting alleles', 'Reference (FBrf ID)'], 'file': 'disease_model_annotations.tsv.gz', 'url': 'releases/current/precomputed_files/human_disease/disease_model_annotations.+tsv\\.gz$'}, 'ref_pubmed': {'columns': ['FBrf', 'PMID', 'PMCID', 'DOI', 'pub_type', 'miniref', 'pmid_added'], 'file': 'fbrf_pmid_pmcid_doi_fb.tsv.gz', 'url': 'releases/current/precomputed_files/references/fbrf_pmid_pmcid_doi.+tsv\\.gz$'}, 'species_map': {'columns': ['internal_id', 'taxgroup', 'abbreviation', 'genus', 'species name', 'common name', 'comment', 'ncbi-taxon-id'], 'file': 'species.ab.gz', 'url': 'releases/current/precomputed_files/species/species\\.ab\\.gz$'}}
parse(limit=None)

Parse flybase files and add to graph

Parameters:limit – number of rows to process
Returns:None
queries = {'allele_phenotype': {'columns': ['allele_id', 'pheno_desc', 'pheno_type', 'pub_id', 'pub_title', 'pmid_id'], 'file': 'allele_phenotype.tsv', 'query': '../../resources/sql/fb/allele_phenotype.sql'}, 'gene_xref': {'columns': ['gene_id', 'xref_id', 'xref_source'], 'file': 'gene_xref.tsv', 'query': '../../resources/sql/fb/gene_xref.sql'}}