dipper.sources.FlyBase module¶
-
class
dipper.sources.FlyBase.
FlyBase
(graph_type, are_bnodes_skolemized, data_release_version=None)¶ Bases:
dipper.sources.PostgreSQLSource.PostgreSQLSource
This is the [Drosophila Genetics](http://www.flybase.org/) resource, from which we process genotype and phenotype data about the fruit fly.
Here, we connect to their public database and download preprocessed files
Queries from the relational db 1. allele-phenotype data: ../../sources/sql/fb/allele_phenotype.sql 2. gene dbxrefs: ../../resources/sql/fb/gene_xref.sql
Downloads: 1. allele_human_disease_model_data_fb_*.tsv.gz - models of disease 2. species.ab.gz - species prefix mappings 3. fbal_to_fbgn_fb*.tsv.gz - allele to gene 4. fbrf_pmid_pmcid_doi_fb_*.tsv.gz - flybase ref to pmid
We connect using the [Direct Chado Access](http://gmod.org/wiki/ Public_Chado_Databases#Direct_Chado_Access)
When running the whole set, it performs best by dumping raw triples using the flag
`--format nt`
.Note that this script underwent a major revision after commit bd5f555 in which genotypes, stocks, and environments were removed
-
CURREL
= 'releases/current/precomputed_files'¶
-
FLYFTP
= 'ftp.flybase.net'¶
-
fetch
(is_dl_forced=False)¶ Fetch flat files and sql queries
Parameters: is_dl_forced – force download Returns: None
-
files
= {'allele_gene': {'columns': ['AlleleID', 'AlleleSymbol', 'GeneID', 'GeneSymbol'], 'file': 'fbal_to_fbgn_fb.tsv.gz', 'url': 'releases/current/precomputed_files/alleles/fbal_to_fbgn.*tsv\\.gz$'}, 'disease_model': {'columns': ['FBgn ID', 'Gene symbol', 'HGNC ID', 'DO qualifier', 'DO ID', 'DO term', 'Allele used in model (FBal ID)', 'Allele used in model (symbol)', 'Based on orthology with (HGNC ID)', 'Based on orthology with (symbol)', 'Evidence/interacting alleles', 'Reference (FBrf ID)'], 'file': 'disease_model_annotations.tsv.gz', 'url': 'releases/current/precomputed_files/human_disease/disease_model_annotations.+tsv\\.gz$'}, 'ref_pubmed': {'columns': ['FBrf', 'PMID', 'PMCID', 'DOI', 'pub_type', 'miniref', 'pmid_added'], 'file': 'fbrf_pmid_pmcid_doi_fb.tsv.gz', 'url': 'releases/current/precomputed_files/references/fbrf_pmid_pmcid_doi.+tsv\\.gz$'}, 'species_map': {'columns': ['internal_id', 'taxgroup', 'abbreviation', 'genus', 'species name', 'common name', 'comment', 'ncbi-taxon-id'], 'file': 'species.ab.gz', 'url': 'releases/current/precomputed_files/species/species\\.ab\\.gz$'}}¶
-
parse
(limit=None)¶ Parse flybase files and add to graph
Parameters: limit – number of rows to process Returns: None
-
queries
= {'allele_phenotype': {'columns': ['allele_id', 'pheno_desc', 'pheno_type', 'pub_id', 'pub_title', 'pmid_id'], 'file': 'allele_phenotype.tsv', 'query': '../../resources/sql/fb/allele_phenotype.sql'}, 'gene_xref': {'columns': ['gene_id', 'xref_id', 'xref_source'], 'file': 'gene_xref.tsv', 'query': '../../resources/sql/fb/gene_xref.sql'}}¶
-