dipper.sources.OrthoXML module

class dipper.sources.OrthoXML.OrthoXML(graph_type, are_bnodes_skolemized, method, tax_ids=None)

Bases: dipper.sources.Source.Source

Extract the induced pairwise relations from an OrthoXML file.

This base class is primarily intended to extract the orthologous and paralogous relations from a file in OrthoXML file containing the QfO reference species data set.

A concreate method should subclass this class and overwrite the constructor method to provide the information about the dataset and a method name.

add_protein_to_graph

adds protein nodes to the graph and adds a “in_taxon” triple.

for efficency reasons, we cache which proteins we have already added using a least recently used cache.

clean_protein_id(protein_id)

makes sure protein_id is properly prefixed

extract_taxon_info(gene_node)

extract the ncbi taxon id from a gene_node

default implementation goes up to the species node in the xml and extracts the id from the attribute at that node.

fetch(is_dl_forced=False)
Returns:None
files = {}
parse(limit=None)
Returns:None
class dipper.sources.OrthoXML.OrthoXMLParser(xml)

Bases: object

default_node_list()
extract_pairwise_relations(node=None)
get_children(node)
is_internal_node(node)
is_leaf(node)
leaf_label(leaf)