8

How can I save a dendrogram generated by scipy into Newick format?

El Dude
  • 5,328
  • 11
  • 54
  • 101

1 Answers1

22

You need the linkage matrix Z, which is the input to the scipy dendrogram function, and convert that to Newick format. Additionally, you need a list 'leaf_names' with the names of your leaves. Here is a function that will do the job:

def get_newick(node, parent_dist, leaf_names, newick='') -> str:
    """
    Convert sciply.cluster.hierarchy.to_tree()-output to Newick format.

    :param node: output of sciply.cluster.hierarchy.to_tree()
    :param parent_dist: output of sciply.cluster.hierarchy.to_tree().dist
    :param leaf_names: list of leaf names
    :param newick: leave empty, this variable is used in recursion.
    :returns: tree in Newick format
    """
    if node.is_leaf():
        return "%s:%.2f%s" % (leaf_names[node.id], parent_dist - node.dist, newick)
    else:
        if len(newick) > 0:
            newick = "):%.2f%s" % (parent_dist - node.dist, newick)
        else:
            newick = ");"
        newick = get_newick(node.get_left(), node.dist, leaf_names, newick=newick)
        newick = get_newick(node.get_right(), node.dist, leaf_names, newick=",%s" % (newick))
        newick = "(%s" % (newick)
        return newick

tree = hierarchy.to_tree(Z, False)
get_newick(tree, tree.dist, leaf_names)
MrTomRod
  • 345
  • 2
  • 16
jfn
  • 2,486
  • 1
  • 13
  • 14