Evaluate the inferred tree#
ScisTree2 provides several metrics to evaluate the results. These include:
Genotype Accuracy:
scistree2.metric.genotype_accuracy(true_genotype, genotype)Tree Accuracy (defined as 1 minus the normalized Robinson-Foulds distance):
scistree2.metric.tree_accuracy(true_tree, tree)Ancestor-Descendant Error:
scistree2.metric.ancestor_descendant_error(true_mutation, mutation)Different Lineage Error:
scistree2.metric.different_lineage_error(true_mutaiton, mutation)
Usage examples are shown below:
Load the prepared dataset and run inference using SPR, NNI, and NJ respectively.
import scistree2 as s2
import numpy as np
import pandas as pd
gp = s2.probability.from_csv('./data/toy_raw_reads.csv', source='read')
# SPR local search
caller_spr = s2.ScisTree2(threads=8)
tree_spr, imputed_genotype_spr, likelihood_spr = caller_spr.infer(gp)
# NNI local search
caller_nni = s2.ScisTree2(nni=True, threads=8)
tree_nni, imputed_genotype_nni, likelihood_nni = caller_nni.infer(gp)
# NJ
caller_nj = s2.ScisTree2(nj=True)
tree_nj, imputed_genotype_nj, likelihood_nj = caller_nj.infer(gp)
Load the ground truth if you have.
# get groundtruth
true_genotype = np.loadtxt('data/true_genotype.txt') # load true genotype provided by CellCoal
with open('data/true_tree.nwk', 'r') as f:
true_tree_nwk = f.readline().strip() # load true tree provided by CellCoal
true_tree = s2.util.from_newick(true_tree_nwk)
print('Newick of true tree', true_tree)
print('True genotype', true_genotype.shape)
Newick of true tree (((((((((cell14,cell26),cell27),cell11),((cell16,cell47),cell2)),cell17),cell30),((((cell36,cell6),cell7),cell48),cell1)),(((((cell10,cell18),(cell12,cell37)),cell35),cell9),(((cell0,cell34),cell45),cell33))),(((((((cell15,cell29),cell44),cell8),((cell3,cell49),cell28)),(((cell13,cell38),(cell20,cell21)),cell42)),(((((cell23,cell31),cell22),cell41),(cell19,cell25)),(((cell24,cell43),cell39),((cell46,cell5),cell40)))),(cell32,cell4)));
True genotype (100, 50)
Evaluate the genotype accuracy (MAPE between imputed genotype and the ground truth).
gacc_spr = s2.metric.genotype_accuarcy(true_genotype, imputed_genotype_spr.values)
gacc_nni = s2.metric.genotype_accuarcy(true_genotype, imputed_genotype_nni.values)
gacc_nj = s2.metric.genotype_accuarcy(true_genotype, imputed_genotype_nj.values)
Evaluate the tree accuracy using \(1 - RF_{norm}(t_1, t_2)\), we use normalized Robinson-Foulds distance here.
tacc_spr = s2.metric.tree_accuracy(true_tree, tree_spr)
tacc_nni = s2.metric.tree_accuracy(true_tree, tree_nni)
tacc_nj = s2.metric.tree_accuracy(true_tree, tree_nj)
Next, we calculate the Ancestor-Descendant Error and Different Lineage Error. Before doing this, we need to get the ancestor-descendant pairs.
mutation_true = s2.metric.get_ancestor_descendant_pairs(true_genotype)
mutations_spr = s2.metric.get_ancestor_descendant_pairs(imputed_genotype_spr.values)
mutations_nni = s2.metric.get_ancestor_descendant_pairs(imputed_genotype_nni.values)
mutations_nj = s2.metric.get_ancestor_descendant_pairs(imputed_genotype_nj.values)
Then, calculate those errors.
ad_err_spr = s2.metric.ancestor_descendant_error(mutation_true, mutations_spr)
ad_err_nni = s2.metric.ancestor_descendant_error(mutation_true, mutations_nni)
ad_err_nj = s2.metric.ancestor_descendant_error(mutation_true, mutations_nj)
dl_err_spr = s2.metric.different_lineage_error(mutation_true, mutations_spr)
dl_err_nni = s2.metric.different_lineage_error(mutation_true, mutations_nni)
dl_err_nj = s2.metric.different_lineage_error(mutation_true, mutations_nj)
Check the results. It is clear to see SPR local search usually performs better.
metrics = {
"Method": ["SPR", "NNI", "NJ"],
"Genotype Accuracy": [gacc_spr, gacc_nni, gacc_nj],
"Tree Accuracy": [tacc_spr, tacc_nni, tacc_nj],
"Ancestor-Descendant Error": [ad_err_spr, ad_err_nni, ad_err_nj],
"Different Lineage Error": [dl_err_spr, dl_err_nni, dl_err_nj]
}
# Convert to DataFrame
df_metrics = pd.DataFrame(metrics)
df_metrics
| Method | Genotype Accuracy | Tree Accuracy | Ancestor-Descendant Error | Different Lineage Error | |
|---|---|---|---|---|---|
| 0 | SPR | 0.9826 | 0.250000 | 0.479858 | 0.023928 |
| 1 | NNI | 0.9802 | 0.166667 | 0.478591 | 0.024925 |
| 2 | NJ | 0.9766 | 0.083333 | 0.503420 | 0.024925 |