Data_Sheet_1_NoMAS: A Computational Approach to Find Mutated Subnetworks Associated With Survival in Genome-Wide Cancer Studies.PDF

Next-generation sequencing technologies allow to measure somatic mutations in a large number of patients from the same cancer type: one of the main goals in their analysis is the identification of mutations associated with clinical parameters. The identification of such relationships is hindered by extensive genetic heterogeneity in tumors, with different genes mutated in different patients, due, in part, to the fact that genes and mutations act in the context of pathways: it is therefore crucial to study mutations in the context of interactions among genes. In this work we study the problem of identifying subnetworks of a large gene-gene interaction network with mutations associated with survival time. We formally define the associated computational problem by using a score for subnetworks based on the log-rank statistical test to compare the survival of two given populations. We propose a novel approach, based on a new algorithm, called Network of Mutations Associated with Survival (NoMAS) to find subnetworks of a large interaction network whose mutations are associated with survival time. NoMAS is based on the color-coding technique, that has been previously employed in other applications to find the highest scoring subnetwork with high probability when the subnetwork score is additive. In our case the score is not additive, so our algorithm cannot identify the optimal solution with the same guarantees associated to additive scores. Nonetheless, we prove that, under a reasonable model for mutations in cancer, NoMAS identifies the optimal solution with high probability. We also design a holdout approach to identify subnetworks significantly associated with survival time. We test NoMAS on simulated and cancer data, comparing it to approaches based on single gene tests and to various greedy approaches. We show that our method does indeed find the optimal solution and performs better than the other approaches. Moreover, on three cancer datasets our method identifies subnetworks with significant association to survival when none of the genes has significant association with survival when considered in isolation.