LOGos: From Logs to Causal Diagnosis of Large Systems

LOGos: From Logs to Causal Diagnosis of Large Systems

Causal inference can quantify cause-effect relationships in domains as varied as medicine, economics and public policy. Production computer systems exhibit a similar level of complexity, together with a recurring time-sensitive need to diagnose unwanted phenomena. However, such systems are often only observed imperfectly and indirectly, through long, messy, semi-structured logs. In this work, we want to accelerate large systems debugging by applying causal inference over logs. This will let engineers leverage logs to diagnose problems and assess interventions in a principled manner. Our proposed framework achieves this through two human-in-the-loop modules: (1) The Candidate Cause Ranker, through which engineers can determine the causes of a problem without running a full causal discovery algorithm, informing possible interventions; and (2) the Interactive Causal Graph Refiner, which helps engineers compute an unbiased estimation of the effect of their discovered causes without extensive manual causal graph verification. Both modules are powered by the insight that only part of the causal graph of the system is needed to correctly quantify an effect of interest. We also provide a data preparation pipeline, the Log Converter, which transforms raw, messy, real-world logs into an appropriate tabular input for causal inference, using methods drawn from data transformation, cleaning, and extraction.

Read the Demo Paper (SIGMOD 2024) | View BibTeX

Project Participants

Markos Markakis, Brit Youngmann, Trinity Gao, Sylvia Zhang, Rana Shahout, Peter Chen, Chunwei Liu, Ibrahim Sabek, Michael Cafarella