Causal Discovery

Introduction

Causal discovery is the process of inferring causal relationships between variables from observational data. The goal of causal discovery is to find a set of causal relationships that are consistent with the data. The causal relationships are represented as a directed acyclic graph (DAG) where the nodes are the variables and the edges are the causal relationships. The algorithms can be divided into two categories: score-based and constraint-based. Score-based algorithms use a score function to evaluate the quality of a causal graph. Constraint-based algorithms use a set of constraints to evaluate the quality of a causal graph. The score-based algorithms are more flexible and can be used to find causal graphs that are not consistent with the data. The constraint-based algorithms are more restrictive and can only find causal graphs that are consistent with the data. The constraint-based algorithms are more efficient than the score-based algorithms.

Our causal discovery module leverages strengths of both score-based and constraint-based algorithms. You can use your domain knowledge to specify the constraints of the search space. Our score-based algorithm to find a causal graph that is consistent with the data and the specified constraints.

Video Tutorial

Parameters

First we need to select features to be included in the analysis. We currently support only numerical features (or ordinal features that can be converted to numerical features). We can do in this Data > Query > Selected Features.

Next we can set different normalization options and constraints.

Normalization

You can choose to normalize the data before running the causal discovery algorithm. You can choose to have zero mean and unit variance for all the features. We advise to normalize the data before running the analysis as it will make the algorithm more effective and you can see the relative importance of each feature.

Constraints

You can specify the constraints of the search space. There are 4 different types of constraints:

  1. Causes: features selected in this option can only be causes.

  2. Effects: features selected in this option can only be effects.

  3. Forbidden relationships: selected cause and effect features in this options cannot be connected.

  4. Potential relationships: selected cause and effect features in this options are prioritized to be connected.

Feature data types

By default we will automatically decide the types of the data but you can also tell us the types of the data. We currently support only numerical features (or ordinal).

Models

You can select which causal discovery algorithms to be used. We currently support:

  1. PC (Peter-Clark algorithm): This is a constraint-based algorithm. It uses the PC algorithm to find a causal graph that is consistent with the data and the specified constraints. The PC algorithm is a greedy search algorithm that starts with an empty graph and iteratively adds edges to the graph. The algorithm adds an edge between two variables if the conditional independence test between the two variables given the rest of the variables in the graph fails. The algorithm stops when there are no more edges that can be added to the graph. The PC algorithm is guaranteed to find a causal graph that is consistent with the data and the specified constraints. The PC algorithm is more efficient than the other constraint-based algorithms. The PC algorithm is also more efficient than the score-based algorithms. The PC algorithm is the default algorithm.

    PC algorithm assumptions and limitations:

    • Causal sufficiency: The set of observed variables includes all common causes for any pair of variables. In other words, there are no unmeasured confounders.

    • Causal Markov condition: Each variable is independent of its non-descendants, given its direct causes (parents) in the causal graph.

    • Faithfulness: A conditional independence between variables in the observed data implies that the independence is represented by the causal graph. This means that the causal graph captures all the conditional independencies in the data without including any “extra” ones.

    • Causal ordering: There exists a causal ordering of the variables such that for any pair of variables, one precedes the other in the ordering if there is a direct causal relationship between them. This assumption is necessary to orient the edges in the graph.

    • Large sample size: The data sample size is large enough to ensure that the statistical tests used in the PC algorithm can accurately detect conditional independence relationships.

    • Appropriate statistical test: The choice of the statistical test for independence should be appropriate for the nature of the data, so that the test has the desired properties (e.g., consistency, power) to reliably detect conditional independence relationships.

Note that the PC algorithm and its variations may still be useful in practice, even if some of these assumptions are not met, but the results should be interpreted with caution. The algorithm’s performance is sensitive to the validity of these assumptions, and violations can lead to incorrect or incomplete causal structures.

  1. DirectLinGAM (Direct Linear Non-Gaussian Acyclic Model): This is a score-based algorithm. It uses the DirectLinGAM algorithm to find a causal graph that is consistent with the data and the specified constraints. The DirectLinGAM algorithm is a greedy search algorithm that starts with an empty graph and iteratively adds edges to the graph. The algorithm adds an edge between two variables if the conditional independence test between the two variables given the rest of the variables in the graph fails. The algorithm stops when there are no more edges that can be added to the graph. The DirectLinGAM algorithm is guaranteed to find a causal graph that is consistent with the data and the specified constraints. The DirectLinGAM algorithm is more efficient than the other score-based algorithms. The DirectLinGAM algorithm is also more efficient than the constraint-based algorithms.

    Here are some key assumptions of the DirectLiNGAM algorithm:

    • Linearity: The causal relationships between the observed variables can be modeled using a linear structural equation model (SEM). In other words, each variable is assumed to be a linear function of its direct causes (parents) in the causal graph, plus an independent noise term.

    • Non-Gaussianity: The noise terms in the linear SEM are assumed to be non-Gaussian. The DirectLiNGAM algorithm leverages the non-Gaussianity to distinguish between direct and indirect causal relationships.

    • Acyclicity: The causal graph is assumed to be acyclic, meaning there are no directed cycles or feedback loops in the graph.

    • Causal sufficiency: All common causes for any pair of variables are included in the set of observed variables. In other words, there are no unmeasured confounders.

    • Independent noise terms: The noise terms associated with each variable in the SEM are assumed to be mutually independent.

    • Large sample size: The data sample size is large enough to ensure the accuracy and reliability of the causal structure estimation.

Keep in mind that the performance of the DirectLiNGAM algorithm is sensitive to the validity of these assumptions. Violations of these assumptions can lead to incorrect or incomplete causal structures. Nonetheless, DirectLiNGAM has been shown to perform well in various applications and is particularly useful for inferring causal structures in the presence of non-Gaussian data.

Paper: https://www.jmlr.org/papers/volume12/shimizu11a/shimizu11a.pdf

  1. NOTEARS: reformulates the problem of structure learning as a smooth optimization problem, which can be solved using efficient gradient-based methods. Here are some key assumptions of the NOTEARS algorithm:

    • Linearity: The relationships between the observed variables can be modeled using a linear structural equation model (SEM). In other words, each variable is assumed to be a linear function of its direct causes (parents) in the causal graph, plus an independent noise term.

    • Acyclicity: The causal graph is assumed to be acyclic, meaning there are no directed cycles or feedback loops in the graph.

    • Gaussian noise terms: The noise terms associated with each variable in the SEM are assumed to be Gaussian. This assumption simplifies the likelihood function and enables efficient optimization.

    • Causal sufficiency: All common causes for any pair of variables are included in the set of observed variables. In other words, there are no unmeasured confounders.

    • Large sample size: The data sample size is large enough to ensure the accuracy and reliability of the causal structure estimation.

    • Regularization: The algorithm assumes that the true underlying graph is sparse, which means that most variables have few direct causes. To encourage sparsity in the learned graph, the algorithm uses regularization techniques, such as L1 regularization.

    It is important to note that the performance of the NOTEARS algorithm is sensitive to the validity of these assumptions. Violations of these assumptions can lead to incorrect or incomplete causal structures. Nonetheless, NOTEARS has been shown to be effective in various applications and is particularly useful for large-scale structure learning problems.

    Paper: https://arxiv.org/abs/1803.01422

  2. DECI (Deep End-to-end Causal Inference): is a single flow-based non-linear additive noise model that performs both causal discovery and inference, including conditional average treatment effect (CATE) estimation. It is a generative model that employs an additive noise structural equation model (ANM-SEM) to capture the functional relationships among variables and exogenous noise, while simultaneously learning a variational distribution over causal graphs. DECI is designed to enable real-world data-driven decision making by allowing the end user to perform causal inference without having complete knowledge of the causal graph.

    Assumptions: DECI can recover the ground truth causal graph under standard causal discovery assumptions. These assumptions include:

    • Causal Markov Condition: All observed variables are independent of their non-effects, given their direct causes.

    • Causal Faithfulness Condition: All observed independencies are due to causal structure.

    • Non-Gaussianity: The distribution of the additive noise in the causal model is non-Gaussian.

    • Causal Sufficiency: No unobserved confounders exist.

    Limitations:

    • Assumption violations: DECI’s performance may be affected if any of the standard causal discovery assumptions are violated openreview.net.

    • Mixed-type data: DECI extends to heterogeneous, mixed-type data with missing values, allowing for both continuous and discrete treatment decisions. However, handling mixed-type data and missingness may introduce additional complexity and potential inaccuracies.

    • Nonlinearity: DECI can model complex nonlinear relationships, but its performance may depend on the specific functional forms and distributions present in the data.

      Despite these limitations, DECI has shown competitive performance when compared to relevant baselines for both causal discovery and (C)ATE estimation in over a thousand experiments on both synthetic datasets and causal machine learning benchmarks across data-types and levels of missingness.

Paper: https://arxiv.org/abs/2202.02195