Causal Machine Learning

This post is the third post of the series on Causal Machine Learning. This blog post is based on the work of Judea Pearl. In this blog post i have discussed Pearl’s three-level causal hierarchy , How to represent Causality and little bit about Causal Inferences & Causal Discovery. As always i have tried to keep the things as simple as possible. So stay with me , Enjoy reading !

Causal Hierarchy

In this blog post , Lets start with Pearl’s three-level causal hierarchy. Let us further investigate the differences between association and causation, by starting with Pearl’s three-level causal hierarchy.

The first level is association, the second level is intervention, and the third level is counterfactual. The first level, association, involves just seeing what is. It invokes purely statistical relationships, defined by the naked data. It can handle questions such as “What does a symptom tell me about a disease?”. The second level, intervention, ranks higher than association because ere the subjects are given the newly developed drug. The reason of randomization is to remove possible effects from confounders. For example, age can be one of the possible confounders which affects both taking the drug or not and the treatment effect. Thus, in practical experiments, we should keep the distribution of ages in the two groups almost the same. Technically, one may use propensity score matching to remove the effects from possible confounders.it involves not just seeing what is, but changing what we see. It can handle questions such as “What if I take aspirin, will my headache be cured?”. The top level is called counterfactuals. A typical question in the counterfactual category is “What if I had acted differently,” thus necessitating retrospective reasoning.

Associational: if I see X, what is the probability that I will see Y?
Interventional: if I do X, what is the probability that I will see Y?
Counterfactual: what would Y have been, had I done X?

How To Represent Causality ?

Causality is represented mathematically via Structural Causal Models (SCMs). The two key elements of SCMs are a graph and a set of equations. More specifically, the graph is a Directed Acyclic Graphs (DAG) and the set of equations is a Structural Equation Model (SEM).

The earliest known version of SCMs, were introduced by geneticist Sewell Wright around 1918, originally for infering the relative importance of factors which determine the birth weight of guinea pigs. He used the construction to develop the methodology of path analysis, a technique commonly used for causal inference tasks over layered and complex processes, such as phenotypic inheritence.

Drawings from Wright's 1921 paper, Correlation and Causation. The bottom image presents an ancestor to causal graphs, a representation of structural causal models describing the relationships between a variety of genetic factors and a guinea pigs birth weight. Wright's path tracing rules defined a set of rules for using a set of associative relationships, to generate a causal graph, presented above with the top and bottom image respectively.

A DAG is a graph, comprised of nodes and edges, for which the direction of an edge determines the relationship between the two nodes on either side. DAGs also do not have any cycles or paths comprised of at least one edge that start and end with the same node.

A structural causal model is comprised of three components:

A set of variables describing the state of the universe and how it relates to a particular data set we are provided. These variables are: explanatory variables, outcome variables, and unobserved variables.
Causal relationships, which describe the causal effect variables have on one another.
A probability distibution defined over unobserved variables in the model, describing the likelihood that each variable takes a particular value.

SCM

SEMs represent relationships between variables. These equations have two peculiarities. First, equations are asymmetric meaning equality only works in one direction. This has the implication that SEMs cannot be inverted to derive alternative SEM equations. Second, equations can be non-parametric meaning the functional form is not known.

Causal Inferences & Causal Discovery

Causal inference, which aims to answer questions involving cause and effect. Although causal inference is a powerful tool, it requires a key to operate. Namely, it requires a causal model. Often in the real world, however, we may not be sure about which variables cause each other. This is where causal discovery can help.

In causal inference, the causal structure of the problem is often assumed. In other words, a DAG representing the situation is assumed. In practice, however, the causal connections of a system are often unknown. Causal discovery aims to uncover causal structure from observational data. Causal discovery aims to infer causal structure from data. In other words, given a dataset, derive a causal model that describes it.

I will describe more on causal inferences and causal discovery in the upcoming blog posts

References

The Three Layer Causal Hierarchy - Judea Pearl

Causal Machine Learning - Part 3

Amaljith

Amaljith

Causal Hierarchy

How To Represent Causality ?

Causal Inferences & Causal Discovery

References

Causal Machine Learning - Part 2

Introduction to Causality In Machine Learning - Part 1

Shapley Additive Explanations (SHAP)

Causal Machine Learning - Part 4