This post is the fourth post of the series on Causal Machine Learning. This blog post is based on the work of Judea Pearl. We have so far discussed about Causal AI , How to represent causality using SCMs , How naive statistics can fail (Spurious Correlations , Simpson Paradox & Asymmetry In Causal Inference) , Pearl's Causal Hierarchy and gave flavors of Causal Inferences & Causal Discovery etc. As always i will try to keep the things as simple as possible. So stay with me , Enjoy reading !
How are Causal AI models different from Bayesian networks?
At first glance there, there could be some ambiguities you feel while trying to explore Bayesian Networks and Causal Networks. This is completely normal. Lets try to uncover the differences before going ahead!.
Bayesian networks and Causal AI models appear similar. But Causal AI models capture underlying causal relationships; Bayesian networks just describe patterns of correlations.
What Are Bayesian Networks ?
You remember the example we discussed earlier about the data describing people’s sleeping habits. We found that there’s a strong correlation between falling asleep with shoes on and waking up with a headache.
A Bayesian network representing this is given below.
The BN tells us that both sleeping with shoes on and waking up with a migraine are correlated with drinking the night before, since there is a path between the variables. It also says that conditional on us knowing that someone was drinking the night before, knowing that they slept with their shoes on tells us absolutely nothing extra about whether they have a headache the next morning (this is called “conditional independence”). We can read off the conditional independence relationship by noticing that drinking alcohol “blocks” the pathway from shoe- sleeping to headache. BNs help to draw conclusions when more data becomes available (via “Bayes’ theorem”)
What Are Causal Networks ?
BNs sound useful! What’s the catch? The core problem is that Bayesian networks are blind to causality. This key, missing ingredient makes BNs very limited when it comes to more sophisticated reasoning and decision-making challenges.
Causal AI is a new category of machine intelligence. Causal AI builds models that are able to capture cause- effect relationships while also retaining the benefits of BNs.
Many BNs are all statistically compatible with the data, but only one BN corresponds to the genuine causal relationships in the system. As a result, it’s always left ambiguous whether your BN is a good causal model or not. And the overwhelming chances are that your BN is not a good causal model. The number of possible BNs grows exponentially as the number of features increases. With, say, 20 variables in your data, there’s effectively zero chance of randomly stumbling across the true causal model. This means you’re using a BN that’s making bad modeling decisions.
Hope this clarifies the differences and ambiguities between Bayesian and Causal Networks.
Causal Inferences
Causal inference refers to the process of drawing conclusions about the causal relationships between variables. In other words, it involves making judgments about whether changes in one variable are responsible for changes in another variable.
Here are some examples of causal inference questions:
- Does taking medication X reduce blood pressure in people with hypertension?
- Does increasing the price of cigarettes lead to a decrease in smoking rates?
- Does participating in an exercise program improve physical fitness?
- Does attending preschool lead to better academic outcomes in school?
- Does exposure to air pollution increase the risk of respiratory problems?
- Did the treatment directly help those who took it?
- Was it the marketing campaign that lead to increased sales this month or the holiday?
- How big of an effect would increased wages have on productivity?
These are just a few examples, but causal inference questions can be asked in many different fields and contexts. The key is that they are trying to understand whether a particular intervention or exposure causes a change in some outcome or dependent variable.
Do Calculus & Do Operator
Causality - Formal Definition
Before going ahead , lets again define Causality in terms of interventions.
In the context of interventions, causality refers to the relationship between an intervention (also known as a treatment or exposure) and the resulting effect on an outcome or dependent variable. A causal relationship between an intervention and an outcome means that the intervention is responsible for the observed change in the outcome. In other words, if we change the intervention, we expect to see a corresponding change in the outcome.
For example, if a medication is found to reduce blood pressure in people with hypertension, we can say that there is a causal relationship between taking the medication and lowering blood pressure. This is because we expect that if we give the medication to a group of people with hypertension, their blood pressure will decrease as a result of the intervention. However, it is important to note that there may be other factors that could have influenced the relationship between the intervention and the outcome.
What is Do-Calculus ?
In the context of causal AI, do-calculus can be used to reason about the effects of interventions on a causal model. In causal AI, a causal model represents the relationships between different variables in a system, and the do-calculus can be used to reason about how changing the value of one variable (the "intervention") will affect the values of other variables in the system. This can be useful for understanding the potential consequences of interventions in a real-world system, or for identifying the most effective intervention to achieve a particular outcome.
Here is Judea Pearl’s canonical primer on do-calculus—a short PDF with lots of math and proofs (Pearl 2012).
What do-operator does ?
However, how does that fit into causality’s mathematical representation?
The do-operator is a mathematical representation of a physical intervention.
If we start with the model Z → X → Y, we can simulate an intervention in X by deleting all the incoming arrows to X, and manually setting X to some value x0
Below is the illustration how do-operator works.
Rules Of Do Calculus
Beneath this scary math, each rule has specific intuition and purpose behind it! Here’s what each rule actually does:
Rule 1: Decide if we can ignore an observation
Rule 2: Decide if we can treat an intervention as an observation
Rule 3: Decide if we can ignore an intervention
Whoa! That’s exceptionally logical. Each rule is designed to help simplify and reduce nodes in a DAG by either ignoring them (Rules 1 and 3) or making it so interventions like do(⋅)do(⋅) can be treated like observations instead (Rule 2).