**CPSC 532Y: Causal Machine Learning** **Instructor**: [Mathias Lécuyer](https://mathias.lecuyer.me) **Schedule**: MW 10:00-11:20 -- Term 1 (September - December 2024) **Location**: [SWNG 210](https://maps.ubc.ca/?code=SWNG) (the class is in person) **Office Hours**: TBD **Logistics**: Logistics and discussions will happen on Piazza [^piazza]. You can [login through Canvas here](https://canvas.ubc.ca/courses/155998/). **If you want to audit or attend the first sessions, please feel free to join**. Every UBC student should be able to access Canvas for now. **Previous offerings**: while anything might change, this session will be similar to previous offerings [2023w1](previous_offerings/2023w1), [2022w1](previous_offerings/2022w1). Objectives ========== This class has two main educational goals: 1. Cover the basics of causal inference, including: a. The Potential outcome framework ("Rubin model") b. The Causal Graph framework ("Pearl model") c. Goals of causal inference, and core techniques 2. Introduce recent developments using Machine Learning (ML) to extend the core techniques, and address some of their limitations. Topics include: a. Using ML to estimate heterogeneous causal effects, including in regression (meta learners, random forests, orthogonal learning…) and IV settings (deep-IV, method of moments). b. Evaluating Causal ML models. We will cover these topics trough a mix of lectures, paper readings, and in class discussions of the material. **This is a graduate seminar, and all students are expected to be actively involved during lectures and paper discussions. The grading with reflect these expectations.** Prerequisites ============= While there is no formal requirement to allow anyone to take the class, general, basic background on the following topics is assumed: probability, statistics, and Machine Learning. The assignment and mini-project will also require the ability to program in python. It should be very possible for a motivated student to learn any missing background on their own as the class progresses (and students with missing background knowledge are expected to). Evaluation ========== Your course grade will be based on (potentially a subset of) the following: in class participation, assignments, project. Assignments =========== Assignment 1 ------------ In the first assignment, you will create a data setting that demonstrates the pitfalls of causal inference, and some solutions. 1. Create a generative data model that illustrates the challenges of causal inference (i.e. when "naive" observational estimators are different from causal ones). Make sure to explicitely state the causal effect of interest. Be concise by precise when describing your model (e.g., use notation such as x ~ N(0, 1) to define the variables and their relationships). 2. Using the potential outcomes framework, show theoretically that "naive" observational estimator (difference of conditional expectations) is biased. 3. Empirically demonstrate this bias by implementing your generative data model and showing (at least) one plot demonstrating the issue. 4. Give at least two estimators that exactly identify the causal effect. What assumptions do this estimators rely on? Why are those assumptions verified in your model? Prove identification under those assumptions. 5. Implement those estimators and demonstrate using plots that they correctly identify the causal effect of interest. **Deliverable**: A pdf, written in latex, emailed to my UBC email. **At most 2 pages, including plots and formulas** (less is more). **Due date**: Sept. 27, 2024. Syllabus ======== See [what we covered last year](previous_offerings/2023w1/#schedule). Schedule -------- 4 Sep 2024: Introduction: Why Causal Inference? 9 Sep 2024: The Potential Outcomes framework (lecture) Definitions and notations. 11 Sep 2024: The Potential Outcomes framework (lecture) Concept: Identifiability. Assumptions: (conditional) ignorability. Techniques and intuition: randomization. 16 Sep 2024: The Potential Outcomes framework (lecture) Identifiability without randomization, estimation, heterogeneity. Confidence intervals. 18 Sep 2024: Regression for causal inference (lecture) Basic facts, interpretation(s) as an estimator for the ATE (using potential outcomes), confidence intervals. 23 Sep 2024: Propensity Scores (lecture) Identification through conditioning on the propensity. Eestimators: Iverse Propensity Scores; Doubly Robust estimators. 25 Sep 2024: TBD TBD 27 Sep 2024: ⚠️ Assignement 1 due [Assignment 1](#a1) is due. Deliverable: a pdf, written in latex, **emailed to my UBC email with subject [532Y HW1] your-name**. **At most 2 pages, including plots and formulas** (less is more). 30 Sep 2024: 🌴 *No Class* National Day for Truth and Reconciliation 2 Oct 2024: TBD TBD 7 Oct 2024: TBD TBD 9 Oct 2024: TBD TBD 14 Oct 2024: 🌴 *No Class* Thanksgiving Day 16 Oct 2024: TBD TBD 21 Oct 2024: TBD TBD 23 Oct 2024: TBD TBD 28 Oct 2024: TBD TBD 30 Oct 2024: TBD TBD 4 Nov 2024: TBD TBD 6 Nov 2024: TBD TBD 11 Nov 2024: 🌴 *No Class* Remembrance Day and Midterm Break 13 Nov 2024: 🌴 *No Class* Midterm Break 18 Nov 2024: TBD TBD 20 Nov 2024: TBD TBD 25 Nov 2024: TBD TBD 27 Nov 2024: TBD TBD 2 Dec 2024: TBD TBD 4 Dec 2024: TBD TBD [^piazza]: In this course, you will be using Piazza, which is a tool to help facilitate discussions. When creating an account in the tool, you will be asked to provide personally identifying information. Because this tool is hosted on servers in the U.S. and not in Canada, by creating an account you will also be consenting to the storage of your information in the U.S. Please know you are not required to consent to sharing this personal information with the tool, if you are uncomfortable doing so. If you choose not to provide consent, you may create an account using a nickname and a non-identifying email address, then let your instructor know what alias you are using in the tool.