Benji Lu


About Me

I'm a J.D./Ph.D. student at Yale Law School and the UC Berkeley Department of Statistics. My research interests lie broadly at the interface of statistical machine learning, causal inference, law, and public policy. Before graduate school, I was an empirical research fellow at Harvard Law School. I graduated from Pomona College with a bachelor's degree in mathematics and philosophy.

Curriculum Vitae

Research

Lu, B.*, O. Angiuli*, and P. Ding (2021+). A Scale-Based Sensitivity Analysis for Difference-in-Differences.
Abstract

Difference-in-differences enables identification of causal effects when treated and untreated groups of units are observed over multiple time periods. However, identification relies on the untestable assumption that, in the absence of treatment, the mean responses of the treated and untreated groups would have followed parallel trends over time. One important limitation of this parallel trends assumption is that it is scale-dependent: It may hold only under some unknown, nonlinear transformation of the mean responses, and the implied value of the average treatment effect depends on this scaling. In this paper, we propose a sensitivity analysis that assesses how the difference-in-differences causal effect estimate would change depending on the scaling under which the parallel trends assumption holds. To operationalize our sensitivity analysis, we provide identification and estimation results for difference-in-differences when the parallel trends assumption holds under families of monotone scalings like the Box--Cox transformation. We then extend our work beyond the canonical two-group, two-period setting to difference-in-differences studies with multiple time periods. Finally, we illustrate our proposed method by assessing the scale sensitivity of three studies that use difference-in-differences.

Lu, B. and J. Hardin (2021). A Unified Framework for Random Forest Prediction Error Estimation. Journal of Machine Learning Research, 22(8):1-41.
Abstract Paper R Package

We introduce a unified framework for random forest prediction error estimation based on a novel estimator of the conditional prediction error distribution function. Our framework enables simple plug-in estimation of key prediction uncertainty metrics, including conditional mean squared prediction errors, conditional biases, and conditional quantiles, for random forests and many variants. Our approach is especially well-adapted for prediction interval estimation; we show via simulations that our proposed prediction intervals are competitive with, and in some settings outperform, existing methods. To establish theoretical grounding for our framework, we prove pointwise uniform consistency of a more stringent version of our estimator of the conditional prediction error distribution function. The estimators introduced here are implemented in the R package forestError.

*Equal contribution.

Teaching

I have been a graduate student instructor for the following courses at Berkeley.

Spring 2020: Stat 158 Design and Analysis of Experiments
Spring 2019: Stat 158 Design and Analysis of Experiments