Benji Lu

About Me

I'm a J.D./Ph.D. student at Yale Law School and the UC Berkeley Department of Statistics. My research interests include statistical machine learning, causal inference, law, and public policy. Before graduate school, I was an empirical research fellow at Harvard Law School. I graduated from Pomona College with a bachelor's degree in mathematics and philosophy.

Curriculum Vitae


Lu, B.*, O. Angiuli*, and P. Ding (2021+). A Scale-Based Sensitivity Analysis for Difference-in-Differences.

Difference-in-differences enables identification of causal effects when treated and untreated groups of units are observed over multiple time periods. However, identification relies on the untestable assumption that, in the absence of treatment, the mean responses of the treated and untreated groups would have followed parallel trends over time. One important limitation of this parallel trends assumption is that it is scale-dependent: It may hold only under some unknown, nonlinear transformation of the mean responses, and the implied value of the average treatment effect depends on this scaling. In this paper, we propose a sensitivity analysis that assesses how the difference-in-differences causal effect estimate would change depending on the scaling under which the parallel trends assumption holds. To operationalize our sensitivity analysis, we provide identification and estimation results for difference-in-differences when the parallel trends assumption holds under families of monotone scalings like the Box--Cox transformation. We then extend our work beyond the canonical two-group, two-period setting to difference-in-differences studies with multiple time periods. Finally, we illustrate our proposed method by assessing the scale sensitivity of three studies that use difference-in-differences.

Lu, B., E. Ben-Michael, A. Feller, and L. Miratrix (2021+). Is It Who You Are or Where You Are? Accounting for Compositional Differences in Cross-Site Treatment Variation.
Abstract arXiv

Multisite trials, in which treatment is randomized separately in multiple sites, offer a unique opportunity to disentangle treatment effect variation due to "compositional" differences in the distributions of unit-level features from variation due to "contextual" differences in site-level features. In particular, if we can re-weight (or "transport") each site to have a common distribution of unit-level covariates, the remaining effect variation captures contextual differences across sites. In this paper, we develop a framework for transporting effects in multisite trials using approximate balancing weights, where the weights are chosen to directly optimize unit-level covariate balance between each site and the target distribution. We first develop our approach for the general setting of transporting the effect of a single-site trial. We then extend our method to multisite trials, assess its performance via simulation, and use it to analyze a series of multisite trials of welfare-to-work programs. Our method is available in the balancer R package.

Lu, B. and J. Hardin (2021). A Unified Framework for Random Forest Prediction Error Estimation. Journal of Machine Learning Research, 22(8):1-41.
Abstract Paper R Package

We introduce a unified framework for random forest prediction error estimation based on a novel estimator of the conditional prediction error distribution function. Our framework enables simple plug-in estimation of key prediction uncertainty metrics, including conditional mean squared prediction errors, conditional biases, and conditional quantiles, for random forests and many variants. Our approach is especially well-adapted for prediction interval estimation; we show via simulations that our proposed prediction intervals are competitive with, and in some settings outperform, existing methods. To establish theoretical grounding for our framework, we prove pointwise uniform consistency of a more stringent version of our estimator of the conditional prediction error distribution function. The estimators introduced here are implemented in the R package forestError.

*Equal contribution.


I have been a graduate student instructor for the following courses at Berkeley.

Spring 2020: Stat 158 Design and Analysis of Experiments
Spring 2019: Stat 158 Design and Analysis of Experiments