Seminar: January 31

John Lafferty, University of Chicago

Nonparametric Forest Density Estimation

We present recent work on graph estimation and density estimation in high dimensions, using a family of density estimators based on forest structured graphical models. We do not assume the true distribution corresponds to a forest; rather, we apply graph algorithms to estimate the optimal forest on held out data from low dimensional kernel density estimates. We prove an oracle inequality on the excess risk of the resulting estimator relative to the risk of the best forest. For graph estimation, we consider the problem of estimating forests with restricted tree sizes. We prove that finding a maximum weight spanning forest with restricted tree size is NP-hard, and develop approximation algorithms for this problem. We prove structure selection consistency of the procedure. Under mild additional assumptions we show that the forest can be estimated at the optimal parametric rate of convergence. In addition to enjoying strong theoretical guarantees, forest density estimation is scalable, easily interpretable, and gives effective probability estimates for high dimensional data.

Joint work with Anupam Gupta, Han Liu, Larry Wasserman and Min Xu.