An Interpretable Machine Learning Pipeline for Deforestation Risk Prediction in Vietnam at 1 km Resolution
Abstract
Deforestation remains a critical environmental challenge in Vietnam, particularly in the central highlands. While remote sensing detection methods have improved significantly, proactive risk modeling remains underutilized in conservation planning. In this study, we present an interpretable machine learning pipeline designed to predict deforestation risk at a 1 km spatial resolution. Using a pilot study area in Gia Lai Province (K’Bang and Mang Yang districts), we integrate multi-source data from Google Earth Engine, including Hansen Global Forest Change, Global Forest Watch alerts, Sentinel-2 spectral indices, and topographic features. We evaluate three modeling approaches: a baseline risk score, logistic regression, and random forest. Our results demonstrate that the random forest model achieves high discriminative performance (AUC = 0.89) while logistic regression provides crucial insights into the direction and significance of risk drivers, such as proximity to roads and vegetation condition. This platform serves as a pilot implementation of a scalable framework for national forest-risk monitoring in Vietnam.