Abstract: | We compare the performance of a wide set of regression techniques and machine-learning algorithms for predicting recovery rates on non-performing loans, using a private database from a European debt collection agency. We find that rule-based algorithms such as Cubist, boosted trees, and random forests perform significantly better than other approaches. In addition to loan contract specificities, predictors that refer to the bank recovery process — prior to the portfolio’s sale to a debt collector — are also shown to enhance forecasting performance. These variables, derived from the time series of contacts to defaulted clients and client reimbursements to the bank, help all algorithms better identify debtors with different repayment ability and/or commitment, and in general those with different recovery potential. |