Don’t whistle before you’re out of the WOODS:

New benchmarks for OOD Generalization in Sequential Prediction Tasks

Machine learning models often fail to generalize well under distributional shifts. Understanding and overcoming these failures has led to a research program on out-of-distribution generalization. The field has been extensively explored in the static computer vision tasks (Domainbed, WILDS) but has not been explored for sequential prediction tasks. We propose a set of new out-of-distribution generalization datasets for sequential prediction tasks. We also provide a fair and systematic way of evaluating performance of algorithms on these datasets. Finally, we provide a leaderboard which currently consists of the performance of popular algorithms in the field of OOD generalization.