Title: Transfer Learning under Heterogeneity

Abstract: Transfer learning aims to enhance model performance for a target population by leveraging other related data sources. Heterogeneity can be either a friend or a foe in transfer learning. As a friend, “good” heterogeneity arises when sources and the target form clusters. In this scenario, we show that a three stage, cluster-aware transfer learning procedure can lead to a better bias-variance tradeoff. As a foe, “bad” heterogeneity arises when external sources contain arbitrary outliers, making naïve transfer fail. Under a prototypical scenario with small target data and large but contaminated external data, we investigate subsampling strategies for transfer learning. Our analysis makes explicit how performance depends on sample sizes, sampling rates, signal strength, outlier magnitude, and error-tail behavior. The key message: more is not necessarily better, and how much you can borrow depends on how much you have.