Abstract:
Security patches play a crucial role in the battle against Open Source Software (OSS) vulnerabilities. Meanwhile, to facilitate the development of OSS projects, both upstream and downstream developers often maintain multiple branches. Due to the different code contexts among branches, multiple security patch variants exist for the same vulnerability. Hence, to ease the management of OSS vulnerabilities, locating all patch variants of an OSS vulnerability is pretty important. However, existing works are mainly designed for locating a patch or several patches for a vulnerability but cannot locate all its patch variants. In this paper, we study the problem of how to accurately locate all variants of a given security patch. We motivate the problem with a preliminary study, which shows that it is rather challenging to locate all patch variants, even with a reference patch, due to the diverse practice of OSS developers in backporting patches. To overcome these challenges, we propose a new patch location method to locate all variants of a patch in a code repository (e.g., a software or a specific version). Based on our findings in the preliminary study, our method employs a rule-based model and incorporates two-dimensional code commit features that are specifically designed for the task of patch variant location: similarity features and representative features. With a ground truth patch variants dataset, our method achieves a precision of 99.68% and a recall of 98.81% and significantly outperforms two state-of-the-art baselines (PatchScout and Tracer). Besides, our method shows strong capability in locating patch variants at both upstream and downstream code repositories.