[ICSME 2024]New PHP Language Features Make Your Static Code Analysis Tools Miss Vulnerabilities

Abstract:

Due to the nature of directly interacting with user inputs, PHP applications are susceptible to taint-style vulnerabilities. To detect such vulnerabilities, Static Code Analysis Tools (SCATs) are widely used for their broad code coverage and scalability. Modeling language features (i.e., to represent and simulate the behavior of program codes) is the keystone of SCATs’ vulnerability detection capabilities. Meanwhile, being an actively maintained language, the PHP community introduces several new language features almost every year, rendering many unmodeled features. Though efforts have been made to reduce the number of unmodeled features, e.g., proposing new modeling methods, the impact of the introduction of new PHP features on SCAT during the language evolution is not well-conscious and systematically assessed. To fill the gap, this paper performs a systematic study of new language features and their impact on the ability of SCATs to detect taint-style vulnerabilities in PHP codes. To be specific, we identify 25 widely-used new language features that potentially compromise SCATs’ vulnerability detection capabilities. Besides, we assess the impact of these new features on five open-source SCATs and show that the vulnerability detection ability is significantly compromised, with each SCAT affected by 10 features on average. To mitigate the impact, we conduct a theoretical analysis to diagnose the underlying reasons and propose several effective adaptation strategies. Finally, we provide key insights and implications for various stakeholders in static code analysis, emphasizing the need for them to recognize and proactively address the potential effects of language evolution.