In recent years, we have seen an increased interest in studying the software supply chain of user-facing applications to uncover problematic third-party dependencies. Prior work shows that web applications often rely on outdated or vulnerable third-party code. Moreover, real-world supply chain attacks show that dependencies can also be used to deliver malicious code, e.g., for carrying cryptomining operations. Nonetheless, existing measurement studies in this domain neglect an important software engineering practice: developers often merge together third-party code into a single file called bundle, which they then deliver from their own servers, making it appear as first-party code. Bundlers like Webpack or Rollup are popular open-source projects with tens of thousand of GitHub stars, suggesting that this technology is widely-used by developers. Ignoring bundling may result in underestimating the complexity of modern software supply chains. In this work, we aim to address this methodological shortcomings of prior work. To this end, we propose a novel methodology for automatically detecting bundles, and partially reverse engineer them. Using this methodology, we conduct the first large-scale empirical study of bundled code on the web and examine its security implications. We provide evidence about the high prevalence of bundles, which are contained in 40% of all websites and the average website includes more than one bundle. Following our methodology, we reidentify 1051 vulnerabilities originating from 33 vulnerable npm packages, included in bundled code. Among the vulnerabilities, we find 17 critical and 59 high severity ones, which might enable malicious actors to execute attacks such as arbitrary code execution. Analyzing the low-rated libraries included in bundles, we discover 10 security placeholder packages, which suggest that supply-chain attacks against bundles are not only possible, but they are already happening.
ACM Conference on Computer and Communications Security (CCS)
2023-11-15
2024-10-16