Send email Copy Email Address
2024-04-15

AndroLibZoo: A Reliable Dataset of Libraries Based on Software Dependency Analysis

Summary

Android app developers extensively employ code reuse, integrating many third-party libraries into their apps. While such integration is practical for developers, it can be challenging for static analyzers to achieve scalability and precision when libraries account for a large part of the code. As a direct consequence, it is common practice in the literature to consider developer code only during static analysis --with the assumption that the sought issues are in developer code rather than the libraries. However, analysts need to distinguish between library and developer code. Currently, many static analyses rely on white lists of libraries. However, these white lists are unreliable, inaccurate, and largely non-comprehensive. In this paper, we propose a new approach to address the lack of comprehensive and automated solutions for the production of accurate and ``always up to date" sets of libraries. First, we demonstrate the continued need for a white list of libraries. Second, we propose an automated approach to produce an accurate and up-to-date set of third-party libraries in the form of a dataset called AndroLibZoo. Our dataset, which we make available to the community, contains to date 34 813 libraries and is meant to evolve.

Conference / Medium

Mining Software Repositories

Date published

2024-04-15

Date last modified

2024-01-19 12:38:54