Send email Copy Email Address
2024-07-22

Understanding Web Fingerprinting with a Protocol-Centric Approach

Summary

Recent breakthroughs in machine learning (ML) have unleashed several approaches to fingerprinting web traffic based on traffic analysis. In particular, researchers report impressive classification performances by modeling HTTPS traces using packet metadata. Recent works focus mainly on the packet burst metadata (packet lengths, counts, and directions). The fact that burst metadata characterizes web traces is not surprising per se. Then again, most works stop at providing evaluation results and do not question the reasons for the success in qualitative analyses or ablation studies. In this paper, we try to better understand why and when burst-based web fingerprinting works. To this end, we follow a protocol-centric approach ---instead of promoting yet another classification approach---that seeks to investigate the impact of the underlying protocols on web fingerprinting. We study several research questions based on typical domain and page classification datasets. Most importantly, we show where the classification gain comes from, i.e., which messages or flows are particularly valuable. In contrast to recent works, we show that the beginning of communication does not always leak valuable fingerprinting information. This knowledge allows the design of targeted and, thus, more efficient fingerprinting attacks and defenses. In addition, we study how data availability (number of labels) and HTTP protocol features (e.g., caching, user agents) might skew the classification results. We hope that future research can profit from this analysis, which complements existing fingerprinting approaches, by better understanding fingerprinting methods and respective countermeasures.

Conference Paper

International Symposium on Research in Attacks Intrusions and Defenses (RAID)

Date published

2024-07-22

Date last modified

2024-10-18