Recommender systems predict and suggest relevant options to users in various domains, such as e-commerce, streaming services, and social media. Recently, deep reinforcement learning (DRL)-based recommendation systems have become increasingly popular in academics and industry since DRL can characterize the long-term interaction between the system and users to achieve a better recommendation experience, e.g., Netflix, Spotify, Google, and YouTube. This paper demonstrates that an adversary can manipulate the DRL-based recommender system by injecting carefully designed user-system interaction records. The poisoning attack against the DRL-based recommender system is formulated as a non-convex integer programming problem. To solve the problem, we proposed a three-phase mechanism (called PARL) to maximize the hit ratio (the proportion of recommendations that result in actual user interactions, such as clicks, purchases, or other relevant actions) while avoiding easy detection. The core idea of PARL is to improve the ranking of the target item while fixing the rankings of other items. Considering the sequential decision-making characteristics of DRL, PARL rearranges the items' order of the fake users to mimic the normal users' sequential features, an aspect usually overlooked in existing work. Our experiments on three real-world datasets demonstrate the effectiveness of PARL and better concealment against the detection techniques. PARL is open-sourced at https://github.com/PARL-RS/PARL.
ACM ASIA Conference on Computer and Communications Security (AsiaCCS)
2024-07
2024-11-20