Modelstealingposesasignificantsecurityriskinmachinelearningbyenabling attackerstoreplicateablack-boxmodelwithoutaccesstoitstrainingdata, thus jeopardizing intellectual propertyandexposingsensitive information. Recent methods that usepre-traineddiffusionmodels fordatasynthesis improveefficiencyandperformancebut relyheavilyonmanuallycraftedprompts, limiting automationandscalability,especiallyforattackerswithlittleexpertise.Toassess therisksposedbyopen-sourcepre-trainedmodels,weproposeamorerealistic threatmodel thateliminates theneedforpromptdesignskillsorknowledgeof classnames. Inthiscontext,weintroduceStealix, thefirstapproachtoperform model stealingwithoutpredefinedprompts. Stealixuses twoopen-sourcepretrainedmodelstoinferthevictimmodel’sdatadistribution,anditerativelyrefines promptsthroughageneticalgorithmbasedonaproxymetric,progressivelyimprovingtheprecisionanddiversityofsyntheticimages.Ourexperimentalresults demonstratethatStealixsignificantlyoutperformsothermethods,eventhosewith access toclassnamesorfine-grainedprompts,whileoperatingunder thesame querybudget. Thesefindingshighlight thescalabilityofourapproachandsuggest that therisksposedbypre-trainedgenerativemodelsinmodelstealingmay begreaterthanpreviouslyrecognized.
International Conference on Machine Learning (ICML)
2025-07
2025-05-23