BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain (2017)笔记-蒲公英云

BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain (2017)笔记

矫情吗；* 2022-09-10 02:22 203阅读 0赞

Code: https://github.com/Billy1900/BadNet (reproduced version on cifar10 and Mnist)

Property: It has state-of-art performance on train and test dataset but behaves badly on specific attacker-chosen inputs.
Context: outsourced task and transfer learning
Case study: MNIST digit recognition attack
- Two different backdoor:
  - a) single-pixel backdoor
  - b) a pattern backdoor
- Two attacks
  - a) a single target attack: i -> j
  - b) all to all attack: i -> i+1
- strategy: poisoning training dataset, might change the training parameters like batch_size, step_size.
- Results:
  - a) single-target attack: the error rate for clean images is extremely low, and the largest error rate for the attack where the backdoor images of digit 1 are mislabeled by the BadNet as digit 5.
  - b) all-to-all attack: the average error rate for clean images on the BadNet is actually lower than the original network; meanwhile, the average error on backdoor images is extremely low which means it successfully mislabels >99% of backdoor images.
- Analysis of attack: The presence of dedicated backdoor filters suggests that the presence of backdoors is sparsely coded in deeper layers of the BadNet

It shows the relative fraction of backdoored images in the training dataset increases the error rate on clean images increases while the error rate on backdoored images decreases. Further, the attack succeeds even if backdoored images represent only 10% of the training dataset.
在这里插入图片描述

Case study: traffic sign detection attack
- Model: faster-RCNN which contains three sub-networks: (1) a shared CNN which extracts the features of the input image for other two sub-nets; (2) a region proposal CNN that identifies bounding boxes within an image that might correspond to objects of interest (these are referred to as region proposals); and (3) a traffic sign classification FCNN that classifies regions as either not a traffic sign, or into different types of traffic signs.
- Three super-classes: stop signs, speed-limit signs and warning signs.
- Three triggers: (i) a yellow square, (ii) an image of a bomb, and (iii) an image of a flower