BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain (2017)笔记

矫情吗;* 2022-09-10 02:22 203阅读 0赞

Code: https://github.com/Billy1900/BadNet (reproduced version on cifar10 and Mnist)

  • Property: It has state-of-art performance on train and test dataset but behaves badly on specific attacker-chosen inputs.
  • Context: outsourced task and transfer learning
  • Case study: MNIST digit recognition attack

    • Two different backdoor:

      • a) single-pixel backdoor
      • b) a pattern backdoor
        在这里插入图片描述
    • Two attacks

      • a) a single target attack: i -> j
      • b) all to all attack: i -> i+1
    • strategy: poisoning training dataset, might change the training parameters like batch_size, step_size.
    • Results:

      • a) single-target attack: the error rate for clean images is extremely low, and the largest error rate for the attack where the backdoor images of digit 1 are mislabeled by the BadNet as digit 5.
        在这里插入图片描述
      • b) all-to-all attack: the average error rate for clean images on the BadNet is actually lower than the original network; meanwhile, the average error on backdoor images is extremely low which means it successfully mislabels >99% of backdoor images.
        在这里插入图片描述
    • Analysis of attack: The presence of dedicated backdoor filters suggests that the presence of backdoors is sparsely coded in deeper layers of the BadNet
      在这里插入图片描述

It shows the relative fraction of backdoored images in the training dataset increases the error rate on clean images increases while the error rate on backdoored images decreases. Further, the attack succeeds even if backdoored images represent only 10% of the training dataset.
在这里插入图片描述

  • Case study: traffic sign detection attack

    • Model: faster-RCNN which contains three sub-networks: (1) a shared CNN which extracts the features of the input image for other two sub-nets; (2) a region proposal CNN that identifies bounding boxes within an image that might correspond to objects of interest (these are referred to as region proposals); and (3) a traffic sign classification FCNN that classifies regions as either not a traffic sign, or into different types of traffic signs.
    • Three super-classes: stop signs, speed-limit signs and warning signs.
    • Three triggers: (i) a yellow square, (ii) an image of a bomb, and (iii) an image of a flower
      在这里插入图片描述

More Update: https://github.com/Billy1900/Backdoor-Learning

发表评论

表情:
评论列表 (有 0 条评论,203人围观)

还没有评论,来说两句吧...

相关阅读

    相关 Vectorization in Machine Learning

    对于有些问题,如果使用了合适的向量化方法,代码就会变得简单得多而且有效得多。 我们来看一些例子: 这是一个常见的线性回归假设函数: ![20181112202014808