Description
To understand the sensitivity under attacks and to develop defense mechanisms, machine-learning model designers craft worst-case adversarial perturbations with gradient-descent optimization algorithms against the model under evaluation. However, many of the proposed defenses have been shown to provide a false sense of robustness due to failures of the attacks, rather than actual improvements in the machine‐learning models’ robustness, as highlighted by more rigorous evaluations. Although guidelines and best practices have been suggested to improve current adversarial robustness evaluations, the lack of automatic testing and debugging tools makes it difficult to apply these recommendations in a systematic and automated manner. To this end, the analysis of failures in the optimization of adversarial attacks is the only valid strategy to avoid repeating mistakes of the past.