''Is your explanation stable?'': A Robustness Evaluation Framework for Feature Attribution-wellbet吉祥手机官网主页

''Is your explanation stable?'': A Robustness Evaluation Framework for Feature Attribution

时间：2021-06-22 浏览量：次

报名题目：''Is your explanation stable?'': A Robustness Evaluation Framework for Feature Attribution

主讲人：纪守领，浙江大学“百人计划”研究员

时间：2022年6月23日下午15:30

地点：腾讯会议（线上），会议ID：380-273-781

报告人信息：纪守领，浙江大学“百人计划”研究员、博士生导师、浙江大学党委组织部副部长（挂职）、滨江研究院国产信创研究中心副主任，获佐治亚wellbet吉祥手机官网学院电子与计算机工程博士学位、佐治亚州立大学计算机科学博士学位，入选国家青年人才计划。主要研究方向为人工智能与安全、数据驱动安全、软件与系统安全和大数据分析，发表IEEE S&P, USENIX Security, ACM CCS, KDD等CCF A类论文90余篇，研制的多个系统在大型平台上获得部署应用。获国家优秀留学生奖、网络系统安全领域CCF A类会议ACM CCS 2021最佳论文奖等10项最佳论文奖、华为优秀技术成果奖、浙江大学先进工作者等。

报告摘要：Neural networks have become increasingly popular. Nevertheless, understanding their decision process turns out to be complicated. One vital method to explain a models' behavior is feature attribution, i.e., attribute its decision to pivotal features. Although many algorithms are proposed, most of them aim to improve the faithfulness (fidelity) to the model. However, the real environment contains many random noises, which may cause the feature attribution maps to be greatly fluctuated for similar images. More seriously, recent works show that explanation algorithms are vulnerable to adversarial attacks, generating the same explanation for a maliciously perturbed input. All of these make the explanation hard to trust in real scenarios, especially in security-critical applications.

To bridge this gap, we propose Median Test for Feature Attribution (MeTFA) to estimate and reduce the randomness in explanation algorithms with theoretical guarantees. MeTFA is method-agnostic, i.e., it can be applied to any feature attribution method. To quantitatively evaluate MeTFA's faithfulness and stability, we propose several robust faithfulness metrics, which can evaluate the faithfulness of an explanation under different noise settings. Experiment results show that MeTFA-smoothed explanation can significantly increase the robust faithfulness of the attribution map. Furthermore, we use two typical applications to show its potential.