[Security 2020] TEXTSHIELD: Robust Text Classification Based on Multimodal Embedding and Neural Machine Translation

发布者:刘智晨发布时间:2021-08-15浏览次数:535

Authors:

Jinfeng Li, Tianyu Du, Shouling Ji, Rong Zhang, Quan Lu, Min Yang, Ting Wang


Publication:

This paper is included in the Proceedings of the 29th USENIX Security Symposium, August 12–14, 2020


Abstract:

Text-based toxic content detection is an important tool for reducing harmful interactions in online social media environments. Yet, its underlying mechanism, deep learning-based text classification (DLTC), is inherently vulnerable to maliciously crafted adversarial texts. To mitigate such vulnerabilities, intensive research has been conducted on strengthening English-based DLTC models. However, the existing defenses are not effective for Chinese-based DLTC models, due to the unique sparseness, diversity, and variation of the Chinese language.

In this paper, we bridge this striking gap by presenting TEXTSHIELD, a new adversarial defense framework specifically designed for Chinese-based DLTC models. TEXTSHIELD differs from previous work in several key aspects: (i) generic – it applies to any Chinese-based DLTC models without requiring re-training; (ii) robust – it significantly reduces the attack success rate even under the setting of adaptive attacks; and (iii) accurate – it has little impact on the performance of DLTC models over legitimate inputs. Extensive evaluations show that it outperforms both existing methods and the industry-leading platforms. Future work will explore its applicability in broader practical tasks.


TEXTSHIELD: Robust Text Classification Based on Multimodal Embedding and Neural Machine Translation.pdf