Data Augmentation in NLP

骑猪看日落 2023-02-26 11:23 34阅读 0赞

Data Augmentation in NLP

Word Substitution

  1. Synonym-based substitution

20200715150804496.png

  1. Word embedding substitution

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2FuZ3VzX21vbnJvZQ_size_16_color_FFFFFF_t_70

20200715150804496.png

  1. Masked language model

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2FuZ3VzX21vbnJvZQ_size_16_color_FFFFFF_t_70 1

  1. TF-IDF-based word substitution

The basic idea is that words with a low TF-IDF score are meaningless, so they can be replaced without affecting the true label of the sentence.

20200715150804511.png

Back Translation

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2FuZ3VzX21vbnJvZQ_size_16_color_FFFFFF_t_70 2

Text Surface Transformation

20200715150804505.png

Random Noise Injection

  1. Misspelling injection

20200715150804504.png

  1. QWERTY keyboard error injection

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2FuZ3VzX21vbnJvZQ_size_16_color_FFFFFF_t_70 3

  1. empty noise injection

20200715150804513.png

  1. Random injection

Choose a random word from sentences that are not stop words. Then, find its synonyms and insert them at random positions in the sentence.

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2FuZ3VzX21vbnJvZQ_size_16_color_FFFFFF_t_70 1

  1. Sentence reorganization

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2FuZ3VzX21vbnJvZQ_size_16_color_FFFFFF_t_70 4

Syntax Tree

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2FuZ3VzX21vbnJvZQ_size_16_color_FFFFFF_t_70 5

reference

https://blog.csdn.net/lqfarmer/article/details/107006551

发表评论

表情:
评论列表 (有 0 条评论,34人围观)

还没有评论,来说两句吧...

相关阅读