论文阅读《Understanding Procedural Text using Interactive Entity Networks》

总结

原文出处：CSDN_LawsonAbs
LawsonAbs的认知与思考，望各位读者审慎阅读。
论文类型：改进模型类，无亮点。
建议：如果读者尚未读到此文，建议换文阅读。
论文地址

在这里插入图片描述
本文主要按照如下三个方面来叙述：

1 Target：Understanding procedural text

2 Method：By using Interactive Entity Network

3 Contributions:

introduce an novel module——IEN(by using attention)
conduct intensive experiments

4 Experiments

1 Task

1.1 What’s procedural text?

e.g., scientific articles, instruction books, or recipes

sentence 1:

The water breaks into oxygen, hydrogen, and electrons.

sentence 2:

Blood travels to the lungs.
Carbon dioxide is removed from the blood.
Oxygen is added to your blood.

1.2 Entity state tracking is the key task for procedural text comprehension.

本文的任务：Focus on scientific process understanding task(Dalvi et al., 2018), in which the tracking targets are action and location of entities.

1.3 Example

The state tracking task is to predict the states of each entity e i e_i ei after reading each sentence s t s_t st, where an entity’s state is a value of a property p j p_j pj.

CO2 enters leaf.

System:
the existence( p j p_j pj) of CO2 ( e i e_i ei) is true
the location( p j p_j pj) of CO2 is leaf

1.4 Difficulties

dynamic nature
the involvement of multiple entities => [The water breaks into oxygen, hydrogen, and electrons.]
the complexity of tracking targets

1.5 Challenge

how to properly capture the relationship between entity interactions and their state changes.
在这里插入图片描述

2 IEN: Interactive Entity Network

本文的贡献就在于提出了这个IEN模型，因为它提高了任务的效果，所以它有用。

2.1 Characteristics

two-layer RNN model

在这里插入图片描述

2.1.1 Bottom RNN Encoder : Word-Level Encoding

w ⃗ i = [ e m b ( w i ) ; v i ] \vec \textbf w_{i} = [emb(w_i);v_i] wi=[emb(wi);vi]

等式左边的 w i w_i wi 表示的是embedding vector，程式右边的 w i w_i wi 是单词i
实验中针对不同的embedding function，如（fastText、ELMo），进行对比
v i v_i vi 是一个标量，用于表示 w i w_i wi 这个词是否是动词
将得到的 w i w_i wi 送入到BiLSTM中得到最后的一个表示 u i = B i L S T M ( [ w i ] ) u_i =BiLSTM([w_i]) ui=BiLSTM([wi])

2.1.2 Upper RNN Encoder : Sentence-Level Encoding and IEN cell

To track the state changes, we extract sentence features from word-level encodings by running another RNN at the sentence level.

在这里插入图片描述

x t e i x_t^{e_i} xtei ：表示的就是在句子t中的 e i e_i ei 实体；
x t l j x_t^{l_j} xtlj ：表示的就是在句子t中的 l j l_j lj 位置；
u t v u_t^{v} utv：表示的是句子t中的谓语动词。

补充：

如果entity 或者 location 是由多个词构成的，那么就执行一个mean pooling 操作。

接着分别将所有的实体、位置表示拼接在一起得到 x t e ∈ R n ∗ d x_t^e \in R^{n*d} xte∈Rn∗d， x t l ∈ R m ∗ d x_t^l \in R^{m*d} xtl∈Rm∗d。 x t e x_t^e xte 和 x t l x_t^l xtl 就是第t个IEN cell 的输入。
下图中红框1就是IEN cell 部分，而其中的红框2就是IEN cell 的输入。
在这里插入图片描述
接着来看一下IEN cell 是如何设计的？

2.2 Structure of an IEN cell

在这里插入图片描述
在每个 IEN cell中，都有 n 个实体 slots、 m个location slots，与上面的 x t e x_t^e xte 和 x t l x_t^l xtl 相对应。

2.2.1 IEN cell Inputs

the representations of all entities and all location candidates in a signle sentence st, or a mask vector if the entity or location candidate is not in st .

在这里插入图片描述

memory slots 是什么？

we place memory slots inside IEN cells and let them recurrently update as GRU;
Each memory slot represents the state of a specific entity or a location candidate.

h t e ∈ R n ∗ d h_t^e \in R^{n*d} hte∈Rn∗d 表示第t个IEN cell中所有实体的 memory slots； h t l ∈ R n ∗ d h_t^l \in R^{n*d} htl∈Rn∗d 表示第t个IEN cell中所有location的 memory slots。更新过程如下：