### 文章目录 ###

* 0x00 parser 超参传入
* 0x01 数据预处理
* 0x02 model 此处选择 CNN
* * 2.1 输入( 此处 flags.preprocess = mfcc)
* 2.2 输入增维
* 2.3 构建网络
* * 2.3.1 stream()函数
* 2.3.2 网络结构模型 CNN
* 2.3.3 保存网络结构模型和参数到 txt
* 0x03 训练
* * 3.1 优化器等参数配置
* 3.2 开始训练
* 3.3 训练结束
* 0x04 保存model
* * 4.1 保存non-stream model
* 4.2 保存 stream model
* 4.3 最终的文件目录
* 0x05 Other 问题集收录
* * 1 stride\_ms 40 & dct\_num\_features 10 报错
* 2. 最开始训练时传入参数有个坑
* 3. 模型训练好之后,non-stream to streaming
* 4. 在模型训练好之后转换成\`tflite\`的过程中
* 5. 在non-stream to streaming 过程中

> 原创:
> 时间: 2020/06/17
> code: [google-research/kws\_streaming/ ][google-research_kws_streaming_]
> paper: [Streaming keyword spotting on mobile devices][]
> 关于stream kws 代码上的一些探索

# 0x00 parser 超参传入 # 1) \* d + 1 knew=(k−1)∗d\+1 ## 2.1 输入( 此处 flags.preprocess = mfcc) ## * train: * TRAINING * inference: * STREAM\_INTERNAL\_STATE\_INFERENCE * STREAM\_EXTERNAL\_STATE\_INFERENCE * NON\_STREAM\_INFERENCE # input_audio = tf.keras.layers.Input( shape=modes.get_input_data_shape(flags, modes.Modes.TRAINING), batch_size=flags.batch_size) * non-streaming and training shape = (flags.spectrogram_length, flags.dct_num_features,) * shape = (49, 20) * streaming shape = (1, flags.dct_num_features,) * shape = (1, 20) * 输入分两种: * 当`flags.preprocess`为 `raw` 时,`net = input_audio` * 当`flags.preprocess`为 `mfcc` or `micro` 时, # it is a self contained model, user need to feed raw audio only net = speech_features.SpeechFeatures( speech_features.SpeechFeatures.get_params(flags))( net) ## 2.2 输入增维 ## > 为了增加`chanel` # input shape = (100,49, 20, 1) net = tf.keras.backend.expand_dims(net) ## 2.3 构建网络 ## `for` \+ `zip` 简直就是神来之笔 for filters, kernel_size, activation, dilation_rate, strides in zip( parse(flags.cnn_filters), parse(flags.cnn_kernel_size), parse(flags.cnn_act), parse(flags.cnn_dilation_rate), parse(flags.cnn_strides)): net = Stream( cell=tf.keras.layers.Conv2D( filters=filters, kernel_size=kernel_size, activation=activation, dilation_rate=dilation_rate, strides=strides))( net) -------------------- ### 2.3.1 stream()函数 ### > 灵魂拷问: > > 1. 为什么`conv`要加stream() > 2. 为什么`flattern` 要加stream() > 3. 为什么stream() 有效果 > 4. 为什么使用 build() > > 回答: > > 1. 为了产生`self.state_shape` 这一个变量。 > 2. 在训练(卷积)的过程生成每一层的`self.state_shape`变量,在推理时候,如果是`Modes.STREAM_EXTERNAL_STATE_INFERENCE`会生成一个新的`self.input_state`的输入,如果是`Modes.STREAM_INTERNAL_STATE_INFERENCE`的时候会生成一个不可训练的权重文件 > 3. ![tTtdWq.png][] > 4. 当定义网络时不知道网络的维度时可以重写build()函数,用获得的shape构建网络 1. `__init__()` - super().\_\_init\_\_(**kwargs) - self.cell = tf.keras.layers.Conv2D(filters=filters, kernel_size=kernel_size, activation=activation, dilation_rate=dilation_rate, strides=strides) - self.mode = Modes.TRAINING - self.pad_time_dim = False - self.state_shape = None - self.ring_buffer_size_in_time_dim = None - strides = (1,1) - dilation_rate = (1,1) - kernel_size = (3,3) - self.ring_buffer_size_in_time_dim = 3 # 时间维度上的核大小 # conv self.ring_buffer_size_in_time_dim = dilation_rate[0] * (kernel_size[0] - 1) + 1 # Flattern self.ring_buffer_size_in_time_dim = None 2. `build()` : `build()`是在`__init__()`调用之后被调用 self.state_shape = [ self.inference_batch_size, self.ring_buffer_size_in_time_dim ] + input_shape.as_list()[2:] > self.state\_shape = \[1, 3, 20, 1\] # Modes.STREAM_INTERNAL_STATE_INFERENCE # 增加一个变量, state 变为 weight self.states = self.add_weight(name='states', shape=self.state_shape, trainable=False, initializer=tf.zeros_initializer) 3. `call()` 前向传播 * 训练的时候,`no padding`,`call()`如下: self.cell(inputs) # 即 tf.keras.layers.Conv2D()(inputs) * 推理的时候,假定`mode = STREAM_INTERNAL_STATE_INFERENCE` 输入的第一维必须是1,移除第一帧,补上最后一帧,作为网络的输入 def _streaming_internal_state(self, inputs): # The time dimenstion always has to equal 1 in streaming mode. if inputs.shape[1] != 1: raise ValueError('inputs.shape[1]: %d must be 1 ' % inputs.shape[1]) # remove latest row [batch_size, (memory_size-1), feature_dim, channel] memory = self.states[:, 1:self.ring_buffer_size_in_time_dim, :] # add new row [batch_size, memory_size, feature_dim, channel] memory = tf.keras.backend.concatenate([memory, inputs], 1) assign_states = self.states.assign(memory) with tf.control_dependencies([assign_states]): return self.cell(memory) ### 2.3.2 网络结构模型 CNN ### Model: "model" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) [(100, 49, 20)] 0 _________________________________________________________________ tf_op_layer_ExpandDims (Tens [(100, 49, 20, 1)] 0 _________________________________________________________________ stream (Stream) (100, 47, 18, 64) 640 _________________________________________________________________ stream_1 (Stream) (100, 43, 16, 64) 61504 _________________________________________________________________ stream_2 (Stream) (100, 35, 14, 64) 61504 _________________________________________________________________ stream_3 (Stream) (100, 31, 12, 64) 61504 _________________________________________________________________ stream_4 (Stream) (100, 23, 11, 128) 82048 _________________________________________________________________ stream_5 (Stream) (100, 19, 11, 64) 41024 _________________________________________________________________ stream_6 (Stream) (100, 1, 11, 128) 82048 _________________________________________________________________ stream_7 (Stream) (100, 1408) 0 _________________________________________________________________ dropout (Dropout) (100, 1408) 0 _________________________________________________________________ dense (Dense) (100, 128) 180352 _________________________________________________________________ dense_1 (Dense) (100, 256) 33024 _________________________________________________________________ dense_2 (Dense) (100, 14) 3598 ================================================================= Total params: 607,246 Trainable params: 607,246 Non-trainable params: 0 ## 2.3.3 保存网络结构模型和参数到 txt ## > 这个函数的代码可以抄袭,很有借鉴意义 utils.save_model_summary(model, flags.train_dir) def save_model_summary(model, path, file_name='model_summary.txt'): """Saves model topology/summary in text format. Args: model: Keras model path: path where to store model summary file_name: model summary file name """ with open(os.path.join(path, file_name), 'wt') as fd: stringlist = [] # print_fn 默认是 print ,传入的是函数 model.summary(print_fn=lambda x: stringlist.append(x)) # pylint: disable=unnecessary-lambda model_summary = '\n'.join(stringlist) fd.write(model_summary) # save model and data flags with open(os.path.join(flags.train_dir, 'flags.txt'), 'wt') as f: # stream 流向f,默认是屏幕 pprint.pprint(flags, stream=f) # 0x03 训练 # > 训练的过程一直是 `non-stream`,不过每一层会产生`self.state_shape` 这个值 ## 3.1 优化器等参数配置 ## # 设定loss loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True) # optimizer = tf.keras.optimizers.Adam(epsilon=flags.optimizer_epsilon) # 设定优化器 if flags.optimizer == 'adam': optimizer = tf.keras.optimizers.Adam(epsilon=flags.optimizer_epsilon) elif flags.optimizer == 'momentum': optimizer = tf.keras.optimizers.SGD(momentum=0.9) else: raise ValueError('Unsupported optimizer:%s' % flags.optimizer) model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy']) # 指定一个文件用来保存图 train_writer = tf.summary.FileWriter(flags.summaries_dir + '/train', sess.graph) validation_writer = tf.summary.FileWriter(flags.summaries_dir + '/validation') # 初始化所有变量 start_step = 1'Training from step: %d ', start_step) # Save graph.pbtxt. # 保存图, 但不保存variable tf.train.write_graph(sess.graph_def, flags.train_dir, 'graph.pbtxt') # Save list of words. # 保存标签 with, 'labels.txt'), 'w') as f: f.write('\n'.join(audio_processor.words_list)) best_accuracy = 0.0 # prepare parameters for exp learning rate decay training_steps_max = np.sum(training_steps_list) lr_init = learning_rates_list[0] # exp_rate 与后面的各个阶段的学习率有关 exp_rate = -np.log(learning_rates_list[-1] / lr_init)/training_steps_max ## 3.2 开始训练 ## 获取输入、学习率设定 训练方式: result = model.train_on_batch(train_fingerprints, train_ground_truth) ## 3.3 训练结束 ## 会生成如下文件 ./speech_commands_train ├── accuracy_last.txt # 训练结束后的测试集的准确率 ├── # 最高准确率的权重保存 ├── # 最高准确率的权重保存 ├── best_weights.index # 最高准确率的权重保存 ├── flags.txt # flags 参数保存 ├── graph.pbtxt # 网络结构图保存,The model architecture, and training configuration ├── labels.txt # 标签保存 ├── # 最后一次的权重保存?? ├── # 最后一次的权重保存?? ├── last_weights.index # 最后一次的权重保存?? ├── logs # tensorlfow 可视化工具 │ ├── train │ │ ├── events.out.tfevents.1591857342.ubuntu │ │ ├── events.out.tfevents.1591857494.ubuntu │ │ ├── events.out.tfevents.1591870751.RT-AI │ │ └── events.out.tfevents.1591875621.ubuntu │ └── validation │ ├── events.out.tfevents.1591857855.ubuntu │ └── events.out.tfevents.1591870970.RT-AI ├── model_summary.txt # 模型结构及参数保存 └── train # 每经过400次 step 保存一个权重文件,name: acc+weights+steps ├── ├── ├── 0weights_400.index ├── ... ├── ├── ├── 9027weights_15000.index └── checkpoint -------------------- * 查看权重详细数值(tensorflow1) * 查看了之后存在一个问题,所有的权重文件数值都是一样的,文件的大小也是一样的,**不知道原因** from tensorflow.python import pywrap_tensorflow checkpoint_path = 'your saved weight file path' # Read data from checkpoint file reader = pywrap_tensorflow.NewCheckpointReader(checkpoint_path) var_to_shape_map = reader.get_variable_to_shape_map() # Print tensor name and values for key in var_to_shape_map: print("tensor_name: ", key) print(reader.get_tensor(key)) # 0x04 保存model # ## 4.1 保存non-stream model ## # convert to SavedModel test.convert_model_saved(flags, 'non_stream', Modes.NON_STREAM_INFERENCE) # 在convert_model_saved() 函数中使用了这么一行代码 # convert_model_saved(flags, folder, mode, weights_name='best_weights') utils.model_to_saved(model, flags, path_model, mode) `model_to_saved()` 的功能是保存网络结构到txt 和保存网络结构图,权重文件保存在`variables`下面 ./non_stream ├── assets # 未知 ├── model_summary.txt ├── saved_model.pb └── variables ├── ├── └── variables.index ## 4.2 保存 stream model ## test.convert_model_saved(flags, 'stream_state_internal', Modes.STREAM_INTERNAL_STATE_INFERENCE) 同样调用`utils.model_to_saved()`这个函数,不同的是`mode = Modes.STREAM_INTERNAL_STATE_INFERENCE`, 在保存之前,多了这么一步,这一步完成之后才是保存网络结构到txt 和保存网络结构图: # convert non streaming Keras model to Keras streaming model, internal state model = to_streaming_inference(model_non_stream, flags, mode) 上面的函数的功能是将之前的训练好的模型转换成 `streaming model` 怎么转换呢?继续深入源码 发现其中有两个比较重要的函数: model = _set_mode(model, mode) new_model = _clone_model(model, input_tensors) 第一个函数是将model.layers 中的所有`mode` 从`TRAINING` 改为`STREAM_INTERNAL_STATE_INFERENCE`; 第二个函数可就厉害了,功能是克隆网络结构和`configs`, 写的代码,调用的`tensorflow` 的API在市面上没有任何的流通资料,一头抓瞎,心态裂开,源码如下: def _clone_model(model, input_tensors): """Clone model with configs, except of weights.""" new_input_layers = { } # Cache for created layers. # pylint: disable=protected-access if input_tensors is not None: # Make sure that all input tensors come from a Keras layer. input_tensors = tf.nest.flatten(input_tensors) for i, input_tensor in enumerate(input_tensors): if not tf.keras.backend.is_keras_tensor(input_tensor): raise ValueError('Expected keras tensor but get', input_tensor) # 新旧Input 保存成一个字典 dict[旧] = 新 original_input_layer = model._input_layers[i] newly_created_input_layer = input_tensor._keras_history.layer new_input_layers[original_input_layer] = newly_created_input_layer # 将原来的第一层 Input层改为现在的Input层 # shape [1, 25, 10] --> [1, 1, 10] # 如果不是第一层,就copy 各层的config model_config, created_layers = models._clone_layers_and_model_config( model, new_input_layers, models._clone_layer) # pylint: enable=protected-access # Reconstruct model from the config, using the cloned layers. input_tensors, output_tensors, created_layers = ( functional.reconstruct_from_config( model_config, created_layers=created_layers)) new_model = tf.keras.Model(input_tensors, output_tensors, return new_model -------------------- 此处引发了关于tf版本的一个问题,需要`tf_nightly==2.3.0.dev20200515 # was validated on tf_nightly-2.3.0.dev20200515-cp36-cp36m-manylinux2010_x86_64.whl` ,未完待续 ## 4.3 最终的文件目录 ## ⇒ tree ./train_model/cnn ./train_model/cnn ├── accuracy_last.txt ├── ├── ├── best_weights.index ├── flags.json ├── flags.txt ├── graph.pbtxt ├── labels.txt ├── ├── ├── last_weights.index ├── logs │ ├── train │ │ ├── events.out.tfevents.1591857342.ubuntu │ │ ├── events.out.tfevents.1591857494.ubuntu │ │ ├── events.out.tfevents.1591870751.RT-AI │ │ └── events.out.tfevents.1591875621.ubuntu │ └── validation │ ├── events.out.tfevents.1591857855.ubuntu │ └── events.out.tfevents.1591870970.RT-AI ├── model_summary.txt ├── non_stream │ ├── assets │ ├── model_summary.txt │ ├── saved_model.pb │ └── variables │ ├── │ ├── │ └── variables.index ├── quantize_opt_for_size_tflite_non_stream │ ├── model_summary.txt │ ├── non_stream.tflite │ └── tflite_non_stream_model_accuracy.txt ├── quantize_opt_for_size_tflite_stream_state_external │ ├── model_summary.txt │ ├── stream_state_external.tflite │ ├── tflite_stream_state_external_model_accuracy_reset0.txt │ └── tflite_stream_state_external_model_accuracy_reset1.txt ├── stream_state_internal │ ├── assets │ ├── model_summary.txt │ ├── saved_model.pb │ └── variables │ ├── │ ├── │ └── variables.index ├── tf │ ├── model_summary_non_stream.png │ ├── model_summary_non_stream.txt │ ├── model_summary_stream_state_external.png │ ├── model_summary_stream_state_external.txt │ ├── model_summary_stream_state_internal.png │ ├── model_summary_stream_state_internal.txt │ ├── stream_state_external_model_accuracy_sub_set_reset0.txt │ ├── stream_state_external_model_accuracy_sub_set_reset1.txt │ ├── tf_non_stream_model_accuracy.txt │ ├── tf_non_stream_model_sampling_stream_accuracy.txt │ └── tf_stream_state_internal_model_accuracy_sub_set.txt ├── tflite_non_stream │ ├── model_summary.txt │ ├── non_stream.tflite │ └── tflite_non_stream_model_accuracy.txt ├── tflite_stream_state_external │ ├── model_summary.txt │ ├── stream_state_external.tflite │ ├── tflite_stream_state_external_model_accuracy_reset0.txt │ └── tflite_stream_state_external_model_accuracy_reset1.txt └── train ├── ├── ├── 0weights_400.index ├── ... ├── ├── ├── 9027weights_15000.index └── checkpoint # 0x05 Other 问题集收录 # ## 1 stride\_ms 40 & dct\_num\_features 10 报错 ## ValueError: Negative dimension size caused by subtracting 5 from 4 for '{ {node stream_4/conv2d_4/Conv2D}} = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], explicit_paddings=[], padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true] (stream_4/conv2d_4/SpaceToBatchND, stream_4/conv2d_4/Conv2D/ReadVariableOp)' with input shapes: [200,4,2,64], [5,2,64,128] 第五层卷积核过大,重新设置参数: --cnn_kernel_size (3,3),(5,3),(5,3),(5,3),(5,2),(5,1),(3,1) \ --cnn_dilation_rate (1,1),(1,1),(1,1),(1,1),(1,1),(1,1),(1,1) \ 出现新的报错: raise ValueError('malformed node or string: ' + repr(node)) ValueError: malformed node or string: <_ast.Name object at 0x7f8ca813a630> 原因: --units2 128,256 \ --act2 linear,relu 参数传入有问题,删除即可 class MyLayer(layers.Layer): def __init__(self, input_dim=32, unit=32): super(MyLayer, self).__init__() self.weight = self.add_weight(shape=(input_dim, unit), initializer=keras.initializers.RandomNormal(), trainable=True) self.bias = self.add_weight(shape=(unit,), initializer=keras.initializers.Zeros(), trainable=True) def call(self, inputs): return tf.matmul(inputs, self.weight) + self.bias ## 2. 最开始训练时传入参数有个坑 ## 末尾需要补上’\\’, 否则保存的权重文件会出现在神奇的位置 # 保存训练的权重 model.save_weights(flags.train_dir + 'train/' + str(int(best_accuracy * 10000)) + 'weights_' + str(training_step)) # 保存每次验证的最优的权重 model.save_weights(flags.train_dir + 'best_weights') # 保存最后一次测试的权重 model.save_weights(flags.train_dir + 'last_weights') ## 3. 模型训练好之后,non-stream to streaming ## WARNING: failed to convert to SavedModel: module 'tensorflow.python.keras.engine.node' has no attribute '_CONSTANT_VALUE' 在 `/home/lebhoryi/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow/python/keras/engine/`中,少了一行定义,增加在第25行 `_CONSTANT_VALUE = '_CONSTANT_VALUE'` -------------------- 下面的解决思想产生了点小偏差,花了一天多一点的时间,最后在下班之前发现是这个原因: `/home/lebhoryi/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/` 版本不匹配,包括上面的`_CONSTANT_VALUE` 值的问题,战术绕头,有点懵逼。 > 在github master分支上的 有output、\_CONSTANT\_VALUE两个属性的,但是在tf2.1 和 2.2 中的 就不存在,所以会报错 ======== update 2020/06/17 版本问题 ========= [作者给出的意见][Link 1],更新到版本 `tf_nightly==2.3.0.dev20200515 # was validated on tf_nightly-2.3.0.dev20200515-cp36-cp36m-manylinux2010_x86_64.whl`, 问题看上去解决了 未完待续… > (一些迷了路的想法) > > 后续继续引发问题: > > WARNING: failed to convert to SavedModel: 'Node' object has no attribute 'outputs' > > # 上述问题代码出错在这一行: > (model_config, created_layers) = models._clone_layers_and_model_config( > model, new_input_layers, models._clone_layer) > > 问题原因在于在clone的过程中,丢失了每一层的output这个参数: > > [![NCayef.png][]][NCayef.png 1] > > ![NCaedU.png][] > > 正常是这样的: > > ![NCdTgA.png][] > > # 往下面找,发现问题在这 > Layer stream has no inbound nodes. > > 在调用`output` 属性之前,有一个叫做`inbound_nodes` 的东西,这个不存在则导致`input` & `output` 不存在 > > > 这些[inbound nodes][]是什么? > > A **Node** describes the **connectivity between two layers**. Each time a layer is connected to some new input, a node is added to *layer.\_inbound\_nodes*. Each time the output of a layer is used by another layer, a node is added to *layer.\_outbound\_nodes*. > > ![NiJoj0.png][] ## 4. 在模型训练好之后转换成`tflite`的过程中 ## tensorflow.python.framework.errors_impl.InvalidArgumentError: Matrix size-incompatible: In[0]: [100,128], In[1]: [1408,128] [[{ { node dense/BiasAdd}}]] 这个是传入的参数不对,训练时的参数`dct`是20,测试的时候传入10,因此报错 -------------------- 新的问题:\[在tf版本更新之后,该问题小时了\] I0615 15:00:39.572662 140162058372864] run TF non streaming model accuracy evaluation I0615 15:01:34.501143 140162058372864] TF Final test accuracy on non stream model = 89.95% (N=5600) I0615 15:02:19.639499 140162058372864] TF Final test accuracy on non stream model = 89.18% (N=5600) ... I0615 15:02:20.397487 140162058372864] None W0615 15:02:20.405573 140162058372864] There is no need to use Stream on time dim with size 1 W0615 15:02:20.410688 140162058372864] FAILED to convert to mode NON_STREAM_INFERENCE, tflite: 'Node' object has no attribute 'outputs' Traceback (most recent call last): File "train/", line 331, in <module>, argv=[sys.argv[0]] + unparsed) File "/home/lebhoryi/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow/python/platform/", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/home/lebhoryi/anaconda3/envs/tf2/lib/python3.6/site-packages/absl/", line 299, in run _run_main(main, args) File "/home/lebhoryi/anaconda3/envs/tf2/lib/python3.6/site-packages/absl/", line 250, in _run_main sys.exit(main(argv)) File "train/", line 202, in main test.tflite_non_stream_model_accuracy(flags, folder_name, file_name) File "/home/lebhoryi/WakeUp-Xiaorui/kws_streaming/train/", line 498, in tflite_non_stream_model_accuracy model_path=os.path.join(path, tflite_model_name)) File "/home/lebhoryi/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow/lite/python/", line 204, in __init__ model_path, self._custom_op_registerers)) ValueError: Mmap of '/tmp/speech_commands_train/tflite_non_stream/non_stream.tflite' failed. ## 5. 在non-stream to streaming 过程中 ## 在运行这一行代码的时候: (model_config, created_layers) = models._clone_layers_and_model_config( model, new_input_layers, models._clone_layer) 警告如下: There is no need to use Stream on time dim with size 1 这是因为在传入参数的时候,在`streaming_7` 层时,传入的`self.stae_shape = [1, 1, 11, 128]`, if isinstance(self.cell, tf.keras.layers.Flatten): # effective kernel size in time dimension if self.state_shape: self.ring_buffer_size_in_time_dim = self.state_shape[1] if self.ring_buffer_size_in_time_dim == 1: logging.warn('There is no need to use Stream on time dim with size 1') [google-research_kws_streaming_]: [Streaming keyword spotting on mobile devices]: [NAgICd.png]: [Tensorflow]: [tTtdWq.png]: [Link 1]: [NCayef.png]: [NCayef.png 1]: [NCaedU.png]: [NCdTgA.png]: [inbound nodes]: [NiJoj0.png]: