之前已经讲过了generator，这次是要建立一个更详细的框架，数据到底怎么被处理的。可以作为样例学习。写的并不详细，因为你如果要做更深的工作，你需要很高自主能力，大多数人都具备，所以我就不废话了（主要是忙）。

源数据：图片/矩阵

目标数据：tensorflow 标准的Dataset

主要过程：

源数据->build_voc_data.py->tfrecord
tfrecord->data_generator.py->Dataset

第一步将数据转换成tfrecord

直接读入的是二进制图像数据

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3UwMTMyNDk4NTM_size_16_color_FFFFFF_t_70

之后利用类build_data.py直接将其转换成feature存入tfrecord，其实就是转换一下格式成为feature，然后用tf.train.Example画个格子装进去。

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3UwMTMyNDk4NTM_size_16_color_FFFFFF_t_70 1

所以如果要多存入一些数据，比如深度数据，比如视频数据，那就需要改三个点：

1.build_data.py :def image_seg_to_tfexample加入你自己的定义。注意你的数据如果是整数就用_int64,浮点数就自己写个浮点数的，或者用string

return tf.train.Example(features=tf.train.Features(feature={
      'image/encoded': _bytes_list_feature(image_data),
      'image/filename': _bytes_list_feature(filename),
      'image/format': _bytes_list_feature(
          _IMAGE_FORMAT_MAP[FLAGS.image_format]),
      'image/height': _int64_list_feature(height),
      'image/width': _int64_list_feature(width),
      'image/channels': _int64_list_feature(3),
      'image/segmentation/class/encoded': (
          _bytes_list_feature(seg_data)),
      'image/segmentation/class/format': _bytes_list_feature(
          FLAGS.label_format),
  }))

2.build_voc_data.py:读入你自己的数据，下面就是读取图像和标注。你需要写你自己的数据读入。

image_filename = os.path.join(
            FLAGS.image_folder, filenames[i] + '.' + FLAGS.image_format)
        image_data = tf.gfile.FastGFile(image_filename, 'rb').read()
        height, width = image_reader.read_image_dims(image_data)
        # Read the semantic segmentation annotation.
        seg_filename = os.path.join(
            FLAGS.semantic_segmentation_folder,
            filenames[i] + '.' + FLAGS.label_format)
        seg_data = tf.gfile.FastGFile(seg_filename, 'rb').read()
        seg_height, seg_width = label_reader.read_image_dims(seg_data)

3.build_voc_data.py:传递你的数据直接转换。

example = build_data.image_seg_to_tfexample(
            image_data, filenames[i], height, width, seg_data)

第二步通过tfrecord建立Dataset

1.主函数是data_generator,调用了input_preprocess.py，preprocess调用了core.preprocess_utils.py

data_generator 读取tfrecord 需要改变features字典

features = {
        'image/encoded':
            tf.FixedLenFeature((), tf.string, default_value=''),
        'image/filename':
            tf.FixedLenFeature((), tf.string, default_value=''),
        'image/format':
            tf.FixedLenFeature((), tf.string, default_value='jpeg'),
        'image/height':
            tf.FixedLenFeature((), tf.int64, default_value=0),
        'image/width':
            tf.FixedLenFeature((), tf.int64, default_value=0),
        'image/segmentation/class/encoded':
            tf.FixedLenFeature((), tf.string, default_value=''),
        'image/segmentation/class/format':
            tf.FixedLenFeature((), tf.string, default_value='png'),
        'image/superpixel16/class/encoded':
            tf.VarLenFeature(tf.int64),
}

注意如果不能指定读取形状，那就用VarLenFeature。不过需要将var读取的稀疏数据通过

tf.sparse_tensor_to_dense

转成稠密的。

2.增加sample里面的字典key，你的数据key

sample = {
        common.IMAGE: image,
        common.IMAGE_NAME: image_name,
        common.HEIGHT: parsed_features['image/height'],
        common.WIDTH: parsed_features['image/width'],
        common.SUPER16:super16,
        common.SUPER8:super8,
        'shape' : shape,
        'ori_image':image,
        }

这个时候你已经能通过调试查看自己的数据有没有存储并且正确读取。

调试代码：

# Copyright 2018 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Training script for the DeepLab model.
See model.py for more details and usage.
"""
import six
import tensorflow as tf
from tensorflow.python.ops import math_ops
import sys
sys.path.append('..')
from deeplab import common
from deeplab import model
from deeplab.datasets import data_generator_super as data_generator
from deeplab.utils import train_utils
import os
from matplotlib import pyplot as plt
from skimage.segmentation import slic, mark_boundaries
os.environ['CUDA_VISIBLE_DEVICES']='5'
import numpy as np
flags = tf.app.flags
FLAGS = flags.FLAGS
# Settings for multi-GPUs/multi-replicas training.
flags.DEFINE_integer('num_clones', 1, 'Number of clones to deploy.')
flags.DEFINE_boolean('clone_on_cpu', False, 'Use CPUs to deploy clones.')
flags.DEFINE_integer('num_replicas', 1, 'Number of worker replicas.')
flags.DEFINE_integer('startup_delay_steps', 15,
                     'Number of training steps between replicas startup.')
flags.DEFINE_integer(
    'num_ps_tasks', 0,
    'The number of parameter servers. If the value is 0, then '
    'the parameters are handled locally by the worker.')
flags.DEFINE_string('master', '', 'BNS name of the tensorflow server')
flags.DEFINE_integer('task', 0, 'The task ID.')
# Settings for logging.
flags.DEFINE_string('train_logdir', None,
                    'Where the checkpoint and logs are stored.')
flags.DEFINE_integer('log_steps', 10,
                     'Display logging information at every log_steps.')
flags.DEFINE_integer('save_interval_secs', 1200,
                     'How often, in seconds, we save the model to disk.')
flags.DEFINE_integer('save_summaries_secs', 600,
                     'How often, in seconds, we compute the summaries.')
flags.DEFINE_boolean(
    'save_summaries_images', False,
    'Save sample inputs, labels, and semantic predictions as '
    'images to summary.')
# Settings for profiling.
flags.DEFINE_string('profile_logdir', None,
                    'Where the profile files are stored.')
# Settings for training strategy.
flags.DEFINE_enum('learning_policy', 'poly', ['poly', 'step'],
                  'Learning rate policy for training.')
# Use 0.007 when training on PASCAL augmented training set, train_aug. When
# fine-tuning on PASCAL trainval set, use learning rate=0.0001.
flags.DEFINE_float('base_learning_rate', .0001,
                   'The base learning rate for model training.')
flags.DEFINE_float('learning_rate_decay_factor', 0.1,
                   'The rate to decay the base learning rate.')
flags.DEFINE_integer('learning_rate_decay_step', 2000,
                     'Decay the base learning rate at a fixed step.')
flags.DEFINE_float('learning_power', 0.9,
                   'The power value used in the poly learning policy.')
flags.DEFINE_integer('training_number_of_steps', 30000,
                     'The number of steps used for training')
flags.DEFINE_float('momentum', 0.9, 'The momentum value to use')
# When fine_tune_batch_norm=True, use at least batch size larger than 12
# (batch size more than 16 is better). Otherwise, one could use smaller batch
# size and set fine_tune_batch_norm=False.
flags.DEFINE_integer('train_batch_size', 8,
                     'The number of images in each batch during training.')
# For weight_decay, use 0.00004 for MobileNet-V2 or Xcpetion model variants.
# Use 0.0001 for ResNet model variants.
flags.DEFINE_float('weight_decay', 0.00004,
                   'The value of the weight decay for training.')
flags.DEFINE_list('train_crop_size', '513,513',
                  'Image crop size [height, width] during training.')
flags.DEFINE_float(
    'last_layer_gradient_multiplier', 1.0,
    'The gradient multiplier for last layers, which is used to '
    'boost the gradient of last layers if the value > 1.')
flags.DEFINE_boolean('upsample_logits', True,
                     'Upsample logits during training.')
# Hyper-parameters for NAS training strategy.
flags.DEFINE_float(
    'drop_path_keep_prob', 1.0,
    'Probability to keep each path in the NAS cell when training.')
# Settings for fine-tuning the network.
flags.DEFINE_string('tf_initial_checkpoint', None,
                    'The initial checkpoint in tensorflow format.')
# Set to False if one does not want to re-use the trained classifier weights.
flags.DEFINE_boolean('initialize_last_layer', True,
                     'Initialize the last layer.')
flags.DEFINE_boolean('last_layers_contain_logits_only', False,
                     'Only consider logits as last layers or not.')
flags.DEFINE_integer('slow_start_step', 0,
                     'Training model with small learning rate for few steps.')
flags.DEFINE_float('slow_start_learning_rate', 1e-4,
                   'Learning rate employed during slow start.')
# Set to True if one wants to fine-tune the batch norm parameters in DeepLabv3.
# Set to False and use small batch size to save GPU memory.
flags.DEFINE_boolean('fine_tune_batch_norm', True,
                     'Fine tune the batch norm parameters or not.')
flags.DEFINE_float('min_scale_factor', 0.5,
                   'Mininum scale factor for data augmentation.')
flags.DEFINE_float('max_scale_factor', 2.,
                   'Maximum scale factor for data augmentation.')
flags.DEFINE_float('scale_factor_step_size', 0.25,
                   'Scale factor step size for data augmentation.')
# For `xception_65`, use atrous_rates = [12, 24, 36] if output_stride = 8, or
# rates = [6, 12, 18] if output_stride = 16. For `mobilenet_v2`, use None. Note
# one could use different atrous_rates/output_stride during training/evaluation.
flags.DEFINE_multi_integer('atrous_rates', None,
                           'Atrous rates for atrous spatial pyramid pooling.')
flags.DEFINE_integer('output_stride', 16,
                     'The ratio of input to output spatial resolution.')
# Hard example mining related flags.
flags.DEFINE_integer(
    'hard_example_mining_step', 0,
    'The training step in which exact hard example mining kicks off. Note we '
    'gradually reduce the mining percent to the specified '
    'top_k_percent_pixels. For example, if hard_example_mining_step=100K and '
    'top_k_percent_pixels=0.25, then mining percent will gradually reduce from '
    '100% to 25% until 100K steps after which we only mine top 25% pixels.')
flags.DEFINE_float(
    'top_k_percent_pixels', 1.0,
    'The top k percent pixels (in terms of the loss values) used to compute '
    'loss during training. This is useful for hard pixel mining.')
# Quantization setting.
flags.DEFINE_integer(
    'quantize_delay_step', -1,
    'Steps to start quantized training. If < 0, will not quantize model.')
# Dataset settings.
flags.DEFINE_string('dataset', 'pascal_voc_seg',
                    'Name of the segmentation dataset.')
flags.DEFINE_string('train_split', 'train_aug',
                    'Which split of the dataset to be used for training')
flags.DEFINE_string('dataset_dir', None, 'Where the dataset reside.')
def main(unused_argv):
  tf.logging.set_verbosity(tf.logging.INFO)
  tf.gfile.MakeDirs(FLAGS.train_logdir)
  tf.logging.info('Training on %s set', FLAGS.train_split)
  graph = tf.Graph()
  with graph.as_default():
    with tf.device(tf.train.replica_device_setter(ps_tasks=FLAGS.num_ps_tasks)):
      assert FLAGS.train_batch_size % FLAGS.num_clones == 0, (
          'Training batch size not divisble by number of clones (GPUs).')
      clone_batch_size = FLAGS.train_batch_size // FLAGS.num_clones
      dataset = data_generator.Dataset(
          dataset_name=FLAGS.dataset,
          split_name=FLAGS.train_split,
          dataset_dir=FLAGS.dataset_dir,
          batch_size=clone_batch_size,
          crop_size=[int(sz) for sz in FLAGS.train_crop_size],
          min_resize_value=FLAGS.min_resize_value,
          max_resize_value=FLAGS.max_resize_value,
          resize_factor=FLAGS.resize_factor,
          min_scale_factor=FLAGS.min_scale_factor,
          max_scale_factor=FLAGS.max_scale_factor,
          scale_factor_step_size=FLAGS.scale_factor_step_size,
          model_variant=FLAGS.model_variant,
          num_readers=2,
          is_training=True,
          should_shuffle=True,
          should_repeat=True)
      iterator = dataset.get_one_shot_iterator()
      next_element = iterator.get_next()
      # Soft placement allows placing on CPU ops without GPU implementation.
      session_config = tf.ConfigProto(
          allow_soft_placement=True, log_device_placement=False)
      last_layers = model.get_extra_layer_scopes(
          FLAGS.last_layers_contain_logits_only)
      init_fn = None
      #FLAGS.tf_initial_checkpoint = '/home/DATA/liutian/tmp/tfdeeplab/deeplab/datasets/pascal_voc_seg/init_models/deeplabv3_pascal_train_aug/model.ckpt'
      if FLAGS.tf_initial_checkpoint:
        init_fn = train_utils.get_model_init_fn(
            FLAGS.train_logdir,
            FLAGS.tf_initial_checkpoint,
            FLAGS.initialize_last_layer,
            last_layers,
            ignore_missing_vars=True)
      stop_hook = tf.train.StopAtStepHook(
          last_step=FLAGS.training_number_of_steps)
      profile_dir = FLAGS.profile_logdir
      if profile_dir is not None:
        tf.gfile.MakeDirs(profile_dir)
      sess = tf.Session()
      next_element = sess.run(next_element)
      shape = next_element['shape']
      ori_image = np.array(next_element['image']).astype(np.uint8)
      print("shape is ", sess.run(shape))
if __name__ == '__main__':
  flags.mark_flag_as_required('train_logdir')
  flags.mark_flag_as_required('dataset_dir')
  tf.app.run()

3.preprocess处理你的数据。

原始的数据（image+标注）如果需要放缩，多尺度测试，以及padding，那么你的数据也需要的话，你就得改def _preprocess_image

该函数将新数据传入preprocess_image_and_label.

所以最主要的是改core里面的preprocess_utils

很多函数接受的只有image和label，你可以将你的数据输入，然后仿照image和label对你的数据进行操作。

比如：

image = tf.squeeze(tf.image.resize_bilinear(
      tf.expand_dims(image, 0),
      new_dim,
      align_corners=True), [0])
  if label is not None:
    label = tf.squeeze(tf.image.resize_nearest_neighbor(
        tf.expand_dims(label, 0),
        new_dim,
        align_corners=True), [0])
  new_sup = []
  if superpixels is not None:
      for superpixel in superpixels:
          superpixel=tf.squeeze(tf.image.resize_nearest_neighbor(
        tf.expand_dims(superpixel, 0),
        new_dim,
        align_corners=True), [0])
          #todo: list union
          new_sup.append(superpixel)
  else:
      new_sup.append(None)
  return image, label, new_sup