tensorflow 2.0 深度学习(第四部分 循环神经网络)

迈不过友情╰ 2023-07-02 05:23 251阅读 0赞

" class="reference-link">20191009191333910.png

日萌社

人工智能AI:Keras PyTorch MXNet TensorFlow PaddlePaddle 深度学习实战(不定时更新)


tensorflow 2.0 深度学习(第一部分 part1)

tensorflow 2.0 深度学习(第一部分 part2)

tensorflow 2.0 深度学习(第一部分 part3)

tensorflow 2.0 深度学习(第二部分 part1)

tensorflow 2.0 深度学习(第二部分 part2)

tensorflow 2.0 深度学习(第二部分 part3)

tensorflow 2.0 深度学习 (第三部分 卷积神经网络 part1)

tensorflow 2.0 深度学习 (第三部分 卷积神经网络 part2)

tensorflow 2.0 深度学习(第四部分 循环神经网络)

tensorflow 2.0 深度学习(第五部分 GAN生成神经网络 part1)

tensorflow 2.0 深度学习(第五部分 GAN生成神经网络 part2)

tensorflow 2.0 深度学习(第六部分 强化学习)


基础知识

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 1

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 2

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 3

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 4

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 5

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 6

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 7


梯度传播相关原理


" class="reference-link">梯度传播原理watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 8

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 9


" class="reference-link">梯度弥散、梯度爆炸watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 10

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 11

  1. >>> import tensorflow as tf
  2. >>> W = tf.ones([2,2]) # 任意创建某矩阵
  3. >>> W
  4. <tf.Tensor: id=2, shape=(2, 2), dtype=float32, numpy=
  5. array([[1., 1.],
  6. [1., 1.]], dtype=float32)>
  7. >>> tf.linalg.eigh(W)
  8. (<tf.Tensor: id=3, shape=(2,), dtype=float32, numpy=array([0., 2.], dtype=float32)>,
  9. <tf.Tensor: id=4, shape=(2, 2), dtype=float32, numpy=
  10. array([[-0.70710677, 0.70710677],
  11. [ 0.70710677, 0.70710677]], dtype=float32)>)
  12. >>> tf.linalg.eigh(W)[0] # 计算特征值:此时的W矩阵的最大特征值为2,那么下面演示的是最大特征值大于1时的矩阵相乘
  13. <tf.Tensor: id=5, shape=(2,), dtype=float32, numpy=array([0., 2.], dtype=float32)>
  14. >>> tf.linalg.eigh(W)[1]
  15. <tf.Tensor: id=8, shape=(2, 2), dtype=float32, numpy=
  16. array([[-0.70710677, 0.70710677],
  17. [ 0.70710677, 0.70710677]], dtype=float32)>
  18. >>> val = [W]
  19. >>> val
  20. [<tf.Tensor: id=2, shape=(2, 2), dtype=float32, numpy=
  21. array([[1., 1.],
  22. [1., 1.]], dtype=float32)>]
  23. >>> for i in range(10): # 矩阵相乘 n 次方
  24. ... val.append([val[-1]@W])
  25. ...
  26. >>> val
  27. [<tf.Tensor: id=2, shape=(2, 2), dtype=float32, numpy=
  28. array([[1., 1.],
  29. [1., 1.]], dtype=float32)>,
  30. [<tf.Tensor: id=9, shape=(2, 2), dtype=float32, numpy=
  31. array([[2., 2.],
  32. [2., 2.]], dtype=float32)>],
  33. [<tf.Tensor: id=11, shape=(1, 2, 2), dtype=float32, numpy=
  34. array([[[4., 4.],
  35. [4., 4.]]], dtype=float32)>],
  36. [<tf.Tensor: id=13, shape=(1, 1, 2, 2), dtype=float32, numpy=
  37. array([[[[8., 8.],
  38. [8., 8.]]]], dtype=float32)>],
  39. [<tf.Tensor: id=15, shape=(1, 1, 1, 2, 2), dtype=float32, numpy=
  40. array([[[[[16., 16.],
  41. [16., 16.]]]]], dtype=float32)>],
  42. [<tf.Tensor: id=17, shape=(1, 1, 1, 1, 2, 2), dtype=float32, numpy=
  43. array([[[[[[32., 32.],
  44. [32., 32.]]]]]], dtype=float32)>],
  45. [<tf.Tensor: id=19, shape=(1, 1, 1, 1, 1, 2, 2), dtype=float32, numpy=
  46. array([[[[[[[64., 64.],
  47. [64., 64.]]]]]]], dtype=float32)>],
  48. [<tf.Tensor: id=21, shape=(1, 1, 1, 1, 1, 1, 2, 2), dtype=float32, numpy=
  49. array([[[[[[[[128., 128.],
  50. [128., 128.]]]]]]]], dtype=float32)>],
  51. [<tf.Tensor: id=23, shape=(1, 1, 1, 1, 1, 1, 1, 2, 2), dtype=float32, numpy=
  52. array([[[[[[[[[256., 256.],
  53. [256., 256.]]]]]]]]], dtype=float32)>],
  54. [<tf.Tensor: id=25, shape=(1, 1, 1, 1, 1, 1, 1, 1, 2, 2), dtype=float32, numpy=
  55. array([[[[[[[[[[512., 512.],
  56. [512., 512.]]]]]]]]]], dtype=float32)>],
  57. [<tf.Tensor: id=27, shape=(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2), dtype=float32, numpy=
  58. array([[[[[[[[[[[1024., 1024.],
  59. [1024., 1024.]]]]]]]]]]], dtype=float32)>]]
  60. >>> l = list(map(lambda x:x,val)) #map遍历val中的每个array到list中作为list的每个元素
  61. >>> len(l)
  62. 11
  63. >>> norm = list(map(lambda x:tf.norm(x).numpy(),val)) #tf.norm(x)默认执行L2的tf.norm(x,ord=2)等同于tf.sqrt(tf.reduce_sum(tf.square(a)))
  64. >>> len(norm)
  65. 11
  66. >>> norm
  67. [2.0, 4.0, 8.0, 16.0, 32.0, 64.0, 128.0, 256.0, 512.0, 1024.0, 2048.0]
  68. >>> m = list(map(lambda x:tf.sqrt(tf.reduce_sum(tf.square(x))),val)) #tf.norm(x)默认执行L2的tf.norm(x,ord=2)等同于tf.sqrt(tf.reduce_sum(tf.square(a)))
  69. >>> m
  70. [<tf.Tensor: id=130, shape=(), dtype=float32, numpy=2.0>,
  71. <tf.Tensor: id=135, shape=(), dtype=float32, numpy=4.0>,
  72. <tf.Tensor: id=140, shape=(), dtype=float32, numpy=8.0>,
  73. <tf.Tensor: id=145, shape=(), dtype=float32, numpy=16.0>,
  74. <tf.Tensor: id=150, shape=(), dtype=float32, numpy=32.0>,
  75. <tf.Tensor: id=155, shape=(), dtype=float32, numpy=64.0>,
  76. <tf.Tensor: id=160, shape=(), dtype=float32, numpy=128.0>,
  77. <tf.Tensor: id=165, shape=(), dtype=float32, numpy=256.0>,
  78. <tf.Tensor: id=170, shape=(), dtype=float32, numpy=512.0>,
  79. <tf.Tensor: id=175, shape=(), dtype=float32, numpy=1024.0>,
  80. <tf.Tensor: id=180, shape=(), dtype=float32, numpy=2048.0>]
  81. >>> from matplotlib import pyplot as plt
  82. >>> plt.plot(range(1,12),norm)
  83. [<matplotlib.lines.Line2D object at 0x0000025773D525C8>]
  84. >>> plt.show()
  85. >>> eigenvalues = tf.linalg.eigh(W)
  86. >>> eigenvalues #此时的 W 矩阵最大特征值是 0.8
  87. (<tf.Tensor: id=14, shape=(2,), dtype=float32, numpy=array([0. , 0.8], dtype=float32)>,
  88. <tf.Tensor: id=15, shape=(2, 2), dtype=float32,
  89. numpy=array([[-0.70710677, 0.70710677],
  90. [ 0.70710677, 0.70710677]], dtype=float32)>)

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 12

  1. >>> import tensorflow as tf
  2. >>> from matplotlib import pyplot as plt
  3. >>> W = tf.ones([2,2])*0.4 # 任意创建某矩阵
  4. >>> W
  5. <tf.Tensor: id=185, shape=(2, 2), dtype=float32, numpy=
  6. array([[0.4, 0.4],
  7. [0.4, 0.4]], dtype=float32)>
  8. >>> tf.linalg.eigh(W)
  9. (<tf.Tensor: id=186, shape=(2,), dtype=float32, numpy=array([0. , 0.8], dtype=float32)>,
  10. <tf.Tensor: id=187, shape=(2, 2), dtype=float32, numpy=
  11. array([[-0.70710677, 0.70710677],
  12. [ 0.70710677, 0.70710677]], dtype=float32)>)
  13. >>> eigenvalues = tf.linalg.eigh(W)[0] # 计算特征值:此时的W矩阵的最大特征值为0.8,那么下面演示的是最大特征值小于1时的矩阵相乘
  14. >>> eigenvalues
  15. <tf.Tensor: id=188, shape=(2,), dtype=float32, numpy=array([0. , 0.8], dtype=float32)>
  16. >>> val = [W]
  17. >>> for i in range(10):
  18. ... val.append([val[-1]@W])
  19. ...
  20. >>> val
  21. [<tf.Tensor: id=185, shape=(2, 2), dtype=float32, numpy=
  22. array([[0.4, 0.4],
  23. [0.4, 0.4]], dtype=float32)>,
  24. [<tf.Tensor: id=190, shape=(2, 2), dtype=float32, numpy=
  25. array([[0.32000002, 0.32000002],
  26. [0.32000002, 0.32000002]], dtype=float32)>],
  27. [<tf.Tensor: id=192, shape=(1, 2, 2), dtype=float32, numpy=
  28. array([[[0.256, 0.256],
  29. [0.256, 0.256]]], dtype=float32)>],
  30. [<tf.Tensor: id=194, shape=(1, 1, 2, 2), dtype=float32, numpy=
  31. array([[[[0.20480001, 0.20480001],
  32. [0.20480001, 0.20480001]]]], dtype=float32)>],
  33. [<tf.Tensor: id=196, shape=(1, 1, 1, 2, 2), dtype=float32, numpy=
  34. array([[[[[0.16384001, 0.16384001],
  35. [0.16384001, 0.16384001]]]]], dtype=float32)>],
  36. [<tf.Tensor: id=198, shape=(1, 1, 1, 1, 2, 2), dtype=float32, numpy=
  37. array([[[[[[0.13107201, 0.13107201],
  38. [0.13107201, 0.13107201]]]]]], dtype=float32)>],
  39. [<tf.Tensor: id=200, shape=(1, 1, 1, 1, 1, 2, 2), dtype=float32, numpy=
  40. array([[[[[[[0.10485762, 0.10485762],
  41. [0.10485762, 0.10485762]]]]]]], dtype=float32)>],
  42. [<tf.Tensor: id=202, shape=(1, 1, 1, 1, 1, 1, 2, 2), dtype=float32, numpy=
  43. array([[[[[[[[0.08388609, 0.08388609],
  44. [0.08388609, 0.08388609]]]]]]]], dtype=float32)>],
  45. [<tf.Tensor: id=204, shape=(1, 1, 1, 1, 1, 1, 1, 2, 2), dtype=float32, numpy=
  46. array([[[[[[[[[0.06710888, 0.06710888],
  47. [0.06710888, 0.06710888]]]]]]]]], dtype=float32)>],
  48. [<tf.Tensor: id=206, shape=(1, 1, 1, 1, 1, 1, 1, 1, 2, 2), dtype=float32, numpy=
  49. array([[[[[[[[[[0.0536871, 0.0536871],
  50. [0.0536871, 0.0536871]]]]]]]]]], dtype=float32)>],
  51. [<tf.Tensor: id=208, shape=(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2), dtype=float32, numpy=
  52. array([[[[[[[[[[[0.04294968, 0.04294968],
  53. [0.04294968, 0.04294968]]]]]]]]]]], dtype=float32)>]]
  54. >>> norm = list(map(lambda x:tf.norm(x).numpy(),val))
  55. >>> norm
  56. [0.8, 0.64000005, 0.512, 0.40960002, 0.32768002, 0.26214403, 0.20971523, 0.16777219, 0.13421775, 0.107374206, 0.08589937]
  57. >>> plt.plot(range(1,12),norm)
  58. [<matplotlib.lines.Line2D object at 0x000002577AACD888>]
  59. >>> plt.show()

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 13


梯度裁剪(clip_by_value、clip_by_norm、clip_by_global_norm)

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 14

  1. >>> a=tf.random.uniform([2,2]) * 5
  2. >>> a
  3. <tf.Tensor: id=356, shape=(2, 2), dtype=float32, numpy=
  4. array([[1.5947735, 2.6630645],
  5. [3.2025905, 3.669492 ]], dtype=float32)>
  6. >>> tf.norm(a)
  7. <tf.Tensor: id=361, shape=(), dtype=float32, numpy=5.7755494>
  8. >>> # 按范数方式裁剪
  9. ... b = tf.clip_by_norm(a, 5) #tf.clip_by_norm(梯度张量W, max) 等同于 梯度张量W/tf.norm(梯度张量W)*max
  10. >>> b
  11. <tf.Tensor: id=378, shape=(2, 2), dtype=float32, numpy=
  12. array([[1.3806249, 2.3054643],
  13. [2.772542 , 3.176747 ]], dtype=float32)>
  14. >>> tf.norm(b)
  15. <tf.Tensor: id=383, shape=(), dtype=float32, numpy=5.0>
  16. >>> a1 = a/tf.norm(a)*5 #梯度张量W/tf.norm(梯度张量W)*max 等同于 tf.clip_by_norm(梯度张量W, max)
  17. >>> tf.norm(a1)
  18. <tf.Tensor: id=409, shape=(), dtype=float32, numpy=5.0000005>

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 15

  1. >>> import tensorflow as tf
  2. >>> w1=tf.random.normal([3,3]) # 创建梯度张量 1
  3. >>> w2=tf.random.normal([3,3]) # 创建梯度张量 2
  4. >>> w1
  5. <tf.Tensor: id=480, shape=(3, 3), dtype=float32, numpy=
  6. array([[-0.6576707 , -0.90442616, 0.34582302],
  7. [-1.7328235 , -2.1596272 , 0.4980505 ],
  8. [-0.05848425, 0.3528551 , 0.11080291]], dtype=float32)>
  9. >>> w2
  10. <tf.Tensor: id=486, shape=(3, 3), dtype=float32, numpy=
  11. array([[ 0.08453857, -0.2454122 , -0.67583424],
  12. [ 0.32974792, -1.3895415 , -0.3052706 ],
  13. [-0.37487552, -0.9419025 , -0.86943924]], dtype=float32)>
  14. >>> # 计算“还没实现等比例缩放的”所有参数的梯度W的范数global_norm
  15. ... global_norm = tf.math.sqrt(tf.norm(w1)**2+tf.norm(w2)**2)
  16. >>> global_norm
  17. <tf.Tensor: id=502, shape=(), dtype=float32, numpy=3.7236474>
  18. >>> # 根据 max_norm=2 裁剪,返回“对所有参数的梯度W的范数实现了等比例缩放的新的”梯度W 和 原有的“还没实现等比例缩放的”所有参数的梯度W的范数global_norm
  19. ... (ww1,ww2),global_norm = tf.clip_by_global_norm([w1,w2],2)
  20. >>> ww1
  21. <tf.Tensor: id=522, shape=(3, 3), dtype=float32, numpy=
  22. array([[-0.35324 , -0.4857743 , 0.18574423],
  23. [-0.93071294, -1.1599525 , 0.2675068 ],
  24. [-0.03141234, 0.18952121, 0.0595131 ]], dtype=float32)>
  25. >>> ww2
  26. <tf.Tensor: id=523, shape=(3, 3), dtype=float32, numpy=
  27. array([[ 0.04540632, -0.1318128 , -0.3629958 ],
  28. [ 0.17711017, -0.74633354, -0.16396321],
  29. [-0.20134856, -0.5059032 , -0.46698257]], dtype=float32)>
  30. >>> global_norm
  31. <tf.Tensor: id=510, shape=(), dtype=float32, numpy=3.7236474>
  32. >>> # 计算"实现了等比例缩放(裁剪后)的"所有参数的梯度W的范数global_norm
  33. ... global_norm2 = tf.math.sqrt(tf.norm(ww1)**2+tf.norm(ww2)**2)
  34. >>> print(global_norm, global_norm2) #tf.Tensor(4.073522, shape=(), dtype=float32) tf.Tensor(2.0000002, shape=(), dtype=float32)
  35. >>> # 计算“还没实现等比例缩放的”所有参数的梯度W的范数global_norm
  36. ... global_norm = tf.math.sqrt(tf.norm(w1)**2+tf.norm(w2)**2)
  37. >>> global_norm
  38. <tf.Tensor: id=502, shape=(), dtype=float32, numpy=3.7236474>
  39. >>> max_norm = 2
  40. #tf.clip_by_global_norm([w1,w2],max_norm) 等同于 w1 * max_norm / max(global_norm, max_norm) 和 w2 * max_norm / max(global_norm, max_norm) 的效果
  41. >>> ww1 = w1 * max_norm / max(global_norm, max_norm)
  42. >>> ww2 = w2 * max_norm / max(global_norm, max_norm)
  43. >>> ww1
  44. <tf.Tensor: id=586, shape=(3, 3), dtype=float32, numpy=
  45. array([[-0.35324004, -0.48577434, 0.18574424],
  46. [-0.930713 , -1.1599526 , 0.2675068 ],
  47. [-0.03141235, 0.18952121, 0.05951311]], dtype=float32)>
  48. >>> ww2
  49. <tf.Tensor: id=591, shape=(3, 3), dtype=float32, numpy=
  50. array([[ 0.04540633, -0.13181281, -0.36299583],
  51. [ 0.17711018, -0.74633354, -0.16396323],
  52. [-0.20134856, -0.5059032 , -0.4669826 ]], dtype=float32)>
  53. >>> global_norm2 = tf.math.sqrt(tf.norm(ww1)**2+tf.norm(ww2)**2)
  54. >>> global_norm2
  55. <tf.Tensor: id=607, shape=(), dtype=float32, numpy=1.9999998>
  56. >>> import tensorflow as tf
  57. >>> w1=tf.random.normal([3,3])
  58. >>> w2=tf.random.normal([3,3])
  59. >>> w1
  60. <tf.Tensor: id=5, shape=(3, 3), dtype=float32, numpy=
  61. array([[-0.3745235 , -0.54776704, -0.6978908 ],
  62. [-0.48667282, -1.9662677 , 1.2693951 ],
  63. [-1.6218463 , -1.3147658 , 1.1985897 ]], dtype=float32)>
  64. #tf.norm(a)默认为执行L2范数的tf.norm(a. ord=2),等同于tf.sqrt(tf.reduce_sum(tf.square(a)))
  65. #计算裁剪前的网络参数θ的总范数global_norm:所有裁剪前的网络参数θ的L2范数tf.norm(θ)的平方和,然后开方sqrt
  66. >>> global_norm=tf.math.sqrt(tf.norm(w1)**2+tf.norm(w2)**2)
  67. >>> global_norm #4.7265425
  68. <tf.Tensor: id=27, shape=(), dtype=float32, numpy=4.7265425>
  69. #通过tf.clip_by_global_norm([θ],MAX_norm)裁剪后,网络参数的梯度组的总范数缩减到MAX_norm=2
  70. #clip_by_global_norm返回裁剪后的List[参数θ] 和 裁剪前的梯度总范数和global_norm 的2个对象
  71. >>> (ww1,ww2),global_norm=tf.clip_by_global_norm([w1,w2],2) #梯度裁剪一般在计算出梯度后、梯度更新之前进行
  72. >>> ww1
  73. <tf.Tensor: id=47, shape=(3, 3), dtype=float32, numpy=
  74. array([[-0.15847673, -0.2317834 , -0.29530713],
  75. [-0.20593184, -0.832011 , 0.53713477],
  76. [-0.6862717 , -0.556333 , 0.50717396]], dtype=float32)>
  77. >>> ww2
  78. <tf.Tensor: id=48, shape=(3, 3), dtype=float32, numpy=
  79. array([[ 0.03117203, -0.7264457 , 0.32293826],
  80. [ 0.5894358 , 0.87403387, 0.04680141],
  81. [ 0.0015509 , 0.15240058, 0.05759645]], dtype=float32)>
  82. >>> global_norm
  83. <tf.Tensor: id=35, shape=(), dtype=float32, numpy=4.7265425>
  84. #计算裁剪后的网络参数θ的总范数global_norm:所有裁剪后的网络参数θ的L2范数tf.norm(θ)的平方和,然后开方sqrt
  85. >>> global_norm2 = tf.math.sqrt(tf.norm(ww1)**2+tf.norm(ww2)**2)
  86. >>> global_norm2
  87. <tf.Tensor: id=64, shape=(), dtype=float32, numpy=1.9999998>
  88. >>> print(global_norm, global_norm2)
  89. tf.Tensor(4.7265425, shape=(), dtype=float32) tf.Tensor(1.9999998, shape=(), dtype=float32)

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 16

  1. import tensorflow as tf
  2. from tensorflow import keras
  3. from tensorflow.keras import datasets, layers, optimizers
  4. import os
  5. os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
  6. print(tf.__version__)
  7. (x, y), _ = datasets.mnist.load_data()
  8. x = tf.convert_to_tensor(x, dtype=tf.float32) / 50. #标准化/归一化
  9. y = tf.convert_to_tensor(y)
  10. y = tf.one_hot(y, depth=10) #真实标签one-hot化
  11. print('x:', x.shape, 'y:', y.shape)
  12. #构建批量大小和epoch训练次数
  13. train_db = tf.data.Dataset.from_tensor_slices((x,y)).batch(128).repeat(30)
  14. x,y = next(iter(train_db)) #获取生成器并调用next遍历第一个批量大小的数据
  15. print('sample:', x.shape, y.shape)
  16. # print(x[0], y[0])
  17. def main():
  18. # 784 => 512 第一层权重[输入神经元数, 输出神经元数],偏置[输出神经元数]
  19. w1, b1 = tf.Variable(tf.random.truncated_normal([784, 512], stddev=0.1)), tf.Variable(tf.zeros([512]))
  20. # 512 => 256 第二层权重[输入神经元数, 输出神经元数],偏置[输出神经元数]
  21. w2, b2 = tf.Variable(tf.random.truncated_normal([512, 256], stddev=0.1)), tf.Variable(tf.zeros([256]))
  22. # 256 => 10 第三层权重[输入神经元数, 输出神经元数],偏置[输出神经元数]
  23. w3, b3 = tf.Variable(tf.random.truncated_normal([256, 10], stddev=0.1)), tf.Variable(tf.zeros([10]))
  24. optimizer = optimizers.SGD(lr=0.01) #SGD随机梯度下降优化算法
  25. #每次遍历训练集生成器的一个批量大小数据
  26. for step, (x,y) in enumerate(train_db):
  27. # [b, 28, 28] => [b, 784] 展平
  28. x = tf.reshape(x, (-1, 784))
  29. #构建梯度记录环境
  30. with tf.GradientTape() as tape:
  31. # layer1.
  32. h1 = x @ w1 + b1
  33. h1 = tf.nn.relu(h1)
  34. # layer2
  35. h2 = h1 @ w2 + b2
  36. h2 = tf.nn.relu(h2)
  37. # output
  38. out = h2 @ w3 + b3
  39. # out = tf.nn.relu(out)
  40. # compute loss
  41. # [b, 10] - [b, 10] 均方差mse = mean(sum(y-out)^2) 预测值与真实值之差的平方的平均值
  42. loss = tf.square(y-out)
  43. # [b, 10] => [b] 计算每个样本的平均误差
  44. loss = tf.reduce_mean(loss, axis=1)
  45. # [b] => scalar 总误差除以样本数
  46. loss = tf.reduce_mean(loss)
  47. #1.求导,tape.gradient(y,[参数θ])求参数θ相对于y的梯度信息
  48. # dy_dw = tape.gradient(y, [w])
  49. #2.通过tape.gradient(loss,[参数θ])函数求得网络参数θ的梯度信息
  50. # grads = tape.gradient(loss, [w1, b1, w2, b2, w3, b3])
  51. # compute gradients。根据loss 求w1, w2, w3, b1, b2, b3的梯度值 用于后面继续更新对应的模型参数θ。
  52. grads = tape.gradient(loss, [w1, b1, w2, b2, w3, b3])
  53. # print('==before==')
  54. # for g in grads: #计算所有裁剪前的网络参数θ的梯度值的L2范数tf.norm(a),等同于tf.norm(a. ord=2)、tf.sqrt(tf.reduce_sum(tf.square(a)))
  55. # print(tf.norm(g)) #tf.norm(a)默认为执行L2范数tf.norm(a. ord=2),等同于tf.sqrt(tf.reduce_sum(tf.square(a)))
  56. #通过tf.clip_by_global_norm([θ],MAX_norm)裁剪后,网络参数的梯度组的总范数缩减到MAX_norm=15
  57. #clip_by_global_norm返回裁剪后的List[参数θ] 和 裁剪前的梯度总范数和global_norm 的2个对象
  58. grads, _ = tf.clip_by_global_norm(grads, 15) #梯度裁剪一般在计算出梯度后、梯度更新之前进行
  59. # print('==after==')
  60. # for g in grads: #计算所有裁剪后的网络参数θ的梯度值的L2范数tf.norm(a),等同于tf.norm(a. ord=2)、tf.sqrt(tf.reduce_sum(tf.square(a)))
  61. # print(tf.norm(g)) #tf.norm(a)默认为执行L2的tf.norm(a. ord=2),等同于tf.sqrt(tf.reduce_sum(tf.square(a)))
  62. #优化器规则,根据 模型参数θ = θ - lr * grad 更新网络参数
  63. # update w' = w - lr*grad
  64. optimizer.apply_gradients(zip(grads, [w1, b1, w2, b2, w3, b3]))
  65. if step % 100 == 0:
  66. print(step, 'loss:', float(loss))
  67. if __name__ == '__main__':
  68. main()
  69. W = tf.ones([2,2]) #任意创建某矩阵
  70. eigenvalues = tf.linalg.eigh(W)[0] # 计算特征值
  71. eigenvalues
  72. val = [W]
  73. # 矩阵相乘n次方
  74. for i in range(10):
  75. val.append([val[-1]@W])
  76. # 计算L2范数
  77. norm = list(map(lambda x:tf.norm(x).numpy(),val))
  78. plt.plot(range(1,12),norm)
  79. plt.xlabel('n times')
  80. plt.ylabel('L2-norm')
  81. plt.savefig('w_n_times_1.svg')
  82. W = tf.ones([2,2])*0.4 # 任意创建某矩阵
  83. eigenvalues = tf.linalg.eigh(W)[0] # 计算特征值
  84. print(eigenvalues)
  85. val = [W]
  86. for i in range(10):
  87. val.append([val[-1]@W])
  88. norm = list(map(lambda x:tf.norm(x).numpy(),val))
  89. plt.plot(range(1,12),norm)
  90. plt.xlabel('n times')
  91. plt.ylabel('L2-norm')
  92. plt.savefig('w_n_times_0.svg')
  93. a=tf.random.uniform([2,2])
  94. tf.clip_by_value(a,0.4,0.6) # 梯度值裁剪
  95. a=tf.random.uniform([2,2]) * 5
  96. # 按范数方式裁剪
  97. b = tf.clip_by_norm(a, 5)
  98. tf.norm(a),tf.norm(b)
  99. w1=tf.random.normal([3,3]) # 创建梯度张量1
  100. w2=tf.random.normal([3,3]) # 创建梯度张量2
  101. # 计算global norm
  102. global_norm=tf.math.sqrt(tf.norm(w1)**2+tf.norm(w2)**2)
  103. # 根据global norm和max norm=2裁剪
  104. (ww1,ww2),global_norm=tf.clip_by_global_norm([w1,w2],2)
  105. # 计算裁剪后的张量组的global norm
  106. global_norm2 = tf.math.sqrt(tf.norm(ww1)**2+tf.norm(ww2)**2)
  107. print(global_norm, global_norm2)
  108. with tf.GradientTape() as tape:
  109. logits = model(x) # 前向传播
  110. loss = criteon(y, logits) # 误差计算
  111. # 计算梯度值
  112. grads = tape.gradient(loss, model.trainable_variables)
  113. grads, _ = tf.clip_by_global_norm(grads, 25) # 全局梯度裁剪
  114. # 利用裁剪后的梯度张量更新参数
  115. optimizer.apply_gradients(zip(grads, model.trainable_variables))

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 17

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 18

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 19


各种循环神经网络


" class="reference-link">SimpleRNNCellwatermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 20

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 21

  1. import tensorflow as tf
  2. from tensorflow import keras
  3. from tensorflow.keras import datasets, layers, optimizers
  4. #状态向量特征长度h=3
  5. cell = layers.SimpleRNNCell(3)
  6. #不需要指定序列长度s,默认为None即可。时间戳t的输入特征长度f/词向量特征长度f=4
  7. cell.build(input_shape=(None,4))
  8. #SimpleRNNCell内部维护了3个张量,kernel为Wxh(时间戳t的输入特征长度f, 状态向量特征长度h),shape为(4, 3),
  9. #recurrent_kernel为Whh(状态向量特征长度h, 状态向量特征长度h),shape为(3, 3),bias为偏置b(状态向量特征长度h,),shape为(3,)
  10. #前向计算:ht = Wxh * Xt + Whh * ht-1 + b = (4, 3) * Xt + (3, 3) * ht-1 + (3,)
  11. cell.trainable_variables
  12. #[<tf.Variable 'kernel:0' shape=(4, 3) dtype=float32, numpy=
  13. # array([[ 0.08301067, -0.46076387, 0.15347064],
  14. # [-0.1890313 , 0.60186946, -0.85642815],
  15. # [ 0.5622045 , -0.07674742, -0.80009407],
  16. # [ 0.5321919 , -0.44575736, -0.3680237 ]], dtype=float32)>,
  17. # <tf.Variable 'recurrent_kernel:0' shape=(3, 3) dtype=float32, numpy=
  18. # array([[ 0.9972923 , 0.07062273, -0.02049942],
  19. # [-0.06413236, 0.69885963, -0.7123779 ],
  20. # [ 0.03598386, -0.71176374, -0.7014966 ]], dtype=float32)>,
  21. # <tf.Variable 'bias:0' shape=(3,) dtype=float32, numpy=
  22. # array([0., 0., 0.], dtype=float32)>]
  23. #初始化状态向量h0(批量大小b,状态向量特征长度h),shape=(4, 64)
  24. h0 = [tf.zeros([4, 64])]
  25. #输入序列数据(b,s,f),即序列数量b=4,序列长度s=80,时间戳t的输入特征长度f/词向量特征长度f=100,shape=(4, 80, 100)
  26. x = tf.random.normal([4, 80, 100])
  27. #按 x[:,i,:] 或 for xt in tf.unstack(x, axis=1) 方式可以从[序列数量b, 序列长度s, 词向量特征长度f]中获取 s个[序列数量b, 词向量特征长度f],
  28. #即一个序列中有N个单词数,那么就遍历N次,每次遍历同时获取批量多个序列中的同一个时间戳上的单词,当遍历次数等于序列长度s时才算完成一次网络层的前向计算
  29. #即每次需要把(批量大小b, 词向量特征长度f)输入到SimpleRNNCell中,序列数量b=4,输入特征f=100,shape=(4, 100)
  30. #即每次输入数据到SimpleRNNCell中时,输入所有序列数据中同一个时间戳t的数据,对于长度为t+1的训练来说,需要遍历等同于序列长度s的次数才算完成一次网络层的前向计算
  31. xt = x[:,0,:] #取第一个时间戳的输入x0(4, 100) 即xt为(批量大小b, 词向量特征长度f)
  32. # 构建输入特征 f=100,序列长度 s=80,状态向量长度h=64 的Cell
  33. cell = layers.SimpleRNNCell(64)
  34. #SimpleRNNCell内部维护了3个张量,kernel为Wxh(时间戳t的输入特征长度f, 状态向量特征长度h),shape为(100, 64),
  35. #recurrent_kernel为Whh(状态向量特征长度h, 状态向量特征长度h),shape为(64, 64),bias为偏置b(状态向量特征长度h,),shape为(64,)
  36. #前向计算,输入批量序列中同一个时间戳t的字符数据xt和状态向量h0:ht = Wxh * Xt + Whh * ht-1 + b = (100, 64) * (4, 100) + (64, 64) * (4, 64) + (64,)
  37. #对于长度为t+1的训练来说,需要遍历等同于序列长度s的次数才算完成一次网络层的前向计算
  38. out, h1 = cell(xt, h0)
  39. #输出ot和状态向量ht的shape都为(批量大小b, 状态向量特征长度h),即(4, 64)。
  40. #并且两者id一致,代表两者都指向为同一个内存空间的变量值,即状态向量ht直接作为输出ot。
  41. print(out.shape, h1[0].shape) #(4, 64) (4, 64)
  42. print(id(out), id(h1[0])) #1863500704760 1863500704760
  43. h = h0
  44. # 在序列长度的维度解开输入,得到 xt:[批量大小b, 时间戳t的输入特征长度f]
  45. # 即一个序列中有N个单词数,那么就遍历N次,每次遍历同时获取批量多个序列中的同一个时间戳上的单词,当遍历次数等于序列长度s时才算完成一次网络层的前向计算
  46. # 对于长度为t+1的训练来说,需要遍历等同于序列长度s的次数才算完成一次网络层的前向计算
  47. for xt in tf.unstack(x, axis=1):
  48. # 前向计算,输入批量序列中同一个时间戳t的字符数据xt和状态向量h:ht = Wxh * Xt + Whh * ht-1 + b
  49. out, h = cell(xt, h)
  50. # 最终输出可以聚合每个时间戳上的输出,也可以只取最后时间戳的输出
  51. # 最终输出可以把每个时间戳t上的输出ot进行聚合再输出,或者只输出最后一个时间戳t的输出ot
  52. out = out

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 22

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 23


多层SimpleRNNCell网络

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 24

  1. import tensorflow as tf
  2. from tensorflow import keras
  3. from tensorflow.keras import datasets, layers, optimizers
  4. x = tf.random.normal([4,80,100])
  5. # 构建 2 个 Cell,先 cell0,后 cell1
  6. cell0 = layers.SimpleRNNCell(64)
  7. cell1 = layers.SimpleRNNCell(64)
  8. h0 = [tf.zeros([4,64])] # cell0 的初始状态向量
  9. h1 = [tf.zeros([4,64])] # cell1 的初始状态向量
  10. for xt in tf.unstack(x, axis=1):
  11. print("xt:",xt.shape) #(4, 100) [序列数量b, 词向量特征长度f]
  12. #第一层RNN 每次接收的是 [序列数量b, 词向量特征长度f]的序列数据,也即是批量序列数据中同一个时间戳的单词的词向量特征长度f
  13. #第二层RNN 每次接收的是 第一层RNN输出的 [序列数量b, 状态向量特征长度h],也即是每个时间戳上的网络输出Ot,而网络输出Ot可以是直接等于状态向量ht,
  14. #或者也可以是对状态向量ht做一个线性变换Ot=Who*ht得到每个时间戳上的网络输出Ot。
  15. # xtw 作为输入,输出为 out0
  16. out0, h0 = cell0(xt, h0)
  17. print("out0:",out0.shape, h0[0].shape) #(4, 64) (4, 64) [序列数量b, 状态向量特征长度h]
  18. # 上一个 cell 的输出 out0 作为本 cell 的输入
  19. out1, h1 = cell1(out0, h1)
  20. print("out1:",out1.shape, h1[0].shape) #(4, 64) (4, 64) [序列数量b, 状态向量特征长度h]

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 25

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 26

  1. import os
  2. os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
  3. import tensorflow as tf
  4. import numpy as np
  5. from tensorflow import keras
  6. from tensorflow.keras import layers
  7. tf.random.set_seed(22)
  8. np.random.seed(22)
  9. assert tf.__version__.startswith('2.')
  10. batchsz = 128
  11. total_words = 10000
  12. max_review_len = 80
  13. embedding_len = 100
  14. (x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data(num_words=total_words)
  15. # x_train:[b, 80]
  16. # x_test: [b, 80]
  17. x_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=max_review_len)
  18. x_test = keras.preprocessing.sequence.pad_sequences(x_test, maxlen=max_review_len)
  19. db_train = tf.data.Dataset.from_tensor_slices((x_train, y_train))
  20. db_train = db_train.shuffle(1000).batch(batchsz, drop_remainder=True)
  21. db_test = tf.data.Dataset.from_tensor_slices((x_test, y_test))
  22. db_test = db_test.batch(batchsz, drop_remainder=True)
  23. print('x_train shape:', x_train.shape, tf.reduce_max(y_train), tf.reduce_min(y_train))
  24. print('x_test shape:', x_test.shape)
  25. class MyRNN(keras.Model):
  26. def __init__(self, units):
  27. super(MyRNN, self).__init__()
  28. # [b, 64]
  29. self.state0 = [tf.zeros([batchsz, units])]
  30. self.state1 = [tf.zeros([batchsz, units])]
  31. # 将文本转换为嵌入表示
  32. # [b, 80] => [b, 80, 100]
  33. self.embedding = layers.Embedding(total_words, embedding_len, input_length=max_review_len)
  34. # [b, 80, 100] , h_dim: 64
  35. # RNN: cell1 ,cell2, cell3
  36. # SimpleRNN
  37. self.rnn_cell0 = layers.SimpleRNNCell(units, dropout=0.5)
  38. self.rnn_cell1 = layers.SimpleRNNCell(units, dropout=0.5)
  39. # fc, [b, 80, 100] => [b, 64] => [b, 1]
  40. self.outlayer = layers.Dense(1)
  41. def call(self, inputs, training=None):
  42. """
  43. net(x) net(x, training=True) :train mode
  44. net(x, training=False): test
  45. :param inputs: [b, 80]
  46. :param training:
  47. :return:
  48. """
  49. # [b, 80]
  50. x = inputs
  51. # embedding: [b, 80] => [b, 80, 100]
  52. x = self.embedding(x)
  53. # rnn cell compute
  54. # [b, 80, 100] => [b, 64]
  55. state0 = self.state0
  56. state1 = self.state1
  57. for word in tf.unstack(x, axis=1): # word: [b, 100]
  58. # h1 = x*wxh+h0*whh
  59. # out0: [b, 64]
  60. out0, state0 = self.rnn_cell0(word, state0, training)
  61. # out1: [b, 64]
  62. out1, state1 = self.rnn_cell1(out0, state1, training)
  63. # out: [b, 64] => [b, 1]
  64. x = self.outlayer(out1)
  65. # p(y is pos|x)
  66. prob = tf.sigmoid(x)
  67. return prob
  68. def main():
  69. units = 64
  70. epochs = 4
  71. model = MyRNN(units)
  72. model.compile(optimizer = keras.optimizers.Adam(0.001),
  73. loss = tf.losses.BinaryCrossentropy(),
  74. metrics=['accuracy'], experimental_run_tf_function=False)
  75. model.fit(db_train, epochs=epochs, validation_data=db_test)
  76. model.evaluate(db_test)
  77. if __name__ == '__main__':
  78. main()
  79. import os
  80. os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
  81. import tensorflow as tf
  82. import numpy as np
  83. from tensorflow import keras
  84. from tensorflow.keras import layers
  85. tf.random.set_seed(22)
  86. np.random.seed(22)
  87. assert tf.__version__.startswith('2.')
  88. batchsz = 128
  89. total_words = 10000
  90. max_review_len = 80
  91. embedding_len = 100
  92. (x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data(num_words=total_words)
  93. # x_train:[b, 80]
  94. # x_test: [b, 80]
  95. x_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=max_review_len)
  96. x_test = keras.preprocessing.sequence.pad_sequences(x_test, maxlen=max_review_len)
  97. db_train = tf.data.Dataset.from_tensor_slices((x_train, y_train))
  98. db_train = db_train.shuffle(1000).batch(batchsz, drop_remainder=True)
  99. db_test = tf.data.Dataset.from_tensor_slices((x_test, y_test))
  100. db_test = db_test.batch(batchsz, drop_remainder=True)
  101. print('x_train shape:', x_train.shape, tf.reduce_max(y_train), tf.reduce_min(y_train))
  102. print('x_test shape:', x_test.shape)
  103. class MyRNN(keras.Model):
  104. def __init__(self, units):
  105. super(MyRNN, self).__init__()
  106. # 将文本转换为嵌入表示
  107. # [b, 80] => [b, 80, 100]
  108. self.embedding = layers.Embedding(total_words, embedding_len, input_length=max_review_len)
  109. # [b, 80, 100] , h_dim: 64
  110. self.rnn = keras.Sequential([
  111. layers.SimpleRNN(units, dropout=0.5, return_sequences=True, unroll=True),
  112. layers.SimpleRNN(units, dropout=0.5, unroll=True)
  113. ])
  114. # fc, [b, 80, 100] => [b, 64] => [b, 1]
  115. self.outlayer = layers.Dense(1)
  116. def call(self, inputs, training=None):
  117. """
  118. net(x) net(x, training=True) :train mode
  119. net(x, training=False): test
  120. :param inputs: [b, 80]
  121. :param training:
  122. :return:
  123. """
  124. # [b, 80]
  125. x = inputs
  126. # embedding: [b, 80] => [b, 80, 100]
  127. x = self.embedding(x)
  128. # rnn cell compute
  129. # x: [b, 80, 100] => [b, 64]
  130. x = self.rnn(x,training=training)
  131. # out: [b, 64] => [b, 1]
  132. x = self.outlayer(x)
  133. # p(y is pos|x)
  134. prob = tf.sigmoid(x)
  135. return prob
  136. def main():
  137. units = 64
  138. epochs = 4
  139. model = MyRNN(units)
  140. # model.build(input_shape=(4,80))
  141. # model.summary()
  142. model.compile(optimizer = keras.optimizers.Adam(0.001),
  143. loss = tf.losses.BinaryCrossentropy(),
  144. metrics=['accuracy'])
  145. model.fit(db_train, epochs=epochs, validation_data=db_test)
  146. model.evaluate(db_test)
  147. if __name__ == '__main__':
  148. main()

SimpleRNN

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 27


情感分类任务的网络结构

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 28

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 29

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 30

  1. #初始化状态向量h0(批量大小b,状态向量特征长度h/units),此处即[b, 64]
  2. [tf.zeros([batchsz, units])]
  3. # 词向量编码 [b, 80] => [b, 80, 100]
  4. layers.Embedding(total_words, embedding_len, input_length=max_review_len)
  5. total_words = 10000:词汇表大小
  6. input_length = max_review_len = 80 # 句子最大长度s,大于的句子部分将截断,小于的将填充
  7. embedding_len = 100 # 词向量特征长度f,即时间戳t的输入特征长度f
  8. #状态向量特征长度h/units=64
  9. #xt为(批量大小b, 词向量特征长度f)。SimpleRNNCell内部维护了3个张量,kernel为Wxh(词向量特征长度f, 状态向量特征长度h),shape为(100, 64),
  10. #recurrent_kernel为Whh(状态向量特征长度h, 状态向量特征长度h),shape为(64, 64),bias为偏置b(状态向量特征长度h,),shape为(64,)
  11. #前向计算,输入批量序列中同一个时间戳t的字符数据xt和状态向量h0:ht = Wxh * Xt + Whh * ht-1 + b = (100, 64)*(b, 100)+(64, 64)*[b, 64]+(64,)=>[b, 64]
  12. layers.SimpleRNNCell(units, dropout=0.5)
  13. #构建分类网络,用于将 CELL的输出特征进行分类,2分类
  14. # [b, 80, 100] => [b, 64] => [b, 1]
  15. self.outlayer = layers.Dense(1)

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 31

  1. 1.按照 x[:,i,:] for xt in tf.unstack(x, axis=1) 方式对[序列数量b, 序列长度s, 词向量特征长度f]数据进行遍历时,
  2. 一共可以遍历出 s个[序列数量b, 词向量特征长度f],即一个序列中有N个单词数,那么就遍历N次,每次遍历同时获取批量多个序列中的同一个时间戳上的单词,
  3. 当遍历次数等于序列长度s时才算完成一次网络层的前向计算。
  4. 2.第一层RNN 每次接收的是 [序列数量b, 词向量特征长度f]的序列数据,也即是批量序列数据中同一个时间戳的单词的词向量特征长度f
  5. 第二层RNN 每次接收的是 第一层RNN输出的 [序列数量b, 状态向量特征长度h],也即是每个时间戳上的网络输出Ot,而网络输出Ot可以是直接等于状态向量ht
  6. 或者也可以是对状态向量ht做一个线性变换Ot=Who*ht得到每个时间戳上的网络输出Ot

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 32


LSTM

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 33

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 34

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 35

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 36

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 37

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 38

  1. import tensorflow as tf
  2. from tensorflow import keras
  3. from tensorflow.keras import datasets, layers, optimizers
  4. #[序列数量b, 序列长度s, 词向量特征长度f]
  5. x = tf.random.normal([2,80,100])
  6. # 得到一个时间戳的输入,从[序列数量b, 序列长度s, 词向量特征长度f]中获取 s个[序列数量b, 词向量特征长度f]
  7. xt = x[:,0,:]
  8. xt.shape #TensorShape([2, 100])
  9. # 创建 Cell
  10. cell = layers.LSTMCell(64)
  11. # 初始化状态和输出 List[h,c],状态向量h和c的形状均都是[批量大小b, 状态向量特征长度h]
  12. state = [tf.zeros([2,64]),tf.zeros([2,64])]
  13. # 前向计算
  14. out, state = cell(xt, state) #state为List[h,c],out和List的第一个元素h都是相同的
  15. id(out),id(state[0]),id(state[1])
  16. #遍历批量序列中的每一个时间戳,遍历次数和时间戳数量一致,每次遍历的时间戳的形状为[序列数量b, 词向量特征长度f]
  17. for xt in tf.unstack(x, axis=1):
  18. out, state = cell(xt, state)
  19. layer = layers.LSTM(64)
  20. out = layer(x) #[批量大小b,状态向量特征长度h]
  21. out.shape #TensorShape([2, 64])
  22. layer = layers.LSTM(64, return_sequences=True)
  23. out = layer(x) #[批量大小b,序列长度s,状态向量特征长度h]
  24. out.shape #TensorShape([2, 80, 64])

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 39

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 40

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 41

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 42

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 43

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 44

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 45


" class="reference-link">GRUwatermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 46

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 47

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 48

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 49


LSTM/GRU情感分类问题

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 50

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 51

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 52

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 53

  1. import tensorflow as tf
  2. from tensorflow import keras
  3. from tensorflow.keras import layers
  4. import matplotlib.pyplot as plt
  5. x = tf.range(10)
  6. x = tf.random.shuffle(x)
  7. # 创建共10个单词,每个单词用长度为4的向量表示的层
  8. net = layers.Embedding(10, 4) #[词汇表大小, 词向量特征长度f]
  9. out = net(x)
  10. out
  11. out.shape #TensorShape([10, 4]) 即[词汇表大小, 词向量特征长度f]
  12. net.embeddings
  13. net.embeddings.shape #TensorShape([10, 4]) 即[词汇表大小, 词向量特征长度f]
  14. net.embeddings.trainable #True
  15. net.trainable = False
  16. # 从预训练模型中加载词向量表
  17. embed_glove = load_embed('glove.6B.50d.txt')
  18. # 直接利用预训练的词向量表初始化Embedding层
  19. net.set_weights([embed_glove])
  20. cell = layers.SimpleRNNCell(3) #状态向量特征长度h
  21. cell.build(input_shape=(None,4)) #input_shape=(序列数量,序列长度)
  22. #SimpleRNNCell内部维护了3个张量,kernel为Wxh(词向量特征长度f, 状态向量特征长度h),shape为(4, 3),
  23. #recurrent_kernel为Whh(状态向量特征长度h, 状态向量特征长度h),shape为(3, 3),bias为偏置b(状态向量特征长度h,),shape为(3,)
  24. cell.trainable_variables
  25. #[<tf.Variable 'kernel:0' shape=(4, 3) dtype=float32, numpy=
  26. #array(。。。, dtype=float32)>,
  27. #<tf.Variable 'recurrent_kernel:0' shape=(3, 3) dtype=float32, numpy=
  28. #array(。。。, dtype=float32)>,
  29. #<tf.Variable 'bias:0' shape=(3,) dtype=float32, numpy=array([0., 0., 0.], dtype=float32)>]
  30. # 初始化状态向量
  31. h0 = [tf.zeros([4, 64])] #[词向量特征长度f, 状态向量特征长度h]
  32. x = tf.random.normal([4, 80, 100]) #[序列数量b,序列长度s,词向量特征长度f]
  33. xt = x[:,0,:] #批量序列的第一个时间戳/单词的形状 [序列数量b,词向量特征长度f]
  34. # 构建词向量特征长度f=100,序列长度s=80,状态向量长度h=64的 Cell
  35. cell = layers.SimpleRNNCell(64)
  36. out, h1 = cell(xt, h0) # 前向计算,out和h1均为同一个状态向量
  37. print(out.shape, h1[0].shape) #(4, 64) (4, 64)
  38. print(id(out), id(h1[0])) #2310103532472 2310103532472
  39. h = h0
  40. # 在序列长度的维度解开输入,得到xt:[b,f] [序列数量b,词向量特征长度f]
  41. for xt in tf.unstack(x, axis=1):
  42. out, h = cell(xt, h) # 前向计算
  43. # 最终输出可以聚合每个时间戳上的输出,也可以只取最后时间戳的输出
  44. out = out
  45. x = tf.random.normal([4,80,100]) #[序列数量b,序列长度s,词向量特征长度f]
  46. xt = x[:,0,:] # 取第一个时间戳的输入x0 [序列数量b,词向量特征长度f]
  47. # 构建2个Cell,先cell0,后cell1
  48. cell0 = layers.SimpleRNNCell(64) #状态向量长度h=64
  49. cell1 = layers.SimpleRNNCell(64)
  50. h0 = [tf.zeros([4,64])] # cell0的初始状态向量 [批量大小b,状态向量特征长度h]
  51. h1 = [tf.zeros([4,64])] # cell1的初始状态向量
  52. out0, h0 = cell0(xt, h0)
  53. out1, h1 = cell1(out0, h1)
  54. for xt in tf.unstack(x, axis=1):
  55. # xtw作为输入,输出为out0
  56. out0, h0 = cell0(xt, h0)
  57. # 上一个cell的输出out0作为本cell的输入
  58. out1, h1 = cell1(out0, h1)
  59. print(x.shape)
  60. # 保存上一层的所有时间戳上面的输出
  61. middle_sequences = []
  62. # 计算第一层的所有时间戳上的输出,并保存
  63. for xt in tf.unstack(x, axis=1):
  64. out0, h0 = cell0(xt, h0)
  65. middle_sequences.append(out0)
  66. # 计算第二层的所有时间戳上的输出
  67. # 如果不是末层,需要保存所有时间戳上面的输出
  68. for xt in middle_sequences:
  69. out1, h1 = cell1(xt, h1)
  70. layer = layers.SimpleRNN(64) #状态向量长度h=64
  71. x = tf.random.normal([4, 80, 100]) #[序列数量b,序列长度s,词向量特征长度f]
  72. out = layer(x)
  73. out.shape
  74. layer = layers.SimpleRNN(64, return_sequences=True) #返回所有时间戳上的输出
  75. out = layer(x)
  76. out
  77. # 构建2层RNN网络
  78. # 除最末层外,都需要返回所有时间戳的输出
  79. net = keras.Sequential([
  80. layers.SimpleRNN(64, return_sequences=True),
  81. layers.SimpleRNN(64),
  82. ])
  83. out = net(x)
  84. x = tf.random.normal([2,80,100])#[序列数量b,序列长度s,词向量特征长度f]
  85. xt = x[:,0,:] # 得到一个时间戳的输入 [序列数量b,词向量特征长度f]
  86. cell = layers.LSTMCell(64) # 创建Cell #状态向量长度h=64
  87. # 初始化状态和输出List,[h,c]
  88. state = [tf.zeros([2,64]),tf.zeros([2,64])] #[批量大小b,状态向量特征长度h]
  89. out, state = cell(xt, state) # 前向计算
  90. id(out),id(state[0]),id(state[1]) #(2316639101352, 2316639101352, 2316639102232) out和List的第一个元素h都是相同的
  91. net = layers.LSTM(4) #状态向量长度h=4
  92. net.build(input_shape=(None,5,3)) #input_shape=[序列数量b,序列长度s,词向量特征长度f]
  93. #LSTM内部维护了3个张量,kernel为Wxh(词向量特征长度f, 状态向量特征长度h*4),shape为(3, 16),
  94. #recurrent_kernel为Whh(状态向量特征长度h, 状态向量特征长度h*4),shape为(4, 16),bias为偏置b(状态向量特征长度h*4,),shape为(16,)
  95. net.trainable_variables
  96. #[<tf.Variable 'kernel:0' shape=(3, 16) dtype=float32, numpy=
  97. #array(。。。, dtype=float32)>,
  98. #<tf.Variable 'recurrent_kernel:0' shape=(4, 16) dtype=float32, numpy=
  99. #array(。。。, dtype=float32)>,
  100. #<tf.Variable 'bias:0' shape=(16,) dtype=float32, numpy=
  101. #array(。。。, dtype=float32)>]
  102. net = layers.GRU(4) #状态向量长度h=4
  103. net.build(input_shape=(None,5,3)) #input_shape=[序列数量b,序列长度s,词向量特征长度f]
  104. #GRU内部维护了3个张量,kernel为Wxh(词向量特征长度f, 状态向量特征长度h*3),shape为(3, 12),
  105. #recurrent_kernel为Whh(状态向量特征长度h, 状态向量特征长度h*3),shape为(4, 12),bias为偏置b(状态向量特征长度h*3,),shape为(12,)
  106. net.trainable_variables
  107. [<tf.Variable 'kernel:0' shape=(3, 12) dtype=float32, numpy=
  108. array(。。。, dtype=float32)>,
  109. <tf.Variable 'recurrent_kernel:0' shape=(4, 12) dtype=float32, numpy=
  110. array(。。。, dtype=float32)>,
  111. <tf.Variable 'bias:0' shape=(2, 12) dtype=float32, numpy=
  112. array(。。。, dtype=float32)>]
  113. # 初始化状态向量
  114. h = [tf.zeros([2,64])]
  115. cell = layers.GRUCell(64) # 新建GRU Cell
  116. for xt in tf.unstack(x, axis=1):
  117. out, h = cell(xt, h)
  118. out.shape
  119. import os
  120. import tensorflow as tf
  121. import numpy as np
  122. from tensorflow import keras
  123. from tensorflow.keras import layers, losses, optimizers, Sequential
  124. tf.random.set_seed(22)
  125. np.random.seed(22)
  126. os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
  127. assert tf.__version__.startswith('2.')
  128. batchsz = 128 # 批量大小
  129. total_words = 10000 # 词汇表大小N_vocab
  130. max_review_len = 80 # 句子最大长度s,大于的句子部分将截断,小于的将填充
  131. embedding_len = 100 # 词向量特征长度f
  132. # 加载IMDB数据集,此处的数据采用数字编码,一个数字代表一个单词
  133. #x_train/x_test实际为25000个list列表,x_train中的每个list列表中有218个数字(单词),x_test中的每个list列表中有218个数字(单词)
  134. (x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data(num_words=total_words)
  135. print(x_train.shape, len(x_train[0]), y_train.shape) #(25000,) 218 (25000,)
  136. print(x_test.shape, len(x_test[0]), y_test.shape) #(25000,) 68 (25000,)
  137. x_train[0] #这个list列表中有218个数字(单词)
  138. # 数字编码表:字典中的key为单词,value为索引
  139. word_index = keras.datasets.imdb.get_word_index()
  140. # for k,v in word_index.items():
  141. # print(k,v)
  142. #把数字编码表(字典)中的每个key(单词)对应的value(索引值)加3
  143. word_index = {k:(v+3) for k,v in word_index.items()}
  144. word_index["<PAD>"] = 0
  145. word_index["<START>"] = 1
  146. word_index["<UNK>"] = 2 # unknown
  147. word_index["<UNUSED>"] = 3
  148. # 翻转编码表:key(单词)换到value的位置,value(索引值)换到key的位置
  149. reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
  150. #根据翻转的数字编码表(字典) 通过序列数据(每个list列表)中的每个数字(索引值) 获取其对应的单词
  151. def decode_review(text):
  152. return ' '.join([reverse_word_index.get(i, '?') for i in text])
  153. #根据该list列表中的所有数字(索引值)获取其对应的单词
  154. decode_review(x_train[8])
  155. print('Indexing word vectors.')
  156. embeddings_index = {}
  157. #GLOVE_DIR = r'C:\Users\z390\Downloads\glove6b50dtxt'
  158. GLOVE_DIR = r'F:\人工智能\学习资料\GloVe 词嵌入'
  159. with open(os.path.join(GLOVE_DIR, 'glove.6B.100d.txt'),encoding='utf-8') as f:
  160. for line in f:
  161. values = line.split()
  162. word = values[0] #获取单词
  163. coefs = np.asarray(values[1:], dtype='float32')#获取单词对应的训练好的词向量特征值
  164. embeddings_index[word] = coefs #单词作为key,单词对应的权重矩阵值作为value
  165. print('Found %s word vectors.' % len(embeddings_index)) #Found 400000 word vectors.
  166. len(embeddings_index.keys()) #400000
  167. len(word_index.keys()) #88588
  168. MAX_NUM_WORDS = total_words # 词汇表大小N_vocab
  169. # prepare embedding matrix
  170. num_words = min(MAX_NUM_WORDS, len(word_index))
  171. embedding_matrix = np.zeros((num_words, embedding_len)) #[词汇表大小, 词向量特征长度f]
  172. applied_vec_count = 0
  173. for word, i in word_index.items(): #遍历每个key(单词)和对应的value(索引值)
  174. if i >= MAX_NUM_WORDS:
  175. continue
  176. embedding_vector = embeddings_index.get(word) #根据key(单词)获取对应的训练好的词向量特征值
  177. # print(word,embedding_vector)
  178. if embedding_vector is not None:
  179. # words not found in embedding index will be all-zeros.
  180. embedding_matrix[i] = embedding_vector #key为单词对应的索引值,value为单词对应的训练好的词向量特征值
  181. applied_vec_count += 1
  182. #9793 (10000, 100) 即[词汇表大小, 词向量特征长度f]
  183. print(applied_vec_count, embedding_matrix.shape)
  184. # x_train:[b, 80]
  185. # x_test: [b, 80]
  186. # 截断和填充句子,使得等长,此处长句子保留句子后面的部分,短句子在前面填充
  187. x_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=max_review_len)
  188. x_test = keras.preprocessing.sequence.pad_sequences(x_test, maxlen=max_review_len)
  189. # 构建数据集,打散,批量,并丢掉最后一个不够batchsz的batch
  190. db_train = tf.data.Dataset.from_tensor_slices((x_train, y_train))
  191. db_train = db_train.shuffle(1000).batch(batchsz, drop_remainder=True)
  192. db_test = tf.data.Dataset.from_tensor_slices((x_test, y_test))
  193. db_test = db_test.batch(batchsz, drop_remainder=True)
  194. #x_train shape: (25000, 80) tf.Tensor(1, shape=(), dtype=int64) tf.Tensor(0, shape=(), dtype=int64)
  195. print('x_train shape:', x_train.shape, tf.reduce_max(y_train), tf.reduce_min(y_train))
  196. #x_test shape: (25000, 80)
  197. print('x_test shape:', x_test.shape)
  198. class MyRNN(keras.Model):
  199. # Cell方式构建多层网络
  200. def __init__(self, units):
  201. super(MyRNN, self).__init__()
  202. # total_words 词汇表大小
  203. # input_length = max_review_len 句子最大长度s,大于的句子部分将截断,小于的将填充
  204. # embedding_len 词向量特征长度f,即时间戳t的输入特征长度f
  205. # 在获得了词汇表后,利用词汇表初始化 Embedding 层即可,并设置 Embedding 层不参与梯度优化,设置trainable=False不参与梯度更新
  206. # 词向量编码 [b, 80] => [b, 80, 100]
  207. self.embedding = layers.Embedding(total_words, embedding_len, input_length=max_review_len, trainable=False)
  208. # input_shape=[序列数量b,序列长度s]
  209. self.embedding.build(input_shape=(None,max_review_len))
  210. # 直接利用预训练的词向量表初始化 Embedding 层
  211. self.embedding.set_weights([embedding_matrix])
  212. # 构建RNN
  213. self.rnn = keras.Sequential([
  214. layers.LSTM(units, dropout=0.5, return_sequences=True),
  215. layers.LSTM(units, dropout=0.5)
  216. ])
  217. # 构建分类网络,用于将CELL的输出特征进行分类,2分类
  218. # [b, 80, 100] => [b, 64] => [b, 1]
  219. self.outlayer = Sequential([
  220. layers.Dense(32),
  221. layers.Dropout(rate=0.5),
  222. layers.ReLU(),
  223. layers.Dense(1)])
  224. def call(self, inputs, training=None):
  225. x = inputs # [b, 80]
  226. # embedding: [b, 80] => [b, 80, 100] [序列数量b,序列长度s,词向量特征长度f]
  227. x = self.embedding(x)
  228. # rnn cell compute,[b, 80, 100] => [b, 64]
  229. x = self.rnn(x)
  230. # 末层最后一个输出作为分类网络的输入: [b, 64] => [b, 1]
  231. x = self.outlayer(x,training)
  232. # p(y is pos|x)
  233. prob = tf.sigmoid(x)
  234. return prob
  235. def main():
  236. units = 512 # RNN状态向量长度f
  237. epochs = 50 # 训练epochs
  238. model = MyRNN(units)
  239. # 装配
  240. model.compile(optimizer = optimizers.Adam(0.001),
  241. loss = losses.BinaryCrossentropy(),
  242. metrics=['accuracy'])
  243. # 训练和验证
  244. model.fit(db_train, epochs=epochs, validation_data=db_test)
  245. # 测试
  246. model.evaluate(db_test)
  247. if __name__ == '__main__':
  248. main()
  249. import os
  250. import tensorflow as tf
  251. import numpy as np
  252. from tensorflow import keras
  253. from tensorflow.keras import layers, losses, optimizers, Sequential
  254. tf.random.set_seed(22)
  255. np.random.seed(22)
  256. os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
  257. assert tf.__version__.startswith('2.')
  258. batchsz = 128 # 批量大小
  259. total_words = 10000 # 词汇表大小N_vocab
  260. max_review_len = 80 # 句子最大长度s,大于的句子部分将截断,小于的将填充
  261. embedding_len = 100 # 词向量特征长度f
  262. # 加载IMDB数据集,此处的数据采用数字编码,一个数字代表一个单词
  263. (x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data(num_words=total_words)
  264. print(x_train.shape, len(x_train[0]), y_train.shape)
  265. print(x_test.shape, len(x_test[0]), y_test.shape)
  266. x_train[0]
  267. # 数字编码表
  268. word_index = keras.datasets.imdb.get_word_index()
  269. # for k,v in word_index.items():
  270. # print(k,v)
  271. word_index = {k:(v+3) for k,v in word_index.items()}
  272. word_index["<PAD>"] = 0
  273. word_index["<START>"] = 1
  274. word_index["<UNK>"] = 2 # unknown
  275. word_index["<UNUSED>"] = 3
  276. # 翻转编码表
  277. reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
  278. def decode_review(text):
  279. return ' '.join([reverse_word_index.get(i, '?') for i in text])
  280. decode_review(x_train[8])
  281. # x_train:[b, 80]
  282. # x_test: [b, 80]
  283. # 截断和填充句子,使得等长,此处长句子保留句子后面的部分,短句子在前面填充
  284. x_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=max_review_len)
  285. x_test = keras.preprocessing.sequence.pad_sequences(x_test, maxlen=max_review_len)
  286. # 构建数据集,打散,批量,并丢掉最后一个不够batchsz的batch
  287. db_train = tf.data.Dataset.from_tensor_slices((x_train, y_train))
  288. db_train = db_train.shuffle(1000).batch(batchsz, drop_remainder=True)
  289. db_test = tf.data.Dataset.from_tensor_slices((x_test, y_test))
  290. db_test = db_test.batch(batchsz, drop_remainder=True)
  291. print('x_train shape:', x_train.shape, tf.reduce_max(y_train), tf.reduce_min(y_train))
  292. print('x_test shape:', x_test.shape)
  293. class MyRNN(keras.Model):
  294. # Cell方式构建多层网络
  295. def __init__(self, units):
  296. super(MyRNN, self).__init__()
  297. # 词向量编码 [b, 80] => [b, 80, 100]
  298. self.embedding = layers.Embedding(total_words, embedding_len, input_length=max_review_len)
  299. # 构建RNN
  300. self.rnn = keras.Sequential([
  301. layers.LSTM(units, dropout=0.5, return_sequences=True),
  302. layers.LSTM(units, dropout=0.5)
  303. ])
  304. # 构建分类网络,用于将CELL的输出特征进行分类,2分类
  305. # [b, 80, 100] => [b, 64] => [b, 1]
  306. self.outlayer = Sequential([
  307. layers.Dense(32),
  308. layers.Dropout(rate=0.5),
  309. layers.ReLU(),
  310. layers.Dense(1)])
  311. def call(self, inputs, training=None):
  312. x = inputs # [b, 80]
  313. # embedding: [b, 80] => [b, 80, 100]
  314. x = self.embedding(x)
  315. # rnn cell compute,[b, 80, 100] => [b, 64]
  316. x = self.rnn(x)
  317. # 末层最后一个输出作为分类网络的输入: [b, 64] => [b, 1]
  318. x = self.outlayer(x,training)
  319. # p(y is pos|x)
  320. prob = tf.sigmoid(x)
  321. return prob
  322. def main():
  323. units = 32 # RNN状态向量长度f
  324. epochs = 50 # 训练epochs
  325. model = MyRNN(units)
  326. # 装配
  327. model.compile(optimizer = optimizers.Adam(0.001),
  328. loss = losses.BinaryCrossentropy(),
  329. metrics=['accuracy'])
  330. # 训练和验证
  331. model.fit(db_train, epochs=epochs, validation_data=db_test)
  332. # 测试
  333. model.evaluate(db_test)
  334. if __name__ == '__main__':
  335. main()
  336. import os
  337. import tensorflow as tf
  338. import numpy as np
  339. from tensorflow import keras
  340. from tensorflow.keras import layers, losses, optimizers, Sequential
  341. tf.random.set_seed(22)
  342. np.random.seed(22)
  343. os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
  344. assert tf.__version__.startswith('2.')
  345. batchsz = 128 # 批量大小
  346. total_words = 10000 # 词汇表大小N_vocab
  347. max_review_len = 80 # 句子最大长度s,大于的句子部分将截断,小于的将填充
  348. embedding_len = 100 # 词向量特征长度f
  349. # 加载IMDB数据集,此处的数据采用数字编码,一个数字代表一个单词
  350. (x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data(num_words=total_words)
  351. print(x_train.shape, len(x_train[0]), y_train.shape)
  352. print(x_test.shape, len(x_test[0]), y_test.shape)
  353. x_train[0]
  354. # 数字编码表
  355. word_index = keras.datasets.imdb.get_word_index()
  356. # for k,v in word_index.items():
  357. # print(k,v)
  358. word_index = {k:(v+3) for k,v in word_index.items()}
  359. word_index["<PAD>"] = 0
  360. word_index["<START>"] = 1
  361. word_index["<UNK>"] = 2 # unknown
  362. word_index["<UNUSED>"] = 3
  363. # 翻转编码表
  364. reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
  365. def decode_review(text):
  366. return ' '.join([reverse_word_index.get(i, '?') for i in text])
  367. decode_review(x_train[8])
  368. # x_train:[b, 80]
  369. # x_test: [b, 80]
  370. # 截断和填充句子,使得等长,此处长句子保留句子后面的部分,短句子在前面填充
  371. x_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=max_review_len)
  372. x_test = keras.preprocessing.sequence.pad_sequences(x_test, maxlen=max_review_len)
  373. # 构建数据集,打散,批量,并丢掉最后一个不够batchsz的batch
  374. db_train = tf.data.Dataset.from_tensor_slices((x_train, y_train))
  375. db_train = db_train.shuffle(1000).batch(batchsz, drop_remainder=True)
  376. db_test = tf.data.Dataset.from_tensor_slices((x_test, y_test))
  377. db_test = db_test.batch(batchsz, drop_remainder=True)
  378. print('x_train shape:', x_train.shape, tf.reduce_max(y_train), tf.reduce_min(y_train))
  379. print('x_test shape:', x_test.shape)
  380. class MyRNN(keras.Model):
  381. # Cell方式构建多层网络
  382. def __init__(self, units):
  383. super(MyRNN, self).__init__()
  384. # 词向量编码 [b, 80] => [b, 80, 100]
  385. self.embedding = layers.Embedding(total_words, embedding_len, input_length=max_review_len)
  386. # 构建RNN
  387. self.rnn = keras.Sequential([
  388. layers.GRU(units, dropout=0.5, return_sequences=True),
  389. layers.GRU(units, dropout=0.5)
  390. ])
  391. # 构建分类网络,用于将CELL的输出特征进行分类,2分类
  392. # [b, 80, 100] => [b, 64] => [b, 1]
  393. self.outlayer = Sequential([
  394. layers.Dense(32),
  395. layers.Dropout(rate=0.5),
  396. layers.ReLU(),
  397. layers.Dense(1)])
  398. def call(self, inputs, training=None):
  399. x = inputs # [b, 80]
  400. # embedding: [b, 80] => [b, 80, 100]
  401. x = self.embedding(x)
  402. # rnn cell compute,[b, 80, 100] => [b, 64]
  403. x = self.rnn(x)
  404. # 末层最后一个输出作为分类网络的输入: [b, 64] => [b, 1]
  405. x = self.outlayer(x,training)
  406. # p(y is pos|x)
  407. prob = tf.sigmoid(x)
  408. return prob
  409. def main():
  410. units = 32 # RNN状态向量长度f
  411. epochs = 50 # 训练epochs
  412. model = MyRNN(units)
  413. # 装配
  414. model.compile(optimizer = optimizers.Adam(0.001),
  415. loss = losses.BinaryCrossentropy(),
  416. metrics=['accuracy'])
  417. # 训练和验证
  418. model.fit(db_train, epochs=epochs, validation_data=db_test)
  419. # 测试
  420. model.evaluate(db_test)
  421. if __name__ == '__main__':
  422. main()
  423. import os
  424. import tensorflow as tf
  425. import numpy as np
  426. from tensorflow import keras
  427. from tensorflow.keras import layers, losses, optimizers, Sequential
  428. tf.random.set_seed(22)
  429. np.random.seed(22)
  430. os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
  431. assert tf.__version__.startswith('2.')
  432. batchsz = 512 # 批量大小
  433. total_words = 10000 # 词汇表大小N_vocab
  434. max_review_len = 80 # 句子最大长度s,大于的句子部分将截断,小于的将填充
  435. embedding_len = 100 # 词向量特征长度f
  436. # 加载IMDB数据集,此处的数据采用数字编码,一个数字代表一个单词
  437. (x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data(num_words=total_words)
  438. print(x_train.shape, len(x_train[0]), y_train.shape)
  439. print(x_test.shape, len(x_test[0]), y_test.shape)
  440. x_train[0]
  441. # 数字编码表
  442. word_index = keras.datasets.imdb.get_word_index()
  443. # for k,v in word_index.items():
  444. # print(k,v)
  445. word_index = {k:(v+3) for k,v in word_index.items()}
  446. word_index["<PAD>"] = 0
  447. word_index["<START>"] = 1
  448. word_index["<UNK>"] = 2 # unknown
  449. word_index["<UNUSED>"] = 3
  450. # 翻转编码表
  451. reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
  452. def decode_review(text):
  453. return ' '.join([reverse_word_index.get(i, '?') for i in text])
  454. decode_review(x_train[8])
  455. # x_train:[b, 80]
  456. # x_test: [b, 80]
  457. # 截断和填充句子,使得等长,此处长句子保留句子后面的部分,短句子在前面填充
  458. x_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=max_review_len)
  459. x_test = keras.preprocessing.sequence.pad_sequences(x_test, maxlen=max_review_len)
  460. # 构建数据集,打散,批量,并丢掉最后一个不够batchsz的batch
  461. db_train = tf.data.Dataset.from_tensor_slices((x_train, y_train))
  462. db_train = db_train.shuffle(1000).batch(batchsz, drop_remainder=True)
  463. db_test = tf.data.Dataset.from_tensor_slices((x_test, y_test))
  464. db_test = db_test.batch(batchsz, drop_remainder=True)
  465. print('x_train shape:', x_train.shape, tf.reduce_max(y_train), tf.reduce_min(y_train))
  466. print('x_test shape:', x_test.shape)
  467. class MyRNN(keras.Model):
  468. # Cell方式构建多层网络
  469. def __init__(self, units):
  470. super(MyRNN, self).__init__()
  471. # 词向量编码 [b, 80] => [b, 80, 100]
  472. self.embedding = layers.Embedding(total_words, embedding_len, input_length=max_review_len)
  473. # 构建RNN
  474. self.rnn = keras.Sequential([
  475. layers.SimpleRNN(units, dropout=0.5, return_sequences=True),
  476. layers.SimpleRNN(units, dropout=0.5)
  477. ])
  478. # 构建分类网络,用于将CELL的输出特征进行分类,2分类
  479. # [b, 80, 100] => [b, 64] => [b, 1]
  480. self.outlayer = Sequential([
  481. layers.Dense(32),
  482. layers.Dropout(rate=0.5),
  483. layers.ReLU(),
  484. layers.Dense(1)])
  485. def call(self, inputs, training=None):
  486. x = inputs # [b, 80]
  487. # embedding: [b, 80] => [b, 80, 100]
  488. x = self.embedding(x)
  489. # rnn cell compute,[b, 80, 100] => [b, 64]
  490. x = self.rnn(x)
  491. # 末层最后一个输出作为分类网络的输入: [b, 64] => [b, 1]
  492. x = self.outlayer(x,training)
  493. # p(y is pos|x)
  494. prob = tf.sigmoid(x)
  495. return prob
  496. def main():
  497. units = 64 # RNN状态向量长度f
  498. epochs = 50 # 训练epochs
  499. model = MyRNN(units)
  500. # 装配
  501. model.compile(optimizer = optimizers.Adam(0.001),
  502. loss = losses.BinaryCrossentropy(),
  503. metrics=['accuracy'])
  504. # 训练和验证
  505. model.fit(db_train, epochs=epochs, validation_data=db_test)
  506. # 测试
  507. model.evaluate(db_test)
  508. if __name__ == '__main__':
  509. main()
  510. import os
  511. import tensorflow as tf
  512. import numpy as np
  513. from tensorflow import keras
  514. from tensorflow.keras import layers, losses, optimizers, Sequential
  515. tf.random.set_seed(22)
  516. np.random.seed(22)
  517. os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
  518. assert tf.__version__.startswith('2.')
  519. batchsz = 128 # 批量大小
  520. total_words = 10000 # 词汇表大小N_vocab
  521. max_review_len = 80 # 句子最大长度s,大于的句子部分将截断,小于的将填充
  522. embedding_len = 100 # 词向量特征长度f
  523. # 加载IMDB数据集,此处的数据采用数字编码,一个数字代表一个单词
  524. (x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data(num_words=total_words)
  525. print(x_train.shape, len(x_train[0]), y_train.shape)
  526. print(x_test.shape, len(x_test[0]), y_test.shape)
  527. x_train[0]
  528. # 数字编码表
  529. word_index = keras.datasets.imdb.get_word_index()
  530. # for k,v in word_index.items():
  531. # print(k,v)
  532. word_index = {k:(v+3) for k,v in word_index.items()}
  533. word_index["<PAD>"] = 0
  534. word_index["<START>"] = 1
  535. word_index["<UNK>"] = 2 # unknown
  536. word_index["<UNUSED>"] = 3
  537. # 翻转编码表
  538. reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
  539. def decode_review(text):
  540. return ' '.join([reverse_word_index.get(i, '?') for i in text])
  541. decode_review(x_train[8])
  542. # x_train:[b, 80]
  543. # x_test: [b, 80]
  544. # 截断和填充句子,使得等长,此处长句子保留句子后面的部分,短句子在前面填充
  545. x_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=max_review_len)
  546. x_test = keras.preprocessing.sequence.pad_sequences(x_test, maxlen=max_review_len)
  547. # 构建数据集,打散,批量,并丢掉最后一个不够batchsz的batch
  548. db_train = tf.data.Dataset.from_tensor_slices((x_train, y_train))
  549. db_train = db_train.shuffle(1000).batch(batchsz, drop_remainder=True)
  550. db_test = tf.data.Dataset.from_tensor_slices((x_test, y_test))
  551. db_test = db_test.batch(batchsz, drop_remainder=True)
  552. print('x_train shape:', x_train.shape, tf.reduce_max(y_train), tf.reduce_min(y_train))
  553. print('x_test shape:', x_test.shape)
  554. class MyRNN(keras.Model):
  555. # Cell方式构建多层网络
  556. def __init__(self, units):
  557. super(MyRNN, self).__init__()
  558. # [b, 64],构建Cell初始化状态向量,重复使用
  559. self.state0 = [tf.zeros([batchsz, units])]
  560. self.state1 = [tf.zeros([batchsz, units])]
  561. # 词向量编码 [b, 80] => [b, 80, 100]
  562. self.embedding = layers.Embedding(total_words, embedding_len,
  563. input_length=max_review_len)
  564. # 构建2个Cell
  565. self.rnn_cell0 = layers.GRUCell(units, dropout=0.5)
  566. self.rnn_cell1 = layers.GRUCell(units, dropout=0.5)
  567. # 构建分类网络,用于将CELL的输出特征进行分类,2分类
  568. # [b, 80, 100] => [b, 64] => [b, 1]
  569. self.outlayer = Sequential([
  570. layers.Dense(units),
  571. layers.Dropout(rate=0.5),
  572. layers.ReLU(),
  573. layers.Dense(1)])
  574. def call(self, inputs, training=None):
  575. x = inputs # [b, 80]
  576. # embedding: [b, 80] => [b, 80, 100]
  577. x = self.embedding(x)
  578. # rnn cell compute,[b, 80, 100] => [b, 64]
  579. state0 = self.state0
  580. state1 = self.state1
  581. for word in tf.unstack(x, axis=1): # word: [b, 100]
  582. out0, state0 = self.rnn_cell0(word, state0, training)
  583. out1, state1 = self.rnn_cell1(out0, state1, training)
  584. # 末层最后一个输出作为分类网络的输入: [b, 64] => [b, 1]
  585. x = self.outlayer(out1, training)
  586. # p(y is pos|x)
  587. prob = tf.sigmoid(x)
  588. return prob
  589. def main():
  590. units = 64 # RNN状态向量长度f
  591. epochs = 50 # 训练epochs
  592. model = MyRNN(units)
  593. # 装配
  594. model.compile(optimizer = optimizers.RMSprop(0.001),
  595. loss = losses.BinaryCrossentropy(),
  596. metrics=['accuracy'])
  597. # 训练和验证
  598. model.fit(db_train, epochs=epochs, validation_data=db_test)
  599. # 测试
  600. model.evaluate(db_test)
  601. if __name__ == '__main__':
  602. main()
  603. import os
  604. import tensorflow as tf
  605. import numpy as np
  606. from tensorflow import keras
  607. from tensorflow.keras import layers, losses, optimizers, Sequential
  608. tf.random.set_seed(22)
  609. np.random.seed(22)
  610. os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
  611. assert tf.__version__.startswith('2.')
  612. batchsz = 128 # 批量大小
  613. total_words = 10000 # 词汇表大小N_vocab
  614. max_review_len = 80 # 句子最大长度s,大于的句子部分将截断,小于的将填充
  615. embedding_len = 100 # 词向量特征长度f
  616. # 加载IMDB数据集,此处的数据采用数字编码,一个数字代表一个单词
  617. (x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data(num_words=total_words)
  618. print(x_train.shape, len(x_train[0]), y_train.shape)
  619. print(x_test.shape, len(x_test[0]), y_test.shape)
  620. x_train[0]
  621. # 数字编码表
  622. word_index = keras.datasets.imdb.get_word_index()
  623. # for k,v in word_index.items():
  624. # print(k,v)
  625. word_index = {k:(v+3) for k,v in word_index.items()}
  626. word_index["<PAD>"] = 0
  627. word_index["<START>"] = 1
  628. word_index["<UNK>"] = 2 # unknown
  629. word_index["<UNUSED>"] = 3
  630. # 翻转编码表
  631. reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
  632. def decode_review(text):
  633. return ' '.join([reverse_word_index.get(i, '?') for i in text])
  634. decode_review(x_train[8])
  635. # x_train:[b, 80]
  636. # x_test: [b, 80]
  637. # 截断和填充句子,使得等长,此处长句子保留句子后面的部分,短句子在前面填充
  638. x_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=max_review_len)
  639. x_test = keras.preprocessing.sequence.pad_sequences(x_test, maxlen=max_review_len)
  640. # 构建数据集,打散,批量,并丢掉最后一个不够batchsz的batch
  641. db_train = tf.data.Dataset.from_tensor_slices((x_train, y_train))
  642. db_train = db_train.shuffle(1000).batch(batchsz, drop_remainder=True)
  643. db_test = tf.data.Dataset.from_tensor_slices((x_test, y_test))
  644. db_test = db_test.batch(batchsz, drop_remainder=True)
  645. print('x_train shape:', x_train.shape, tf.reduce_max(y_train), tf.reduce_min(y_train))
  646. print('x_test shape:', x_test.shape)
  647. class MyRNN(keras.Model):
  648. # Cell方式构建多层网络
  649. def __init__(self, units):
  650. super(MyRNN, self).__init__()
  651. # [b, 64],构建Cell初始化状态向量,重复使用
  652. self.state0 = [tf.zeros([batchsz, units]),tf.zeros([batchsz, units])]
  653. self.state1 = [tf.zeros([batchsz, units]),tf.zeros([batchsz, units])]
  654. # 词向量编码 [b, 80] => [b, 80, 100]
  655. self.embedding = layers.Embedding(total_words, embedding_len, input_length=max_review_len)
  656. # 构建2个Cell
  657. self.rnn_cell0 = layers.LSTMCell(units, dropout=0.5)
  658. self.rnn_cell1 = layers.LSTMCell(units, dropout=0.5)
  659. # 构建分类网络,用于将CELL的输出特征进行分类,2分类
  660. # [b, 80, 100] => [b, 64] => [b, 1]
  661. self.outlayer = Sequential([
  662. layers.Dense(units),
  663. layers.Dropout(rate=0.5),
  664. layers.ReLU(),
  665. layers.Dense(1)])
  666. def call(self, inputs, training=None):
  667. x = inputs # [b, 80]
  668. # embedding: [b, 80] => [b, 80, 100]
  669. x = self.embedding(x)
  670. # rnn cell compute,[b, 80, 100] => [b, 64]
  671. state0 = self.state0
  672. state1 = self.state1
  673. for word in tf.unstack(x, axis=1): # word: [b, 100]
  674. out0, state0 = self.rnn_cell0(word, state0, training)
  675. out1, state1 = self.rnn_cell1(out0, state1, training)
  676. # 末层最后一个输出作为分类网络的输入: [b, 64] => [b, 1]
  677. x = self.outlayer(out1,training)
  678. # p(y is pos|x)
  679. prob = tf.sigmoid(x)
  680. return prob
  681. def main():
  682. units = 64 # RNN状态向量长度f
  683. epochs = 50 # 训练epochs
  684. model = MyRNN(units)
  685. # 装配
  686. model.compile(optimizer = optimizers.RMSprop(0.001),
  687. loss = losses.BinaryCrossentropy(),
  688. metrics=['accuracy'])
  689. # 训练和验证
  690. model.fit(db_train, epochs=epochs, validation_data=db_test)
  691. # 测试
  692. model.evaluate(db_test)
  693. if __name__ == '__main__':
  694. main()
  695. import os
  696. import tensorflow as tf
  697. import numpy as np
  698. from tensorflow import keras
  699. from tensorflow.keras import layers, losses, optimizers, Sequential
  700. tf.random.set_seed(22)
  701. np.random.seed(22)
  702. os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
  703. assert tf.__version__.startswith('2.')
  704. batchsz = 128 # 批量大小
  705. total_words = 10000 # 词汇表大小N_vocab
  706. max_review_len = 80 # 句子最大长度s,大于的句子部分将截断,小于的将填充
  707. embedding_len = 100 # 词向量特征长度f
  708. # 加载IMDB数据集,此处的数据采用数字编码,一个数字代表一个单词
  709. (x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data(num_words=total_words)
  710. print(x_train.shape, len(x_train[0]), y_train.shape)
  711. print(x_test.shape, len(x_test[0]), y_test.shape)
  712. x_train[0]
  713. # 数字编码表
  714. word_index = keras.datasets.imdb.get_word_index()
  715. # for k,v in word_index.items():
  716. # print(k,v)
  717. word_index = {k:(v+3) for k,v in word_index.items()}
  718. word_index["<PAD>"] = 0
  719. word_index["<START>"] = 1
  720. word_index["<UNK>"] = 2 # unknown
  721. word_index["<UNUSED>"] = 3
  722. # 翻转编码表
  723. reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
  724. def decode_review(text):
  725. return ' '.join([reverse_word_index.get(i, '?') for i in text])
  726. decode_review(x_train[8])
  727. # x_train:[b, 80]
  728. # x_test: [b, 80]
  729. # 截断和填充句子,使得等长,此处长句子保留句子后面的部分,短句子在前面填充
  730. x_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=max_review_len)
  731. x_test = keras.preprocessing.sequence.pad_sequences(x_test, maxlen=max_review_len)
  732. # 构建数据集,打散,批量,并丢掉最后一个不够batchsz的batch
  733. db_train = tf.data.Dataset.from_tensor_slices((x_train, y_train))
  734. db_train = db_train.shuffle(1000).batch(batchsz, drop_remainder=True)
  735. db_test = tf.data.Dataset.from_tensor_slices((x_test, y_test))
  736. db_test = db_test.batch(batchsz, drop_remainder=True)
  737. print('x_train shape:', x_train.shape, tf.reduce_max(y_train), tf.reduce_min(y_train))
  738. print('x_test shape:', x_test.shape)
  739. class MyRNN(keras.Model):
  740. # Cell方式构建多层网络
  741. def __init__(self, units):
  742. super(MyRNN, self).__init__()
  743. # [b, 64],构建Cell初始化状态向量,重复使用
  744. self.state0 = [tf.zeros([batchsz, units])]
  745. self.state1 = [tf.zeros([batchsz, units])]
  746. # 词向量编码 [b, 80] => [b, 80, 100]
  747. self.embedding = layers.Embedding(total_words, embedding_len,
  748. input_length=max_review_len)
  749. # 构建2个Cell
  750. self.rnn_cell0 = layers.SimpleRNNCell(units, dropout=0.5)
  751. self.rnn_cell1 = layers.SimpleRNNCell(units, dropout=0.5)
  752. # 构建分类网络,用于将CELL的输出特征进行分类,2分类
  753. # [b, 80, 100] => [b, 64] => [b, 1]
  754. self.outlayer = Sequential([
  755. layers.Dense(units),
  756. layers.Dropout(rate=0.5),
  757. layers.ReLU(),
  758. layers.Dense(1)])
  759. def call(self, inputs, training=None):
  760. x = inputs # [b, 80]
  761. # embedding: [b, 80] => [b, 80, 100]
  762. x = self.embedding(x)
  763. # rnn cell compute,[b, 80, 100] => [b, 64]
  764. state0 = self.state0
  765. state1 = self.state1
  766. for word in tf.unstack(x, axis=1): # word: [b, 100]
  767. out0, state0 = self.rnn_cell0(word, state0, training)
  768. out1, state1 = self.rnn_cell1(out0, state1, training)
  769. # 末层最后一个输出作为分类网络的输入: [b, 64] => [b, 1]
  770. x = self.outlayer(out1, training)
  771. # p(y is pos|x)
  772. prob = tf.sigmoid(x)
  773. return prob
  774. def main():
  775. units = 64 # RNN状态向量长度f
  776. epochs = 50 # 训练epochs
  777. model = MyRNN(units)
  778. # 装配
  779. model.compile(optimizer = optimizers.RMSprop(0.001),
  780. loss = losses.BinaryCrossentropy(),
  781. metrics=['accuracy'])
  782. # 训练和验证
  783. model.fit(db_train, epochs=epochs, validation_data=db_test)
  784. # 测试
  785. model.evaluate(db_test)
  786. if __name__ == '__main__':
  787. main()
  788. import os
  789. import tensorflow as tf
  790. import numpy as np
  791. from tensorflow import keras
  792. tf.random.set_seed(22)
  793. np.random.seed(22)
  794. os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
  795. assert tf.__version__.startswith('2.')
  796. # 确定随机种子的可重复性
  797. np.random.seed(7)
  798. # 加载数据集,但只保留前n个单词,其余的为零
  799. top_words = 10000
  800. # 截断和填充输入序列
  801. max_review_length = 80
  802. (X_train, y_train), (X_test, y_test) = keras.datasets.imdb.load_data(num_words=top_words)
  803. # X_train = tf.convert_to_tensor(X_train)
  804. # y_train = tf.one_hot(y_train, depth=2)
  805. print('Pad sequences (samples x time)')
  806. x_train = keras.preprocessing.sequence.pad_sequences(X_train, maxlen=max_review_length)
  807. x_test = keras.preprocessing.sequence.pad_sequences(X_test, maxlen=max_review_length)
  808. print('x_train shape:', x_train.shape)
  809. print('x_test shape:', x_test.shape)
  810. class RNN(keras.Model):
  811. def __init__(self, units, num_classes, num_layers):
  812. super(RNN, self).__init__()
  813. # self.cells = [keras.layers.LSTMCell(units) for _ in range(num_layers)]
  814. # self.rnn = keras.layers.RNN(self.cells, unroll=True)
  815. self.rnn = keras.layers.LSTM(units, return_sequences=True)
  816. self.rnn2 = keras.layers.LSTM(units)
  817. # self.cells = (keras.layers.LSTMCell(units) for _ in range(num_layers))
  818. # self.rnn = keras.layers.RNN(self.cells, return_sequences=True, return_state=True)
  819. # self.rnn = keras.layers.LSTM(units, unroll=True)
  820. # self.rnn = keras.layers.StackedRNNCells(self.cells)
  821. # 总共有1000个单词,每个单词都将嵌入到100长度的向量中,最大句子长度是80字
  822. self.embedding = keras.layers.Embedding(top_words, 100, input_length=max_review_length)
  823. self.fc = keras.layers.Dense(1)
  824. def call(self, inputs, training=None, mask=None):
  825. # print('x', inputs.shape)
  826. # [b, sentence len] => [b, sentence len, word embedding]
  827. x = self.embedding(inputs)
  828. # print('embedding', x.shape)
  829. x = self.rnn(x)
  830. x = self.rnn2(x)
  831. # print('rnn', x.shape)
  832. x = self.fc(x)
  833. print(x.shape)
  834. return x
  835. def main():
  836. units = 64
  837. num_classes = 2
  838. batch_size = 32
  839. epochs = 20
  840. model = RNN(units, num_classes, num_layers=2)
  841. model.compile(optimizer=keras.optimizers.Adam(0.001),
  842. loss=keras.losses.BinaryCrossentropy(from_logits=True),
  843. metrics=['accuracy'])
  844. # train
  845. model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs,
  846. validation_data=(x_test, y_test), verbose=1)
  847. # evaluate on test set
  848. scores = model.evaluate(x_test, y_test, batch_size, verbose=1)
  849. print("Final test loss and accuracy :", scores)
  850. if __name__ == '__main__':
  851. main()

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 54

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 55

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 56

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 57

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9henVzYS5ibG9nLmNzZG4ubmV0_size_16_color_FFFFFF_t_70 58


RNNColorbot

  1. import os, six, time
  2. import tensorflow as tf
  3. import numpy as np
  4. from tensorflow import keras
  5. from matplotlib import pyplot as plt
  6. from utils import load_dataset, parse
  7. from model import RNNColorbot
  8. tf.random.set_seed(22)
  9. np.random.seed(22)
  10. os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
  11. assert tf.__version__.startswith('2.')
  12. def test(model, eval_data):
  13. """
  14. 计算评估数据的平均损失,应该是一个Dataset
  15. """
  16. avg_loss = keras.metrics.Mean()
  17. for (labels, chars, sequence_length) in eval_data:
  18. predictions = model((chars, sequence_length), training=False)
  19. avg_loss.update_state(keras.losses.mean_squared_error(labels, predictions))
  20. print("eval/loss: %.6f" % avg_loss.result().numpy())
  21. def train_one_epoch(model, optimizer, train_data, log_interval, epoch):
  22. """
  23. 使用优化器对train_data建模
  24. """
  25. for step, (labels, chars, sequence_length) in enumerate(train_data):
  26. with tf.GradientTape() as tape:
  27. predictions = model((chars, sequence_length), training=True)
  28. loss = keras.losses.mean_squared_error(labels, predictions)
  29. loss = tf.reduce_mean(loss)
  30. grads = tape.gradient(loss, model.trainable_variables)
  31. optimizer.apply_gradients(zip(grads, model.trainable_variables))
  32. if step % 100 == 0:
  33. print(epoch, step, 'loss:', float(loss))
  34. SOURCE_TRAIN_URL = "https://raw.githubusercontent.com/random-forests/tensorflow-workshop/master/archive/extras/colorbot/data/train.csv"
  35. SOURCE_TEST_URL = "https://raw.githubusercontent.com/random-forests/tensorflow-workshop/master/archive/extras/colorbot/data/test.csv"
  36. def main():
  37. batchsz = 64
  38. rnn_cell_sizes = [256, 128]
  39. epochs = 40
  40. data_dir = os.path.join('.', "data")
  41. train_data = load_dataset(
  42. data_dir=data_dir, url=SOURCE_TRAIN_URL, batch_size=batchsz)
  43. eval_data = load_dataset(
  44. data_dir=data_dir, url=SOURCE_TEST_URL, batch_size=batchsz)
  45. model = RNNColorbot(
  46. rnn_cell_sizes=rnn_cell_sizes,
  47. label_dimension=3,
  48. keep_prob=0.5)
  49. optimizer = keras.optimizers.Adam(0.01)
  50. for epoch in range(epochs):
  51. start = time.time()
  52. train_one_epoch(model, optimizer, train_data, 50, epoch)
  53. end = time.time()
  54. # print("train/time for epoch #%d: %.2f" % (epoch, end - start))
  55. if epoch % 10 == 0:
  56. test(model, eval_data)
  57. print("Colorbot is ready to generate colors!")
  58. while True:
  59. try:
  60. color_name = six.moves.input("Give me a color name (or press enter to exit): ")
  61. except EOFError:
  62. return
  63. if not color_name:
  64. return
  65. _, chars, length = parse(color_name)
  66. (chars, length) = (tf.identity(chars), tf.identity(length))
  67. chars = tf.expand_dims(chars, 0)
  68. length = tf.expand_dims(length, 0)
  69. preds = tf.unstack(model((chars, length), training=False)[0])
  70. # 预测不能为负,因为它们是由ReLU layer层生成的
  71. # 但是,它们可能大于1
  72. clipped_preds = tuple(min(float(p), 1.0) for p in preds)
  73. rgb = tuple(int(p * 255) for p in clipped_preds)
  74. print("rgb:", rgb)
  75. data = [[clipped_preds]]
  76. plt.imshow(data)
  77. plt.title(color_name)
  78. plt.savefig(color_name+'.png')
  79. if __name__ == "__main__":
  80. main()
  81. import os, six, time
  82. import tensorflow as tf
  83. import numpy as np
  84. from tensorflow import keras
  85. import urllib
  86. def parse(line):
  87. """
  88. 分析colors dataset中的行
  89. """
  90. # dataset的每一行以逗号分隔,格式为 color_name, r, g, b
  91. # 所以“items”是一个列表 [color_name, r, g, b].
  92. items = tf.string_split([line], ",").values
  93. rgb = tf.strings.to_number(items[1:], out_type=tf.float32) / 255.
  94. # 将颜色名称表示为一个one-hot编码字符序列
  95. color_name = items[0]
  96. chars = tf.one_hot(tf.io.decode_raw(color_name, tf.uint8), depth=256)
  97. # RNN需要序列长度
  98. length = tf.cast(tf.shape(chars)[0], dtype=tf.int64)
  99. return rgb, chars, length
  100. def maybe_download(filename, work_directory, source_url):
  101. """
  102. 从源url下载数据,除非它已经在这里了
  103. Args:
  104. filename: 字符串,目录中文件的名称。
  105. work_directory: 字符串,工作目录的路径。
  106. source_url: 如果文件不存在则从URL去下载。
  107. Returns:
  108. result文件的路径。
  109. """
  110. if not tf.io.gfile.exists(work_directory):
  111. tf.io.gfile.makedirs(work_directory)
  112. filepath = os.path.join(work_directory, filename)
  113. if not tf.io.gfile.exists(filepath):
  114. temp_file_name, _ = urllib.request.urlretrieve(source_url)
  115. tf.io.gfile.copy(temp_file_name, filepath)
  116. with tf.io.gfile.GFile(filepath) as f:
  117. size = f.size()
  118. print("Successfully downloaded", filename, size, "bytes.")
  119. return filepath
  120. def load_dataset(data_dir, url, batch_size):
  121. """将路径处的颜色数据加载到PaddedDataset中"""
  122. # 将数据下载到url到data_dir/basename(url)中。dataset 有一个header
  123. # 行(color_name, r, g, b) 后跟逗号分隔的行。
  124. path = maybe_download(os.path.basename(url), data_dir, url)
  125. # 此命令链通过以下方式加载我们的数据:
  126. # 1. 跳过header (.skip(1))
  127. # 2. 分析后续行(.map(parse))
  128. # 3. shuffle 数据(.shuffle(...))
  129. # 3. 将数据分组到填充的批中 (.padded_batch(...)).
  130. dataset = tf.data.TextLineDataset(path).skip(1).map(parse).shuffle(
  131. buffer_size=10000).padded_batch(
  132. batch_size, padded_shapes=([None], [None, None], []))
  133. return dataset
  134. import tensorflow as tf
  135. from tensorflow import keras
  136. class RNNColorbot(keras.Model):
  137. """
  138. 在实值向量标签上回归的多层(LSTM)RNN
  139. """
  140. def __init__(self, rnn_cell_sizes, label_dimension, keep_prob):
  141. """构造一个 RNNColorbot
  142. Args:
  143. rnn_cell_sizes: 表示RNN中每个LSTM单元大小的整数列表; rnn_cell_sizes[i] 是第i层单元的大小
  144. label_dimension: 要回归的labels的长度
  145. keep_prob: (1 - dropout 概率); dropout应用于每个LSTM层的输出
  146. """
  147. super(RNNColorbot, self).__init__(name="")
  148. self.rnn_cell_sizes = rnn_cell_sizes
  149. self.label_dimension = label_dimension
  150. self.keep_prob = keep_prob
  151. self.cells = [keras.layers.LSTMCell(size) for size in rnn_cell_sizes]
  152. self.relu = keras.layers.Dense(label_dimension, activation=tf.nn.relu)
  153. def call(self, inputs, training=None):
  154. """
  155. 实现RNN逻辑和预测生成
  156. Args:
  157. inputs: 元组(字符,序列长度), 其中chars是一个由一个one-hot编码的颜色名组成的batch,表示为一个张量,
  158. 其维度为[batch_size, time_steps, 256] 和 序列长度,将每个字符序列(color name)的长度保持为一个张量,其维度为[batch_size]
  159. training: 调用是否在training 期间发生
  160. Returns:
  161. 通过传递字符产生的 通过多层RNN并将ReLU应用到最终隐藏状态而产生的维度张量[batch_size, label_dimension]。
  162. """
  163. (chars, sequence_length) = inputs
  164. # 变换第一和第二维度,使字符具有形状 [time_steps, batch_size, dimension].
  165. chars = tf.transpose(chars, [1, 0, 2])
  166. # 外环在RNN的各个层之间循环;内环执行特定层的时间步。
  167. batch_size = int(chars.shape[1])
  168. for l in range(len(self.cells)): # for each layer
  169. cell = self.cells[l]
  170. outputs = []
  171. # h_zero, c_zero
  172. state = (tf.zeros((batch_size, self.rnn_cell_sizes[l])),
  173. tf.zeros((batch_size, self.rnn_cell_sizes[l])))
  174. # 取消堆栈输入以获取批列表,每个时间步一个。
  175. chars = tf.unstack(chars, axis=0)
  176. for ch in chars: # for each time stamp
  177. output, state = cell(ch, state)
  178. outputs.append(output)
  179. # 这一层的输出是下一层的输入。
  180. # [t, b, h]
  181. chars = tf.stack(outputs, axis=0)
  182. if training:
  183. chars = tf.nn.dropout(chars, self.keep_prob)
  184. # 为每个示例提取正确的输出(即隐藏状态)。此批中的所有字符序列都被填充到相同的固定长度,以便它们可以很容易地通过上述RNN循环输入
  185. # `sequence_length`向量告诉我们字符序列的真实长度,让我们为每个序列获取由其非填充字符生成的隐藏状态。
  186. batch_range = [i for i in range(batch_size)]
  187. # stack [64] with [64] => [64, 2]
  188. indices = tf.stack([sequence_length - 1, batch_range], axis=1)
  189. # [t, b, h]
  190. # print(chars.shape)
  191. hidden_states = tf.gather_nd(chars, indices)
  192. # print(hidden_states.shape)
  193. return self.relu(hidden_states)

发表评论

表情:
评论列表 (有 0 条评论,251人围观)

还没有评论,来说两句吧...

相关阅读