UFLDL教程之(一)sparseae_exercise

墨蓝 2021-07-16 11:34 548阅读 0赞

下面,将UFLDL教程中的 sparseae_exercise练习中的各函数及注释列举如下

首先,给出各函数的调用关系

主函数:train.m

(1)调用sampleIMAGES函数从已知图像中扣取多个图像块儿

(2)调用display_network函数,以网格的形式,随机显示多个扣取的图像块儿

(3)梯度校验,该部分的目的是测试函数是否正确,可以由单独的函数checkSparseAutoencoderCost实现

  1. ①利用sparseAutoencoderCost函数计算网路的代价函数和梯度值
  2. ②利用computeNumericalGradient函数计算梯度值(这里,要利用checkNumericalGradient函数验证该梯度计算函数是否正确)
  3. ③比较①和②的梯度计算结果,判断编写的sparseAutoencoderCost函数是否正确
  4. 如果sparseAutoencoderCost函数是正确的,那么,在实际训练中,不需要运行checkSparseAutoencoderCost

(4)利用L-BFGS方法对网络进行训练,从而得到最优化的网络的权值和偏执项

(5)对训练结果进行可视化

然后,对个函数给出注释

train.m

  1. %% CS294A/CS294W Programming Assignment Starter Code
  2. addpath ..common\
  3. %%======================================================================
  4. %% STEP 0: Here we provide the relevant parameters values that will
  5. % allow your sparse autoencoder to get good filters; you do not need to change the parameters below.
  6. visibleSize = 8*8; % number of input units
  7. hiddenSize = 25; % number of hidden units
  8. sparsityParam = 0.01; % desired average activation of the hidden units.
  9. % (This was denoted by the Greek alphabet rho, which looks like a lower-case "p", in the lecture notes).
  10. lambda = 0.0001; % weight decay parameter
  11. beta = 3; % weight of sparsity penalty term
  12. %%======================================================================
  13. %% STEP 1: Implement sampleIMAGES
  14. % After implementing sampleIMAGES, the display_network command should display a random sample of 200 patches from the dataset
  15. %从图像中提取图像块儿,每一个提取到的图像块儿存放在patches的每一列中
  16. patches = sampleIMAGES;
  17. %随机提取patches中的200列,然后显示这200列所对应的图像
  18. IMG=patches(:,randi(size(patches,2),200,1));
  19. display_network(IMG,8);
  20. %%======================================================================
  21. %% STEP 2 and STEP 3Implement sparseAutoencoderCost and Gradient Checking
  22. checkSparseAutoencoderCost()
  23. %%======================================================================
  24. %% STEP 4: After verifying that your implementation of
  25. % Randomly initialize the parameters
  26. theta = initializeParameters(hiddenSize, visibleSize);
  27. % Use minFunc to minimize the function
  28. addpath minFunc/
  29. options.Method = 'lbfgs'; % Here, we use L-BFGS to optimize our cost function
  30. % Generally, for minFunc to work, you need a function pointer with two outputs: the function value and the gradient.
  31. % In our problem, sparseAutoencoderCost.m satisfies this.
  32. options.maxIter = 400; % Maximum number of iterations of L-BFGS to run
  33. options.display = 'on';
  34. % opttheta是整个神经网络的权值和偏执项构成的向量
  35. [opttheta, cost] = minFunc( @(p) sparseAutoencoderCost(p, ...
  36. visibleSize, hiddenSize, ...
  37. lambda, sparsityParam, ...
  38. beta, patches), ...
  39. theta, options);
  40. %%======================================================================
  41. %% STEP 5: Visualization
  42. W1 = reshape(opttheta(1:hiddenSize*visibleSize), hiddenSize, visibleSize);%第一层的权值矩阵
  43. display_network(W1', 12);
  44. print -djpeg weights.jpg % save the visualization to a file

checkSparseAutoencoderCost.m

  1. %% 该函数主要目的是检验SparseAutoencoderCost函数是否正确
  2. function checkSparseAutoencoderCost()
  3. %% 产生一个稀疏自编码网络(可以与主程序相同,也可以重新产生)
  4. visibleSize = 8*8; % number of input units
  5. hiddenSize = 25; % number of hidden units
  6. sparsityParam = 0.01; % desired average activation of the hidden units.
  7. % (This was denoted by the Greek alphabet rho, which looks like a lower-case "p", in the lecture notes).
  8. lambda = 0.0001; % weight decay parameter
  9. beta = 3; % weight of sparsity penalty term
  10. patches = sampleIMAGES;
  11. % Obtain random parameters theta
  12. theta = initializeParameters(hiddenSize, visibleSize);
  13. %% 计算代价函数和梯度
  14. [cost, grad] = sparseAutoencoderCost(theta, visibleSize, hiddenSize, lambda, ...
  15. sparsityParam, beta, patches(:,1:10));
  16. %% 利用近似方法计算梯度(要调用自编码器的代价函数计算程序)
  17. numgrad = computeNumericalGradient( @(x) sparseAutoencoderCost(x, visibleSize, ...
  18. hiddenSize, lambda, ...
  19. sparsityParam, beta, ...
  20. patches(:,1:10)), theta);
  21. %% 比较cost函数计算得到的梯度和由近似得到的梯度之
  22. % Use this to visually compare the gradients side by side
  23. disp([numgrad grad]);
  24. % Compare numerically computed gradients with the ones obtained from backpropagation
  25. diff = norm(numgrad-grad)/norm(numgrad+grad);
  26. disp(diff); % Should be small. In our implementation, these values are usually less than 1e-9.
  27. end

sparseAutoencoderCost.m

  1. %% 计算网络的代价函数和梯度
  2. function [cost,grad] = sparseAutoencoderCost(theta, visibleSize, hiddenSize, ...
  3. lambda, sparsityParam, beta, data)
  4. % visibleSize: the number of input units (probably 64)
  5. % hiddenSize: the number of hidden units (probably 25)
  6. % lambda: weight decay parameter
  7. % sparsityParam: The desired average activation for the hidden units (denoted in the lecture
  8. % notes by the greek alphabet rho, which looks like a lower-case "p").
  9. % beta: weight of sparsity penalty term
  10. % data: Our 64x10000 matrix containing the training data. So, data(:,i) is the i-th training example.
  11. % The input theta is a vector (because minFunc expects the parameters to be a vector).
  12. % We first convert theta to the (W1, W2, b1, b2) matrix/vector format, so that this
  13. % follows the notation convention of the lecture notes.
  14. W1 = reshape(theta(1:hiddenSize*visibleSize), hiddenSize, visibleSize);
  15. W2 = reshape(theta(hiddenSize*visibleSize+1:2*hiddenSize*visibleSize), visibleSize, hiddenSize);
  16. b1 = theta(2*hiddenSize*visibleSize+1:2*hiddenSize*visibleSize+hiddenSize);
  17. b2 = theta(2*hiddenSize*visibleSize+hiddenSize+1:end);
  18. % Cost and gradient variables (your code needs to compute these values).
  19. % Here, we initialize them to zeros.
  20. cost = 0;
  21. m=size(data,2);
  22. %% ---------- YOUR CODE HERE --------------------------------------
  23. % Instructions: Compute the cost/optimization objective J_sparse(W,b) for the Sparse Autoencoder,
  24. % and the corresponding gradients W1grad, W2grad, b1grad, b2grad.
  25. %
  26. % W1grad, W2grad, b1grad and b2grad should be computed using backpropagation.
  27. % Note that W1grad has the same dimensions as W1, b1grad has the same dimensions
  28. % as b1, etc. Your code should set W1grad to be the partial derivative of J_sparse(W,b) with
  29. % respect to W1. I.e., W1grad(i,j) should be the partial derivative of J_sparse(W,b)
  30. % with respect to the input parameter W1(i,j). Thus, W1grad should be equal to the term
  31. % [(1/m) \Delta W^{(1)} + \lambda W^{(1)}] in the last block of pseudo-code in Section 2.2
  32. % of the lecture notes (and similarly for W2grad, b1grad, b2grad).
  33. %
  34. % Stated differently, if we were using batch gradient descent to optimize the parameters,
  35. % the gradient descent update to W1 would be W1 := W1 - alpha * W1grad, and similarly for W2, b1, b2.
  36. %
  37. %% 前向传播算法
  38. a1=data;
  39. z2=bsxfun(@plus,W1*a1,b1);
  40. a2=sigmoid(z2);
  41. z3=bsxfun(@plus,W2*a2,b2);
  42. a3=sigmoid(z3);
  43. %% 计算网络误差
  44. % 误差项J1=所有样本代价函数均值
  45. y=data; % 网络的理想输出值
  46. Ei=sum((a3-y).^2)/2; %每一个样本的代价函数
  47. J1=sum(Ei)/m;
  48. % 正则化项J2=所有权值项平方和
  49. J2=sum(W1(:).^2)+sum(W2(:).^2);
  50. % 稀疏项J3=所有隐藏层的神经元相对熵之和
  51. rho_hat=sum(a2,2)/m;
  52. KL=sum(sparsityParam*log(sparsityParam./rho_hat)+...
  53. (1-sparsityParam)*log((1-sparsityParam)./(1-rho_hat)));
  54. J3=KL;
  55. % 网络的代价函数
  56. cost=J1+lambda*J2/2+beta*J3;
  57. %% 反向传播算法计算各层敏感度delta
  58. delta3=-(data-a3).*dsigmoid(z3);
  59. spare_delta=beta*(-sparsityParam./rho_hat+(1-sparsityParam)./(1-rho_hat));
  60. delta2=bsxfun(@plus,W2'*delta3,spare_delta).*dsigmoid(z2); % 这里加入了稀疏项
  61. %% 计算代价函数对各层权值和偏执项的梯度
  62. W1grad=delta2*a1'/m+lambda*W1;
  63. W2grad=delta3*a2'/m+lambda*W2;
  64. b1grad=sum(delta2,2)/m;
  65. b2grad=sum(delta3,2)/m;
  66. %-------------------------------------------------------------------
  67. % After computing the cost and gradient, we will convert the gradients back
  68. % to a vector format (suitable for minFunc). Specifically, we will unroll
  69. % your gradient matrices into a vector.
  70. grad = [W1grad(:) ; W2grad(:) ; b1grad(:) ; b2grad(:)];
  71. %
  72. end
  73. %-------------------------------------------------------------------
  74. % Here's an implementation of the sigmoid function, which you may find useful
  75. % in your computation of the costs and the gradients. This inputs a (row or
  76. % column) vector (say (z1, z2, z3)) and returns (f(z1), f(z2), f(z3)).
  77. function sigm = sigmoid(x)
  78. sigm = 1 ./ (1 + exp(-x));
  79. end
  80. %% 求解sigmoid函数的导数(这里的计算公式一定要注意啊,出过错)
  81. function dsigm = dsigmoid(x)
  82. sigx = sigmoid(x);
  83. dsigm=sigx.*(1-sigx);
  84. end

梯度检验函数见另一篇博文

  

  

  

发表评论

表情:
评论列表 (有 0 条评论,548人围观)

还没有评论,来说两句吧...

相关阅读

    相关 javaScript简单教程之一

    JavaScript简单教程 本文主要讲解JavaScript的一些基础知识,如:变量,数据结构,循环,控制,集合等。后续还会有介绍函数,面向对象,JQuery,node