UFLDL教程笔记及练习答案六(稀疏编码与稀疏编码自编码表达)

你的名字 2022-07-31 20:57 293阅读 0赞

稀疏编码(SparseCoding)

sparse coding也是deep learning中一个重要的分支,同样能够提取出数据集很好的特征(稀疏的)。选择使用具有稀疏性的分量来表示我们的输入数据是有原因的,因为绝大多数的感官数据,比如自然图像,可以被表示成少量基本元素的叠加,在图像中这些基本元素可以是面或者线(人脑有大量的神经元,但对于某些图像或者边缘只有很少的神经元兴奋,其他都处于抑制状态)。

稀疏编码算法的目的就是找到一组基向量Center使得我们能将输入向量x表示成这些基向量的线性组合:

Center 1

这里构成的基向量要求是超完备的,即要求k大于n,这样的方程就大多情况会有无穷多个解,此时我们给它加入一个稀疏性的限制,最终的优化公式变成了如下形式:

Center 2

其中S(ai)就是稀疏惩罚项,可以是L0或者L1范数,L1范数和L0范数都可以表示稀疏编码,L1范数是L0范数的最优凸近似,但是L1因具有比L0更好的优化求解特性而被广泛应用。

稀疏编码自编码表达:

将稀疏编码用到深度学习中,用于提取数据集良好的稀疏特征,设A为超完备的基向量,s表示输入数据最后的稀疏特征(也就是稀疏编码中的稀疏系数),这样就可以表示成X = A*s。

其实这里的A就等同于稀疏自编码中的W2,而s就是隐层的结点值。(当具有很多样本的时候,s就是一个矩阵,每一列表示的是一个样本的稀疏特征值)

最终的优化函数变成了:

Center 3

优化步骤为:

Center 4

可以采用以下两个trick来提高最后的迭代速度和精度。

(1)将样本表示成良好的“迷你块”,比如你有10000个样本,我们可以每次只随机选择2000个mini_patches进行迭代,这样不仅提高了迭代的速度,也提升了收敛的速度。

(2)良好的s初始化。因为我们的目标是X=As,因此交叉迭代优化的过程中,在A给定的情况下,我们可以将s初始化为s=AT*X,但这样可能会导致稀疏性的缺失,我们在做一个规范,这里其中的s的行数就等于A列数,然后用s的每个元素除以其所在A的那一列向量的2范数。

Center 5其中Ar表示矩阵A的第r列。这样对s做规范化处理就是为了保持较小的稀疏惩罚值。<个人认为UFLDL教程中该处A的下标是错误的。>

因此最后的优化算法步骤表示为:

Center 6

注意:以上对s和A采用交叉迭代优化,其中我们会发现分别对s和A求导的时候,发现可以直接得出A的解析形式的解,因此在优化A的时候直接给出其解析形式的解就可以了,而s我们无法给出其解析形式的解,就需要用梯度迭代等无约束的优化问题了。

测试:当有一个新的样本x,我们需要用以上训练集得到的A来优化以上cost函数得到s就是该样本的稀疏特征。这样相比之前的前馈网络的,每次对新的数据样本进行“编码”,我们必须再次执行优化过程来得到所需的系数。

注:教程中拓扑稀疏编码的内容我还没有弄明白是如何得到groupMatrix的,这里就不误导大家了。

练习答案:

以下对grad的求解可以参看这篇blog:http://www.cnblogs.com/tornadomeet/archive/2013/04/16/3024292.html给出的推导结果。

Sparse_coding_exercise.m

  1. %% CS294A/CS294W Sparse Coding Exercise
  2. % Instructions
  3. % ------------
  4. %
  5. % This file contains code that helps you get started on the
  6. % sparse coding exercise. In this exercise, you will need to modify
  7. % sparseCodingFeatureCost.m and sparseCodingWeightCost.m. You will also
  8. % need to modify this file, sparseCodingExercise.m slightly.
  9. % Add the paths to your earlier exercises if necessary
  10. % addpath /path/to/solution
  11. %%======================================================================
  12. %% STEP 0: Initialization
  13. % Here we initialize some parameters used for the exercise.
  14. numPatches = 20000; % number of patches
  15. numFeatures = 121; % number of features to learn
  16. patchDim = 8; % patch dimension
  17. visibleSize = patchDim * patchDim;
  18. % dimension of the grouping region (poolDim x poolDim) for topographic sparse coding
  19. poolDim = 3;
  20. % number of patches per batch
  21. batchNumPatches = 2000;
  22. lambda = 5e-5; % L1-regularisation parameter (on features)
  23. epsilon = 1e-5; % L1-regularisation epsilon |x| ~ sqrt(x^2 + epsilon)
  24. gamma = 1e-2; % L2-regularisation parameter (on basis)
  25. %%======================================================================
  26. %% STEP 1: Sample patches
  27. images = load('IMAGES.mat');
  28. images = images.IMAGES;
  29. patches = sampleIMAGES(images, patchDim, numPatches);
  30. display_network(patches(:, 1:64));
  31. %%======================================================================
  32. %% STEP 2: Implement and check sparse coding cost functions
  33. % Implement the two sparse coding cost functions and check your gradients.
  34. % The two cost functions are
  35. % 1) sparseCodingFeatureCost (in sparseCodingFeatureCost.m) for the features
  36. % (used when optimizing for s, which is called featureMatrix in this exercise)
  37. % 2) sparseCodingWeightCost (in sparseCodingWeightCost.m) for the weights
  38. % (used when optimizing for A, which is called weightMatrix in this exericse)
  39. % We reduce the number of features and number of patches for debugging
  40. % numFeatures = 25;
  41. % patches = patches(:, 1:5);
  42. % numPatches = 5;
  43. weightMatrix = randn(visibleSize, numFeatures) * 0.005;
  44. featureMatrix = randn(numFeatures, numPatches) * 0.005;
  45. %% STEP 2a: Implement and test weight cost
  46. % Implement sparseCodingWeightCost in sparseCodingWeightCost.m and check
  47. % the gradient
  48. [cost, grad] = sparseCodingWeightCost(weightMatrix, featureMatrix, visibleSize, numFeatures, patches, gamma, lambda, epsilon);
  49. numgrad = computeNumericalGradient( @(x) sparseCodingWeightCost(x, featureMatrix, visibleSize, numFeatures, patches, gamma, lambda, epsilon), weightMatrix(:) );
  50. % Uncomment the blow line to display the numerical and analytic gradients side by side
  51. % disp([numgrad grad]);
  52. diff = norm(numgrad-grad)/norm(numgrad+grad);
  53. fprintf('Weight difference: %g\n', diff);
  54. assert(diff < 1e-8, 'Weight difference too large. Check your weight cost function. ');
  55. %% STEP 2b: Implement and test feature cost (non-topographic)
  56. % Implement sparseCodingFeatureCost in sparseCodingFeatureCost.m and check
  57. % the gradient. You may wish to implement the non-topographic version of
  58. % the cost function first, and extend it to the topographic version later.
  59. % Set epsilon to a larger value so checking the gradient numerically makes sense
  60. epsilon = 1e-2;
  61. % We pass in the identity matrix as the grouping matrix, putting each
  62. % feature in a group on its own.
  63. groupMatrix = eye(numFeatures);
  64. [cost, grad] = sparseCodingFeatureCost(weightMatrix, featureMatrix, visibleSize, numFeatures, patches, gamma, lambda, epsilon, groupMatrix);
  65. numgrad = computeNumericalGradient( @(x) sparseCodingFeatureCost(weightMatrix, x, visibleSize, numFeatures, patches, gamma, lambda, epsilon, groupMatrix), featureMatrix(:) );
  66. % Uncomment the blow line to display the numerical and analytic gradients side by side
  67. % disp([numgrad grad]);
  68. diff = norm(numgrad-grad)/norm(numgrad+grad);
  69. fprintf('Feature difference (non-topographic): %g\n', diff);
  70. assert(diff < 1e-8, 'Feature difference too large. Check your feature cost function. ');
  71. %% STEP 2c: Implement and test feature cost (topographic)
  72. % Implement sparseCodingFeatureCost in sparseCodingFeatureCost.m and check
  73. % the gradient. This time, we will pass a random grouping matrix in to
  74. % check if your costs and gradients are correct for the topographic
  75. % version.
  76. % Set epsilon to a larger value so checking the gradient numerically makes sense
  77. epsilon = 1e-2;
  78. % This time we pass in a random grouping matrix to check if the grouping is
  79. % correct.
  80. groupMatrix = rand(100, numFeatures);
  81. [cost, grad] = sparseCodingFeatureCost(weightMatrix, featureMatrix, visibleSize, numFeatures, patches, gamma, lambda, epsilon, groupMatrix);
  82. numgrad = computeNumericalGradient( @(x) sparseCodingFeatureCost(weightMatrix, x, visibleSize, numFeatures, patches, gamma, lambda, epsilon, groupMatrix), featureMatrix(:) );
  83. % Uncomment the blow line to display the numerical and analytic gradients side by side
  84. % disp([numgrad grad]);
  85. diff = norm(numgrad-grad)/norm(numgrad+grad);
  86. fprintf('Feature difference (topographic): %g\n', diff);
  87. assert(diff < 1e-8, 'Feature difference too large. Check your feature cost function. ');
  88. %%======================================================================
  89. %% STEP 3: Iterative optimization
  90. % Once you have implemented the cost functions, you can now optimize for
  91. % the objective iteratively. The code to do the iterative optimization
  92. % using mini-batching and good initialization of the features has already
  93. % been included for you.
  94. %
  95. % However, you will still need to derive and fill in the analytic solution
  96. % for optimizing the weight matrix given the features.
  97. % Derive the solution and implement it in the code below, verify the
  98. % gradient as described in the instructions below, and then run the
  99. % iterative optimization.
  100. % Initialize options for minFunc
  101. options.Method = 'lbfgs';
  102. options.display = 'off';
  103. options.verbose = 0;
  104. % Initialize matrices
  105. weightMatrix = rand(visibleSize, numFeatures);
  106. featureMatrix = rand(numFeatures, batchNumPatches);
  107. % Initialize grouping matrix
  108. assert(floor(sqrt(numFeatures)) ^2 == numFeatures, 'numFeatures should be a perfect square');
  109. donutDim = floor(sqrt(numFeatures));
  110. assert(donutDim * donutDim == numFeatures,'donutDim^2 must be equal to numFeatures');
  111. groupMatrix = zeros(numFeatures, donutDim, donutDim);
  112. groupNum = 1; %% 获得拓扑稀疏编码 这段处理不太懂啊!!
  113. for row = 1:donutDim
  114. for col = 1:donutDim
  115. groupMatrix(groupNum, 1:poolDim, 1:poolDim) = 1;
  116. groupNum = groupNum + 1;
  117. groupMatrix = circshift(groupMatrix, [0 0 -1]);
  118. end
  119. groupMatrix = circshift(groupMatrix, [0 -1, 0]);
  120. end
  121. groupMatrix = reshape(groupMatrix, numFeatures, numFeatures);
  122. if isequal(questdlg('Initialize grouping matrix for topographic or non-topographic sparse coding?', 'Topographic/non-topographic?', 'Non-topographic', 'Topographic', 'Non-topographic'), 'Non-topographic')
  123. groupMatrix = eye(numFeatures);
  124. end
  125. % Initial batch
  126. indices = randperm(numPatches);
  127. indices = indices(1:batchNumPatches);
  128. batchPatches = patches(:, indices);
  129. fprintf('%6s%12s%12s%12s%12s\n','Iter', 'fObj','fResidue','fSparsity','fWeight');
  130. for iteration = 1:200 %% 因为要交替优化直到最小化cost function, 所以才这样进行的
  131. error = weightMatrix * featureMatrix - batchPatches;
  132. error = sum(error(:) .^ 2) / batchNumPatches;
  133. fResidue = error;
  134. R = groupMatrix * (featureMatrix .^ 2);
  135. R = sqrt(R + epsilon);
  136. fSparsity = lambda * sum(R(:));
  137. fWeight = gamma * sum(weightMatrix(:) .^ 2);
  138. fprintf(' %4d %10.4f %10.4f %10.4f %10.4f\n', iteration, fResidue+fSparsity+fWeight, fResidue, fSparsity, fWeight) %% 以上这部分可以不用的,只是为了显示最终的
  139. % Select a new batch
  140. indices = randperm(numPatches); %% 重新挑选2000个样本用来进行训练
  141. indices = indices(1:batchNumPatches);
  142. batchPatches = patches(:, indices); %%% 重新挑选的样本
  143. % Reinitialize featureMatrix with respect to the new batch
  144. featureMatrix = weightMatrix' * batchPatches; %% trick 对featureMatrix(s)进行初始化 --技巧 方法
  145. normWM = sum(weightMatrix .^ 2)'; %%%%% 也就是weightMatrix矩阵每列的平方和
  146. featureMatrix = bsxfun(@rdivide, featureMatrix, normWM); %% featureMatrix除以上者
  147. % Optimize for feature matrix
  148. options.maxIter = 20; % 迭代20次,并对featureMatrix进行无约束优化
  149. [featureMatrix, cost] = minFunc( @(x) sparseCodingFeatureCost(weightMatrix, x, visibleSize, numFeatures, batchPatches, gamma, lambda, epsilon, groupMatrix), ...
  150. featureMatrix(:), options);
  151. featureMatrix = reshape(featureMatrix, numFeatures, batchNumPatches);
  152. % Optimize for weight matrix
  153. weightMatrix = zeros(visibleSize, numFeatures); %%% 通过直接求导得出对weightMatrix进行优化,这里无需进行梯度迭代或者牛顿法等得出最终的结果
  154. weightMatrix = batchPatches*featureMatrix'/(gamma*batchNumPatches* eye(size(featureMatrix, 1)) + featureMatrix*featureMatrix');
  155. % -------------------- YOUR CODE HERE --------------------
  156. % Instructions:
  157. % Fill in the analytic solution for weightMatrix that minimizes
  158. % the weight cost here.
  159. % Once that is done, use the code provided below to check that your
  160. % closed form solution is correct.
  161. % Once you have verified that your closed form solution is correct,
  162. % you should comment out the checking code before running the
  163. % optimization.
  164. [cost, grad] = sparseCodingWeightCost(weightMatrix, featureMatrix, visibleSize, numFeatures, batchPatches, gamma, lambda, epsilon, groupMatrix);
  165. assert(norm(grad) < 1e-12, 'Weight gradient should be close to 0. Check your closed form solution for weightMatrix again.')
  166. error('Weight gradient is okay. Comment out checking code before running optimization.');
  167. % -------------------- YOUR CODE HERE --------------------
  168. % Visualize learned basis
  169. figure(1);
  170. display_network(weightMatrix);
  171. end

sparseCodingWeight.m

  1. function [cost, grad] = sparseCodingWeightCost(weightMatrix, featureMatrix, visibleSize, numFeatures, patches, gamma, lambda, epsilon, groupMatrix)
  2. %sparseCodingWeightCost - given the features in featureMatrix,
  3. % computes the cost and gradient with respect to
  4. % the weights, given in weightMatrix
  5. % parameters
  6. % weightMatrix - the weight matrix. weightMatrix(:, c) is the cth basis
  7. % vector.
  8. % featureMatrix - the feature matrix. featureMatrix(:, c) is the features
  9. % for the cth example
  10. % visibleSize - number of pixels in the patches
  11. % numFeatures - number of features
  12. % patches - patches
  13. % gamma - weight decay parameter (on weightMatrix)
  14. % lambda - L1 sparsity weight (on featureMatrix)
  15. % epsilon - L1 sparsity epsilon
  16. % groupMatrix - the grouping matrix. groupMatrix(r, :) indicates the
  17. % features included in the rth group. groupMatrix(r, c)
  18. % is 1 if the cth feature is in the rth group and 0
  19. % otherwise.
  20. if exist('groupMatrix', 'var')
  21. assert(size(groupMatrix, 2) == numFeatures, 'groupMatrix has bad dimension');
  22. else
  23. groupMatrix = eye(numFeatures);
  24. end
  25. numExamples = size(patches, 2);
  26. weightMatrix = reshape(weightMatrix, visibleSize, numFeatures);
  27. featureMatrix = reshape(featureMatrix, numFeatures, numExamples);
  28. % -------------------- YOUR CODE HERE --------------------
  29. % Instructions:
  30. % Write code to compute the cost and gradient with respect to the
  31. % weights given in weightMatrix.
  32. % -------------------- YOUR CODE HERE --------------------
  33. ave_square = sum(sum((weightMatrix * featureMatrix - patches).^2))./numExamples; %计算重构误差
  34. weight_decay = gamma * sum(sum(weightMatrix.^2)); %
  35. cost = ave_square + weight_decay;
  36. grad = (2*weightMatrix*featureMatrix*featureMatrix' - 2 * patches*featureMatrix')./numExamples + 2*gamma*weightMatrix;
  37. grad = grad(:);
  38. end

sparseCodingFeatureCost.m

  1. function [cost, grad] = sparseCodingFeatureCost(weightMatrix, featureMatrix, visibleSize, numFeatures, patches, gamma, lambda, epsilon, groupMatrix)
  2. %sparseCodingFeatureCost - given the weights in weightMatrix,
  3. % computes the cost and gradient with respect to
  4. % the features, given in featureMatrix
  5. % parameters
  6. % weightMatrix - the weight matrix. weightMatrix(:, c) is the cth basis
  7. % vector.
  8. % featureMatrix - the feature matrix. featureMatrix(:, c) is the features
  9. % for the cth example
  10. % visibleSize - number of pixels in the patches
  11. % numFeatures - number of features
  12. % patches - patches
  13. % gamma - weight decay parameter (on weightMatrix)
  14. % lambda - L1 sparsity weight (on featureMatrix)
  15. % epsilon - L1 sparsity epsilon
  16. % groupMatrix - the grouping matrix. groupMatrix(r, :) indicates the
  17. % features included in the rth group. groupMatrix(r, c)
  18. % is 1 if the cth feature is in the rth group and 0
  19. % otherwise.
  20. isTopo = 1;
  21. if exist('groupMatrix', 'var')
  22. assert(size(groupMatrix, 2) == numFeatures, 'groupMatrix has bad dimension');
  23. if(isequal(groupMatrix, eye(numFeatures)))
  24. isTopo = 0;
  25. end
  26. else
  27. groupMatrix = eye(numFeatures);
  28. isTopo = 0;
  29. end
  30. numExamples = size(patches, 2);
  31. weightMatrix = reshape(weightMatrix, visibleSize, numFeatures);
  32. featureMatrix = reshape(featureMatrix, numFeatures, numExamples);
  33. % -------------------- YOUR CODE HERE --------------------
  34. % Instructions:
  35. % Write code to compute the cost and gradient with respect to the
  36. % features given in featureMatrix.
  37. % You may wish to write the non-topographic version, ignoring
  38. % the grouping matrix groupMatrix first, and extend the
  39. % non-topographic version to the topographic version later.
  40. % -------------------- YOUR CODE HERE --------------------
  41. ave_square = sum(sum((weightMatrix * featureMatrix - patches).^2))./numExamples; % 计算重构误差
  42. sparsity = lambda .* sum(sum(sqrt( groupMatrix * (featureMatrix.^2) + epsilon))); %计算系数惩罚项
  43. cost = ave_square + sparsity;
  44. gradResidue = (2* weightMatrix'* weightMatrix*featureMatrix - 2*weightMatrix'*patches)./numExamples; %%+ lambda*featureMatrix./sqrt(featureMatrix.^2+epsilon);
  45. if ~isTopo
  46. gradSparsity = lambda*featureMatrix./sqrt(featureMatrix.^2+epsilon); %%% 不是拓扑的稀疏编码
  47. else
  48. gradSparsity = lambda * groupMatrix'*(groupMatrix *(featureMatrix .^ 2) + epsilon).^(0.5).*featureMatrix; %% 拓扑稀疏编码
  49. end
  50. grad = gradResidue + gradSparsity;
  51. grad = grad(:);
  52. end

参考文献:

1:UFLDL教程http://ufldl.stanford.edu/wiki/index.php/UFLDL%E6%95%99%E7%A8%8B

2:http://blog.csdn.net/zouxy09/article/details/24971995/机器学习中的范数规则化之(一)L0、L1与L2范数

3:http://www.cnblogs.com/tornadomeet/archive/2013/04/16/3024292.htmlDeep learning:二十九(Sparse coding练习)

4:http://www.cnblogs.com/tornadomeet/archive/2013/04/13/3018393.htmlDeep learning:二十六(Sparse coding简单理解)

发表评论

表情:
评论列表 (有 0 条评论,293人围观)

还没有评论,来说两句吧...

相关阅读

    相关 编码

    编码: ascii:早期计算机都是用ascii编码,包含了所有的英文和特殊字符 以8位来存储一个字母(一个字节) 8位=1字节(8位是计算机二进制 只能以,010101