无监督学习Unsupervised learning Myth丶恋晨 2023-07-25 09:28 5阅读 0赞 ### 文章目录 ### * 机器学习三大分支:有监督学习,无监督学习,强化学习 * 无监督学习对输入的无标签数据的先验概率密度进行建模 * * 有监督学习试图推断出条件概率密度 P X ( x ∣ y ) P\_X(x|y) PX(x∣y) * 无监督学习试图推断输入数据的先验概率分布 P X ( x ) P\_X(x) PX(x) * 无监督学习的两种主要方法:PCA和聚类 * * PCA(主成分分析) * * 自编码器:以无监督方式学习数据编码的神经网络,被称为非线性PCA * 聚类分析 * 无监督学习的中心应用:统计学中的密度估计 # 机器学习三大分支:有监督学习,无监督学习,强化学习 # 机器学习的三个主要分类是:**有监督学习,无监督学习,强化学习**。(并不是有监督,半监督,无监督!半监督只是综合利用了有监督和无监督技术,算是无监督和有监督的变体) > Unsupervised learning is a type of machine learning that looks for previously undetected patterns in a data set with no pre-existing labels and with a minimum of human supervision. In contrast to supervised learning that usually makes use of human-labeled data, unsupervised learning, also known as **self-organization** allows for **modeling of probability densities over inputs**.\[1\] It forms one of the three main categories of machine learning, along with **supervised and reinforcement learning**. Semi-supervised learning, a related variant, makes use of supervised and unsupervised techniques. # 无监督学习对输入的无标签数据的先验概率密度进行建模 # 无监督学习,也叫做自组织,self-organization,对输入的概率密度进行建模。这个概率密度更精准的说,就是先验概率。 ## 有监督学习试图推断出条件概率密度 P X ( x ∣ y ) P\_X(x|y) PX(x∣y) ## 即以输入数据的标签 y y y为条件,输入数据 x x x的概率密度。 用最简单的二分类举例,y取0或1, 假设数据也只有三种可能的离散情况 x 0 , x 1 , x 2 x\_0,x\_1,x\_2 x0,x1,x2,有监督学习需要对数据集的每一个数据点的 P ( x i ∣ y = 0 ) , P ( x i ∣ y = 1 ) , i = 0 , 1 , 2 P(x\_i|y = 0), P(x\_i|y = 1), i = 0,1,2 P(xi∣y=0),P(xi∣y=1),i=0,1,2进行计算,比如数据集是 \{ ( x 0 , 1 ) , ( x 1 , 1 ) , ( x 0 , 0 ) , ( x 2 , 0 ) , ( x 0 , 1 ) , ( x 1 , 0 ) , ( x 2 , 1 ) , ( x 0 , 1 ) , ( x 2 , 0 ) , ( x 1 , 1 ) \} \\\{(x\_0, 1), (x\_1, 1), (x\_0, 0), (x\_2, 0), (x\_0, 1), (x\_1, 0), (x\_2, 1), (x\_0, 1), (x\_2, 0), (x\_1, 1)\\\} \{ (x0,1),(x1,1),(x0,0),(x2,0),(x0,1),(x1,0),(x2,1),(x0,1),(x2,0),(x1,1)\} 则 P ( x 0 ∣ y = 0 ) = 0.25 , P ( x 0 ∣ y = 1 ) = 0.5 P(x\_0|y = 0) = 0.25, P(x\_0|y = 1) = 0.5 P(x0∣y=0)=0.25,P(x0∣y=1)=0.5 P ( x 1 ∣ y = 0 ) = 0.25 , P ( x 1 ∣ y = 1 ) = 0.33 P(x\_1|y = 0) = 0.25, P(x\_1|y = 1) = 0.33 P(x1∣y=0)=0.25,P(x1∣y=1)=0.33 P ( x 2 ∣ y = 0 ) = 0.5 , P ( x 2 ∣ y = 1 ) = 0.16 P(x\_2|y = 0) = 0.5, P(x\_2|y = 1) = 0.16 P(x2∣y=0)=0.5,P(x2∣y=1)=0.16 这是根据当前数据集所计算出来的精确的条件概率值,但是这是最简单的情况,即输入离散且可能情况少,标签也少。并且还有两个问题需要考虑: 一,这个数据集太小(是我随便编的),小数据集很有可能无法真实反映源域的实际概率分布情况,所以这样训练出来的模型泛化能力极差,这可以通过增大数据集和提高数据集质量来缓解,这就是为什么有监督学习领域**数据集的制作**也是个重要的精细的体力活儿; 二,这里做的假设比较简单,输入数据只有三种离散情况,但是实际中,输入数据要么是离散的但有更多可能情况,这就需要更多数据,以防止数据集中某种情况没出现导致模型以后都不认识这种情况;要么输入数据是连续取值的,这时候有监督学习的分类器要学的就是条件概率分布ccdf了,或者说条件概率密度cpdf(cpdf的积分是ccdf),而不是仅仅计算条件概率的值,这是更常见更普通的情况。 ## 无监督学习试图推断输入数据的先验概率分布 P X ( x ) P\_X(x) PX(x) ## > It could be contrasted with supervised learning by saying that whereas supervised learning intends to infer a conditional probability distribution P X ( x ∣ y ) P\_X(x|y) PX(x∣y) conditioned on the label y y y of input data; unsupervised learning intends to infer an a priori probability distribution P X ( x ) P\_X(x) PX(x). 比如数据集是 \{ x 0 , x 1 , x 0 , x 2 , x 0 , x 1 , x 2 , x 0 , x 2 , x 1 \} \\\{x\_0, x\_1, x\_0, x\_2, x\_0, x\_1, x\_2, x\_0, x\_2, x\_1\\\} \{ x0,x1,x0,x2,x0,x1,x2,x0,x2,x1\} 则 P X ( x 0 ) = 0.4 , P X ( x 1 ) = 0.3 , P X ( x 2 ) = 0.3 P\_X(x\_0) = 0.4, P\_X(x\_1) = 0.3, P\_X(x\_2) = 0.3 PX(x0)=0.4,PX(x1)=0.3,PX(x2)=0.3 # 无监督学习的两种主要方法:PCA和聚类 # ## PCA(主成分分析) ## 提取主要成分,以降低维度,是完全线性的,即每一个主成分都是独立的,对应一个特征值。所以每一个主成分都在不同的方向上,彼此完全正交,因此相互独立。 PCA计算数据的协方差矩阵的特征值,保留前n个特征值(以保证累计贡献率达到80%以上),前n个特征值的特征想向量们构建了一个n维空间,PCA就相当于把数据放到了这个n维的高维度的空间去审视,每一个主成分在一个独立的维度(方向上)。 ### 自编码器:以无监督方式学习数据编码的神经网络,被称为非线性PCA ### Autoencoder,自编码器的目标是为数据学出一组表示,representation或者叫做encodings,这样做主要的好处是降维。 自编码器的变体有正则化自编码器regularized autoencoders (比如稀疏自编码器,去噪自编码器,收缩自编码器,这些正则化自编码器可以学到很好的特征表示,对后续的有监督分类器很有帮助)和变分自编码器(主要被用作为生成模型)。 > An autoencoder is a type of artificial neural network used to learn efficient data codings in an **unsupervised** manner.\[1\] The aim of an autoencoder is to learn a **representation (encoding)** for a set of data, typically for **dimensionality reduction**, by training the network to ignore signal “noise”. Along with the reduction side, a reconstructing side is learnt, where the autoencoder tries to generate from the reduced encoding a representation as close as possible to its original input, **hence its name**. Several variants exist to the basic model, with the aim of forcing the learned representations of the input to assume useful properties.\[2\] Examples are the **regularized autoencoders (Sparse, Denoising and Contractive autoencoders)**, proven effective in learning representations for subsequent classification tasks,\[3\] and **Variational autoencoders**, with their recent applications as generative models.\[4\] Autoencoders are effectively used for solving many applied problems, from face recognition\[5\] to acquiring the semantic meaning of words。 自编码器80年代就火了,至今用的最多的是降维和特征学习,但是近些年也被用来学习数据的生成(比如由文本描述生成图像,一般都是用的变分自编码器)。 > An autoencoder is a neural network that learns to **copy its input to its output.** It has an internal (hidden) layer that describes a **code** used to represent the input, and it is constituted by **two main parts: an encoder** that maps the input into the code, and **a decoder** that maps the code to a reconstruction of the original input. > Performing the copying task perfectly would simply duplicate the signal, and this is why autoencoders usually are restricted in ways that force them to reconstruct the input **approximately**, preserving only the most relevant aspects of the data in the copy. > The idea of autoencoders has been popular in the field of neural networks for decades, and the first applications date back to the '80s. **Their most traditional application was dimensionality reduction or feature learning,** but more recently the autoencoder concept has become more widely used for learning generative models of data. Some of the most powerful AIs in the 2010s involved sparse autoencoders stacked inside of deep neural networks. ## 聚类分析 ## 聚类是机器学习的一个分支,他把没有标签的数据聚集起来,聚类不是对反馈做出反应(有监督学习,分类任务),而是识别数据中的共性,然后基于这些共性在每一份数据中的出现与否做出反应。聚类很适合检测出不适合任何组团的异常数据点。 > Two of the main methods used in unsupervised learning are **principal component and cluster analysis**. Cluster analysis is used in unsupervised learning to group, or segment, datasets with shared attributes in order to extrapolate algorithmic relationships.\[2\] Cluster analysis is a branch of machine learning that groups the data that has not been labelled, classified or categorized. **Instead of responding to feedback, cluster analysis identifies commonalities in the data and reacts based on the presence or absence of such commonalities in each new piece of data.** This approach helps detect **anomalous data points** that do not fit into either group.\\ # 无监督学习的中心应用:统计学中的密度估计 # 无监督学习的中心应用是是统计学的密度估计领域。 > A central application of unsupervised learning is in the field of density estimation in statistics,
还没有评论,来说两句吧...