线性回归算法-3.线性回归算法的衡量标准

喜欢ヅ旅行 2023-08-17 16:44 272阅读 0赞

线性回归算法的衡量标准

  • 均方误差(Mean Squared Error)

\[\frac{1}{m}\sum_{i=1}^{m}(y_{test}^{(i)}- \hat y{_{test}^{(i)}})^2\]

  • 均方根误差(Root Mean Squared Error)
    \[\sqrt{\frac{1}{m}\sum_{i=1}^{m}(y_{test}^{(i)}- \hat y{_{test}^{(i)}})^2}\]
  • 平均绝对误差(Mean Absolute Error)
    \[\frac{1}{m}\sum_{i=1}^{m} \left | (y_{test}^{(i)}- \hat y{_{test}^{(i)}}) \right| \]

加载波士顿房产数据

  1. import numpy
  2. import matplotlib.pyplot as plt
  3. from sklearn import datasets
  4. boston = datasets.load_boston()
  5. # 打印此数据集的描述
  6. print(boston.DESCR)

1465169-20190714175956139-1414432332.png

  1. x = boston.data[:,5] #只取房间数量这个特征

1465169-20190714180007997-995621320.png

  1. plt.scatter(x,y)
  2. plt.show()

1465169-20190714180022839-815059325.png

存在边缘极值点,用numpy中的”fancy index”去除数据中的上限值

  1. x = x[y<50]
  2. y = y[y<50]
  3. plt.scatter(x,y)
  4. plt.show()

1465169-20190714180033534-1167828998.png

  1. from mylib.model_selection import train_test_split
  2. from mylib.SimpleLineRegression import SimpleLineRegression
  3. x_train,x_test,y_train,y_test = train_test_split(x,y,seed =666)
  4. reg = SimpleLineRegression()
  5. reg.fit(x_train,y_train)
  6. y_predict = reg.predict(x_test)

通过训练,得到了a,b两个参数,从而确定了线性方程
1465169-20190714180048042-721142438.png

  1. plt.scatter(x,y)
  2. plt.plot(x,reg.predict(x))
  3. plt.show()

1465169-20190714180103973-1609458394.png

MSE 均方误差

  1. mse_test = numpy.sum((y_predict-y_test)**2) / len(x_test)

RMSE 均方根误差

  1. import math
  2. mse_test = numpy.sum((y_predict-y_test)**2) / len(x_test)
  3. rmse_test = math.sqrt(mse_test)

MAE 平均绝对误差

  1. mae_test = numpy.sum(numpy.absolute(y_predict-y_test)) / len(x_test) #absolute 求绝对值

上述算法指标封装为库:

metrics文件:
1465169-20190714180120103-459070007.png

调用封装的算法度量库:

  1. from mylib.metrics import mean_squared_error,root_mean_squared_error,mean_absolute_error
  2. mean_squared_error(y_predict,y_test)
  3. root_mean_squared_error(y_predict,y_test)
  4. mean_absolute_error(y_predict,y_test)

1465169-20190714180956425-1345831393.png

sk-learn 中的MSE和MAE调用:

  1. from sklearn.metrics import mean_squared_error,mean_absolute_error
  2. mean_squared_error(y_predict,y_test)
  3. mean_absolute_error(y_predict,y_test)

R Squared

表达式为: $ R^2 = 1 - \frac{\Sigma(\hat{y}^{(i)} - y^{(i)})^2}{\Sigma(\overline{y}^{(i)} - y^{(i)})^2 } = 1 - \frac{MSE(\hat{y},y)}{Var(y)} $

  • $ \Sigma(\hat{y}^{(i)} - y^{(i)})^2 = SS_{residual} $ 使用得到的模型预测产生的错误

  • $ \Sigma(\overline{y}^{(i)} - y^{(i)})^2 = SS_{total} $ 使用\(y=\hat{y}\)预测产生的错误(baseline Model)
    1465169-20190714180137240-732874039.png

    var求方差

    R2 = 1 - mean_squared_error(y_test,y_predict)/numpy.var(y_test)

封装到mylib中的metrics库中

  1. from mylib.metrics import r2_score
  2. r2_score(y_test,y_predict)
  3. # 在线性回归类中封装score方法
  4. reg.score(x_test,y_test)

转载于:https://www.cnblogs.com/shuai-long/p/11185059.html

发表评论

表情:
评论列表 (有 0 条评论,272人围观)

还没有评论,来说两句吧...

相关阅读

    相关 线性回归算法

    以sklearn.datasets模块中的经典数据load\_boston(波士顿房价)为例,实现线性回归算法 代码: from sklearn.datasets