李航《统计学习方法》——第八章 提升方法

水深无声 2022-05-30 10:28 342阅读 0赞

提升方法就是组合一系列弱分类器构成一个强分类器,AdaBoost是其代表性算法

AdaBoost算法

适用问题:二类分类,要处理多类分类需进行改进

代码(用sklearn实现)

  1. # encoding=utf-8
  2. import pandas as pd
  3. import time
  4. from sklearn.cross_validation import train_test_split
  5. from sklearn.metrics import accuracy_score
  6. from sklearn.ensemble import AdaBoostClassifier
  7. if __name__ == '__main__':
  8. print("Start read data...")
  9. time_1 = time.time()
  10. raw_data = pd.read_csv('../data/train_binary.csv', header=0)
  11. data = raw_data.values
  12. features = data[::, 1::]
  13. labels = data[::, 0]
  14. # 随机选取33%数据作为测试集,剩余为训练集
  15. train_features, test_features, train_labels, test_labels = train_test_split(features, labels, test_size=0.33, random_state=0)
  16. time_2 = time.time()
  17. print('read data cost %f seconds' % (time_2 - time_1))
  18. print('Start training...')
  19. # n_estimators表示要组合的弱分类器个数;
  20. # algorithm可选{‘SAMME’, ‘SAMME.R’},默认为‘SAMME.R’,表示使用的是real boosting算法,‘SAMME’表示使用的是discrete boosting算法
  21. clf = AdaBoostClassifier(n_estimators=100,algorithm='SAMME.R')
  22. clf.fit(train_features,train_labels)
  23. time_3 = time.time()
  24. print('training cost %f seconds' % (time_3 - time_2))
  25. print('Start predicting...')
  26. test_predict = clf.predict(test_features)
  27. time_4 = time.time()
  28. print('predicting cost %f seconds' % (time_4 - time_3))
  29. score = accuracy_score(test_labels, test_predict)
  30. print("The accruacy score is %f" % score)

代码可从这里AdaBoost/AdaBoost_sklearn.py获取

实验数据为train.csv的运行结果

Adaboost_sklearn_result_1.png

实验数据为train_binary.csv的运行结果

Adaboost_sklearn_result_2.png

发表评论

表情:
评论列表 (有 0 条评论,342人围观)

还没有评论,来说两句吧...

相关阅读