spark读取excel表格

Love The Way You Lie 2023-06-11 07:24 330阅读 0赞

参考资料:https://blog.csdn.net/qq_38689769/article/details/79471332

参考资料:https://blog.csdn.net/Dr_Guo/article/details/77374403?locationNum=9&fps=1

pom.xml:

  1. <!--读取excel文件-->
  2. <dependency>
  3. <groupId>org.apache.poi</groupId>
  4. <artifactId>poi</artifactId>
  5. <version>3.10-FINAL</version>
  6. </dependency>
  7. <dependency>
  8. <groupId>org.apache.poi</groupId>
  9. <artifactId>poi-ooxml</artifactId>
  10. <version>3.10-FINAL</version>
  11. </dependency>

数据:

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80MTgwNDA0OQ_size_16_color_FFFFFF_t_70

代码:

  1. import java.io.FileInputStream
  2. import com.emg.join.model.{AA, BB}
  3. import org.apache.poi.ss.usermodel.Cell
  4. import org.apache.poi.xssf.usermodel.XSSFWorkbook
  5. import org.apache.spark.SparkConf
  6. import org.apache.spark.sql.SparkSession
  7. import scala.collection.mutable.ListBuffer
  8. object Excels {
  9. val conf = new SparkConf().setAppName("join")
  10. .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
  11. .setMaster("local[*]")
  12. .registerKryoClasses(Array[Class[_]](AA.getClass, BB.getClass))
  13. val spark = SparkSession.builder().config(conf).getOrCreate()
  14. val sc = spark.sparkContext
  15. import spark.implicits._
  16. val filePath = "c:\\user\\id.xlsx"
  17. //val filePath1 = "hdfs://192.168.40.0:9000/user/id.xlsx"
  18. val fs = new FileInputStream(filePath)
  19. val workbook: XSSFWorkbook = new XSSFWorkbook(fs)
  20. val sheet = workbook.getSheetAt(0) //获取第一个sheet
  21. val rowCount = sheet.getPhysicalNumberOfRows() //获取总行数
  22. val data = new ListBuffer[BB]()
  23. for (i <- 1 until rowCount) {
  24. val row = sheet.getRow(i)
  25. // 得到第一列第一行的单元格
  26. val cellwellname: Cell = row.getCell(0)
  27. //同一字段不同数据类型处理
  28. var wellname = 0L
  29. if (cellwellname.getCellType == 0) {
  30. wellname = cellwellname.getNumericCellValue.toLong
  31. }
  32. data.+=(BB(wellname))
  33. data
  34. }
  35. val data1 = spark.createDataset(data)
  36. data1.createTempView("data1")
  37. val result = spark.sql("select * from data1").coalesce(1)
  38. result.rdd.saveAsTextFile(outPath)
  39. }

注意:

当路径为本地的时候,运行好使。当路径为hdfs时,报错找不到路径,会出现转义符问题,查了查资料还是没能解决!

有解决方法记得回复哈。

发表评论

表情:
评论列表 (有 0 条评论,330人围观)

还没有评论,来说两句吧...

相关阅读