Pandas-数据操作-字符串型(一):常用方法【str(自动过滤NaN值)、索引】

偏执的太偏执、 2023-09-29 09:39 57阅读 0赞

Pandas针对字符串配备的一套方法,使其易于对数组的每个元素进行操作。

一、str:通过str访问,且自动排除丢失/ NA值

通过str访问,且自动排除丢失/ NA值

  • 直接通过.str调用字符串方法
  • 可以对Series、Dataframe使用
  • 自动过滤NaN值

    import numpy as np
    import pandas as pd

    通过str访问,且自动排除丢失/ NA值

    直接通过.str调用字符串方法

    可以对Series、Dataframe使用

    自动过滤NaN值

    s = pd.Series([‘A’, ‘b’, ‘C’, ‘bbhello’, ‘123’, np.nan, ‘hj’])
    df = pd.DataFrame({

    1. 'key1': list('abcdef'),
    2. 'key2': ['hee', 'fv', 'w', 'hija', '123', np.nan]})

    print(“s = \n”, s)
    print(‘-‘ 50)
    print(“df = \n”, df)
    print(‘-‘
    200)

    print(“s.str.count(‘b’) = \n”, s.str.count(‘b’))
    print(‘-‘ 50)
    print(“df[‘key2’].str.upper() = \n”, df[‘key2’].str.upper())
    print(‘-‘
    200)

    df.columns是一个Index对象,也可使用.str

    df.columns = df.columns.str.upper()
    print(“df = \n”, df)
    print(‘-‘ * 200)

打印结果:

  1. s =
  2. 0 A
  3. 1 b
  4. 2 C
  5. 3 bbhello
  6. 4 123
  7. 5 NaN
  8. 6 hj
  9. dtype: object
  10. --------------------------------------------------
  11. df =
  12. key1 key2
  13. 0 a hee
  14. 1 b fv
  15. 2 c w
  16. 3 d hija
  17. 4 e 123
  18. 5 f NaN
  19. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  20. s.str.count('b') =
  21. 0 0.0
  22. 1 1.0
  23. 2 0.0
  24. 3 2.0
  25. 4 0.0
  26. 5 NaN
  27. 6 0.0
  28. dtype: float64
  29. --------------------------------------------------
  30. df['key2'].str.upper() =
  31. 0 HEE
  32. 1 FV
  33. 2 W
  34. 3 HIJA
  35. 4 123
  36. 5 NaN
  37. Name: key2, dtype: object
  38. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  39. df =
  40. KEY1 KEY2
  41. 0 a hee
  42. 1 b fv
  43. 2 c w
  44. 3 d hija
  45. 4 e 123
  46. 5 f NaN
  47. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  48. Process finished with exit code 0

二、字符串索引

  1. import numpy as np
  2. import pandas as pd
  3. # 字符串索引
  4. s = pd.Series(['A', 'b', 'C', 'bbhello', '123', np.nan, 'hj'])
  5. df = pd.DataFrame({
  6. 'key1': list('abcdef'),
  7. 'key2': ['hee', 'fv', 'w', 'hija', '123', np.nan]})
  8. # 取第一个字符
  9. data1 = s.str[0]
  10. print("取第一个字符: data1 = s.str[0] = \n", data1)
  11. print('-' * 200)
  12. # 取前两个字符
  13. data2 = s.str[:2]
  14. print("取前两个字符: data2 = s.str[:2] = \n", data2)
  15. print('-' * 200)
  16. # str之后和字符串本身索引方式相同
  17. data3 = df['key2'].str[:2]
  18. print("data3 = df['key2'].str[:2] = \n", data3)
  19. print('-' * 200)

打印结果:

  1. 取第一个字符: data1 = s.str[0] =
  2. 0 A
  3. 1 b
  4. 2 C
  5. 3 b
  6. 4 1
  7. 5 NaN
  8. 6 h
  9. dtype: object
  10. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  11. 取前两个字符: data2 = s.str[:2] =
  12. 0 A
  13. 1 b
  14. 2 C
  15. 3 bb
  16. 4 12
  17. 5 NaN
  18. 6 hj
  19. dtype: object
  20. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  21. data3 = df['key2'].str[:2] =
  22. 0 he
  23. 1 fv
  24. 2 w
  25. 3 hi
  26. 4 12
  27. 5 NaN
  28. Name: key2, dtype: object
  29. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  30. Process finished with exit code 0

发表评论

表情:
评论列表 (有 0 条评论,57人围观)

还没有评论,来说两句吧...

相关阅读