Pandas-数据操作-字符串型(二):常用方法【lower、upper、len、startswith、endswith、strip、lstrip、replace、split、rsplit】

川长思鸟来 2023-09-29 07:16 77阅读 0赞

一、字符串常用方法:lower,upper,len,startswith,endswith

  1. import numpy as np
  2. import pandas as pd
  3. s = pd.Series(['A', 'b', 'bbhello', '123', np.nan])
  4. print("s = \n", s)
  5. print('-' * 200)
  6. print("lower小写: s.str.lower() = \n", s.str.lower())
  7. print('-' * 200)
  8. print("upper大写: s.str.upper() = \n", s.str.upper())
  9. print('-' * 200)
  10. print("len字符长度: s.str.len() = \n", s.str.len())
  11. print('-' * 200)
  12. print("判断起始是否为b: s.str.startswith('b') = \n", s.str.startswith('b'))
  13. print('-' * 200)
  14. print("判断结束是否为3: s.str.endswith('3') = \n", s.str.endswith('3'))

打印结果:

  1. s =
  2. 0 A
  3. 1 b
  4. 2 bbhello
  5. 3 123
  6. 4 NaN
  7. dtype: object
  8. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  9. lower小写: s.str.lower() =
  10. 0 a
  11. 1 b
  12. 2 bbhello
  13. 3 123
  14. 4 NaN
  15. dtype: object
  16. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  17. upper大写: s.str.upper() =
  18. 0 A
  19. 1 B
  20. 2 BBHELLO
  21. 3 123
  22. 4 NaN
  23. dtype: object
  24. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  25. len字符长度: s.str.len() =
  26. 0 1.0
  27. 1 1.0
  28. 2 7.0
  29. 3 3.0
  30. 4 NaN
  31. dtype: float64
  32. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  33. 判断起始是否为b: s.str.startswith('b') =
  34. 0 False
  35. 1 True
  36. 2 True
  37. 3 False
  38. 4 NaN
  39. dtype: object
  40. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  41. 判断结束是否为3: s.str.endswith('3') =
  42. 0 False
  43. 1 False
  44. 2 False
  45. 3 True
  46. 4 NaN
  47. dtype: object
  48. Process finished with exit code 0

二、字符串常用方法:strip

  1. import numpy as np
  2. import pandas as pd
  3. # 字符串常用方法(2) - strip
  4. s = pd.Series([' jack', 'jill ', ' je sse ', 'frank'])
  5. df = pd.DataFrame(np.random.randn(3, 2),
  6. columns=[' Column A ', ' Column B '],
  7. index=range(3))
  8. print("s = \n", s)
  9. print('-' * 200)
  10. print("df = \n", df)
  11. print('-' * 200)
  12. # 去除字符串左右的空格
  13. print("去除字符串左右的空格: s.str.strip() = \n", s.str.strip())
  14. # 去除字符串中的左空格
  15. print("去除字符串中的左空格: s.str.lstrip() = \n", s.str.lstrip())
  16. # 去除字符串中的右空格
  17. print("去除字符串中的右空格: s.str.rstrip() = \n", s.str.rstrip())
  18. # 这里去掉了columns的前后空格,但没有去掉中间空格
  19. df.columns = df.columns.str.strip()
  20. print("df = \n", df)
  21. print('-' * 200)

打印结果:

  1. s =
  2. 0 jack
  3. 1 jill
  4. 2 je sse
  5. 3 frank
  6. dtype: object
  7. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  8. df =
  9. Column A Column B
  10. 0 -1.318646 -0.831649
  11. 1 -0.339870 -1.141231
  12. 2 -0.024364 -2.163961
  13. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  14. 去除字符串左右的空格: s.str.strip() =
  15. 0 jack
  16. 1 jill
  17. 2 je sse
  18. 3 frank
  19. dtype: object
  20. 去除字符串中的左空格: s.str.lstrip() =
  21. 0 jack
  22. 1 jill
  23. 2 je sse
  24. 3 frank
  25. dtype: object
  26. 去除字符串中的右空格: s.str.rstrip() =
  27. 0 jack
  28. 1 jill
  29. 2 je sse
  30. 3 frank
  31. dtype: object
  32. df =
  33. Column A Column B
  34. 0 -1.318646 -0.831649
  35. 1 -0.339870 -1.141231
  36. 2 -0.024364 -2.163961
  37. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  38. Process finished with exit code 0

三、字符串常用方法:replace

  1. import numpy as np
  2. import pandas as pd
  3. # 字符串常用方法(3) - replace
  4. df = pd.DataFrame(np.random.randn(3, 2),
  5. columns=[' Column A ', ' Column B '],
  6. index=range(3))
  7. # 替换
  8. df.columns = df.columns.str.replace(' ', '-')
  9. print("df = \n", df)
  10. print('-' * 200)
  11. # n:替换个数
  12. df.columns = df.columns.str.replace('-', '*', n=1)
  13. print("df = \n", df)

打印结果:

  1. df =
  2. -Column-A- -Column-B-
  3. 0 0.704728 -0.835929
  4. 1 1.478930 -2.708538
  5. 2 0.585825 -1.395908
  6. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  7. df =
  8. *Column-A- *Column-B-
  9. 0 0.704728 -0.835929
  10. 1 1.478930 -2.708538
  11. 2 0.585825 -1.395908
  12. Process finished with exit code 0

四、字符串常用方法:split、rsplit

  1. import numpy as np
  2. import pandas as pd
  3. # 字符串常用方法(4) - split、rsplit
  4. s = pd.Series(['a,b,c', '1,2,3', ['a,,,c'], np.nan])
  5. print("s = \n", s)
  6. print('-' * 200)
  7. # 类似字符串的split
  8. data1 = s.str.split(',')
  9. print("data1 = s.str.split(',') = \n{0} \ntype(data1) = {1}".format(data1, type(data1)))
  10. print('-' * 100)
  11. # 直接索引得到一个list
  12. data2 = data1[0]
  13. print("data2 = data1[0] = s.str.split(',')[0] = \n{0} \ntype(data2) = {1}".format(data2, type(data2)))
  14. print('-' * 100)
  15. # 可以使用get或[]符号访问拆分列表中的元素
  16. data3 = s.str.split(',').str.get(1)
  17. print("data3 = s.str.split(',').str.get(1) = \n{0} \ntype(data3) = {1}".format(data3, type(data3)))
  18. print('-' * 200)
  19. # 可以使用expand可以轻松扩展此操作以返回DataFrame
  20. # n参数限制分割数
  21. # rsplit类似于split,反向工作,即从字符串的末尾到字符串的开头
  22. data4 = s.str.split(',', expand=True)
  23. print("data4 = s.str.split(',', expand=True) = \n{0} \ntype(data4) = {1}".format(data4, type(data4)))
  24. print('-' * 100)
  25. data5 = s.str.split(',', expand=True, n=1)
  26. print("data5 = s.str.split(',', expand=True, n=1) = \n{0} \ntype(data5) = {1}".format(data5, type(data5)))
  27. print('-' * 100)
  28. data6 = s.str.rsplit(',', expand=True, n=1)
  29. print("data6 = s.str.rsplit(',', expand=True, n=1) = \n{0} \ntype(data6) = {1}".format(data6, type(data6)))
  30. print('-' * 200)
  31. # Dataframe使用split
  32. df = pd.DataFrame({
  33. 'key1': ['a,b,c', '1,2,3', [':,., ']],
  34. 'key2': ['a-b-c', '1-2-3', [':-.- ']]})
  35. print("df = \n", df)
  36. print('-' * 100)
  37. data7 = df['key2'].str.split('-')
  38. print("data7 = df['key2'].str.split('-') = \n{0} \ntype(data7) = {1}".format(data7, type(data7)))
  39. print('-' * 200)

打印结果:

  1. s =
  2. 0 a,b,c
  3. 1 1,2,3
  4. 2 [a,,,c]
  5. 3 NaN
  6. dtype: object
  7. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  8. data1 = s.str.split(',') =
  9. 0 [a, b, c]
  10. 1 [1, 2, 3]
  11. 2 NaN
  12. 3 NaN
  13. dtype: object
  14. type(data1) = <class 'pandas.core.series.Series'>
  15. ----------------------------------------------------------------------------------------------------
  16. data2 = data1[0] = s.str.split(',')[0] =
  17. ['a', 'b', 'c']
  18. type(data2) = <class 'list'>
  19. ----------------------------------------------------------------------------------------------------
  20. data3 = s.str.split(',').str.get(1) =
  21. 0 b
  22. 1 2
  23. 2 NaN
  24. 3 NaN
  25. dtype: object
  26. type(data3) = <class 'pandas.core.series.Series'>
  27. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  28. data4 = s.str.split(',', expand=True) =
  29. 0 1 2
  30. 0 a b c
  31. 1 1 2 3
  32. 2 NaN NaN NaN
  33. 3 NaN NaN NaN
  34. type(data4) = <class 'pandas.core.frame.DataFrame'>
  35. ----------------------------------------------------------------------------------------------------
  36. data5 = s.str.split(',', expand=True, n=1) =
  37. 0 1
  38. 0 a b,c
  39. 1 1 2,3
  40. 2 NaN NaN
  41. 3 NaN NaN
  42. type(data5) = <class 'pandas.core.frame.DataFrame'>
  43. ----------------------------------------------------------------------------------------------------
  44. data6 = s.str.rsplit(',', expand=True, n=1) =
  45. 0 1
  46. 0 a,b c
  47. 1 1,2 3
  48. 2 NaN NaN
  49. 3 NaN NaN
  50. type(data6) = <class 'pandas.core.frame.DataFrame'>
  51. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  52. df =
  53. key1 key2
  54. 0 a,b,c a-b-c
  55. 1 1,2,3 1-2-3
  56. 2 [:,., ] [:-.- ]
  57. ----------------------------------------------------------------------------------------------------
  58. data7 = df['key2'].str.split('-') =
  59. 0 [a, b, c]
  60. 1 [1, 2, 3]
  61. 2 NaN
  62. Name: key2, dtype: object
  63. type(data7) = <class 'pandas.core.series.Series'>
  64. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  65. Process finished with exit code 0

发表评论

表情:
评论列表 (有 0 条评论,77人围观)

还没有评论,来说两句吧...

相关阅读