Pandas-数据结构-DataFrame(五):行&列同时索引【①:df.loc[[‘b‘, ‘c‘], [‘y‘, 8]];②:先列索引,再行索引;③:df.iloc[1:3, 2:6]】

深藏阁楼爱情的钟 2023-09-29 09:09 15阅读 0赞

Dataframe既有行索引也有列索引,可以被看做由Series组成的字典(共用一个索引)。

一、直接索引(先列后行,直接索引时只能通过索引名进行索引,不能通过下标)

在这里插入图片描述
获取’2018-02-27’这天的’close’的结果

  1. # 直接使用行列索引名字的方式(先列后行)
  2. data['open']['2018-02-27']
  3. 23.53
  4. # 不支持的操作
  5. # 错误
  6. data['2018-02-27']['open']
  7. # 错误
  8. data[:1, :2]

二、提取 “目标行 & 目标列”:df.loc[[‘b’, ‘c’], [‘y’, 8]]

  1. df =
  2. x y z 8 9
  3. a NaN NaN NaN NaN NaN
  4. b NaN NaN NaN NaN NaN
  5. c NaN NaN NaN NaN NaN
  6. d NaN NaN NaN NaN NaN
  7. e NaN NaN NaN NaN NaN
  • df.loc[:, [‘y’, 8]]:提取第‘y’、8列的所有行;
  • df.loc[:, ‘y’:8]:错误表达;

    import numpy as np
    import pandas as pd

    df = pd.DataFrame(np.nan,

    1. index=list('abcde'),
    2. columns=['x', 'y', 'z', 8, 9])

    print(“df = \n”, df)

    print(“-“ * 100)

    data1 = df.loc[[‘b’, ‘c’]]
    print(“data1 = \n”, data1)
    print(“-“ 50)
    data2 = df.loc[:, [‘y’, 8]]
    print(“data2 = \n”, data2)
    print(“-“
    50)
    data3 = df.loc[[‘b’, ‘c’], [‘y’, 8]]
    print(“data3 = \n”, data3)

打印结果:

  1. df =
  2. x y z 8 9
  3. a NaN NaN NaN NaN NaN
  4. b NaN NaN NaN NaN NaN
  5. c NaN NaN NaN NaN NaN
  6. d NaN NaN NaN NaN NaN
  7. e NaN NaN NaN NaN NaN
  8. ----------------------------------------------------------------------------------------------------
  9. data1 =
  10. x y z 8 9
  11. b NaN NaN NaN NaN NaN
  12. c NaN NaN NaN NaN NaN
  13. --------------------------------------------------
  14. data2 =
  15. y 8
  16. a NaN NaN
  17. b NaN NaN
  18. c NaN NaN
  19. d NaN NaN
  20. e NaN NaN
  21. --------------------------------------------------
  22. data3 =
  23. y 8
  24. b NaN NaN
  25. c NaN NaN
  26. Process finished with exit code 0

三、先列索引,再行索引

先选择列再选择行:相当于对于一个数据,先筛选字段,再选择数据量

  1. import numpy as np
  2. import pandas as pd
  3. # 多重索引:比如同时索引行和列
  4. # 先选择列再选择行 —— 相当于对于一个数据,先筛选字段,再选择数据量
  5. df = pd.DataFrame(np.random.rand(16).reshape(4, 4) * 100,
  6. index=['one', 'two', 'three', 'four'],
  7. columns=['a', 'b', 'c', 'd'])
  8. print("df = \n", df)
  9. print('-' * 100)
  10. data1 = df['a'].loc[['one', 'three']]
  11. print("data1 = df['a'].loc[['one', 'three']] = \n", df['a'].loc[['one', 'three']]) # 选择a列的one,three行
  12. print('-' * 50)
  13. data2 = df[['b', 'c', 'd']]
  14. print("data2 = df[['b', 'c', 'd']] = \n", data2)
  15. print('-' * 100)
  16. data = df['a'] < 50
  17. print("data = df['a'] < 50 = \n", data)
  18. print('-' * 50)
  19. data3 = df[df['a'] < 50]
  20. print("data3 = df[df['a'] < 50] = \n", data3) # 选择b,c,d列的one,three行
  21. print('-' * 50)
  22. data4 = df[df['a'] < 50].iloc[1]
  23. print("data4 = df[df['a'] < 50].iloc[:2] = \n", data4) # 选择满足判断索引的前两行数据

打印结果:

  1. df =
  2. a b c d
  3. one 9.835341 90.198909 41.946498 57.696927
  4. two 42.118455 92.361098 12.128027 58.962167
  5. three 57.007146 18.977019 92.999803 47.113144
  6. four 97.706270 99.227877 4.032991 27.748419
  7. ----------------------------------------------------------------------------------------------------
  8. data1 = df['a'].loc[['one', 'three']] =
  9. one 9.835341
  10. three 57.007146
  11. Name: a, dtype: float64
  12. --------------------------------------------------
  13. data2 = df[['b', 'c', 'd']] =
  14. b c d
  15. one 90.198909 41.946498 57.696927
  16. two 92.361098 12.128027 58.962167
  17. three 18.977019 92.999803 47.113144
  18. four 99.227877 4.032991 27.748419
  19. ----------------------------------------------------------------------------------------------------
  20. data = df['a'] < 50 =
  21. one True
  22. two True
  23. three False
  24. four False
  25. Name: a, dtype: bool
  26. --------------------------------------------------
  27. data3 = df[df['a'] < 50] =
  28. a b c d
  29. one 9.835341 90.198909 41.946498 57.696927
  30. two 42.118455 92.361098 12.128027 58.962167
  31. --------------------------------------------------
  32. data4 = df[df['a'] < 50].iloc[:2] =
  33. a 42.118455
  34. b 92.361098
  35. c 12.128027
  36. d 58.962167
  37. Name: two, dtype: float64
  38. Process finished with exit code 0

四、行切片&列切片:df.iloc[1:3, 2:6]

根据位置和名称信息混搭的取数:对于一个DaraFrame,如果我想提取c行及其之前所有的,同时属于前4列的数呢?

iloc[num_of_row_start : num_of_row_end, num_of_column_start : num_of_column_end]

  1. import numpy as np
  2. import pandas as pd
  3. df = pd.DataFrame(np.nan,
  4. index=list('abcde'),
  5. columns=['x', 'y', 'z', 8, 9])
  6. print("df = \n", df)
  7. print("-" * 100)
  8. df_select = df.iloc[:df.index.get_loc('c') + 1, :4]
  9. print("df_select = \n", df_select)

打印结果:

  1. df =
  2. x y z 8 9
  3. a NaN NaN NaN NaN NaN
  4. b NaN NaN NaN NaN NaN
  5. c NaN NaN NaN NaN NaN
  6. d NaN NaN NaN NaN NaN
  7. e NaN NaN NaN NaN NaN
  8. ----------------------------------------------------------------------------------------------------
  9. df_select =
  10. x y z 8
  11. a NaN NaN NaN NaN
  12. b NaN NaN NaN NaN
  13. c NaN NaN NaN NaN
  14. Process finished with exit code 0

get_loc(pandas 0.24.1)是一个应用在index的工具,即“获取名称对象在index的位置(整数)”。注意,因为不包含num_of_end,所以需要 +1才能包含c行。




参考资料:
Python笔记:df.loc[]和df.iloc[]的区别

发表评论

表情:
评论列表 (有 0 条评论,15人围观)

还没有评论,来说两句吧...

相关阅读