代码不写就会忘 笔记不整理就会乱
读取文件 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 import osimport pandas as pdabs_path = os.path.abspath(__file__) proj_path = os.path.abspath(f"{abs_path} /../" ) data_path = os.path.join(proj_path, 'data.pkl' ) df = pd.read_pickle(data_path) df = pd.read_pickle(data_path)[['id' ] + ['name' ] + ['location' ]] df.to_pickle(data_path) df.to_csv('data.csv' , index=False , encoding='utf-8-sig' )
更多见https://pandas.pydata.org/docs/user_guide/io.html
显示完整df 1 2 3 4 pd.set_option('display.width' , None ) pd.set_option('display.max_rows' , None ) pd.set_option('display.max_colwidth' , None ) print (df)
groupby
该函数是基于行的操作。
1 df[](指输出数据的结果属性名称).groupby([df[属性],df[属性])(指分类的属性,数据的限定定语,可以有多个).mean()(对于数据的计算方式——函数名称)
例:
1 df['score' ].groupby([df["id" ],df["name" ]]).mean()
单分组 1 2 3 df.groupby("id" ) df.groupby("id" ).describe().unstack() df.groupby("id" )["location" ].describe().unstack()
多分组 1 df.groupby(["id" ,"name" ]).mean()
agg
该函数是基于列的聚合操作。
1 A.groupby(A["生日" ].apply(lambda x:x.year)).count()
更多例子:
1 2 3 4 info_df = df. groupby(["id" , "name" ], sort=False ).count()[["city" ]] info_df = df.groupby(["id" , "name" ], as_index=False , sort=True ).agg({"city" : lambda x: len (set (list (x)))}).reset_index(drop=True ).rename(columns={"city" : "city_cnt" })
merge
1 pd.merge(left, right, how='inner' , on=None , left_on=None , right_on=None , left_index=False , right_index=False , sort=True , suffixes=('_x' , '_y' ), copy=True , indicator=False , validate=None )
例:
1 2 3 4 5 6 7 8 import pandas as pddf1 = pd.DataFrame({'key' :list ('bbaca' ), 'data1' :range (5 )}) print (df1)df2 = pd.DataFrame({'key' :['a' ,'b' ,'d' ], 'data2' :range (3 )}) print (df2)print (pd.merge(df1, df2))print (pd.merge(df2, df1))
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 key data1 0 b 0 1 b 1 2 a 2 3 c 3 4 a 4 key data2 0 a 0 1 b 1 2 d 2 key data1 data2 0 b 0 1 1 b 1 1 2 a 2 0 3 a 4 0 key data2 data1 0 a 0 2 1 a 0 4 2 b 1 0 3 b 1 1
更多见https://pandas.pydata.org/docs/user_guide/merging.html
参考文献 pandas官方文档
python中groupby函数详解(非常容易懂)
[Python3]pandas.merge用法详解