重复数据判断,第一次出现的情况均设置为False,所有其他情况均设置为True,返回bool值的Series
stock_data=pd.read_excel('/data/stock_data.xlsx')stock_data.duplicated()
stock_data.shapestock_data
stock_data.drop_duplicates()
stock_data.drop_duplicates().shape#删除特定列上的重复项,请使用subsetstock_data.drop_duplicates(subset="open")
stock_data.drop_duplicates(subset="open").shape#要删除重复的行并保留最后一次出现的值,请使用keep=‘last’stock_data.drop_duplicates(subset=["open","close"],keep="last")
stock_data.drop_duplicates(subset=["open","close"],keep="last").shape
输出:(95, 20)