i've been playing around datetimes , timestamps, , i've come across can't understand.
import pandas pd import datetime year_month = pd.dataframe({'year':[2001,2002,2003], 'month':[1,2,3]}) year_month['date'] = [datetime.datetime.strptime(str(y) + str(m) + '1', '%y%m%d') y,m in zip(year_month['year'], year_month['month'])] >>> year_month month year date 0 1 2001 2001-01-01 1 2 2002 2002-02-01 2 3 2003 2003-03-01 i think unique function doing timestamps changing them somehow:
first_date = year_month['date'].unique()[0] >>> first_date == year_month['date'][0] false in fact:
>>> year_month['date'].unique() array(['2000-12-31t16:00:00.000000000-0800', '2002-01-31t16:00:00.000000000-0800', '2003-02-28t16:00:00.000000000-0800'], dtype='datetime64[ns]') my suspicions there sort of timezone difference underneath functions, can't figure out.
edit
i checked python commands list(set()) alternative unique function, , works. must quirk of unique() function.
you have convert datetime64 compare:
in [12]: first_date == year_month['date'][0].to_datetime64() out[12]: true this because unique has converted dtype datetime64:
in [6]: first_date = year_month['date'].unique()[0] first_date out[6]: numpy.datetime64('2001-01-01t00:00:00.000000000+0000') i think because unique returns np array , there no dtype numpy understands timestamp currently: converting between datetime, timestamp , datetime64
Comments
Post a Comment