Datetime and Timestamp equality in Python and Pandas -


i've been playing around datetimes , timestamps, , i've come across can't understand.

import pandas pd import datetime  year_month = pd.dataframe({'year':[2001,2002,2003], 'month':[1,2,3]}) year_month['date'] = [datetime.datetime.strptime(str(y) + str(m) + '1', '%y%m%d') y,m in zip(year_month['year'], year_month['month'])]  >>> year_month   month  year       date 0     1  2001 2001-01-01 1     2  2002 2002-02-01 2     3  2003 2003-03-01 

i think unique function doing timestamps changing them somehow:

first_date = year_month['date'].unique()[0]  >>> first_date == year_month['date'][0] false 

in fact:

>>> year_month['date'].unique() array(['2000-12-31t16:00:00.000000000-0800',        '2002-01-31t16:00:00.000000000-0800',        '2003-02-28t16:00:00.000000000-0800'], dtype='datetime64[ns]') 

my suspicions there sort of timezone difference underneath functions, can't figure out.

edit

i checked python commands list(set()) alternative unique function, , works. must quirk of unique() function.

you have convert datetime64 compare:

in [12]: first_date == year_month['date'][0].to_datetime64() out[12]:  true 

this because unique has converted dtype datetime64:

in [6]:     first_date = year_month['date'].unique()[0] first_date  out[6]: numpy.datetime64('2001-01-01t00:00:00.000000000+0000') 

i think because unique returns np array , there no dtype numpy understands timestamp currently: converting between datetime, timestamp , datetime64


Comments