In python pandas DataFrames, what are the rules for automatic type conversion when setting values? -
if have dataframe looks
import pandas d = pandas.dataframe( data = {'col1':[100,101,102,103] } ) # col1 #0 100 #1 101 #2 102 #3 103 and do
d.set_value( 0,'col1', '200') it casts '200' integer:
type( d.col1[0] ) #numpy.int64 however if do
d.set_value( 0,'col2', '200') i get
type( d.col2[0] ) #str as expected.
more mysteries:
further, following
[ type(x) x in d.col1 ] #[numpy.int64, numpy.int64, numpy.int64, numpy.int64] d.set_value( [0,1,2,3], 'col1', ['101', '102', '103', 200] ) [ type(x) x in d.col1 ] #[str, str, str, str] so though d.col1 integer column, has become string column. rules such type casting of entire columns ?
i curious rules automatic type-casting when manipulating pandas dataframes.
pandas column-major , every element in same column must have same data type.
when create dataframe using
import pandas pd df = pd.dataframe({'col':[100,101,102,103]}) df.col.dtype out[11]: dtype('int64') pandas automatically infer these input numeric values , of integer type. when set values column col, inputs automatically casted current column dtype int64, following give same output
df.set_value(0, 'col', '200') # cast string int df.set_value(0, 'col', 200) # int input df.set_value(0, 'col', 200.1) # cast float64 int64 but when try df.set_value(0, 'col1', '200'), current df has no column col1, pandas first create new column named col1, , try infer dtype new column based on input.
df.set_value(0, 'col1', '200') df.col1.dtype # dtype('o'), means object/string df.set_value(0, 'col2', 200.1) df.col2.dtype # dtype('float64')
Comments
Post a Comment