In python pandas DataFrames, what are the rules for automatic type conversion when setting values? -


if have dataframe looks

import pandas  d = pandas.dataframe( data = {'col1':[100,101,102,103] } ) #   col1 #0   100 #1   101 #2   102 #3   103 

and do

d.set_value( 0,'col1', '200') 

it casts '200' integer:

type( d.col1[0] ) #numpy.int64 

however if do

d.set_value( 0,'col2', '200') 

i get

type( d.col2[0] ) #str 

as expected.

more mysteries:

further, following

[ type(x) x in d.col1 ] #[numpy.int64, numpy.int64, numpy.int64, numpy.int64] d.set_value( [0,1,2,3], 'col1', ['101', '102', '103', 200] ) [ type(x) x in d.col1 ] #[str, str, str, str] 

so though d.col1 integer column, has become string column. rules such type casting of entire columns ?

i curious rules automatic type-casting when manipulating pandas dataframes.

pandas column-major , every element in same column must have same data type.

when create dataframe using

import pandas pd df = pd.dataframe({'col':[100,101,102,103]}) df.col.dtype  out[11]: dtype('int64') 

pandas automatically infer these input numeric values , of integer type. when set values column col, inputs automatically casted current column dtype int64, following give same output

df.set_value(0, 'col', '200')  # cast string int df.set_value(0, 'col', 200)  # int input df.set_value(0, 'col', 200.1)  # cast float64 int64 

but when try df.set_value(0, 'col1', '200'), current df has no column col1, pandas first create new column named col1, , try infer dtype new column based on input.

df.set_value(0, 'col1', '200') df.col1.dtype  # dtype('o'), means object/string df.set_value(0, 'col2', 200.1) df.col2.dtype  # dtype('float64') 

Comments