python - Why does order of comparison matter for this apply/lambda inequality? -


sorry, it's not great title. simple example though:

(pandas version 0.16.1)

df = pd.dataframe({ 'x':range(1,5), 'y':[1,1,1,9] }) 

works fine:

df.apply( lambda x: x > x.mean() )         x      y 0  false  false 1  false  false 2   true  false 3   true   true 

shouldn't work same?

df.apply( lambda x: x.mean() < x ) --------------------------------------------------------------------------- typeerror                                 traceback (most recent call last) <ipython-input-467-6f32d50055ea> in <module>() ----> 1 df.apply( lambda x: x.mean() < x )  c:\users\ei\appdata\local\continuum\anaconda\lib\site-packages\pandas\core\frame.pyc in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)    3707                     if reduce none:    3708                         reduce = true -> 3709                     return self._apply_standard(f, axis, reduce=reduce)    3710             else:    3711                 return self._apply_broadcast(f, axis)  c:\users\ei\appdata\local\continuum\anaconda\lib\site-packages\pandas\core\frame.pyc in _apply_standard(self, func, axis, ignore_failures, reduce)    3797             try:    3798                 i, v in enumerate(series_gen): -> 3799                     results[i] = func(v)    3800                     keys.append(v.name)    3801             except exception e:  <ipython-input-467-6f32d50055ea> in <lambda>(x) ----> 1 df.apply( lambda x: x.mean() < x )  c:\users\ei\appdata\local\continuum\anaconda\lib\site-packages\pandas\core\ops.pyc in wrapper(self, other, axis)     586             return notimplemented     587         elif isinstance(other, (np.ndarray, pd.index)): --> 588             if len(self) != len(other):     589                 raise valueerror('lengths must match compare')     590             return self._constructor(na_op(self.values, np.asarray(other)),  typeerror: ('len() of unsized object', u'occurred @ index x') 

for counter-example, these both work:

df.mean() < df  df > df.mean() 

edit

finally found bug - issue 9369

as indicated in issue -

left = 0 > s works (e.g. python scalar). think being treated 0-dim array (its np.int64) (and not scalar when called.) i'll mark bug. feel free dig in

the issue occurs when using comparison operators numpy datatype (like np.int64 or np.float64, etc) on left side of comparison operator . simple fix maybe @santon noted in answer, convert number python scalar, rather using numpy scalar.


old :

i tried in pandas 0.16.2.

i did following on original df -

in [22]: df['z'] = df['x'].mean() < df['x']  in [23]: df out[23]:    x  y      z 0  1  1  false 1  2  1  false 2  3  1   true 3  4  9   true  in [27]: df['z'].mean() < df['z'] --------------------------------------------------------------------------- typeerror                                 traceback (most recent call last) <ipython-input-27-afc8a7b869b4> in <module>() ----> 1 df['z'].mean() < df['z']  c:\anaconda3\lib\site-packages\pandas\core\ops.py in wrapper(self, other, axis)     586             return notimplemented     587         elif isinstance(other, (np.ndarray, pd.index)): --> 588             if len(self) != len(other):     589                 raise valueerror('lengths must match compare')     590             return self._constructor(na_op(self.values, np.asarray(other)),  typeerror: len() of unsized object 

seems bug me, can compare boolean means int , vice versa fine, issue comes when using boolean mean boolean (though not think makes sense take mean() boolean) -

in [24]: df['z'] < df['x'] out[24]: 0    true 1    true 2    true 3    true dtype: bool  in [25]: df['z'] < df['x'].mean() out[25]: 0    true 1    true 2    true 3    true name: z, dtype: bool  in [26]: df['x'].mean() < df['z'] out[26]: 0    false 1    false 2    false 3    false name: z, dtype: bool 

i tried , reproduced issue in pandas 0.16.1 , can reproduced using -

in [10]: df['x'].mean() < df['x'] --------------------------------------------------------------------------- typeerror                                 traceback (most recent call last) <ipython-input-10-4e5dab1545af> in <module>() ----> 1 df['x'].mean() < df['x']  /opt/anaconda/envs/np18py27-1.9/lib/python2.7/site-packages/pandas/core/ops.pyc in wrapper(self, other, axis)     586             return notimplemented     587         elif isinstance(other, (np.ndarray, pd.index)): --> 588             if len(self) != len(other):     589                 raise valueerror('lengths must match compare')     590             return self._constructor(na_op(self.values, np.asarray(other)),  typeerror: len() of unsized object  in [11]: df['x'] < df['x'].mean() out[11]:  0     true 1     true 2    false 3    false name: x, dtype: bool 

seems bug has been fixed in pandas version 0.16.2 (except when mixing booleans integer). suggest upgrade pandas version using -

pip install pandas --upgrade 


Comments