python - pandas shifts column names and fills last column with NAN -


i have csv file tab delimited.

example: rec#    cyc#    step    test (sec)  step (sec)  amp-hr  watt-hr amps    volts   state   es  dpt time 1   0   1   0.00000000  0.00000000  0.00000000  0.00000000  0.00000000  3.41214609  r   0   09:44:13     2   0   1   30.00000000 30.00000000 0.00000000  0.00000000  0.00000000  3.41077280  r   1   09:44:43     3   0   1   60.00000000 60.00000000 0.00000000  0.00000000  0.00000000  3.41077280  r   1   09:45:13 

i read csv in using:

import pandas pd  df = pd.read_csv('foo.csv', sep='\t') 

this gives output:

    rec#  cyc#     step  test (sec)  step (sec)    amp-hr   watt-hr      amps  volts  state      es  dpt time 1      0     1     0.00        0.00    0.000000  0.000000  0.000000  3.412146   r      0  09:44:13       nan 2      0     1    30.00       30.00    0.000000  0.000000  0.000000  3.410773   r      1  09:44:43       nan 3      0     1    60.00       60.00    0.000000  0.000000  0.000000  3.410773   r      1  09:45:13       nan 

this seems have shifted column names on 1 , causes last column filled nan's instead of dates.

if following:

import pandas pd  df = pd.read_csv("foo.csv", sep="\t") df = pd.read_csv("foo.csv", sep="\t", usecols=df[:len(df.columns)]) 

i following output:

    rec#  cyc#   step  test (sec)  step (sec) amp-hr   watt-hr      amps  volts  state    es  dpt time 1   1      0     1     0.00        0.00    0.000000  0.000000  0.000000  3.412146   r      0  09:44:13 2   2      0     1    30.00       30.00    0.000000  0.000000  0.000000  3.410773   r      1  09:44:43 3   3      0     1    60.00       60.00    0.000000  0.000000  0.000000  3.410773   r      1  09:45:13       

also if try grab 2 specific columns seems grab correctly. in df = df = pd.read_csv("foo.csv", sep="\t", usecols=[3, 8]) correctly grab time (sec) column , volts column.

i hoping there way correctly frame data wouldn't require me reading twice.

thanks in advance!

oniwa

it looks there trailing tabs:

>>> open("oniwa.dat") fp: ...     line in fp: ...         print(repr(line)) ...          'rec#\tcyc#\tstep\ttest (sec)\tstep (sec)\tamp-hr\twatt-hr\tamps\tvolts\tstate\tes\tdpt time\n' '1\t0\t1\t0.00000000\t0.00000000\t0.00000000\t0.00000000\t0.00000000\t3.41214609\tr\t0\t09:44:13\t\n' '2\t0\t1\t30.00000000\t30.00000000\t0.00000000\t0.00000000\t0.00000000\t3.41077280\tr\t1\t09:44:43\t\n' '3\t0\t1\t60.00000000\t60.00000000\t0.00000000\t0.00000000\t0.00000000\t3.41077280\tr\t1\t09:45:13\n' 

as result, pandas concludes there's index column. can tell otherwise using index_col. specific, instead of

>>> pd.read_csv("oniwa.dat", sep="\t") # no    rec#  cyc#  step  test (sec)  step (sec)  amp-hr  watt-hr      amps volts  \ 1     0     1     0           0           0       0        0  3.412146     r    2     0     1    30          30           0       0        0  3.410773     r    3     0     1    60          60           0       0        0  3.410773     r        state        es  dpt time   1      0  09:44:13       nan   2      1  09:44:43       nan   3      1  09:45:13       nan   

we can use

>>> pd.read_csv("oniwa.dat", sep="\t", index_col=false) # hooray!    rec#  cyc#  step  test (sec)  step (sec)  amp-hr  watt-hr  amps     volts  \ 0     1     0     1           0           0       0        0     0  3.412146    1     2     0     1          30          30       0        0     0  3.410773    2     3     0     1          60          60       0        0     0  3.410773       state  es  dpt time   0     r   0  09:44:13   1     r   1  09:44:43   2     r   1  09:45:13   

Comments