i have csv file tab delimited.
example: rec# cyc# step test (sec) step (sec) amp-hr watt-hr amps volts state es dpt time 1 0 1 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 3.41214609 r 0 09:44:13 2 0 1 30.00000000 30.00000000 0.00000000 0.00000000 0.00000000 3.41077280 r 1 09:44:43 3 0 1 60.00000000 60.00000000 0.00000000 0.00000000 0.00000000 3.41077280 r 1 09:45:13 i read csv in using:
import pandas pd df = pd.read_csv('foo.csv', sep='\t') this gives output:
rec# cyc# step test (sec) step (sec) amp-hr watt-hr amps volts state es dpt time 1 0 1 0.00 0.00 0.000000 0.000000 0.000000 3.412146 r 0 09:44:13 nan 2 0 1 30.00 30.00 0.000000 0.000000 0.000000 3.410773 r 1 09:44:43 nan 3 0 1 60.00 60.00 0.000000 0.000000 0.000000 3.410773 r 1 09:45:13 nan this seems have shifted column names on 1 , causes last column filled nan's instead of dates.
if following:
import pandas pd df = pd.read_csv("foo.csv", sep="\t") df = pd.read_csv("foo.csv", sep="\t", usecols=df[:len(df.columns)]) i following output:
rec# cyc# step test (sec) step (sec) amp-hr watt-hr amps volts state es dpt time 1 1 0 1 0.00 0.00 0.000000 0.000000 0.000000 3.412146 r 0 09:44:13 2 2 0 1 30.00 30.00 0.000000 0.000000 0.000000 3.410773 r 1 09:44:43 3 3 0 1 60.00 60.00 0.000000 0.000000 0.000000 3.410773 r 1 09:45:13 also if try grab 2 specific columns seems grab correctly. in df = df = pd.read_csv("foo.csv", sep="\t", usecols=[3, 8]) correctly grab time (sec) column , volts column.
i hoping there way correctly frame data wouldn't require me reading twice.
thanks in advance!
oniwa
it looks there trailing tabs:
>>> open("oniwa.dat") fp: ... line in fp: ... print(repr(line)) ... 'rec#\tcyc#\tstep\ttest (sec)\tstep (sec)\tamp-hr\twatt-hr\tamps\tvolts\tstate\tes\tdpt time\n' '1\t0\t1\t0.00000000\t0.00000000\t0.00000000\t0.00000000\t0.00000000\t3.41214609\tr\t0\t09:44:13\t\n' '2\t0\t1\t30.00000000\t30.00000000\t0.00000000\t0.00000000\t0.00000000\t3.41077280\tr\t1\t09:44:43\t\n' '3\t0\t1\t60.00000000\t60.00000000\t0.00000000\t0.00000000\t0.00000000\t3.41077280\tr\t1\t09:45:13\n' as result, pandas concludes there's index column. can tell otherwise using index_col. specific, instead of
>>> pd.read_csv("oniwa.dat", sep="\t") # no rec# cyc# step test (sec) step (sec) amp-hr watt-hr amps volts \ 1 0 1 0 0 0 0 0 3.412146 r 2 0 1 30 30 0 0 0 3.410773 r 3 0 1 60 60 0 0 0 3.410773 r state es dpt time 1 0 09:44:13 nan 2 1 09:44:43 nan 3 1 09:45:13 nan we can use
>>> pd.read_csv("oniwa.dat", sep="\t", index_col=false) # hooray! rec# cyc# step test (sec) step (sec) amp-hr watt-hr amps volts \ 0 1 0 1 0 0 0 0 0 3.412146 1 2 0 1 30 30 0 0 0 3.410773 2 3 0 1 60 60 0 0 0 3.410773 state es dpt time 0 r 0 09:44:13 1 r 1 09:44:43 2 r 1 09:45:13
Comments
Post a Comment