i using python(pandas) manipulate high frequency data. basically, need fill blank cells.
if row blank, row filled in previous existed observation.
my original data example:
time bid ask 15:00 . . 15:00 . . 15:02 76 . 15:02 . 77 15:03 . . 15:03 78 . 15:04 . . 15:05 . 80 15:05 . . 15:05 . . needs converted to
time bid ask 15:00 . . 15:00 . . 15:02 76 . 15:00 76 77 15:00 76 77 15:00 78 77 15:00 78 77 15:00 78 80 15:05 78 80 15:05 78 80 this code:
#import tan=pd.read_csv('sample.csv') #from here fill blank cells first_line = true mydata = [] open(tan, 'rb') f: reader = csv.reader(f) # loop through each row... row in reader: this_row = row # blank-cell checking... if first_line: colnos in range(len(this_row)): if this_row[colnos] == '': this_row[colnos] = 0 first_line = false else: colnos in range(len(this_row)): if this_row[colnos] == '': this_row[colnos] = prev_row[colnos] mydata.append( [this_row] ) prev_row = this_row however, code not work.
system indicates:
typeerror: coercing unicode: need string or buffer, dataframe found i appreciated if can me solve issue. thanks.
use fillna() property. can specify method forward fill follows
import pandas pd data = pd.read_csv('sample.csv') data = data.fillna(method='ffill') # 1 forward fills columns. # can apply specific columns below # data[['bid','ask']] = data[['bid','ask']].fillna(method='ffill') print data time bid ask 0 15:00 nan nan 1 15:00 nan nan 2 15:02 76 nan 3 15:02 76 77 4 15:03 76 77 5 15:03 78 77 6 15:04 78 77 7 15:05 78 80 8 15:05 78 80 9 15:05 78 80
Comments
Post a Comment