i'm trying read lines text file, count number of elements per line , identify line highest number of elements.
python solution:
line_lengths_blogs = [] import time t1 = time.time() f_blogs = open("./working/en_us.blogs.capped.txt", "r") d_blogs = f_blogs.readlines() f_blogs.close() line in d_blogs: line_lengths_blogs.append(len(line)) print max(line_lengths_blogs) output:
3839 operating time: 0.0128719806671 r solution:
t1 <- proc.time() line_lengths_blogs <- data.frame() loaded_file_blogs <- file("./working/en_us.blogs.capped.txt", open="r") d_blogs <- readlines(loaded_file_blogs) close(loaded_file_blogs) for(i in 1:length(d_blogs)) { line_lengths_blogs <- rbind(line_lengths_blogs, nchar(d_blogs[i])) } print(max(line_lengths_blogs)) t2 <- proc.time() print(t2-t1) output:
[1] 3831 user system verstrichen 6.015 0.027 6.042 for moment, don't care elements identified different... why r around 470 times slower?
what doing terribly wrong in r code? or accidental genius @ python programming?
working on mac os x 10.6.8. the exact same data set can obtaine here. first 10'000 lines of data in final/en_us/en_us.blogs.txt of linked file used test. hope problem reproducible.
Comments
Post a Comment