Runtime difference Python and R when counting elements in text files -


i'm trying read lines text file, count number of elements per line , identify line highest number of elements.

python solution:

line_lengths_blogs = [] import time t1 = time.time() f_blogs = open("./working/en_us.blogs.capped.txt", "r") d_blogs = f_blogs.readlines() f_blogs.close()  line in d_blogs:     line_lengths_blogs.append(len(line)) print max(line_lengths_blogs) 

output:

3839 operating time: 0.0128719806671 

r solution:

t1 <- proc.time()  line_lengths_blogs <- data.frame() loaded_file_blogs  <- file("./working/en_us.blogs.capped.txt", open="r") d_blogs            <- readlines(loaded_file_blogs) close(loaded_file_blogs)  for(i in 1:length(d_blogs)) {     line_lengths_blogs <- rbind(line_lengths_blogs, nchar(d_blogs[i])) }  print(max(line_lengths_blogs))  t2 <- proc.time() print(t2-t1) 

output:

[1] 3831        user      system verstrichen        6.015       0.027       6.042  

for moment, don't care elements identified different... why r around 470 times slower?

what doing terribly wrong in r code? or accidental genius @ python programming?

working on mac os x 10.6.8. the exact same data set can obtaine here. first 10'000 lines of data in final/en_us/en_us.blogs.txt of linked file used test. hope problem reproducible.


Comments