i have been working on python script analyze csvs. of these files large (1-2 million records), , script taking hours complete.
i changed way records processed for-in loop while loop, , speedup remarkable. demonstration below:
>>> def for_list(): ... d in data: ... bunk = d**d ... >>> def while_list(): ... while data: ... d = data.pop(0) ... bunk = d**d ... >>> data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] >>> import timeit >>> timeit.timeit(for_list) 1.0698931217193604 >>> timeit.timeit(while_list) 0.14515399932861328 almost order of magnitude faster. i've never looked @ python bytecode, though might telling, turns out while_list has more instructions.
so what's going on here? there principle here can apply other programs? there scenarios for ten times faster while?
edit: @happyleapsecond pointed out, didn't quite understand going on inside timeit discrepancy gone following:
>>> def for_list(): ... data = [x x in range(1000)] ... d in data: ... bunk = d**d ... >>> def while_list(): ... data = [x x in range(1000)] ... while data: ... d = data.pop(0) ... bunk = d**d >>> timeit.timeit(while_list, number=1000) 12.006330966949463 >>> timeit.timeit(for_list, number=1000) 11.847280025482178 which makes odd "real" script sped such simple change. best guess iteration method requiring more swapping? have 40g swap partition, script fills 15-20g of it. popping reduce swapping?
while_list mutating global data. timeit.timeit not reset value of data. timeit.timeit calls for_list , while_list million times each default. after first call while_list, subsequent calls while_list return after performing 0 loops because data empty.
you need reset value of data before each call for_list , while_list perform fair benchmark.
import timeit def for_list(data): d in data: bunk = d ** d def while_list(data): while data: d = data.pop(0) bunk = d ** d data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] print(timeit.timeit('data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]; for_list(data)', 'from __main__ import for_list')) # 0.959696054459 print(timeit.timeit('data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]; while_list(data)', 'from __main__ import while_list')) # 2.40107011795 pop(0) o(n) operation. performing inside loop of length n makes while_list have overall time complexity o(n**2), compared o(n) complexity for_list. expected, for_list faster, , advantage grows n, length of data, gets larger.
Comments
Post a Comment