for loop - Python: Why is popping off a queue faster than for-in block? -


i have been working on python script analyze csvs. of these files large (1-2 million records), , script taking hours complete.

i changed way records processed for-in loop while loop, , speedup remarkable. demonstration below:

>>> def for_list(): ...     d in data: ...             bunk = d**d ...  >>> def while_list(): ...     while data: ...             d = data.pop(0) ...             bunk = d**d ...  >>> data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] >>> import timeit >>> timeit.timeit(for_list) 1.0698931217193604 >>> timeit.timeit(while_list) 0.14515399932861328 

almost order of magnitude faster. i've never looked @ python bytecode, though might telling, turns out while_list has more instructions.

so what's going on here? there principle here can apply other programs? there scenarios for ten times faster while?

edit: @happyleapsecond pointed out, didn't quite understand going on inside timeit discrepancy gone following:

>>> def for_list(): ...     data = [x x in range(1000)] ...     d in data: ...             bunk = d**d ...  >>> def while_list(): ...     data = [x x in range(1000)] ...     while data: ...             d = data.pop(0) ...             bunk = d**d >>> timeit.timeit(while_list, number=1000) 12.006330966949463 >>> timeit.timeit(for_list, number=1000) 11.847280025482178 

which makes odd "real" script sped such simple change. best guess iteration method requiring more swapping? have 40g swap partition, script fills 15-20g of it. popping reduce swapping?

while_list mutating global data. timeit.timeit not reset value of data. timeit.timeit calls for_list , while_list million times each default. after first call while_list, subsequent calls while_list return after performing 0 loops because data empty.

you need reset value of data before each call for_list , while_list perform fair benchmark.


import timeit  def for_list(data):     d in data:         bunk = d ** d   def while_list(data):     while data:         d = data.pop(0)         bunk = d ** d  data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]  print(timeit.timeit('data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]; for_list(data)', 'from __main__ import for_list')) # 0.959696054459  print(timeit.timeit('data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]; while_list(data)', 'from __main__ import while_list')) # 2.40107011795 

pop(0) o(n) operation. performing inside loop of length n makes while_list have overall time complexity o(n**2), compared o(n) complexity for_list. expected, for_list faster, , advantage grows n, length of data, gets larger.


Comments