i have machine 24 physical cores (at least told so) running debian: linux 3.2.0-4-amd64 #1 smp debian 3.2.68-1+deb7u1 x86_64 gnu/linux. seems correct:
usr@machine:~/$ cat /proc/cpuinfo | grep processor processor : 0 processor : 1 <...> processor : 22 processor : 23 i had issues trying load cores python's multiprocessing.pool.pool. used pool(processes=none); docs python uses cpu_count() if none provided.
alas, only 8 cores 100% loaded, others remained idle (i used htop monitor cpu load). thought cannot cook pools , tried invoke 24 processes "manually":
print 'starting processes...' procs = list() param_set in all_params: # 24 items p = process(target=_wrap_test, args=[param_set]) p.start() procs.append(p) print 'now waiting them.' p in procs: p.join() i had 24 "greeting" messages processes started:
starting processes... executing combination: session len: 15, delta: 10, ratio: 0.1, eps_relabel: 0.5, min_pts_lof: 5, alpha: 0.01, reduce: 500 < ... 22 more messages ... > executing combination: session len: 15, delta: 10, ratio: 0.1, eps_relabel: 0.5, min_pts_lof: 7, alpha: 0.01, reduce: 2000 waiting them. but still 8 cores loaded:

i've read here on there may issues numpy, openblas , multicore execution. how start code:
openblas_main_free=1 python -m tests.my_module and after imports do:
os.system("taskset -p 0xff %d" % os.getpid()) so, here question: should have 100%-load on cores? poor python usage or has os limitations on multicore machines?
updated: 1 more interesting thing inconsistency within htop output. if @ image above, you'll see table below cpu load bars shows 30-50% load more 8 cores, different load bars say. then, top seems agree bars: 8 cores 100%-loaded, others idle.
updated once again:
i used this rather popular post on when added os.system("taskset -p 0xff %d" % os.getpid()) line after imports. have admit didn't think when did that, after reading this:
with line pasted in after module imports, example runs on cores
i'm simple man. see "works charm", copy , paste. anyway, while playing code removed line. after code began executing on 24 cores "manual" process starting scenario. pool scenario same problem remained, no matter whether affinity trick used or not.
i don't think it's real answer 'cause don't know issue pool, @ least managed cores loaded. thank you!
even though solved issue i'll try explain clarify ideas.
for read around, numpy lot of "magic" improve performance. 1 of magic tricks set cpu affinity of process.
the cpu affinity optimisation of os scheduler. enforces given process run on same cpu core.
this improves performance reducing amount of times cpu cache invalidated , increasing benefits reference locality. on high computational tasks these factors indeed important.
what don't of numpy fact implicitly. puzzling developers.
the fact processes not running on cores due fact numpy sets affinity parent process when import module. then, when spawn new processes affinity inherited leading processes fighting few cores instead of efficiently using available ones.
the os.system("taskset -p 0xff %d" % os.getpid()) command instruct os set affinity on cores solving issue.
if want see working on pool can following trick.
import os multiprocessing import pool def set_affinity_on_worker(): """when new worker process created, affinity set cpus""" print("i'm process %d, setting affinity cpus." % os.getpid()) os.system("taskset -p 0xff %d" % os.getpid()) if __name__ == '__main__': p = pool(initializer=set_affinity_on_worker) ...
Comments
Post a Comment