i got simple big(n large) loop here:
for (i=0; i<n; i++) { dst[i] = src[table[i]]; } i want optimize using neon don't know how deal part:src[table[i]]. possible optimize? if yes, how?
thanks @paul r , comment:
this gathered load, , not supported in neon.see: stackoverflow.com/questions/11502332/…
since couldn't optimized neon, tried openmp, , got significant improvement. , code rather simple too:
#pragma omp parallel for (i=0; i<n; i++) { dst[i] = src[table[i]]; }
Comments
Post a Comment