optimization - how to optimize a[i] = b[c[i]] with NEON -


i got simple big(n large) loop here:

for (i=0; i<n; i++) {     dst[i] = src[table[i]]; } 

i want optimize using neon don't know how deal part:src[table[i]]. possible optimize? if yes, how?

thanks @paul r , comment:

this gathered load, , not supported in neon.see: stackoverflow.com/questions/11502332/…

since couldn't optimized neon, tried openmp, , got significant improvement. , code rather simple too:

#pragma omp parallel for (i=0; i<n; i++) {     dst[i] = src[table[i]]; } 

Comments