i have discrete nvidia gpu (say, kepler or maxwell). want clear l2 cache before kernel scheduled, not taint test results.
i allocate large slab of memory , read sequentially lot of that's someplace far away, , work. i'd rather simpler...
notes:
- i'm interested in how in opencl, albeit less so.
- ptx inlining acceptable (but i'd rather write proper code).
Comments
Post a Comment