c++ - Most efficient way to test a 256-bit YMM AVX register element for equal or less than zero -


i'm implementing particle system using intel avx intrinsics. when y-position of particle less or equal 0 want reset particle.

the particle system ordered in soa-pattern this:

class particlesystem {     private:         float*      mxposition;         float*      myposition;         float*      mzposition;          .... rest of code not important question 

my initial approach had in mind iterate through myposition array , check case stated in beginning. perhaps performance improvmentes made approach?

the question if there efficient way implement using avx intrinsics? thank you!

if elements <= 0 relatively sparse 1 simple approach test 8 @ time using avx , drop scalar code when identify vector contains 1 or more such elements, e.g.:

#include <immintrin.h>                                  // avx intrinsics  const __m256 vk0 = _mm256_setzero_ps();                 // const vector of zeros  (int = 0; + 8 <= n; += 8) {     __m256 vy = _mm256_loadu_ps(&myposition[i]);        // load 8 x floats     __m256 vcmp = _mm256_cmp_ps(vy, vk0, _cmp_le_os);   // compare <= 0     int mask = _mm256_movemask_ps(vcmp);                // ms bits comparison result     if (mask != 0)                                      // if bits set     {                                                   // have 1 or more elements <= 0         (int k = 0; k < 8; ++k)                     // test each element in vector         {                                               // using scalar code...             if ((mask & 1) != 0)             {                 // found element @ index + k                 // it...             }             mask >>= 1;         }     } } // deal remaining elements in case n not multiple of 8 (int j = i; j < n; ++j) {     if (myposition[j] <= 0.0f)     {         // found element @ index j         // it...     } } 

of course if matching elements not sparse, i.e. if typically finding 1 or more in every vector of 8, isn't going buy performance gain. if elements sparse, such vectors can skipped, should see significant benefit.


Comments

Popular posts from this blog

IF statement in MySQL trigger -

c++ - What does MSC in "// appease MSC" comments mean? -

javascript - Blogger related post gadget image Resize s72-c [ Need Expert Help ] -