c++ - OpenCL - need to recommended structure -
i have 2 files each 1 has 10000 points each point has 2 double number x , y. need operation on of these points, have 10000 0000 operations (10000 x 10000).
first question: structure recommend? mean variable should pass kernel file?
i have write script , executed 1000 point files (1000000 operations), have put points in 1 array (1000000 x 4) - 4 came x,y first file , x,y file - , passed kernel had 1000000 parallel threads.
local_item_size = 125 global_item_size = 1000000
second question: think can improve structure , how?
third question: script have written working correctly 1000 points files when run 10000 point files faced cl_createbuffer error (cl_invalid_buffer_size 100000000 * 4double input array). think (but not sure) reason huge number of generated threads (100000000)!!
update: - hardware (intel(r) core(tm) i5-4570 cpu @ 3.20ghz, nvidia corporation gm204 [geforce gtx 980]). - have loop 1000 (3 ifs) operations each point, these operations done in kernel , result on each point independent other points.
update2: simplify problem - need multiply 2 matrix , b, has 10000 rows , 2 columns , b has 2 rows , 10000 columns best structure this?
thanks in advance,
regarding update 2: best way of handling matrices storing them in row-order-column. need 2 matrices 20000 elements each. in matrix elements stored 10000 elements per row , 2 rows altogether. in matrix b gives 10.000 rows, each row 2 elements.
take @ profile blog. there (german) tutorial opencl based matrix multiplication.
Comments
Post a Comment