CUDA Thread Indexing
When I was learning CUDA programming, I was initially stuck on thread indexing. So I thought to write this blog post to help novices in CUDA programming to understand thread indexing easily. I hope you have the knowledge of CUDA architecture before reading this.
First I’ll introduce the basic terminology in CUDA programming and variables we need to know for thread indexing.
Kernel — GPU function
Grid — A kernel (GPU function) is launched as a collection of thread blocks called Grid. A grid is composed of thread blocks.
Grid size is defined using the number of blocks. For example Grid of size 6 contains 6 thread blocks. If the grid is 1D →all 6 blocks are in one dimension (eg: 1x6). If the grid is 2D →6 blocks are in two dimensions (eg: 3x2)
Block — A collection of threads.
Thread — Single execution unit that runs GPU function (kernel) on GPU.
Variables:
I’ll define the variables by giving examples from the following diagram.
Number of blocks in the grid:
gridDim.x — number of blocks in the x dimension of the grid (eg: 3)
gridDim.y — number of blocks in the y dimension of the grid (eg:2)
Number of threads in a block:
blockDim.x — number of threads in the x dimension of the block (eg:4)
blockDim.y — number of threads in the y dimension of the block (eg:3)
Block Index:
blockIdx.x — block’s index in x dimension
blockIdx.y — block’s index in y dimension
eg: block (0,1) — blockIdx.x = 0 , blockIdx.y = 1
Thread Index:
ThreadIdx.x — thread’s index in x dimension
ThreadIdx.y — thread’s index in y dimension
eg: Thread(2,1) — ThreadIdx.x = 2, ThreadIdx.y = 1
Now we can head into the thread indexing. We have to do thread indexing using the above explained variables. By thread indexing we are getting a unique number for each thread and each block in a grid.
1D grid of 1D blocks
threadId = (blockIdx.x * blockDim.x) + threadIdx.x
Let’s check the equation for Thread (2,0) in Block (1,0).
Thread ID = (1 * 3) + 2 =3+2 = 5
1D grid of 2D blocks
Here we have various ways for indexing. I’ll show you two methods.
threadId = (blockIdx.x * blockDim.x * blockDim.y) + (threadIdx.y * blockDim.x) + threadIdx.x
Let’s check the equation for Thread (2,1) in Block (1,0).
Thread ID = (1*3*2)+(1*3)+2 = 6+3+2 =11
Here (1*3*2) → count threads in block 0
(1*3) →count thread(0,0),(1,0) and (2,0) in block 1
Then add the threadIdx.x of the particular thread.
Here is the second method.
threadId = (gridDim.x * blockDim.x * threadIdx.y) + (blockDim.x * blockIdx.x) + threadIdx.x
Let’s check the equation for Thread (2,1) in Block (1,0).
Thread ID = (4 * 3 * 1) +(1 * 3)+2 = 12+3+2 =17
Here (4*3*1) → count thread(0,0),(1,0) and (2,0) in block 0,1,2 and 3
(1*3) →count thread(0,1),(1,1) and (2,1) in block 0
Then add the threadIdx.x of the particular thread.
2D grid of 1D blocks
blockId = (gridDim.x * blockIdx.y) + blockIdx.x
threadId = (blockId * blockDim.x) + threadIdx.x
Let’s check the equation for Thread (1,0) in Block (1,1).
blockId = (2*1) + 1 =2+1=3
threadID = (3*3)+1 =9+1=10
2D grid of 2D blocks
blockId = (gridDim.x * blockIdx.y) + blockIdx.x
threadId = (blockId * (blockDim.x * blockDim.y)) + (threadIdx.y * blockDim.x) + threadIdx.x
Let’s check the equation for Thread (2,1) in Block (0,1).
block Id = (2 * 1) + 0 = 2
Thread Id = (2 * (3 * 2))+(1*3) + 2 = 12+3+2 = 17
Done ✅ I hope now you can construct the thread indexing equation for 1D grid of 3D blocks, 2D grid of 3D blocks by your own.
📝 Read this story later in Journal.
👩💻 Wake up every Sunday morning to the week’s most noteworthy stories in Tech waiting in your inbox. Read the Noteworthy in Tech newsletter.