When I was learning CUDA programming, I was initially stuck on thread indexing. So I thought to write this blog post to help novices in CUDA programming to understand thread indexing easily. I hope you have the knowledge of CUDA architecture before reading this.

First I’ll introduce the basic terminology in CUDA programming and variables we need to know for thread indexing.

Kernel — GPU function

Grid — A kernel (GPU function) is launched as a collection of thread blocks called Grid. A grid is composed of thread blocks.

Grid size is defined using the number of blocks. For example Grid of size 6 contains 6 thread blocks. If the grid is 1D →all 6 blocks are in one dimension (eg: 1x6). If the grid is 2D →6 blocks are in two dimensions (eg: 3x2)

Block — A collection of threads.

Thread — Single execution unit that runs GPU function (kernel) on GPU.

Variables:

I’ll define the variables by giving examples from the following diagram.

Number of blocks in the grid:

gridDim.x — number of blocks in the x dimension of the grid (eg: 3)

gridDim.y — number of blocks in the y dimension of the grid (eg:2)

Number of threads in a block:

blockDim.x — number of threads in the x dimension of the block (eg:4)

blockDim.y — number of threads in the y dimension of the block (eg:3)

Block Index:

blockIdx.x — block’s index in x dimension

blockIdx.y — block’s index in y dimension

eg: block (0,1) — blockIdx.x = 0 , blockIdx.y = 1

Now we can head into the thread indexing. We have to do thread indexing using the above explained variables. By thread indexing we are getting a unique number for each thread and each block in a grid.

1D grid of 1D blocks

`threadId = (blockIdx.x * blockDim.x) + threadIdx.x`

Let’s check the equation for Thread (2,0) in Block (1,0).

Thread ID = (1 * 3) + 2 =3+2 = 5

1D grid of 2D blocks

Here we have various ways for indexing. I’ll show you two methods.

`threadId = (blockIdx.x * blockDim.x * blockDim.y) + (threadIdx.y * blockDim.x) + threadIdx.x`

Let’s check the equation for Thread (2,1) in Block (1,0).

Thread ID = (1*3*2)+(1*3)+2 = 6+3+2 =11

Here (1*3*2) → count threads in block 0

(1*3) →count thread(0,0),(1,0) and (2,0) in block 1

Here is the second method.

`threadId = (gridDim.x * blockDim.x * threadIdx.y) + (blockDim.x * blockIdx.x) + threadIdx.x`

Let’s check the equation for Thread (2,1) in Block (1,0).

Thread ID = (4 * 3 * 1) +(1 * 3)+2 = 12+3+2 =17

Here (4*3*1) → count thread(0,0),(1,0) and (2,0) in block 0,1,2 and 3

(1*3) →count thread(0,1),(1,1) and (2,1) in block 0

2D grid of 1D blocks

`blockId = (gridDim.x * blockIdx.y) + blockIdx.x`

`threadId = (blockId * blockDim.x) + threadIdx.x`

Let’s check the equation for Thread (1,0) in Block (1,1).

blockId = (2*1) + 1 =2+1=3

2D grid of 2D blocks

`blockId = (gridDim.x * blockIdx.y) + blockIdx.x`

`threadId = (blockId * (blockDim.x * blockDim.y)) + (threadIdx.y * blockDim.x) + threadIdx.x`

Let’s check the equation for Thread (2,1) in Block (0,1).

block Id = (2 * 1) + 0 = 2

Thread Id = (2 * (3 * 2))+(1*3) + 2 = 12+3+2 = 17

Done ✅ I hope now you can construct the thread indexing equation for 1D grid of 3D blocks, 2D grid of 3D blocks by your own.

📝 Read this story later in Journal.

👩‍💻 Wake up every Sunday morning to the week’s most noteworthy stories in Tech waiting in your inbox. Read the Noteworthy in Tech newsletter.