Skip to main content

Kernels

In taichi.js, Javascript functions that are compiled into WebGPU shaders are called "kernels". Kernels are created by passing a lambda function to ti.kernel(). For example:

let increment = ti.kernel(
(f) => {
return f + 1.0;
}
)
console.log(await increment(42.0)) // logs 43.0

Here, taichi.js will compile the lambda (f) => {return f + 1.0;} into a WebGPU compute shader. Upon calling increment(42.0), the value 42.0 is passed to WebGPU. After the GPU computes the incremented value, it is written back to the CPU to produce the value 43.0.

Kernel Scope

In taichi.js, every variable that is accessed in a kernel needs to be explictly declared via ti.addToKernelScope({ ... }). As an example, the following kernel adds N to its input argument, where N is a Javascript-scope variable:

let N = 100
ti.addToKernelScope({ N })
let addN = ti.kernel(
(f) => {
return f + N;
}
)
console.log(await addN(42.0)) // logs 142.0

If the line ti.addToKernelScope({ N }) is exluded, then, upon compiling the lambda (f) => { return f + N; }, the compiler wouldn't be able to know what N is, and will thus throw an error.

info

Notice that, in Javascript, the notation { N } is a shorthand for {N: N}. So we are actually writing

ti.addToKernelScope({ N: N })

which means we are adding a variable named N into the kernel scope, whose value is equal to N. We can of course also write

ti.addToKernelScope({ M: N })

and then use M within the kernel.

Automatic Parallelization

The most important feature of kernels is that it automatically parallelizes Javascript code. More specifically, every top-level ranged for-loop in a kernel is parallelized so that all loop indices are executed in parallel. As an example, the following code segment increments 1000 numbers in parallel on the GPU:

let x = ti.field(ti.i32, 1000)
ti.addToKernelScope({ x })
let k = ti.kernel(() => {
for(let i of ti.range(1000)){
x[i] = x[i] + 1
}
})
await k()

Here, the variable i loops over the range [0, 1000). The compiler recognizes this and automatically distributes these 1000 tasks into 1000 parallel GPU threads. This means that instead of iterating through the members of x one-by-one, the kernel will increment every element at the same time.

In the previous example, we incremented every element of a 1D array in paralle. If you wish to parallelize tasks over high-dimensional data structures, you can do this in taichi.js via ti.ndrange:

let x = ti.field(ti.i32, [1000, 1000])
ti.addToKernelScope({ x })
let k = ti.kernel(() => {
for(let I of ti.ndrange(1000, 1000)){
x[I] = x[I] + 1
}
})
await k()

In this example, the loop index I takes tuple values ranging from [0, 0] to [999, 999]. The taichi.js framework will automatically allocate GPU threads that parallelize over these 1000000 different loop index values.

Arguments

As shown in the first example in this page, you can pass arguments to kernels. By default, arguments are of the type ti.f32. If you wish to pass arguments of other types, you will need to explicitly tell the compiler what the argument types are. As an example, the following kernel takes in a vector, and returns the sum of its components:

let f = ti.kernel(
{v: ti.types.vector(ti.f32, 3)},
(v) => {
return v[0] + v[1] + v[2];
}
)
console.log(await f([1.0, 2.0, 3.0])) // logs 6.0

Admittedly, this type annotation syntax is a bit unconventional, and arguably cumbersome. However, it is a necessary evil which reconciles the lack of type annotations in Javascript's syntax and the strong type system of WebGPU.

Sequential Execution

Although the most important feature of kernels is the auto-parallelization of for loops, it is sometimes useful for kernels to have a little bit of sequential logic. This is allowed in taichi.js by simply putting the logic at the top-level of the kernel function. As an example, in the following kernel, we compute the sin() of the input argument t, and accumulate the result in every element of a field:

let x = ti.field(ti.i32, [1000, 1000])
ti.addToKernelScope({ x })
let k = ti.kernel((t) => {
let s = ti.sin(t)
for(let I of ti.ndrange(1000, 1000)){
x[I] = x[I] + s
}
})
await k(0.5)

Here, since the ti.sin() call exists outside of the for loop, it will be executed on a single GPU thread. Notice that its return value, s, can still be used freely within the parallel loop.

Asyncness

ti.kernel() always returns an async function. Semantically, this means that upon invoking this function, the GPU pipelines will be started, but the function will not wait for GPU operations to complete before returning. Instead, as soon as the GPU started its work, the kernel will return a Promise object. An await on the promise will ensure the completion of the GPU pipelines and fetch any return values it returns.

Rendering

In addition to writing compute pipelines with auto-parallelized for-loops, ti.kernel can also be used to define WebGPU render pipelines. This will be discussed in details in a later chapter.