🚀 GPU Kernel Codegen Quiz

Question 1 of 10 | Score: 0/10

1. What are the two main GPU architectures supported by this codegen?

Correct Answer: CUDA and Metal
The code explicitly handles two GPU architectures through the GPUArch enum: GPUArch::CUDA and GPUArch::Metal. You can see this in the kernel generation where different syntax is used for each architecture.

2. What is the purpose of the `split_kernels` function?

Correct Answer: To break the computation graph into separate GPU kernels
The split_kernels function takes a computation graph and splits it into multiple kernel graphs, handling dependencies between kernels. This is necessary because GPU kernels have limitations on synchronization and memory access patterns.

3. What does the `var_to_char` function do?

Correct Answer: Generates unique variable names using letters
The function converts integer variable IDs to letter-based names (a, b, c, ..., z, aa, ab, etc.). It uses base-26 encoding to create readable variable names in the generated kernel code.

4. What is the maximum number of loop levels before switching to explicit for-loops?

Correct Answer: 6
The code uses GPU thread dimensions (blockIdx.x/y/z and threadIdx.x/y/z) for the first 6 loop levels, then switches to explicit for-loops for deeper nesting. This is evident in the condition if *loop_level < 6.

5. What does `GraphTerm::NewAcc` represent?

Correct Answer: A new accumulator variable for reductions
NewAcc creates an accumulator array with a starting value, used for reduction operations. The code calculates the size needed and initializes the accumulator with the starting value.

6. What is the difference between `SMEMLoad` and `SMEMRead`?

Correct Answer: SMEMLoad loads from global memory to shared memory, SMEMRead reads from shared memory
SMEMLoad copies data from global memory to shared memory with a sync barrier, while SMEMRead just reads from the shared memory location that was previously loaded.

7. How are kernel dependencies handled in the meta-graph?

Correct Answer: Through global memory buffers and topological sorting
The code creates a meta-graph of kernel dependencies, using GMEMBuffer::PrevKernel to track outputs from previous kernels as inputs to subsequent kernels. Topological sorting ensures correct execution order.

8. What does the `stride` parameter in `LoopIn`/`LoopOut` control?

Correct Answer: Memory offset for each loop iteration
The stride determines how much to offset pointer addresses for each loop iteration. Special stride values like accumulator strides (checked with stride.is_acc()) handle reduction operations differently.

9. What triggers the creation of a new kernel in `split_kernels`?

Correct Answer: Loop dependencies between grid/block level operations
A new kernel is created when there's a dependency between LoopOut and LoopIn operations at the grid/block level (less than 3 levels deep), as these require synchronization across the entire GPU.

10. What is the purpose of the `toposort_subset` function?

Correct Answer: To topologically sort only a subset of graph nodes
This function performs topological sorting on a specified subset of nodes from the graph, maintaining dependency order while only considering the nodes in the given subset. This is used when processing individual kernel subgraphs.

🎉 Quiz Complete!

Your final score: 0/10