Free NVIDIA NCA-GENL Actual Exam Questions - Question 7 Discussion

Question No. 7
In the context of data preprocessing for Large Language Models (LLMs), what does tokenization refer
to?
Select one option, then reveal solution.
US
DY
Daniel Y.
2026-02-20

A/B? I get that tokenization is mainly splitting text into tokens (A), but sometimes people consider the whole step that maps tokens to numbers as part of tokenization in LLM pipelines, which would be B. Still, strictly speaking, tokenization itself is about splitting, not number conversion. So A fits better if we separate those steps. C and D are definitely out since they’re different preprocessing tasks.

0
DY
Daniel Y.
2026-02-15

A, since tokenization is fundamentally about chopping text into pieces before any number stuff.

0
DY
Daniel Y.
2026-02-01

I’d say tokenization is about breaking text down, so A feels right over B.

0
RK
Ryan K.
2026-01-28

Exactly, tokenization means breaking text into chunks, so A.

0
LM
Luke M.
2026-01-25

Agreed, tokenization is about breaking text up, so A it is.

0
LM
Luke M.
2026-01-23

Makes sense to rule out B and C since they describe other steps. Tokenization is really just chopping text up, so A’s the way to go.

0
JJ
John J.
2026-01-16

Probably A. Tokenization usually means breaking text into words or smaller parts, not turning them into numbers or removing stop words. D seems off too, more like augmentation.

0