Tokenization
Breaking text into small chunks (tokens) that an AI model can read and process.
In Plain English
Tokenization is how AI language models prepare text for processing. Before an AI can understand or respond to your question, it breaks your words into bite-sized pieces called tokens—which might be whole words, parts of words, or punctuation marks. Think of it like a scanner at a grocery store reading barcodes instead of looking at entire products. Different AI models use different tokenization rules, which is why the same sentence might break into different-sized chunks depending on which AI system reads it. This step matters because it determines how the AI "sees" your input and directly affects how well it understands and responds to you.
💡Real-World Example
When you type "I can't wait for coffee!" into an AI chatbot, tokenization breaks it into pieces the model can process—perhaps "I", "can", "'", "t", "wait", "for", "coffee", "!". The AI counts these tokens because it has limits on how many it can handle in one conversation; some complex requests might use hundreds of tokens, while simple ones use just a handful.
Related Terms
What did you think of our explanation?
