TechniquesLast updated: April 2026

LLM inference

The process of running a trained language model to produce an answer or output based on input text.

In Plain English

LLM inference is what happens when you ask an AI model like ChatGPT a question and it generates a response. The model was trained once (in the past) on vast amounts of text, and inference is the act of using that training to respond to you in real time. Think of it like the difference between a chef learning recipes in culinary school versus actually cooking a meal for a customer—the training happened long ago, but inference is the cooking happening now. Inference requires computational power, and the cost depends on how long your question is and how long the answer needs to be.

💡Real-World Example

When you type a question into ChatGPT and hit Enter, the model performs inference—it reads your words and generates a response word by word. That process happens on powerful servers, costs money based on the number of words processed, and finishes in seconds.

Related Terms

Large Language Model (LLM)Token cost

What did you think of our explanation?

←Large Language Model (LLM)Previous View all terms Machine LearningNext→

Want to learn more about AI?

Explore our curated collection of AI news, tools, and guides — all explained in plain English.

Read Latest News Explore AI Tools