RAG workflow

This is most asked question in any AI Interview (Answer shared below)

Two users ask the exact same question to your LLM-powered application, but they receive slightly different answers each time.

Why does this happen even when the input is identical?

What factors inside the LLM pipeline can cause this?

How do you solve this?

Answer: This usually happens because LLMs are probabilistic models, not rule-based systems.

Even for the same input, the model predicts the next most likely token at each step, and small randomness in token selection can lead to slightly different responses.

Some common reasons are:

Temperature / top-p settings → higher values increase randomness
Non-deterministic sampling → model may choose different valid next words
RAG retrieval differences → different chunks may be fetched from the vector DB
Prompt variations → hidden system prompts, chat history, or metadata may differ
Model version / load balancing → requests may hit different model snapshots
Tool or agent pipeline changes → routing logic may choose different paths

How to reduce this in production? Use temperature = 0 or very low, keep prompts fixed, make retrieval deterministic, pin the model version, and standardize the pipeline flow. This helps make responses more consistent and production-ready.

Contents

RAG workflow

How do you solve this?