Sampling

In MCP, the client is responsible for executing LLM sampling requests initiated by servers. Unlike traditional architectures, the client:

Owns model access and API keys
Applies local policies (cost, privacy, rate limits)
Mediates all interaction with language models

Servers never communicate with LLMs directly - they only request sampling.

Important mental model

Server requests sampling

Client executes sampling

Client decides:

which model to use

whether tools are supported

how prompts are handled

Client returns structured results back to the server

Client Configuration

Sampling support must be explicitly enabled on the client:

let mut client = Client::new()
    .with_options(|opt| opt
        .with_sampling(|s| s.with_tools()));

with_sampling() enables sampling support
with_tools() allows tool calls during sampling

Sampling Handler

To support sampling, a client must define a handler annotated with #[sampling] attribute macro. This handler receives a CreateMessageRequestParams object and returns a CreateMessageResult.

use neva::prelude::*;

#[sampling]
async fn sampling_handler(params: CreateMessageRequestParams) -> CreateMessageResult {
    println!("Received sampling request: {:?}", params);

    // Client-side sampling logic goes here
}

The handler is invoked every time a server calls Context::sample().

Inspecting Sampling Requests

The incoming CreateMessageRequestParams contains:

Prompt messages
System prompt
Model preferences
Tools metadata
Previous tool results (for multi-step sampling)

Access text prompts

let prompts: Vec<&TextContent> = params.text().collect();

This includes all user and assistant text messages accumulated so far.

Detecting tool usage requests

The client can check whether the server allows or expects tool usage via tool_choice:

if params.tool_choice.is_some_and(|c| !c.is_none()) {
    // Model is allowed or required to call tools
}

This allows the client to decide whether to produce tool calls or final text.

Tool Use

If tools are enabled, the client may respond with a tool invocation request instead of plain text.

CreateMessageResult::assistant()
    .with_model("gpt-5")
    .use_tools([
        ("get_weather", ("city", "London"))
    ])

note

Tool execution is always performed by the server
The client only returns intent to call tools
Tool arguments must match the tool schema

Handling Tool Results

After the server executes tools, it will issue a follow-up sampling request containing tool results.

These results are available via:

let results: Vec<&ToolResult> = params.results().collect();

At this stage, the client should typically:

Interpret tool outputs
Generate a final assistant response
End the sampling turn

Producing Final Responses

To return a normal assistant message and end the sampling loop:

CreateMessageResult::assistant()
    .with_model("gpt-5")
    .with_content("Final response text")
    .end_turn()

Calling end_turn() signals to the server that sampling is complete.

When to Customize Client Sampling

Consider custom sampling logic when:

You need to integrate proprietary or local models
You want fine-grained cost or latency control
You want to apply prompt filtering or auditing
You need deterministic or policy-driven responses

Learn By Example

A complete working example is available here.

Client Configuration​

Sampling Handler​

Inspecting Sampling Requests​

Access text prompts​

Detecting tool usage requests​

Tool Use​

Handling Tool Results​

Producing Final Responses​

When to Customize Client Sampling​

Learn By Example​