Skip to main content

Sampling

In MCP, the client is responsible for executing LLM sampling requests initiated by servers. Unlike traditional architectures, the client:

  • Owns model access and API keys
  • Applies local policies (cost, privacy, rate limits)
  • Mediates all interaction with language models

Servers never communicate with LLMs directly - they only request sampling.

Important mental model

  • Server requests sampling
  • Client executes sampling
  • Client decides:
    • which model to use
    • whether tools are supported
    • how prompts are handled
  • Client returns structured results back to the server

Client Configuration

Sampling support must be explicitly enabled on the client:

let mut client = Client::new()
.with_options(|opt| opt
.with_sampling(|s| s.with_tools()));

Sampling Handler

To support sampling, a client must define a handler annotated with #[sampling] attribute macro. This handler receives a CreateMessageRequestParams object and returns a CreateMessageResult.

use neva::prelude::*;

#[sampling]
async fn sampling_handler(params: CreateMessageRequestParams) -> CreateMessageResult {
println!("Received sampling request: {:?}", params);

// Client-side sampling logic goes here
}

The handler is invoked every time a server calls Context::sample().

Inspecting Sampling Requests

The incoming CreateMessageRequestParams contains:

  • Prompt messages
  • System prompt
  • Model preferences
  • Tools metadata
  • Previous tool results (for multi-step sampling)

Access text prompts

let prompts: Vec<&TextContent> = params.text().collect();

This includes all user and assistant text messages accumulated so far.

Detecting tool usage requests

The client can check whether the server allows or expects tool usage via tool_choice:

if params.tool_choice.is_some_and(|c| !c.is_none()) {
// Model is allowed or required to call tools
}

This allows the client to decide whether to produce tool calls or final text.

Tool Use

If tools are enabled, the client may respond with a tool invocation request instead of plain text.

CreateMessageResult::assistant()
.with_model("gpt-5")
.use_tools([
("get_weather", ("city", "London"))
])
note
  • Tool execution is always performed by the server
  • The client only returns intent to call tools
  • Tool arguments must match the tool schema

Handling Tool Results

After the server executes tools, it will issue a follow-up sampling request containing tool results.

These results are available via:

let results: Vec<&ToolResult> = params.results().collect();

At this stage, the client should typically:

  • Interpret tool outputs
  • Generate a final assistant response
  • End the sampling turn

Producing Final Responses

To return a normal assistant message and end the sampling loop:

CreateMessageResult::assistant()
.with_model("gpt-5")
.with_content("Final response text")
.end_turn()

Calling end_turn() signals to the server that sampling is complete.

When to Customize Client Sampling

Consider custom sampling logic when:

  • You need to integrate proprietary or local models
  • You want fine-grained cost or latency control
  • You want to apply prompt filtering or auditing
  • You need deterministic or policy-driven responses

Learn By Example

A complete working example is available here.