A CLI to evaluate MCP servers performance
- Export your Openrouter API key as
OPENROUTER_API_KEY
environment variable``
$ export OPENROUTER_API_KEY=<your-key>
- Write your
myserver.yml
test case
test_cases:
- name: "Open a contribution PR on Github"
input_prompt: "I'd like to contribute to mcp-eval. I want to enable ... feature. I'll let you go ahead and implement the feature as you see fit. Open a pull request with the proposed modification once you're done."
expected_tool_call:
tool_name: "open-pr"
parameters:
branch: "new-feature"
- Run your test suite
$ npx -y @alpic-ai/mcp-eval@latest run --url=https://mcp.github.com ./myserver.yml
- Et voilà 🎉!
- Nodejs >= 22
- StreamableHTTP compatible MCP server
$ npm install -g @alpic-ai/mcp-eval
$ mcp-eval COMMAND
running command...
$ mcp-eval (--version)
@alpic-ai/mcp-eval/0.6.0 darwin-arm64 node-v22.17.1
$ mcp-eval --help [COMMAND]
USAGE
$ mcp-eval COMMAND
...
Run the test suite described in the provided YAML file.
USAGE
$ mcp-eval run TESTFILE -u <value> [-a anthropic/claude]
ARGUMENTS
TESTFILE YAML file path containing the test suite
FLAGS
-a, --assistant=<option> [default: anthropic/claude] Assistant configuration to use (impact model and system prompt)
<options: anthropic/claude>
-u, --url=<value> (required) URL of the MCP server
DESCRIPTION
Run the test suite described in the provided YAML file.
EXAMPLES
$ mcp-eval run
See code: src/commands/run.ts
Test suite should be written in YAML.
A test suite file should have a root test_cases
property with at least one test.
Each test requires:
name
: a convenient name for your testinput_prompt
: the initial prompt to send to the assistant from which the response should be evaluatedexpected_tool_call
: an object detailing the expected tool to be called with:tool_name
: the name as advertized by the MCP server of the tool to be calledparameters
: the expected set of parameters the tool is expected to be called with. Only these specified properties will be checked during comparison with the actual tool call. Extra properties set by the model will not cause the test to fail.
test_cases:
- name: "Find flights from Paris to Tokyo"
input_prompt: "I'd like to plan a trip to Tokyo, Japan. Find me a flight from Paris to Tokyo on October 3rd and returning on October 5th."
expected_tool_call:
tool_name: "search-flight"
parameters:
flyFrom: Paris
flyTo: Tokyo
departureDate: 03/10/2025
returnDate: 05/10/2025