Skip to main content

Optimizing MCP Performance: Speed Up Your AI Workflows

· 9 min read
ToolBoost Team
ToolBoost Engineering Team

You've deployed MCPs and they're working great—but sometimes responses feel slow. In this guide, we'll explore proven techniques to optimize MCP performance and deliver lightning-fast AI experiences.

Understanding MCP Performance

MCP performance depends on multiple factors:

Total Response Time =
Network Latency +
MCP Server Processing +
External API Calls +
Data Transfer +
AI Processing

Let's optimize each component.

Quick Wins (5-Minute Optimizations)

1. Use Connection Pooling

Problem: Creating new database connections for each request is slow.

Before (Slow):

// Creating new connection each time
server.setRequestHandler(CallToolRequestSchema, async (request) => {
const client = await pg.connect({
host: 'db.example.com',
database: 'mydb',
user: 'user',
password: 'password'
});

const result = await client.query('SELECT * FROM users');
await client.end();

return { content: [{ type: 'text', text: JSON.stringify(result.rows) }] };
});

After (Fast):

// Connection pool created once
const pool = new pg.Pool({
host: 'db.example.com',
database: 'mydb',
user: 'user',
password: 'password',
max: 20, // Maximum connections
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
});

server.setRequestHandler(CallToolRequestSchema, async (request) => {
const client = await pool.connect();

try {
const result = await client.query('SELECT * FROM users');
return { content: [{ type: 'text', text: JSON.stringify(result.rows) }] };
} finally {
client.release();
}
});

Result: 200ms → 50ms (4x faster)

2. Cache Expensive Operations

Problem: Fetching the same data repeatedly.

Solution: Add caching

import NodeCache from 'node-cache';

// Cache for 5 minutes
const cache = new NodeCache({ stdTTL: 300 });

server.setRequestHandler(CallToolRequestSchema, async (request) => {
if (request.params.name === 'list_users') {
// Check cache first
const cached = cache.get('users');
if (cached) {
return { content: [{ type: 'text', text: JSON.stringify(cached) }] };
}

// Cache miss - fetch from database
const result = await pool.query('SELECT * FROM users');

// Store in cache
cache.set('users', result.rows);

return { content: [{ type: 'text', text: JSON.stringify(result.rows) }] };
}
});

Smart Cache Invalidation:

server.setRequestHandler(CallToolRequestSchema, async (request) => {
if (request.params.name === 'create_user') {
// Create user
await pool.query('INSERT INTO users ...');

// Invalidate cache
cache.del('users');
}
});

Result: 100ms → 5ms for cached responses (20x faster)

3. Limit Response Size

Problem: Returning too much data.

Before:

// Returns 10,000 users (5MB of JSON)
const result = await pool.query('SELECT * FROM users');

After:

// Add pagination
const { page = 1, limit = 100 } = request.params.arguments;
const offset = (page - 1) * limit;

const result = await pool.query(
'SELECT id, name, email FROM users LIMIT $1 OFFSET $2',
[limit, offset]
);

// Return summary + pagination
return {
content: [{
type: 'text',
text: JSON.stringify({
users: result.rows,
page,
total: result.rowCount,
hasMore: result.rowCount === limit
})
}]
};

Result: 5MB → 50KB response size, 2000ms → 200ms

4. Use Parallel Requests

Problem: Sequential external API calls.

Before (Slow):

// Sequential - 300ms total
const user = await fetch(`/api/users/${id}`);
const posts = await fetch(`/api/posts?user=${id}`);
const comments = await fetch(`/api/comments?user=${id}`);

After (Fast):

// Parallel - 100ms total
const [user, posts, comments] = await Promise.all([
fetch(`/api/users/${id}`),
fetch(`/api/posts?user=${id}`),
fetch(`/api/comments?user=${id}`)
]);

Result: 300ms → 100ms (3x faster)

5. Optimize Database Queries

Add Indexes:

-- Before: Full table scan (2000ms)
SELECT * FROM users WHERE email = 'user@example.com';

-- Add index
CREATE INDEX idx_users_email ON users(email);

-- After: Index scan (5ms)

Result: 2000ms → 5ms (400x faster)

Medium Optimizations (30-Minute Improvements)

6. Implement Request Batching

Problem: Multiple small requests instead of one batch.

Before:

// AI makes 10 separate requests
for (const userId of userIds) {
await callTool('get_user', { id: userId });
}
// Total: 10 × 100ms = 1000ms

After:

// Add batch endpoint
server.setRequestHandler(ListToolsRequestSchema, async () => {
return {
tools: [
// ... other tools
{
name: 'get_users_batch',
description: 'Get multiple users in one request',
inputSchema: {
type: 'object',
properties: {
ids: {
type: 'array',
items: { type: 'string' },
description: 'User IDs to fetch'
}
}
}
}
]
};
});

server.setRequestHandler(CallToolRequestSchema, async (request) => {
if (request.params.name === 'get_users_batch') {
const { ids } = request.params.arguments;

const result = await pool.query(
'SELECT * FROM users WHERE id = ANY($1)',
[ids]
);

return { content: [{ type: 'text', text: JSON.stringify(result.rows) }] };
}
});

Result: 1000ms → 150ms (6.7x faster)

7. Stream Large Responses

Problem: Waiting for entire response before displaying anything.

Solution: Use Server-Sent Events (SSE)

import { SSEServerTransport } from '@modelcontextprotocol/sdk/server/sse.js';

// Enable streaming
const transport = new SSEServerTransport('/mcp', async (request) => {
// Stream chunks as they become available
for await (const chunk of processLargeDataset()) {
yield {
type: 'text',
text: JSON.stringify(chunk)
};
}
});

Benefits:

  • User sees results immediately
  • Better perceived performance
  • Can cancel long-running operations

8. Compress Responses

For HTTP Transport:

import compression from 'compression';
import express from 'express';

const app = express();

// Enable gzip compression
app.use(compression({
threshold: 1024, // Only compress responses > 1KB
level: 6 // Compression level (0-9)
}));

Result: 500KB → 50KB (10x smaller, faster transfer)

9. Lazy Load Resources

Problem: Loading all resources upfront.

Before:

server.setRequestHandler(ListResourcesRequestSchema, async () => {
// Loads ALL files immediately
const files = await readAllProjectFiles();

return {
resources: files.map(f => ({
uri: `file:///${f.path}`,
name: f.name,
mimeType: 'text/plain'
}))
};
});

After:

server.setRequestHandler(ListResourcesRequestSchema, async () => {
// Just list file paths, don't read content
const filePaths = await listFilePathsOnly();

return {
resources: filePaths.map(path => ({
uri: `file:///${path}`,
name: basename(path),
mimeType: getMimeType(path)
}))
};
});

server.setRequestHandler(ReadResourceRequestSchema, async (request) => {
// Only read when specifically requested
const content = await readFile(request.params.uri);

return {
contents: [{
uri: request.params.uri,
mimeType: 'text/plain',
text: content
}]
};
});

Result: 5000ms → 50ms for listing

10. Optimize JSON Serialization

Use faster JSON libraries:

import { stringify } from 'fast-json-stringify';

// Create schema for your data
const stringifyUser = stringify({
type: 'object',
properties: {
id: { type: 'string' },
name: { type: 'string' },
email: { type: 'string' }
}
});

// 2-3x faster than JSON.stringify
const json = stringifyUser(user);

Advanced Optimizations (Multi-Hour Projects)

11. Add a CDN/Edge Caching

For ToolBoost-hosted MCPs, responses can be cached at edge locations.

Configure cache headers:

// For read-only, stable data
return {
content: [{
type: 'text',
text: data
}],
metadata: {
cacheControl: 'public, max-age=3600' // Cache for 1 hour
}
};

Result: 200ms → 20ms for cached edge responses

12. Implement Rate Limiting & Throttling

Prevent resource exhaustion:

import rateLimit from 'express-rate-limit';

const limiter = rateLimit({
windowMs: 60 * 1000, // 1 minute
max: 100, // 100 requests per minute
message: 'Too many requests, please slow down',
handler: (req, res) => {
res.status(429).json({
error: 'Rate limit exceeded',
retryAfter: 60
});
}
});

app.use('/mcp', limiter);

Benefits:

  • Prevents abuse
  • Ensures fair resource distribution
  • Maintains performance for all users

13. Use Read Replicas

For database-heavy MCPs:

// Write to primary
const primaryPool = new pg.Pool({
host: 'primary-db.example.com',
// ...
});

// Read from replicas (distributes load)
const replicaPools = [
new pg.Pool({ host: 'replica1.example.com' }),
new pg.Pool({ host: 'replica2.example.com' }),
new pg.Pool({ host: 'replica3.example.com' })
];

// Round-robin read queries
let currentReplica = 0;
function getReadPool() {
const pool = replicaPools[currentReplica];
currentReplica = (currentReplica + 1) % replicaPools.length;
return pool;
}

server.setRequestHandler(CallToolRequestSchema, async (request) => {
if (request.params.name === 'read_data') {
// Use replica for reads
const pool = getReadPool();
return await pool.query('SELECT ...');
} else if (request.params.name === 'write_data') {
// Use primary for writes
return await primaryPool.query('INSERT ...');
}
});

14. Profile and Optimize Hot Paths

Find bottlenecks with profiling:

import { performance } from 'perf_hooks';

// Add timing instrumentation
server.setRequestHandler(CallToolRequestSchema, async (request) => {
const start = performance.now();

// Break down timing
const t1 = performance.now();
const data = await fetchData();
const fetchTime = performance.now() - t1;

const t2 = performance.now();
const processed = await processData(data);
const processTime = performance.now() - t2;

const t3 = performance.now();
const result = await formatResult(processed);
const formatTime = performance.now() - t3;

const total = performance.now() - start;

console.log({
tool: request.params.name,
total: `${total.toFixed(2)}ms`,
breakdown: {
fetch: `${fetchTime.toFixed(2)}ms`,
process: `${processTime.toFixed(2)}ms`,
format: `${formatTime.toFixed(2)}ms`
}
});

return result;
});

Example output:

{
"tool": "analyze_code",
"total": "1250ms",
"breakdown": {
"fetch": "50ms",
"process": "1150ms", // ← Bottleneck!
"format": "50ms"
}
}

Now you know to optimize processData().

15. Implement Smart Prefetching

Anticipate what the AI will request next:

// When user is retrieved, prefetch their posts
server.setRequestHandler(CallToolRequestSchema, async (request) => {
if (request.params.name === 'get_user') {
const userId = request.params.arguments.id;

// Fetch user
const user = await pool.query('SELECT * FROM users WHERE id = $1', [userId]);

// Prefetch posts in background (don't await)
pool.query('SELECT * FROM posts WHERE user_id = $1', [userId])
.then(posts => cache.set(`posts:${userId}`, posts.rows));

return { content: [{ type: 'text', text: JSON.stringify(user.rows[0]) }] };
}

if (request.params.name === 'get_user_posts') {
const userId = request.params.arguments.userId;

// Check if prefetched
const cached = cache.get(`posts:${userId}`);
if (cached) {
return { content: [{ type: 'text', text: JSON.stringify(cached) }] };
}

// Fetch if not cached
const posts = await pool.query('SELECT * FROM posts WHERE user_id = $1', [userId]);
return { content: [{ type: 'text', text: JSON.stringify(posts.rows) }] };
}
});

Performance Monitoring

Key Metrics to Track

interface PerformanceMetrics {
// Latency
p50ResponseTime: number; // Median
p95ResponseTime: number; // 95th percentile
p99ResponseTime: number; // 99th percentile

// Throughput
requestsPerSecond: number;
concurrentRequests: number;

// Errors
errorRate: number;
timeoutRate: number;

// Resources
cpuUsage: number;
memoryUsage: number;
activeConnections: number;
}

Simple Monitoring

import StatsD from 'hot-shots';

const stats = new StatsD();

server.setRequestHandler(CallToolRequestSchema, async (request) => {
const start = Date.now();

try {
const result = await handleRequest(request);

// Record success
stats.timing('mcp.request.duration', Date.now() - start, {
tool: request.params.name,
status: 'success'
});

return result;
} catch (error) {
// Record error
stats.increment('mcp.request.error', {
tool: request.params.name,
error: error.message
});

throw error;
}
});

ToolBoost Built-in Monitoring

ToolBoost automatically tracks:

  • Response times (p50, p95, p99)
  • Request volume
  • Error rates
  • Cache hit rates

Access via Dashboard → Analytics.

Performance Checklist

Before Deploying

  • Connection pooling implemented
  • Caching strategy defined
  • Database queries have indexes
  • Response sizes limited (pagination)
  • Error handling in place
  • Monitoring configured

After Deploying

  • Response times < 200ms for p95
  • Error rate < 0.1%
  • Cache hit rate > 80% (for cacheable data)
  • CPU usage < 70%
  • Memory stable (no leaks)

Real-World Results

Case Study: E-commerce MCP

Before Optimization:

  • Average response time: 1,200ms
  • p95: 3,500ms
  • Timeouts: 5% of requests
  • User complaints: "AI is too slow"

Optimizations Applied:

  1. Added database connection pooling
  2. Implemented caching for product catalog
  3. Added indexes to frequently queried columns
  4. Implemented pagination
  5. Used parallel requests for related data

After Optimization:

  • Average response time: 180ms (6.7x faster)
  • p95: 320ms (11x faster)
  • Timeouts: 0.01%
  • User feedback: "AI responses are instant!"

Common Performance Mistakes

Mistake 1: N+1 Queries

Bad:

const users = await pool.query('SELECT id FROM users');

for (const user of users.rows) {
const posts = await pool.query('SELECT * FROM posts WHERE user_id = $1', [user.id]);
// Tons of queries!
}

Good:

const users = await pool.query('SELECT id FROM users');
const userIds = users.rows.map(u => u.id);

// One query for all posts
const posts = await pool.query(
'SELECT * FROM posts WHERE user_id = ANY($1)',
[userIds]
);

Mistake 2: Loading Everything into Memory

Bad:

const allUsers = await pool.query('SELECT * FROM users'); // 1 million rows!

Good:

// Stream or paginate
const cursor = pool.query(new Cursor('SELECT * FROM users'));

for await (const batch of cursor) {
processBatch(batch);
}

Mistake 3: Synchronous I/O

Bad:

const data = fs.readFileSync('/large/file.json'); // Blocks event loop

Good:

const data = await fs.promises.readFile('/large/file.json'); // Non-blocking

Mistake 4: No Timeout Handling

Bad:

const response = await fetch(externalAPI); // Hangs forever if API is down

Good:

const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 5000);

try {
const response = await fetch(externalAPI, { signal: controller.signal });
} catch (error) {
if (error.name === 'AbortError') {
throw new Error('Request timed out after 5 seconds');
}
throw error;
} finally {
clearTimeout(timeout);
}

Conclusion

MCP performance optimization is an ongoing process. Start with quick wins, monitor your metrics, and continuously improve.

Key Takeaways:

  • ✅ Use connection pooling
  • ✅ Cache aggressively
  • ✅ Paginate large responses
  • ✅ Optimize database queries
  • ✅ Monitor performance metrics
  • ✅ Profile to find bottlenecks

ToolBoost handles many optimizations for you:

  • Global CDN for low latency
  • Automatic caching
  • Connection pooling
  • Performance monitoring
  • Auto-scaling

Focus on your MCP logic, let ToolBoost handle the infrastructure.


Need help optimizing your MCP? Contact ToolBoost for a performance audit.

Deploy optimized MCPs instantly with ToolBoost - performance built-in.