Optimizing MCP Performance: Speed Up Your AI Workflows
You've deployed MCPs and they're working great—but sometimes responses feel slow. In this guide, we'll explore proven techniques to optimize MCP performance and deliver lightning-fast AI experiences.
Understanding MCP Performance
MCP performance depends on multiple factors:
Total Response Time =
Network Latency +
MCP Server Processing +
External API Calls +
Data Transfer +
AI Processing
Let's optimize each component.
Quick Wins (5-Minute Optimizations)
1. Use Connection Pooling
Problem: Creating new database connections for each request is slow.
Before (Slow):
// Creating new connection each time
server.setRequestHandler(CallToolRequestSchema, async (request) => {
const client = await pg.connect({
host: 'db.example.com',
database: 'mydb',
user: 'user',
password: 'password'
});
const result = await client.query('SELECT * FROM users');
await client.end();
return { content: [{ type: 'text', text: JSON.stringify(result.rows) }] };
});
After (Fast):
// Connection pool created once
const pool = new pg.Pool({
host: 'db.example.com',
database: 'mydb',
user: 'user',
password: 'password',
max: 20, // Maximum connections
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
});
server.setRequestHandler(CallToolRequestSchema, async (request) => {
const client = await pool.connect();
try {
const result = await client.query('SELECT * FROM users');
return { content: [{ type: 'text', text: JSON.stringify(result.rows) }] };
} finally {
client.release();
}
});
Result: 200ms → 50ms (4x faster)
2. Cache Expensive Operations
Problem: Fetching the same data repeatedly.
Solution: Add caching
import NodeCache from 'node-cache';
// Cache for 5 minutes
const cache = new NodeCache({ stdTTL: 300 });
server.setRequestHandler(CallToolRequestSchema, async (request) => {
if (request.params.name === 'list_users') {
// Check cache first
const cached = cache.get('users');
if (cached) {
return { content: [{ type: 'text', text: JSON.stringify(cached) }] };
}
// Cache miss - fetch from database
const result = await pool.query('SELECT * FROM users');
// Store in cache
cache.set('users', result.rows);
return { content: [{ type: 'text', text: JSON.stringify(result.rows) }] };
}
});
Smart Cache Invalidation:
server.setRequestHandler(CallToolRequestSchema, async (request) => {
if (request.params.name === 'create_user') {
// Create user
await pool.query('INSERT INTO users ...');
// Invalidate cache
cache.del('users');
}
});
Result: 100ms → 5ms for cached responses (20x faster)
3. Limit Response Size
Problem: Returning too much data.
Before:
// Returns 10,000 users (5MB of JSON)
const result = await pool.query('SELECT * FROM users');
After:
// Add pagination
const { page = 1, limit = 100 } = request.params.arguments;
const offset = (page - 1) * limit;
const result = await pool.query(
'SELECT id, name, email FROM users LIMIT $1 OFFSET $2',
[limit, offset]
);
// Return summary + pagination
return {
content: [{
type: 'text',
text: JSON.stringify({
users: result.rows,
page,
total: result.rowCount,
hasMore: result.rowCount === limit
})
}]
};
Result: 5MB → 50KB response size, 2000ms → 200ms
4. Use Parallel Requests
Problem: Sequential external API calls.
Before (Slow):
// Sequential - 300ms total
const user = await fetch(`/api/users/${id}`);
const posts = await fetch(`/api/posts?user=${id}`);
const comments = await fetch(`/api/comments?user=${id}`);
After (Fast):
// Parallel - 100ms total
const [user, posts, comments] = await Promise.all([
fetch(`/api/users/${id}`),
fetch(`/api/posts?user=${id}`),
fetch(`/api/comments?user=${id}`)
]);
Result: 300ms → 100ms (3x faster)
5. Optimize Database Queries
Add Indexes:
-- Before: Full table scan (2000ms)
SELECT * FROM users WHERE email = 'user@example.com';
-- Add index
CREATE INDEX idx_users_email ON users(email);
-- After: Index scan (5ms)
Result: 2000ms → 5ms (400x faster)
Medium Optimizations (30-Minute Improvements)
6. Implement Request Batching
Problem: Multiple small requests instead of one batch.
Before:
// AI makes 10 separate requests
for (const userId of userIds) {
await callTool('get_user', { id: userId });
}
// Total: 10 × 100ms = 1000ms
After:
// Add batch endpoint
server.setRequestHandler(ListToolsRequestSchema, async () => {
return {
tools: [
// ... other tools
{
name: 'get_users_batch',
description: 'Get multiple users in one request',
inputSchema: {
type: 'object',
properties: {
ids: {
type: 'array',
items: { type: 'string' },
description: 'User IDs to fetch'
}
}
}
}
]
};
});
server.setRequestHandler(CallToolRequestSchema, async (request) => {
if (request.params.name === 'get_users_batch') {
const { ids } = request.params.arguments;
const result = await pool.query(
'SELECT * FROM users WHERE id = ANY($1)',
[ids]
);
return { content: [{ type: 'text', text: JSON.stringify(result.rows) }] };
}
});
Result: 1000ms → 150ms (6.7x faster)
7. Stream Large Responses
Problem: Waiting for entire response before displaying anything.
Solution: Use Server-Sent Events (SSE)
import { SSEServerTransport } from '@modelcontextprotocol/sdk/server/sse.js';
// Enable streaming
const transport = new SSEServerTransport('/mcp', async (request) => {
// Stream chunks as they become available
for await (const chunk of processLargeDataset()) {
yield {
type: 'text',
text: JSON.stringify(chunk)
};
}
});
Benefits:
- User sees results immediately
- Better perceived performance
- Can cancel long-running operations
8. Compress Responses
For HTTP Transport:
import compression from 'compression';
import express from 'express';
const app = express();
// Enable gzip compression
app.use(compression({
threshold: 1024, // Only compress responses > 1KB
level: 6 // Compression level (0-9)
}));
Result: 500KB → 50KB (10x smaller, faster transfer)
9. Lazy Load Resources
Problem: Loading all resources upfront.
Before:
server.setRequestHandler(ListResourcesRequestSchema, async () => {
// Loads ALL files immediately
const files = await readAllProjectFiles();
return {
resources: files.map(f => ({
uri: `file:///${f.path}`,
name: f.name,
mimeType: 'text/plain'
}))
};
});
After:
server.setRequestHandler(ListResourcesRequestSchema, async () => {
// Just list file paths, don't read content
const filePaths = await listFilePathsOnly();
return {
resources: filePaths.map(path => ({
uri: `file:///${path}`,
name: basename(path),
mimeType: getMimeType(path)
}))
};
});
server.setRequestHandler(ReadResourceRequestSchema, async (request) => {
// Only read when specifically requested
const content = await readFile(request.params.uri);
return {
contents: [{
uri: request.params.uri,
mimeType: 'text/plain',
text: content
}]
};
});
Result: 5000ms → 50ms for listing
10. Optimize JSON Serialization
Use faster JSON libraries:
import { stringify } from 'fast-json-stringify';
// Create schema for your data
const stringifyUser = stringify({
type: 'object',
properties: {
id: { type: 'string' },
name: { type: 'string' },
email: { type: 'string' }
}
});
// 2-3x faster than JSON.stringify
const json = stringifyUser(user);
Advanced Optimizations (Multi-Hour Projects)
11. Add a CDN/Edge Caching
For ToolBoost-hosted MCPs, responses can be cached at edge locations.
Configure cache headers:
// For read-only, stable data
return {
content: [{
type: 'text',
text: data
}],
metadata: {
cacheControl: 'public, max-age=3600' // Cache for 1 hour
}
};
Result: 200ms → 20ms for cached edge responses
12. Implement Rate Limiting & Throttling
Prevent resource exhaustion:
import rateLimit from 'express-rate-limit';
const limiter = rateLimit({
windowMs: 60 * 1000, // 1 minute
max: 100, // 100 requests per minute
message: 'Too many requests, please slow down',
handler: (req, res) => {
res.status(429).json({
error: 'Rate limit exceeded',
retryAfter: 60
});
}
});
app.use('/mcp', limiter);
Benefits:
- Prevents abuse
- Ensures fair resource distribution
- Maintains performance for all users
13. Use Read Replicas
For database-heavy MCPs:
// Write to primary
const primaryPool = new pg.Pool({
host: 'primary-db.example.com',
// ...
});
// Read from replicas (distributes load)
const replicaPools = [
new pg.Pool({ host: 'replica1.example.com' }),
new pg.Pool({ host: 'replica2.example.com' }),
new pg.Pool({ host: 'replica3.example.com' })
];
// Round-robin read queries
let currentReplica = 0;
function getReadPool() {
const pool = replicaPools[currentReplica];
currentReplica = (currentReplica + 1) % replicaPools.length;
return pool;
}
server.setRequestHandler(CallToolRequestSchema, async (request) => {
if (request.params.name === 'read_data') {
// Use replica for reads
const pool = getReadPool();
return await pool.query('SELECT ...');
} else if (request.params.name === 'write_data') {
// Use primary for writes
return await primaryPool.query('INSERT ...');
}
});
14. Profile and Optimize Hot Paths
Find bottlenecks with profiling:
import { performance } from 'perf_hooks';
// Add timing instrumentation
server.setRequestHandler(CallToolRequestSchema, async (request) => {
const start = performance.now();
// Break down timing
const t1 = performance.now();
const data = await fetchData();
const fetchTime = performance.now() - t1;
const t2 = performance.now();
const processed = await processData(data);
const processTime = performance.now() - t2;
const t3 = performance.now();
const result = await formatResult(processed);
const formatTime = performance.now() - t3;
const total = performance.now() - start;
console.log({
tool: request.params.name,
total: `${total.toFixed(2)}ms`,
breakdown: {
fetch: `${fetchTime.toFixed(2)}ms`,
process: `${processTime.toFixed(2)}ms`,
format: `${formatTime.toFixed(2)}ms`
}
});
return result;
});
Example output:
{
"tool": "analyze_code",
"total": "1250ms",
"breakdown": {
"fetch": "50ms",
"process": "1150ms", // ← Bottleneck!
"format": "50ms"
}
}
Now you know to optimize processData().
15. Implement Smart Prefetching
Anticipate what the AI will request next:
// When user is retrieved, prefetch their posts
server.setRequestHandler(CallToolRequestSchema, async (request) => {
if (request.params.name === 'get_user') {
const userId = request.params.arguments.id;
// Fetch user
const user = await pool.query('SELECT * FROM users WHERE id = $1', [userId]);
// Prefetch posts in background (don't await)
pool.query('SELECT * FROM posts WHERE user_id = $1', [userId])
.then(posts => cache.set(`posts:${userId}`, posts.rows));
return { content: [{ type: 'text', text: JSON.stringify(user.rows[0]) }] };
}
if (request.params.name === 'get_user_posts') {
const userId = request.params.arguments.userId;
// Check if prefetched
const cached = cache.get(`posts:${userId}`);
if (cached) {
return { content: [{ type: 'text', text: JSON.stringify(cached) }] };
}
// Fetch if not cached
const posts = await pool.query('SELECT * FROM posts WHERE user_id = $1', [userId]);
return { content: [{ type: 'text', text: JSON.stringify(posts.rows) }] };
}
});
Performance Monitoring
Key Metrics to Track
interface PerformanceMetrics {
// Latency
p50ResponseTime: number; // Median
p95ResponseTime: number; // 95th percentile
p99ResponseTime: number; // 99th percentile
// Throughput
requestsPerSecond: number;
concurrentRequests: number;
// Errors
errorRate: number;
timeoutRate: number;
// Resources
cpuUsage: number;
memoryUsage: number;
activeConnections: number;
}
Simple Monitoring
import StatsD from 'hot-shots';
const stats = new StatsD();
server.setRequestHandler(CallToolRequestSchema, async (request) => {
const start = Date.now();
try {
const result = await handleRequest(request);
// Record success
stats.timing('mcp.request.duration', Date.now() - start, {
tool: request.params.name,
status: 'success'
});
return result;
} catch (error) {
// Record error
stats.increment('mcp.request.error', {
tool: request.params.name,
error: error.message
});
throw error;
}
});
ToolBoost Built-in Monitoring
ToolBoost automatically tracks:
- Response times (p50, p95, p99)
- Request volume
- Error rates
- Cache hit rates
Access via Dashboard → Analytics.
Performance Checklist
Before Deploying
- Connection pooling implemented
- Caching strategy defined
- Database queries have indexes
- Response sizes limited (pagination)
- Error handling in place
- Monitoring configured
After Deploying
- Response times < 200ms for p95
- Error rate < 0.1%
- Cache hit rate > 80% (for cacheable data)
- CPU usage < 70%
- Memory stable (no leaks)
Real-World Results
Case Study: E-commerce MCP
Before Optimization:
- Average response time: 1,200ms
- p95: 3,500ms
- Timeouts: 5% of requests
- User complaints: "AI is too slow"
Optimizations Applied:
- Added database connection pooling
- Implemented caching for product catalog
- Added indexes to frequently queried columns
- Implemented pagination
- Used parallel requests for related data
After Optimization:
- Average response time: 180ms (6.7x faster)
- p95: 320ms (11x faster)
- Timeouts: 0.01%
- User feedback: "AI responses are instant!"
Common Performance Mistakes
Mistake 1: N+1 Queries
Bad:
const users = await pool.query('SELECT id FROM users');
for (const user of users.rows) {
const posts = await pool.query('SELECT * FROM posts WHERE user_id = $1', [user.id]);
// Tons of queries!
}
Good:
const users = await pool.query('SELECT id FROM users');
const userIds = users.rows.map(u => u.id);
// One query for all posts
const posts = await pool.query(
'SELECT * FROM posts WHERE user_id = ANY($1)',
[userIds]
);
Mistake 2: Loading Everything into Memory
Bad:
const allUsers = await pool.query('SELECT * FROM users'); // 1 million rows!
Good:
// Stream or paginate
const cursor = pool.query(new Cursor('SELECT * FROM users'));
for await (const batch of cursor) {
processBatch(batch);
}
Mistake 3: Synchronous I/O
Bad:
const data = fs.readFileSync('/large/file.json'); // Blocks event loop
Good:
const data = await fs.promises.readFile('/large/file.json'); // Non-blocking
Mistake 4: No Timeout Handling
Bad:
const response = await fetch(externalAPI); // Hangs forever if API is down
Good:
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 5000);
try {
const response = await fetch(externalAPI, { signal: controller.signal });
} catch (error) {
if (error.name === 'AbortError') {
throw new Error('Request timed out after 5 seconds');
}
throw error;
} finally {
clearTimeout(timeout);
}
Conclusion
MCP performance optimization is an ongoing process. Start with quick wins, monitor your metrics, and continuously improve.
Key Takeaways:
- ✅ Use connection pooling
- ✅ Cache aggressively
- ✅ Paginate large responses
- ✅ Optimize database queries
- ✅ Monitor performance metrics
- ✅ Profile to find bottlenecks
ToolBoost handles many optimizations for you:
- Global CDN for low latency
- Automatic caching
- Connection pooling
- Performance monitoring
- Auto-scaling
Focus on your MCP logic, let ToolBoost handle the infrastructure.
Need help optimizing your MCP? Contact ToolBoost for a performance audit.
Deploy optimized MCPs instantly with ToolBoost - performance built-in.