Optimizing MCP Performance: Speed Up Your AI Workflows

February 14, 2025 · 9 min read

ToolBoost Engineering Team

You've deployed MCPs and they're working great—but sometimes responses feel slow. In this guide, we'll explore proven techniques to optimize MCP performance and deliver lightning-fast AI experiences.

Understanding MCP Performance

MCP performance depends on multiple factors:

Total Response Time =
  Network Latency +
  MCP Server Processing +
  External API Calls +
  Data Transfer +
  AI Processing

Let's optimize each component.

Quick Wins (5-Minute Optimizations)

1. Use Connection Pooling

Problem: Creating new database connections for each request is slow.

Before (Slow):

// Creating new connection each time
server.setRequestHandler(CallToolRequestSchema, async (request) => {
  const client = await pg.connect({
    host: 'db.example.com',
    database: 'mydb',
    user: 'user',
    password: 'password'
  });

  const result = await client.query('SELECT * FROM users');
  await client.end();

  return { content: [{ type: 'text', text: JSON.stringify(result.rows) }] };
});

After (Fast):

// Connection pool created once
const pool = new pg.Pool({
  host: 'db.example.com',
  database: 'mydb',
  user: 'user',
  password: 'password',
  max: 20, // Maximum connections
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 2000,
});

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  const client = await pool.connect();

  try {
    const result = await client.query('SELECT * FROM users');
    return { content: [{ type: 'text', text: JSON.stringify(result.rows) }] };
  } finally {
    client.release();
  }
});

Result: 200ms → 50ms (4x faster)

2. Cache Expensive Operations

Problem: Fetching the same data repeatedly.

Solution: Add caching

import NodeCache from 'node-cache';

// Cache for 5 minutes
const cache = new NodeCache({ stdTTL: 300 });

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  if (request.params.name === 'list_users') {
    // Check cache first
    const cached = cache.get('users');
    if (cached) {
      return { content: [{ type: 'text', text: JSON.stringify(cached) }] };
    }

    // Cache miss - fetch from database
    const result = await pool.query('SELECT * FROM users');

    // Store in cache
    cache.set('users', result.rows);

    return { content: [{ type: 'text', text: JSON.stringify(result.rows) }] };
  }
});

Smart Cache Invalidation:

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  if (request.params.name === 'create_user') {
    // Create user
    await pool.query('INSERT INTO users ...');

    // Invalidate cache
    cache.del('users');
  }
});

Result: 100ms → 5ms for cached responses (20x faster)

3. Limit Response Size

Problem: Returning too much data.

Before:

// Returns 10,000 users (5MB of JSON)
const result = await pool.query('SELECT * FROM users');

After:

// Add pagination
const { page = 1, limit = 100 } = request.params.arguments;
const offset = (page - 1) * limit;

const result = await pool.query(
  'SELECT id, name, email FROM users LIMIT $1 OFFSET $2',
  [limit, offset]
);

// Return summary + pagination
return {
  content: [{
    type: 'text',
    text: JSON.stringify({
      users: result.rows,
      page,
      total: result.rowCount,
      hasMore: result.rowCount === limit
    })
  }]
};

Result: 5MB → 50KB response size, 2000ms → 200ms

4. Use Parallel Requests

Problem: Sequential external API calls.

Before (Slow):

// Sequential - 300ms total
const user = await fetch(`/api/users/${id}`);
const posts = await fetch(`/api/posts?user=${id}`);
const comments = await fetch(`/api/comments?user=${id}`);

After (Fast):

// Parallel - 100ms total
const [user, posts, comments] = await Promise.all([
  fetch(`/api/users/${id}`),
  fetch(`/api/posts?user=${id}`),
  fetch(`/api/comments?user=${id}`)
]);

Result: 300ms → 100ms (3x faster)

5. Optimize Database Queries

Add Indexes:

-- Before: Full table scan (2000ms)
SELECT * FROM users WHERE email = 'user@example.com';

-- Add index
CREATE INDEX idx_users_email ON users(email);

-- After: Index scan (5ms)

Result: 2000ms → 5ms (400x faster)

Medium Optimizations (30-Minute Improvements)

6. Implement Request Batching

Problem: Multiple small requests instead of one batch.

Before:

// AI makes 10 separate requests
for (const userId of userIds) {
  await callTool('get_user', { id: userId });
}
// Total: 10 × 100ms = 1000ms

After:

// Add batch endpoint
server.setRequestHandler(ListToolsRequestSchema, async () => {
  return {
    tools: [
      // ... other tools
      {
        name: 'get_users_batch',
        description: 'Get multiple users in one request',
        inputSchema: {
          type: 'object',
          properties: {
            ids: {
              type: 'array',
              items: { type: 'string' },
              description: 'User IDs to fetch'
            }
          }
        }
      }
    ]
  };
});

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  if (request.params.name === 'get_users_batch') {
    const { ids } = request.params.arguments;

    const result = await pool.query(
      'SELECT * FROM users WHERE id = ANY($1)',
      [ids]
    );

    return { content: [{ type: 'text', text: JSON.stringify(result.rows) }] };
  }
});

Result: 1000ms → 150ms (6.7x faster)

7. Stream Large Responses

Problem: Waiting for entire response before displaying anything.

Solution: Use Server-Sent Events (SSE)

import { SSEServerTransport } from '@modelcontextprotocol/sdk/server/sse.js';

// Enable streaming
const transport = new SSEServerTransport('/mcp', async (request) => {
  // Stream chunks as they become available
  for await (const chunk of processLargeDataset()) {
    yield {
      type: 'text',
      text: JSON.stringify(chunk)
    };
  }
});

Benefits:

User sees results immediately
Better perceived performance
Can cancel long-running operations

8. Compress Responses

For HTTP Transport:

import compression from 'compression';
import express from 'express';

const app = express();

// Enable gzip compression
app.use(compression({
  threshold: 1024, // Only compress responses > 1KB
  level: 6 // Compression level (0-9)
}));

Result: 500KB → 50KB (10x smaller, faster transfer)

9. Lazy Load Resources

Problem: Loading all resources upfront.

Before:

server.setRequestHandler(ListResourcesRequestSchema, async () => {
  // Loads ALL files immediately
  const files = await readAllProjectFiles();

  return {
    resources: files.map(f => ({
      uri: `file:///${f.path}`,
      name: f.name,
      mimeType: 'text/plain'
    }))
  };
});

After:

server.setRequestHandler(ListResourcesRequestSchema, async () => {
  // Just list file paths, don't read content
  const filePaths = await listFilePathsOnly();

  return {
    resources: filePaths.map(path => ({
      uri: `file:///${path}`,
      name: basename(path),
      mimeType: getMimeType(path)
    }))
  };
});

server.setRequestHandler(ReadResourceRequestSchema, async (request) => {
  // Only read when specifically requested
  const content = await readFile(request.params.uri);

  return {
    contents: [{
      uri: request.params.uri,
      mimeType: 'text/plain',
      text: content
    }]
  };
});

Result: 5000ms → 50ms for listing

10. Optimize JSON Serialization

Use faster JSON libraries:

import { stringify } from 'fast-json-stringify';

// Create schema for your data
const stringifyUser = stringify({
  type: 'object',
  properties: {
    id: { type: 'string' },
    name: { type: 'string' },
    email: { type: 'string' }
  }
});

// 2-3x faster than JSON.stringify
const json = stringifyUser(user);

Advanced Optimizations (Multi-Hour Projects)

11. Add a CDN/Edge Caching

For ToolBoost-hosted MCPs, responses can be cached at edge locations.

Configure cache headers:

// For read-only, stable data
return {
  content: [{
    type: 'text',
    text: data
  }],
  metadata: {
    cacheControl: 'public, max-age=3600' // Cache for 1 hour
  }
};

Result: 200ms → 20ms for cached edge responses

12. Implement Rate Limiting & Throttling

Prevent resource exhaustion:

import rateLimit from 'express-rate-limit';

const limiter = rateLimit({
  windowMs: 60 * 1000, // 1 minute
  max: 100, // 100 requests per minute
  message: 'Too many requests, please slow down',
  handler: (req, res) => {
    res.status(429).json({
      error: 'Rate limit exceeded',
      retryAfter: 60
    });
  }
});

app.use('/mcp', limiter);

Benefits:

Prevents abuse
Ensures fair resource distribution
Maintains performance for all users

13. Use Read Replicas

For database-heavy MCPs:

// Write to primary
const primaryPool = new pg.Pool({
  host: 'primary-db.example.com',
  // ...
});

// Read from replicas (distributes load)
const replicaPools = [
  new pg.Pool({ host: 'replica1.example.com' }),
  new pg.Pool({ host: 'replica2.example.com' }),
  new pg.Pool({ host: 'replica3.example.com' })
];

// Round-robin read queries
let currentReplica = 0;
function getReadPool() {
  const pool = replicaPools[currentReplica];
  currentReplica = (currentReplica + 1) % replicaPools.length;
  return pool;
}

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  if (request.params.name === 'read_data') {
    // Use replica for reads
    const pool = getReadPool();
    return await pool.query('SELECT ...');
  } else if (request.params.name === 'write_data') {
    // Use primary for writes
    return await primaryPool.query('INSERT ...');
  }
});

14. Profile and Optimize Hot Paths

Find bottlenecks with profiling:

import { performance } from 'perf_hooks';

// Add timing instrumentation
server.setRequestHandler(CallToolRequestSchema, async (request) => {
  const start = performance.now();

  // Break down timing
  const t1 = performance.now();
  const data = await fetchData();
  const fetchTime = performance.now() - t1;

  const t2 = performance.now();
  const processed = await processData(data);
  const processTime = performance.now() - t2;

  const t3 = performance.now();
  const result = await formatResult(processed);
  const formatTime = performance.now() - t3;

  const total = performance.now() - start;

  console.log({
    tool: request.params.name,
    total: `${total.toFixed(2)}ms`,
    breakdown: {
      fetch: `${fetchTime.toFixed(2)}ms`,
      process: `${processTime.toFixed(2)}ms`,
      format: `${formatTime.toFixed(2)}ms`
    }
  });

  return result;
});

Example output:

{
  "tool": "analyze_code",
  "total": "1250ms",
  "breakdown": {
    "fetch": "50ms",
    "process": "1150ms",  // ← Bottleneck!
    "format": "50ms"
  }
}

Now you know to optimize processData().

15. Implement Smart Prefetching

Anticipate what the AI will request next:

// When user is retrieved, prefetch their posts
server.setRequestHandler(CallToolRequestSchema, async (request) => {
  if (request.params.name === 'get_user') {
    const userId = request.params.arguments.id;

    // Fetch user
    const user = await pool.query('SELECT * FROM users WHERE id = $1', [userId]);

    // Prefetch posts in background (don't await)
    pool.query('SELECT * FROM posts WHERE user_id = $1', [userId])
      .then(posts => cache.set(`posts:${userId}`, posts.rows));

    return { content: [{ type: 'text', text: JSON.stringify(user.rows[0]) }] };
  }

  if (request.params.name === 'get_user_posts') {
    const userId = request.params.arguments.userId;

    // Check if prefetched
    const cached = cache.get(`posts:${userId}`);
    if (cached) {
      return { content: [{ type: 'text', text: JSON.stringify(cached) }] };
    }

    // Fetch if not cached
    const posts = await pool.query('SELECT * FROM posts WHERE user_id = $1', [userId]);
    return { content: [{ type: 'text', text: JSON.stringify(posts.rows) }] };
  }
});

Performance Monitoring

Key Metrics to Track

interface PerformanceMetrics {
  // Latency
  p50ResponseTime: number;  // Median
  p95ResponseTime: number;  // 95th percentile
  p99ResponseTime: number;  // 99th percentile

  // Throughput
  requestsPerSecond: number;
  concurrentRequests: number;

  // Errors
  errorRate: number;
  timeoutRate: number;

  // Resources
  cpuUsage: number;
  memoryUsage: number;
  activeConnections: number;
}

Simple Monitoring

import StatsD from 'hot-shots';

const stats = new StatsD();

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  const start = Date.now();

  try {
    const result = await handleRequest(request);

    // Record success
    stats.timing('mcp.request.duration', Date.now() - start, {
      tool: request.params.name,
      status: 'success'
    });

    return result;
  } catch (error) {
    // Record error
    stats.increment('mcp.request.error', {
      tool: request.params.name,
      error: error.message
    });

    throw error;
  }
});

ToolBoost Built-in Monitoring

ToolBoost automatically tracks:

Response times (p50, p95, p99)
Request volume
Error rates
Cache hit rates

Access via Dashboard → Analytics.

Performance Checklist

Before Deploying

After Deploying

Real-World Results

Case Study: E-commerce MCP

Before Optimization:

Average response time: 1,200ms
p95: 3,500ms
Timeouts: 5% of requests
User complaints: "AI is too slow"

Optimizations Applied:

Added database connection pooling
Implemented caching for product catalog
Added indexes to frequently queried columns
Implemented pagination
Used parallel requests for related data

After Optimization:

Average response time: 180ms (6.7x faster)
p95: 320ms (11x faster)
Timeouts: 0.01%
User feedback: "AI responses are instant!"

Common Performance Mistakes

Mistake 1: N+1 Queries

Bad:

const users = await pool.query('SELECT id FROM users');

for (const user of users.rows) {
  const posts = await pool.query('SELECT * FROM posts WHERE user_id = $1', [user.id]);
  // Tons of queries!
}

Good:

const users = await pool.query('SELECT id FROM users');
const userIds = users.rows.map(u => u.id);

// One query for all posts
const posts = await pool.query(
  'SELECT * FROM posts WHERE user_id = ANY($1)',
  [userIds]
);

Mistake 2: Loading Everything into Memory

Bad:

const allUsers = await pool.query('SELECT * FROM users'); // 1 million rows!

Good:

// Stream or paginate
const cursor = pool.query(new Cursor('SELECT * FROM users'));

for await (const batch of cursor) {
  processBatch(batch);
}

Mistake 3: Synchronous I/O

Bad:

const data = fs.readFileSync('/large/file.json'); // Blocks event loop

Good:

const data = await fs.promises.readFile('/large/file.json'); // Non-blocking

Mistake 4: No Timeout Handling

Bad:

const response = await fetch(externalAPI); // Hangs forever if API is down

Good:

const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 5000);

try {
  const response = await fetch(externalAPI, { signal: controller.signal });
} catch (error) {
  if (error.name === 'AbortError') {
    throw new Error('Request timed out after 5 seconds');
  }
  throw error;
} finally {
  clearTimeout(timeout);
}

Conclusion

MCP performance optimization is an ongoing process. Start with quick wins, monitor your metrics, and continuously improve.

Key Takeaways:

✅ Use connection pooling
✅ Cache aggressively
✅ Paginate large responses
✅ Optimize database queries
✅ Monitor performance metrics
✅ Profile to find bottlenecks

ToolBoost handles many optimizations for you:

Global CDN for low latency
Automatic caching
Connection pooling
Performance monitoring
Auto-scaling

Focus on your MCP logic, let ToolBoost handle the infrastructure.

Need help optimizing your MCP? Contact ToolBoost for a performance audit.

Deploy optimized MCPs instantly with ToolBoost - performance built-in.

Understanding MCP Performance​

Quick Wins (5-Minute Optimizations)​

1. Use Connection Pooling​

2. Cache Expensive Operations​

3. Limit Response Size​

4. Use Parallel Requests​

5. Optimize Database Queries​

Medium Optimizations (30-Minute Improvements)​

6. Implement Request Batching​

7. Stream Large Responses​

8. Compress Responses​

9. Lazy Load Resources​

10. Optimize JSON Serialization​

Advanced Optimizations (Multi-Hour Projects)​

11. Add a CDN/Edge Caching​

12. Implement Rate Limiting & Throttling​

13. Use Read Replicas​

14. Profile and Optimize Hot Paths​

15. Implement Smart Prefetching​

Performance Monitoring​

Key Metrics to Track​

Simple Monitoring​

ToolBoost Built-in Monitoring​

Performance Checklist​

Before Deploying​

After Deploying​

Real-World Results​

Case Study: E-commerce MCP​

Common Performance Mistakes​

Mistake 1: N+1 Queries​

Mistake 2: Loading Everything into Memory​

Mistake 3: Synchronous I/O​

Mistake 4: No Timeout Handling​

Conclusion​

Understanding MCP Performance

Quick Wins (5-Minute Optimizations)

1. Use Connection Pooling

2. Cache Expensive Operations

3. Limit Response Size

4. Use Parallel Requests

5. Optimize Database Queries

Medium Optimizations (30-Minute Improvements)

6. Implement Request Batching

7. Stream Large Responses

8. Compress Responses

9. Lazy Load Resources

10. Optimize JSON Serialization

Advanced Optimizations (Multi-Hour Projects)

11. Add a CDN/Edge Caching

12. Implement Rate Limiting & Throttling

13. Use Read Replicas

14. Profile and Optimize Hot Paths

15. Implement Smart Prefetching

Performance Monitoring

Key Metrics to Track

Simple Monitoring

ToolBoost Built-in Monitoring

Performance Checklist

Before Deploying

After Deploying

Real-World Results

Case Study: E-commerce MCP

Common Performance Mistakes

Mistake 1: N+1 Queries

Mistake 2: Loading Everything into Memory

Mistake 3: Synchronous I/O

Mistake 4: No Timeout Handling

Conclusion