Serverless Architectures: Building APIs without Servers (2026)
notes
Serverless has been production-ready for years, but the discourse around it still oscillates between “the future of everything” and “just someone else’s server.” The truth, as usual, is boring and specific: serverless is excellent for some API architectures and terrible for others, and the difference comes down to workload characteristics.
This note covers the practical patterns for building serverless APIs in 2026, with honest numbers on cost, latency, and the operational tradeoffs that vendor marketing omits.
The Serverless API Stack
A typical serverless API in 2026 looks like this:
- API Gateway (AWS API Gateway, Azure APIM, or Cloudflare Workers Routes) handles routing, rate limiting, and authentication
- Functions (Lambda, Azure Functions, Cloud Functions, or Workers) contain the business logic
- Database (DynamoDB, PlanetScale, Neon, or Turso) stores the data
- Queue/event bus (SQS, EventBridge) handles async workflows
The developer writes the function code and configuration. The cloud provider manages the servers, scaling, patching, and availability. In theory, you focus on business logic. In practice, you focus on business logic plus a new category of operational concerns: cold starts, concurrency limits, timeout configurations, and cost monitoring.
Cold Starts: The Number That Matters
A cold start happens when the platform creates a new function instance to handle a request. This adds latency — anywhere from 100ms (Node.js on Workers) to 5-10 seconds (Java on Lambda with large dependencies).
The cold start landscape in 2026:
| Runtime | Platform | Typical Cold Start |
|---|---|---|
| Node.js | Lambda | 200-500ms |
| Python | Lambda | 300-700ms |
| Java | Lambda (SnapStart) | 200-400ms |
| Go | Lambda | 100-200ms |
| JavaScript | Cloudflare Workers | <50ms |
| .NET | Azure Functions | 500-1500ms |
Workers avoid traditional cold starts because they use V8 isolates instead of containers. The startup overhead is milliseconds, not seconds. This is a genuine architectural advantage for latency-sensitive APIs.
For Lambda and similar container-based platforms, cold starts hit when functions have not been invoked recently or when traffic spikes require new instances. The mitigation strategies:
Provisioned concurrency — pre-warm a specific number of instances. Eliminates cold starts but adds cost (you pay for idle instances).
Smaller bundles — fewer dependencies mean faster initialization. A Lambda function with three dependencies starts 5-10x faster than one with thirty.
SnapStart (Java/JVM) — caches the initialized state of the function. Reduces Java cold starts from seconds to hundreds of milliseconds.
Cost: Where Serverless Gets Interesting
Serverless pricing is per-invocation plus per-millisecond of compute time. At low to moderate traffic, this is dramatically cheaper than running servers 24/7. At high traffic, the math shifts.
The crossover point varies by workload, but a rough guideline: if your API handles fewer than 10 million requests per month, serverless is almost certainly cheaper. Above 100 million requests per month, a dedicated container service (ECS, Cloud Run) often costs less.
The hidden cost is API Gateway. AWS API Gateway charges $3.50 per million requests. For a high-traffic API, the gateway cost can exceed the function cost. Alternatives like Application Load Balancer with Lambda integration ($0.40/million) or Cloudflare Workers (bundled routing) are significantly cheaper.
Patterns That Work
CRUD APIs with moderate traffic. The classic use case. Each endpoint is a function, the database handles persistence, and the platform handles scaling. Works well, costs little, and requires minimal operational effort.
Event-driven processing. Functions triggered by queue messages, database changes, or scheduled events. The per-invocation cost model aligns perfectly with event-driven workloads — you pay exactly for the processing you do.
Webhook receivers. Third-party services send webhooks that your function processes. Traffic is bursty and unpredictable — exactly the workload profile where serverless auto-scaling shines.
BFF (Backend for Frontend). A thin serverless layer between your frontend and backend services that aggregates, transforms, and caches API responses. The edge computing pattern extends this to run the BFF at the CDN edge.
Patterns That Do Not Work
Long-running processing. Functions have execution time limits (15 minutes on Lambda, 30 seconds on Workers). Anything that takes longer needs a different architecture — Step Functions, background jobs, or containers.
WebSocket connections. Maintaining persistent connections does not map well to the function-per-request model. API Gateway WebSocket support exists but adds complexity. Consider a dedicated service for real-time features.
High-throughput, low-margin workloads. If each request does minimal work and you process billions of them, the per-invocation overhead adds up. A container serving requests from a thread pool is more cost-effective at that scale.
The Vendor Lock-In Question
Serverless APIs are more tightly coupled to their platform than containerized services. Lambda functions use Lambda-specific APIs. DynamoDB access patterns are DynamoDB-specific. EventBridge rules do not port to GCP.
The pragmatic view: some lock-in is acceptable if the productivity gains are real. Write your business logic in portable code (plain functions with dependency injection for platform-specific services) and accept that the glue code is platform-specific. If you ever need to migrate, the business logic ports cleanly and the glue code gets rewritten.
The RFC basics for developers teach you to read standards documents — the same skill helps with reading cloud provider docs to understand the actual guarantees behind serverless platforms.
Getting Started
Start with a single function behind an API Gateway route. Deploy. Test. Measure latency and cost. Then expand. The mistake teams make is designing the entire architecture as serverless upfront — complex orchestration, dozens of functions, event-driven everything — before validating the basic unit economics and operational model.
Serverless is a tool, not a religion. Use it where it fits. Use containers where it does not. The best architecture uses both.