REST, GraphQL, gRPC: who pays the cost of change

April 18, 2022

API style is one of those technology choices that looks like a performance decision on the whiteboard and turns out, six months in, to have been a governance decision. Every style puts someone on the hook for change: the server, the client, the schema owner, the consumer that forgot to regenerate its stubs. The interesting question is not “which one is fastest” — the answers are close enough for most workloads — but who pays the cost of change, and whether the shape of the cost matches the shape of your organization.

Three styles dominate the current landscape: REST, GraphQL, and gRPC. A fourth — the durable event stream — is increasingly an API surface too, even though nobody quite calls it one. They differ on every axis that matters: coupling, evolvability, tooling, performance, and who owns the schema. What follows is a map.

REST

REST has been the default for over a decade, and it is still the right default for a lot of applications. The core ideas are old enough that they have the weight of convention: resources identified by URLs, state manipulated with a small set of verbs (GET/POST/PUT/PATCH/DELETE), responses cached and intermediated by standard HTTP infrastructure, media types negotiated between client and server.

What REST gets right, mostly by accident of timing:

Uniform tooling. Every language has HTTP clients. Every gateway, proxy, CDN, WAF, and observability tool understands HTTP. Every engineer can curl an endpoint and reason about it. This is an enormous advantage that is easy to undervalue until you pick something that does not have it.
Caching. HTTP’s caching story — Cache-Control, ETag, conditional requests, intermediate caches — is mature and works. Nothing else comes close, which matters more than people remember until they need to scale reads.
Incremental evolvability. A well-designed REST API can add fields, endpoints, and even new resource types without breaking existing clients, because clients ignore fields they do not understand. This is a convention, not a guarantee, and it works only when clients are written to ignore unknowns.

What REST gets wrong, also mostly by convention:

Nobody actually does HATEOAS. Roy Fielding’s original formulation of REST required hypermedia — the server tells the client what transitions are possible by embedding links in responses — and in practice almost no one does this. Most “REST” APIs are just RPC over HTTP with resource-shaped URLs. That is fine, but it means you are not getting the evolvability Fielding promised; you are getting the HTTP tooling and nothing more.
Chattiness. A screen that needs data from five resources makes five requests, or requires the server to invent a composite endpoint for that screen. Over time the API grows endpoints shaped like specific clients’ needs (GET /dashboard, GET /user-with-orders), which is fine but is the thing GraphQL exists to avoid.
Under- and over-fetching. Clients usually want a subset of the fields the server returns. Mobile clients especially. REST’s answer is either to return everything (over-fetch), to add ?fields=a,b,c parameters (a hand-rolled query language), or to build a client-specific endpoint (BFF pattern). None is wrong; all are compromises the style does not solve for you.
Schema is optional. Without OpenAPI, there is no machine-readable contract. With OpenAPI, there is — but many teams treat it as documentation rather than source-of-truth, and client/server drift.

Pick REST when the API is consumed by a variety of unknown clients and intermediaries (public API, webhook surface, partner integrations), when caching is important, when the operations genuinely fit a resource model, and when HTTP tooling is more valuable than tight contracts. It is still the right answer for most public APIs.

GraphQL

GraphQL’s central idea is to invert who picks the shape of the response. In REST, the server decides what a resource looks like and clients get what they get. In GraphQL, the client sends a query describing exactly what it needs — which fields of which types with which relationships — and the server returns exactly that shape, no more, no less. The schema is a published, typed contract; the query is a tree against that schema.

What this buys:

No more under- or over-fetching. A mobile client asks for the four fields it actually renders. A desktop client on the same API asks for sixteen. The server responds to each with exactly what they asked for.
Fewer round trips. Nested queries that would have been several REST calls become one GraphQL operation. Dashboards and complex detail screens especially benefit.
A typed schema as the contract. Every field has a declared type. Tooling — code generation, IDE autocomplete, schema-aware linters — follows naturally. Breaking changes are detectable at the schema level.
Client autonomy. A new screen that needs data the schema already exposes does not require a server change. This is the single biggest payoff for teams where the client and server are owned by different groups or different release cadences.

What GraphQL costs, and it is a lot:

The N+1 problem is structural. A naive resolver for users { name posts { title } } fetches users, then fetches posts per user, then maybe fetches comments per post. Solving this requires dataloaders — request-scoped batching — on almost every resolver. Building a GraphQL server without dataloader discipline is how you DoS your own database.
Query complexity is a security surface. A client can write a query that expands into a billion resolver calls if the server does not limit depth, complexity, and query shape. Query cost analysis, persisted queries, and allowlists are not optional features; they are how a GraphQL server stays up.
Caching is harder. Because every query is different and POSTs to a single endpoint, HTTP caching does not work the way it does in REST. You end up relying on client-side normalized caches (Apollo, Relay) and server-side dataloader caches. It works, but the default cost of a request is higher and the infrastructure is specialized.
The schema is an organizational artifact. Somebody owns it; if nobody does, it fragments. Federation (Apollo Federation, schema stitching) exists to handle multi-team schemas, but it is its own architecture with its own failure modes.

Pick GraphQL when you have many clients with heterogeneous needs over the same graph-shaped data (this is the “Facebook mobile” case it was invented for), when client teams need to move faster than server teams, and when you are prepared to own the operational complexity. Avoid it when the API has one client and a simple resource model (you are paying for flexibility you will not use), when you need aggressive HTTP caching, or when nobody on the team has run a GraphQL server at production scale before.

gRPC

gRPC is contract-first RPC over HTTP/2, with protobuf as the schema and binary wire format. A .proto file defines the services, methods, and message types. Code generation produces client and server stubs in every supported language. A call is a typed method invocation — not a URL, not a query, not a resource. Streaming is first-class: unary, server streaming, client streaming, and bidirectional.

What gRPC gets right:

A binary, schema-validated, forward-and-backward-compatible format. Protobuf’s wire format is compact, fast to parse, and evolves cleanly — field numbers are the identity, so renaming a field is free and adding/removing optional fields does not break older clients or servers.
Strict contracts by construction. You cannot send a message that does not conform to the schema, because you generated the client from the schema. Drift between client and server is not a runtime surprise; it is a build error.
Performance. Binary framing, header compression, multiplexed streams on HTTP/2, persistent connections — gRPC is meaningfully faster than REST for high-frequency service-to-service traffic. Often not enough to matter for low-volume APIs; very much enough to matter for the hot paths of a microservice estate.
Streaming as a first-class primitive. Server-push, client-push, and bidirectional streams are native to the protocol, not bolted on. For real-time systems, telemetry pipelines, and live subscriptions, this is a major advantage.

What gRPC costs:

Not browser-friendly. Native gRPC does not run in browsers without a translation layer (gRPC-Web or a gateway). If your API needs to be consumed from browsers directly, you are back to translating to REST or GraphQL at the edge.
Tooling is narrower. curl does not speak gRPC (grpcurl does, but it is a separate tool). Every HTTP proxy, CDN, WAF, and gateway had to be updated to handle HTTP/2 and gRPC framing; support is broad now, but you will still hit tooling that assumes REST.
Debugging is harder. Binary wire format means you cannot read the traffic with your eyes. You need reflection enabled or schemas at hand. This is a small cost in aggregate and a large cost in the specific moment you need to figure out why a service is returning INTERNAL.
Service discovery and load balancing are your problem. gRPC connections are long-lived and multiplexed, which means L4 load balancing (round-robining connections) does not distribute requests evenly. You need L7 load balancing (service mesh, client-side LB, or a gRPC-aware proxy), and that is another piece of infrastructure.

Pick gRPC for internal service-to-service traffic, especially where latency and throughput matter, where teams share a build system and can regenerate stubs, and where the streaming primitives fit the problem. Avoid it for public APIs (your consumers do not want to learn protobuf), for browser-direct APIs (without a translation layer), and for small-scale CRUD where the contract overhead outweighs the benefit.

The event stream as an API

The fourth style, which almost nobody calls an API even though it is one, is the durable event stream. Covered in the event-driven post; worth repeating in this frame.

A Kafka topic (or equivalent) with a published schema is an API surface. Consumers subscribe to it, process events on their own schedule, and build their own projections. The producer does not know who the consumers are; the consumers do not coordinate with each other. The “API” is the event schema plus the topic’s contract about ordering and retention.

The axis this wins on is decoupling over time. Consumers can be added, removed, or changed without the producer being told. A new consumer can replay from the beginning of retention and catch up. The producer commits to producing; the consumers commit to consuming; the log mediates.

The axis it loses on is interactivity. A request-response API answers questions. An event stream announces changes. Forcing queries through an event stream (“publish BalanceRequested, await BalanceResponded”) is a bad idea for the same reasons RPC over a message bus is a bad idea. Different shapes fit different needs.

Most serious systems end up with both: a request-response API (REST/GraphQL/gRPC) for synchronous operations, and an event stream for propagating state changes to whoever cares. The event schema is as much of a contract as the HTTP schema, and it evolves under real constraints — old events persist, old consumers exist, and schema registries become infrastructure someone has to own.

The axes, compared

Putting the four styles side by side on the axes that matter most:

Coupling between producer and consumer. REST: medium (URL shapes, media types). GraphQL: low (the client picks). gRPC: high (shared schema, tight contract). Events: lowest (the producer does not know consumers exist), with the caveat that the schema is still a contract.
Who pays the cost of change. REST: convention pushes cost onto the server (add fields, keep old ones). GraphQL: server absorbs it by maintaining the graph, clients pick what they need. gRPC: protobuf’s compatibility rules spread it (additive changes are cheap; breaking changes require version bumps). Events: the producer bears it — the log holds old schemas forever, and breaking the schema is breaking every downstream.
Caching. REST: excellent (HTTP native). GraphQL: poor by default (single endpoint, POST-only), specialized tooling required. gRPC: application-level only. Events: not applicable; the log is the cache.
Tooling breadth. REST: the most. gRPC: narrower but mature. GraphQL: mature in its own ecosystem, narrow outside it. Events: mature for big players (Kafka ecosystem), narrower elsewhere.
Streaming. gRPC: native. GraphQL: via subscriptions, usually over WebSockets, a separate transport. REST: SSE and long-polling, awkward. Events: native, but a different shape.
Fit for public APIs. REST is still the answer. GraphQL is a distant second for graph-shaped data. gRPC is rare. Events are not typically public (webhooks are the adjacent pattern — a degenerate form of event delivery with HTTP).
Fit for internal APIs. gRPC wins on hot paths, REST wins on mixed-client, GraphQL wins on heterogeneous clients over a graph, events win on state propagation.

None of these dimensions has a universally right answer. A microservice architecture with any real size typically uses three of the four: gRPC between services on the hot path, REST at the edge for external clients, events for state propagation. GraphQL appears when client diversity justifies it — a mobile app, a web app, and a partner portal reading from the same service estate with different field needs.

The decision in one paragraph

Ask two questions first. Who are the consumers? and how fast does the schema change?

If consumers are unknown and many (public API, partners, third-party developers), pick REST. The ecosystem is the product.

If consumers are a small number of known clients with diverse, evolving needs over the same graph-shaped data, and the client team moves faster than the server team, pick GraphQL. You are buying client autonomy at the cost of operational complexity.

If consumers are other services in the same organization, sharing a build system, and the traffic is hot, pick gRPC. You are buying contract strictness and performance at the cost of tooling breadth.

If the information is “here is a thing that happened” and the producer does not need to know who cares, pick events. You are buying decoupling over time at the cost of eventual consistency.

Most real systems end up with several of these. That is fine. What matters is that each API surface was chosen on purpose, not inherited by accident from the last project the tech lead worked on. The style is not a preference. It is a bet about who changes faster, who owns the schema, and how the cost of change flows between them. Place the bet deliberately.