API security: most of it is not authentication

August 14, 2022

The authn/authz post covers identity and permission. This one is about everything else — the surface that stays hostile after a caller is authenticated and authorized. A logged-in, properly scoped user can still send data that breaks the server, reference objects that don’t belong to them, flood the system into a denial-of-service, or trick the server into making requests on their behalf. None of that is an authentication problem; all of it is an API security problem.

The useful framing comes from the OWASP API Security Top 10, which is worth reading in full and is more actionable than the older generic web OWASP list. The API version foregrounds the vulnerabilities that actually break production APIs in 2026 — Broken Object Level Authorization (BOLA), Broken Object Property Level Authorization, unrestricted resource consumption, mass assignment, SSRF, unsafe consumption of third-party APIs. Most of them are not subtle cryptographic failures; they are the API trusting the client to be well-behaved.

The organizing principle across all of them is the same: the client is hostile. Every byte that crosses the boundary is potentially attacker-controlled. Any assumption the server makes about the shape, content, meaning, or authorization of that data is a potential vulnerability. The discipline is the set of habits that make those assumptions explicit, checked, and fail-closed.

The mental model

The server’s trust boundary is the network socket. Everything that comes in from the other side is untrusted. “Authenticated” does not mean “trusted”; it means “we know who is sending hostile data.” The authorization layer narrows what any given authenticated caller is allowed to do, and it is the most important defense by impact, but it does not on its own make the request content safe to process.

Two kinds of concerns flow out of this:

Content concerns. What the request body, query parameters, and headers contain. Is the JSON parseable? Are the fields the right types? Is the user_id parameter allowed to reference the object it’s asking about? Is the URL in image_url one we should fetch on the user’s behalf?
Volume concerns. How much of anything the client can send. How many requests per second? How big a request body? How deep a JSON document? How many results per page? Volume without limits is a denial-of-service waiting to be discovered.

The techniques — validation, parameterization, rate limiting, and the rest — are the specific defenses for specific classes of these concerns. Treat the list as a menu of habits to apply everywhere, not as a per-endpoint checklist to apply once.

Validation is not sanitization

The two are routinely conflated, which is why it’s worth pulling them apart.

Validation is: “does this input match the shape I expected? If not, reject it.” Validation produces a yes/no answer and, on no, rejects the request with a 400. Validation is the right move at the trust boundary. It is cheap, it is definitive, and it is the thing most APIs get away with doing in isolation.

Sanitization is: “this input does not match what I expected, but I’ll modify it until it does and then use it.” Sanitization tries to be helpful and is almost always wrong at the API layer. You strip HTML tags from a field and discover the attacker nested them to survive stripping. You escape quotes and discover the attacker used Unicode variants. You normalize the input and discover you normalized away the malicious part but not the vulnerability that noticed it. The problem with sanitization is that you can’t know every malicious shape the input might take; validation inverts the problem to a shape you do know — the legitimate one.

The rule: validate inputs, encode outputs, parameterize statements. Sanitization is a fallback for cases where validation is genuinely impossible (rich-text editors that must accept HTML, for instance), and even there it should use a battle-tested library (DOMPurify for HTML) rather than a bespoke str.replace chain.

Schema-first validation

The right place to do validation is at the schema level, against a declared specification of what the endpoint accepts. Modern APIs converge on one of a few patterns:

OpenAPI / JSON Schema for REST. The schema is the source of truth. Middleware (express-openapi-validator, FastAPI’s pydantic integration, Go’s go-playground/validator) enforces it at request-time. Invalid requests are rejected before controller code runs.
Protobuf / gRPC. The schema is the wire format. Type validation is automatic; business-rule validation is an explicit next step.
GraphQL. The schema validates types and required fields. Business-rule validation still needs explicit work in resolvers.
Pydantic / Zod / Joi / io-ts. Code-level validators when the API doesn’t have an external schema, or as a belt to the schema’s suspenders.

The properties that matter:

Type checking. user_id is an integer. email is a valid email. created_at is an ISO-8601 timestamp. Reject anything that doesn’t parse as the declared type.
Bounds. String lengths capped. Numeric ranges enforced. Array sizes limited. An endpoint that accepts a tags array with no size limit will eventually see a request with a million tags and spend all its CPU processing it.
Enum enforcement. status is one of "pending" | "shipped" | "delivered". Anything else is rejected.
Format validation. Emails, URLs, UUIDs, phone numbers. Don’t roll your own regex for these; use the libraries that handle the thousand edge cases.
Required vs optional. Required fields missing: 400. Optional fields: default values specified.
Unknown fields: reject by default. If the schema says name and email, an extra is_admin in the request body should fail validation, not be silently dropped. This prevents a class of mass-assignment bugs (see below).

Validation at the boundary is one of the highest-ROI investments an API can make. Most injection and data-corruption vulnerabilities collapse to “the server accepted a request it should have rejected before processing it.”

Parameterization: data is not code

SQL injection is the oldest and still the most common example of the data-as-code confusion, but the pattern is general. Any time user input is concatenated into a string that will be interpreted by another system — SQL, shell, LDAP, XPath, NoSQL queries, templating engines, eval — you have the same vulnerability class.

The defense is the same across all of them: don’t concatenate; parameterize.

A SQL example:

# Broken
db.execute(f"SELECT * FROM users WHERE email = '{email}'")

# Correct
db.execute("SELECT * FROM users WHERE email = ?", (email,))

The second form does not substitute email into the SQL string. It sends the SQL and the parameter separately to the database; the database parses the SQL without the parameter, then binds the parameter as a value into the already-parsed query. There is no path by which email can be interpreted as SQL. The data and the code are separated at the protocol level.

This generalizes:

SQL: prepared statements / parameterized queries, or a query builder / ORM that does this under the hood. Every mainstream database driver supports it; there is no performance reason to avoid it.
Shell: never sh -c "rm -rf /tmp/" + userInput. Use argument arrays passed to subprocess.run([...], shell=False) (Python), execFile (Node), or equivalent. The argument list is not interpreted by a shell; no special characters are special.
NoSQL: MongoDB $where with string-concatenated JavaScript is the same trap in a different color. Use the query-document form exclusively; never pass user-controlled strings into $where or eval operators.
Templating: Jinja2 / ERB / Handlebars called with user-controlled template strings is SSTI (server-side template injection). Templates should be authored, not supplied by users.
Regex: user-supplied regex patterns are a ReDoS risk ((a+)+$ against aaaaa...x is exponential). If the user can supply the pattern, use a regex engine with a linear-time guarantee (RE2, Hyperscan) or reject user-supplied patterns.

The pattern is: the data and the code travel on separate channels. Any interface that accepts a single concatenated string combines them; any interface that accepts arguments separately from the command separates them. Prefer the latter everywhere it is available.

Output encoding

The counterpart to input validation is output encoding. Data that came in safely still has to be emitted safely, because the emitted context — HTML, JSON, shell, URL — each has its own set of special characters.

The canonical case: cross-site scripting (XSS). A comment field accepts arbitrary text. The server stores the text. A page renders the text into HTML. If the rendering concatenates the text into the HTML without encoding <, >, &, ", the attacker stores <script>stealCookies()</script> and every viewer runs it.

The fix is context-aware encoding at render time. HTML contexts need HTML entity encoding; attribute contexts need attribute encoding; URL contexts need URL encoding; JavaScript contexts need JavaScript string encoding. Template engines (Jinja2, React’s JSX, Vue’s templates, Liquid) do this automatically when you use their interpolation syntax; they stop doing it when you use {{ foo | safe }} or dangerouslySetInnerHTML or the equivalent escape hatch.

For a pure JSON API, XSS is less direct — the API returns JSON, not HTML — but the data still flows into clients that render it. The client has the same encoding obligations. The API’s defense is: don’t return anything you didn’t check into. Specifically, don’t return data the requesting user doesn’t have permission to see (see “excessive data exposure” below), and don’t return error messages that reflect user input back without encoding (some payloads exploit the error path, not the success path).

BOLA: the #1 API vulnerability

Broken Object Level Authorization is the OWASP API Top 10 #1 in every edition since the list existed. It’s the one that keeps happening because it is easy to miss in code review: the endpoint is authenticated, the caller is the right role, the request is well-formed — and the caller is reading or modifying an object that belongs to someone else.

The canonical example: GET /api/orders/42. The user sending the request is authenticated as user #7. The endpoint returns order #42. The endpoint did not check whether user #7 owns order #42. If order #42 belongs to user #99, user #7 just read user #99’s order.

The fix is authorization at the object level, on every endpoint that reads, writes, or references an object by id:

def get_order(order_id, current_user):
    order = db.orders.get(order_id)
    if order is None:
        raise NotFound
    if order.user_id != current_user.id:
        raise NotFound  # not Forbidden — don't leak existence
    return order

Two subtleties:

Return NotFound, not Forbidden, for objects the caller can’t see. Returning 403 tells the attacker the object exists and they don’t have access; returning 404 for both “doesn’t exist” and “not yours” is the safer leak budget.
Filter by ownership in list endpoints too. GET /api/orders should return only the caller’s orders. “List all orders and filter client-side” is not authorization; it is a data-exposure bug with a UI.

Frameworks and libraries help partially. Rails’ Pundit, Laravel’s Gates, FastAPI’s dependencies, Go’s middleware patterns — all give you a place to put “does this caller have access to this object?” The responsibility is still the application’s; no framework can know what “owning an order” means for your domain. BOLA is a pattern you have to check for on every endpoint, not a box a library ticks for you.

Mass assignment

The server accepts a JSON body. The application deserializes the JSON into a user record and saves it. If the deserialization binds every field in the JSON to the record’s fields, an attacker can submit {"name": "Alice", "email": "alice@..", "is_admin": true} and promote themselves.

This is mass assignment, and it’s the OWASP API Top 10 #6. Two patterns avoid it:

Allowlist the fields the endpoint accepts. The JSON is parsed into a typed request object with only the fields the endpoint is supposed to change. is_admin isn’t in the UpdateProfileRequest type, so an attacker’s is_admin field is silently dropped (or, with strict schemas, rejected outright).
Separate internal from external representations. The database record has fields the API doesn’t expose. The API DTO has the subset users can send. The mapping is explicit.

Rails made “mass assignment” a famous bug in 2012 because the framework’s ActiveRecord bound every submitted attribute by default. The fix was strong parameters — explicit allowlists. Other frameworks have the same pattern under other names. Pydantic’s model_config = ConfigDict(extra="forbid") is the schema-level answer in Python; TypeScript’s strict schema validators do the same. The pattern is: the fields an endpoint accepts are declared, not inferred.

Broken object property level authorization

A newer OWASP entry, and a refinement of BOLA. The caller has access to the object — they own it — but the object has fields they shouldn’t see or modify. A user can read their own profile (owns it) but the profile includes internal fields like risk_score or internal_notes. A user can update their own order (owns it) but the update endpoint lets them change the status field, which should be server-controlled.

The defenses are the same shape as BOLA and mass assignment, narrowed to field-level granularity: separate read schemas from internal representations, separate write schemas from read schemas, and check at the field level for anything the client is not allowed to control.

Excessive data exposure

An endpoint returns more data than the client needs. The API returns the whole user record — including the password hash, the internal ID, the risk scoring metadata — and assumes the client UI will only display the fields it wants. Attackers aren’t using the UI; they’re reading the raw JSON response.

The fix is explicit response shaping. Don’t serialize the full object; serialize a response DTO that contains only what this endpoint is supposed to return. Frameworks that default to “serialize everything” (toJSON() on the ORM model) are shipping excessive data exposure by default; the discipline is to turn that off and declare the shape.

GraphQL is interesting here: it forces the client to specify what it wants, which limits over-fetching, but it also means any authorized caller can ask for fields that exist on the schema. GraphQL without per-field authorization has the same class of bug as REST-that-returns-everything. The framing is different; the defense is the same.

Rate limiting and resource consumption

Unrestricted resource consumption is OWASP API Top 10 #4. Any endpoint without a rate limit, a size limit, a depth limit, or a cost limit is a DoS vector. The defenses are multiple and complementary:

Request rate limits. Token bucket or leaky bucket algorithms, applied per-IP, per-user, per-API-key, per-endpoint. Nginx, Envoy, Cloudflare, AWS API Gateway, Kong — all do this. Per-user is more useful than per-IP for authenticated APIs (one user behind a corporate NAT should not throttle the rest of the company); per-IP is essential for unauthenticated endpoints (login, signup, password reset) where the attacker is trying to find a valid account.
Request size limits. A 1GB JSON upload should be rejected at the proxy layer, not after the application has buffered it all. Set a sane default (a few MB) at the gateway; endpoints that legitimately need more can be configured as exceptions.
Pagination with caps. Any list endpoint has a maximum page size. ?limit=1000000 is either capped or rejected; the cap is enforced on the server, not trusted from the query parameter.
Query cost limits (GraphQL specifically). A query that asks for users { orders { items { product { reviews { author { orders { ... } } } } } }} is exponential in work. Query depth limits, query complexity scoring, and per-field cost budgets are the defenses. graphql-depth-limit, graphql-cost-analysis, and the newer GraphQL Armor do the standard work.
Regex and parser limits. Timeouts on regex matching. Timeouts on JSON parsing (YAML especially — billion laughs attacks against naive parsers are still real).
Per-endpoint concurrency limits. One slow-query endpoint should not exhaust the database connection pool for every other endpoint. Connection pooling with per-tenant or per-endpoint limits, plus circuit breakers on the slow-queries, isolates the blast radius.

The overarching principle: every resource consumed by a request has a bound, and the bound is enforced at the earliest possible layer. Time, memory, connections, CPU, database rows returned, downstream API calls — each has a budget. Requests that exceed the budget are rejected, not processed-slowly.

Abuse-prevention: beyond rate limits

Rate limits protect the server; abuse prevention protects the product. They overlap but are not the same.

Signup abuse. Throwaway emails, botnets creating fake accounts. Defenses: email verification, CAPTCHAs at threshold, per-IP and per-fingerprint velocity limits, disposable-email-domain blocklists.
Login abuse. Credential stuffing (testing leaked credentials from other sites against yours). Defenses: rate limit per username, per IP, per device; detect low-success-rate patterns; check passwords against HIBP (Have I Been Pwned); push 2FA by default.
Content abuse. Posting the same message to 10,000 users; spamming comments. Defenses: posting velocity limits, content hashing to detect duplicates, trust scores on accounts.
Scraping. A logged-in user reading every record they have access to, to build a competitor’s database. Defenses: access-pattern detection, per-user query budgets, consistency between human-use patterns and the observed pattern.

This is where API security meets product security; the tools are the same (rate limits, scoring, throttling), but the thresholds and responses depend on the domain.

SSRF: the server makes requests too

Server-Side Request Forgery is when an API accepts a URL and fetches it on the user’s behalf. The user supplies image_url; the server fetches it and returns the bytes, or generates a thumbnail, or scrapes the page title. Sounds innocuous; isn’t.

The attack: the user supplies http://169.254.169.254/latest/meta-data/ (AWS instance metadata), http://localhost:6379/ (internal Redis), http://internal-admin-api/. The server fetches it. The server has network access to things the user’s laptop doesn’t; the user just borrowed that access.

Defenses, in order of increasing paranoia:

Don’t let users supply URLs at all, if possible. Upload the file directly instead of fetching it from a URL. This removes the class entirely.
Allowlist schemes and hosts. If URLs must be accepted, reject anything that isn’t https://, resolve the hostname, reject if the resolved IP is private (10., 172.16-31., 192.168., 127., 169.254.*, IPv6 equivalents).
Resolve once, connect once. A classic SSRF bypass is DNS rebinding: the domain resolves to a public IP during validation and a private IP during the actual fetch. Defense: resolve once, connect to the resolved IP explicitly, pass the hostname in the Host header.
Isolate the fetcher. Run the URL-fetching service in a separate VPC or network that doesn’t have access to internal resources. The fetcher can only reach the public internet, by design.

AWS has specifically addressed instance-metadata SSRF with IMDSv2, which requires a header that cross-origin requests can’t set, but the general class is bigger than the AWS metadata endpoint.

CORS and CSRF: the browser cases

Two browser-specific concerns, often conflated.

CSRF (Cross-Site Request Forgery) is an attack on cookie-authenticated APIs. The user is logged into your site; they visit an attacker’s site; the attacker’s site makes a request to your site; the browser attaches the session cookie automatically; your site processes the request as if the user intended it.

Defenses for cookie-authenticated APIs:

SameSite cookies. SameSite=Lax (the modern default) prevents most cross-site requests from including the cookie. SameSite=Strict is safer but breaks some legitimate cross-origin flows.
CSRF tokens. A random token sent in a header or hidden form field, stored in a cookie, verified server-side. The attacker’s site can’t read the token (same-origin policy), so can’t construct a valid request.
Require non-cookie auth (bearer tokens in headers). Browsers don’t attach Authorization headers cross-origin automatically; the CSRF class evaporates. Most SPAs/PWAs that use token auth are immune to CSRF, for this reason.

CORS (Cross-Origin Resource Sharing) is a relaxation of the same-origin policy, not a security feature. By default, the browser prevents a page from one origin making requests to a different origin. CORS headers let the server say “I accept cross-origin requests from these origins for these methods.” A misconfigured CORS (Access-Control-Allow-Origin: * on an authenticated API) opens the door the same-origin policy was closing.

The common CORS mistakes:

Reflecting the origin header. The server echoes whatever Origin the request sent into Access-Control-Allow-Origin. This is * with extra steps and allows any origin.
Enabling credentials with wildcard origins. Access-Control-Allow-Origin: * combined with Access-Control-Allow-Credentials: true is specifically blocked by the spec for good reason; attempts to bypass it by reflecting the origin header re-introduce the vulnerability.
Forgetting that CORS is the browser’s enforcement, not the server’s. A non-browser client (curl, any server code) ignores CORS entirely. CORS does not replace authentication or authorization; it constrains what browsers let scripts do.

TLS and the boring security headers

Every API, every environment, TLS. This is the baseline; no API running unencrypted in 2026 has a defensible reason. Let’s Encrypt removed the last remaining excuse. The public cloud load balancers do it for free. Use HTTPS everywhere; redirect HTTP to HTTPS with HSTS (Strict-Transport-Security header) to prevent downgrade attacks.

A short list of headers worth setting on any HTTP-serving endpoint, whether it’s an API or a page:

Strict-Transport-Security. Force HTTPS for subsequent requests.
Content-Security-Policy. For anything that serves HTML, constrain what scripts/fonts/images can load. Narrow CSP is one of the most effective XSS mitigations; broad CSP is security theater.
X-Content-Type-Options: nosniff. Prevent browsers from guessing content types.
X-Frame-Options: DENY (or CSP frame-ancestors). Prevents clickjacking by disallowing iframing.
Referrer-Policy. Controls what Referer header is sent on outbound links.
Remove Server / X-Powered-By. These leak stack info; not catastrophic, but free to remove.

For pure JSON APIs, most of the above don’t apply. What does apply: TLS, HSTS, and the API-specific equivalents of “don’t advertise unnecessarily” (generic 404 for authn failures, non-descriptive error messages, no stack traces to clients).

Error messages and logs

Two places where security bugs like to live quietly.

Error messages. A 500 response that includes a stack trace leaks the framework version, the file paths, sometimes the SQL query that failed (including the data in it). In production, error responses should be generic ({"error": "internal error", "request_id": "abc123"}) and the details should go to logs. The request id lets support correlate without surface-leaking.

Logs. The other direction: logs should contain enough to debug, but not secrets. Three types of leaks to watch:

Secrets logged directly. An engineer adds logger.info(f"making request with token {token}") for debugging, forgets to remove it, tokens now live in every log aggregator and every backup. Detect: lint rules that flag logging variables whose names look like secrets. Mitigate: if it happens, rotate the secret and clean the logs.
Secrets in request URLs. GET /api/users?token=abc shows up in access logs, proxy logs, browser history, Referer headers. Never put secrets in URLs; they belong in headers or bodies.
PII in logs beyond what’s authorized. Full user emails, phone numbers, payment details flowing into application logs that have a broader read audience than the production database. Redact before logging or use structured logging with explicit field-level classification.

The discipline that scales: logs have a sensitivity level. Auth tokens, PII, payment details get redacted by the logger itself — a shared utility that **s anything matching patterns or anything tagged as sensitive. The discipline fails unless the tool enforces it.

Dependency and supply chain security

The API’s attack surface includes its dependencies. A transitive dependency with a vulnerability is your vulnerability. Log4Shell (2021) and the event-stream / left-pad incidents are the canonical reminders.

Minimum hygiene:

Automated dependency updates. Dependabot, Renovate, GitHub security advisories. Dependencies at the current patch level, automatically.
SBOM generation. A software bill of materials for each release, so that when a new CVE drops you can tell within minutes whether you’re affected.
Lock files, pinned versions. package-lock.json, poetry.lock, go.sum, Gemfile.lock. Reproducible installs; no surprise transitive bumps.
Verify signatures on dependencies where signing exists. Sigstore, Cosign, PGP. This matters increasingly as supply-chain attacks grow.
Limit the blast radius. The application runs with the minimum privileges it needs; the CI system cannot publish packages without review; the runtime doesn’t give dependency code access to the whole filesystem.

Deeper supply-chain security (SLSA levels, attestations, reproducible builds) is its own topic. The short version: know what’s in your artifact, know where it came from, and have a plan for when something in it turns out to be bad.

Third-party integrations and webhooks

Any time your API integrates with an external one — calling it or receiving webhooks from it — there is a trust boundary that deserves explicit design.

For webhooks you receive:

Signature verification. The sender signs the payload with a shared secret (HMAC-SHA256); you verify the signature. Without this, anyone who guesses the webhook URL can forge events.
Idempotency. The sender may retry; the same event may arrive multiple times. Your handler should be idempotent (dedupe on event id).
Replay protection. Include a timestamp in the signed payload; reject old ones.
Don’t trust the body’s claim about what happened. The webhook saying “payment succeeded for user X” is not authority; the authoritative answer is fetching the payment state from the sender’s API.

For APIs you call:

Handle their failures. See resilience patterns — timeouts, retries with idempotency, circuit breakers.
Don’t echo their responses unverified. If you display data you fetched from a third-party API, the third party is now in your trust boundary for that display context. Encode appropriately.
Keep secrets rotated. API keys expire, get compromised, get checked into the wrong repo. Rotation is operational discipline.

The OWASP API Top 10 has added “Unsafe Consumption of APIs” as a specific category for this — the reminder that the API surface going out of your system needs the same care as the one coming in.

Observability as a security tool

Most security incidents are visible in telemetry before they are visible in impact. A login endpoint that normally sees 1% failure rate suddenly seeing 40% is either a deploy bug or credential stuffing; either way, you want to know within minutes. A rare endpoint suddenly getting a hundred calls a second from one API key is either a customer’s bug or an abuse event; either way, you want to know.

The things worth alerting on:

Elevated auth failure rates, per endpoint and per source.
Rate-limit triggers, when they go from occasional to sustained.
Anomalous volume, per API key, per user, per endpoint.
4xx rate, which sometimes indicates enumeration or probing.
5xx rate, which sometimes indicates exploitation causing backend errors.
Known bad patterns: attempts to access /admin, /.env, /.git/, common vulnerability payloads.

These are observability with a security lens, not separate systems. The SIEM and the metrics platform can be the same tool for a small-to-mid team; the rules are the specialization.

The rule

API security, after authn/authz, is a set of habits applied at every boundary. Validate input. Parameterize database calls. Encode output. Check object-level authorization on every endpoint, not every entry point. Allowlist writable fields. Rate-limit everything. Bound every resource. Treat URLs the user supplies as hostile. Accept cookies carefully or not at all. TLS everywhere. Logs without secrets. Errors without internals. Dependencies patched.

None of this is clever. The OWASP API Top 10 is not a list of subtle cryptographic pitfalls; it is a list of habits not consistently applied. The clever bugs — timing attacks on comparison functions, padding oracles on crypto modes, exotic parser differentials — are real and worth studying, but they account for a small fraction of breaches. The overwhelming majority are one of: a missing object-level authz check, an unsanitized input concatenated into a query, an endpoint without a rate limit, a misconfigured CORS, a secret in a log, a URL fetched without an allowlist.

The discipline is the defense. Each endpoint, each request path, each integration is an instance of the same small set of rules. Apply them everywhere and the residual risk shrinks to the genuinely novel. Skip them and you are in the bulk of the OWASP list, where the attackers are.