Line-rate DLP: data loss prevention that runs inline

Most edge DLP is broken for one of two reasons. The heavyweight implementations parse every request body into a tree, walk the tree, and run regex matches at every leaf node. They're thorough, and they're slow enough that you can't afford to run them inline. The loose implementations skip the parsing and just regex-match the raw bytes. They're fast enough to run inline, and they flag every 16-digit number in a JWT payload as a stolen credit card.

Neither is good.

Synapse's DLP runs sub-millisecond even on multi-megabyte payloads, produces zero false positives in GoTestWAF validation, and runs in the same inline request path as the WAF itself. This article is about the architecture choices that make that work: a two-phase literal-first scanner, zero-allocation validators, a soft 8KB cap, parallel execution with upstream TCP setup, and a response-phase hook that catches outbound leaks.

None of those choices are novel in isolation. The combination is.

The two failure modes

The traditional DLP approach is "parse deeply, match everywhere." A request body arrives, serde_json (or equivalent) parses it into a recursive tree structure, and the scanner walks every node, running the full regex library at every leaf value. If it finds a credit card pattern, it can tell you exactly where: /user/profile/payment_methods[0]/card_number.

That level of detail is real. It's also expensive. Every nested object adds CPU cycles and allocations. Every leaf value runs through the full pattern set. A 10KB nested JSON body can generate thousands of leaf evaluations. A 1MB body is a small disaster. The scanner doesn't just get slow at scale, it gets unpredictable, because the cost depends on the input shape and the input shape is the one thing you don't control.

The alternative is "match everywhere, don't parse." Run the pattern set against the raw byte stream and accept that you'll lose structural context. This is faster, but the regex-only approach has a worse problem: false positives. A random 16-digit sequence in a base64-encoded JWT looks exactly like a credit card. A 9-digit user ID looks exactly like an SSN. An IPv4 address in a log entry looks exactly like an IPv4 address, because it is one, even though it doesn't matter.

Every false positive costs operator trust, and operator trust is the only currency a security product has. A 1% false positive rate on a 10K RPS API is 100 false alerts per second, which is 8.64 million per day, which is enough to bury any SOC team and destroy faith in the product within a week.

A traditional attacker-centric DLP is either slow (recursive tree) or noisy (raw regex). The architecturally-honest answer is that it's usually both, in the same implementation, because you're forced to pick one tradeoff and the implementation absorbs the other.

Synapse does neither.

The literal-first split

The observation that makes fast DLP possible is that most sensitive data has a distinctive literal prefix. AWS access keys start with AKIA. JWTs start with eyJ. Stripe secret keys start with sk_live_. GitHub tokens start with ghp_. Slack tokens start with xoxb- or xoxp-. Credit cards from the major networks start with known BIN ranges. Once you enumerate the patterns that matter, you realize that most of them have a recognizable prefix that can be matched without a full regex engine.

Synapse exploits this with a two-phase scanner.

Phase one is an Aho-Corasick multi-pattern matcher scanning for literal prefixes. Aho-Corasick is a classical algorithm that finds all occurrences of a set of fixed strings in a single pass over the input, in linear time, regardless of how many strings are in the set. It's the same algorithm that powers grep -F and YARA's literal mode. For Synapse's DLP, the set is the literal prefixes of all 22 built-in patterns, compiled into a single automaton at load time. A scan over an 8MB body runs in constant time per byte: no backtracking, no allocations.

Phase two is a RegexSet for patterns without distinctive literals. SSNs, IPv4 addresses, phone numbers: these don't have a prefix you can anchor on, so they need a real regex. But RegexSet is still a single-pass scanner. It takes a set of regexes, compiles them into a single NFA, and finds all matches in one traversal. You don't iterate over each pattern individually. You scan the body once, and the engine reports which patterns matched at which positions.

Phase one finds the matches with literal prefixes. Phase two finds the regex-only matches. Both phases share the same byte buffer. Both phases are single-pass. The combined scan runs in O(n) over the body length, independent of how many patterns you care about.

That's 22 patterns matched in one linear pass over the request body, with no body parse, no per-pattern state, and no backtracking.

Zero-allocation validators

Finding a potential match is only half the problem. A potential match isn't a real match until you've validated it, and the validator is where most DLP implementations concede ground.

Traditional DLP is regex-only: the regex matches a pattern, the scanner reports a hit, done. The problem is that for most sensitive-data patterns, the regex is necessary but not sufficient. A 16-digit number that matches the credit card pattern is a credit card if and only if it also passes the Luhn checksum. A 9-digit number that matches the SSN pattern is an SSN if and only if it also has a valid area group and isn't in the SSA's advertising reservation range. An IBAN matches a structural pattern and passes the Mod-97 checksum and conforms to the country-specific length.

Synapse pairs every literal or regex match with a validator that confirms the candidate is real before emitting a signal:

Luhn for credit cards. Runs directly on the matched byte slice. No allocation, no string conversion, no BigInteger. The Luhn algorithm is about 16 arithmetic operations for a 16-digit number.
Mod-97 for IBANs. The modulo is computed in a streaming fashion over the byte slice, with country-specific length validation as a prefilter.
Area code plus service code for US phones. The 555 exchange is a reserved-for-advertising pool and gets filtered. N11 service codes (911, 411) get filtered. The rest pass.
Advertising-number filtering for SSNs. The SSA publishes a list of numbers reserved for public use (most famously 078-05-1120, which appears on millions of training materials). Synapse filters those out.

Every validator operates on the matched byte slice directly. Every validator is zero-allocation. Every validator either returns a boolean (match is real) or a specific reason code (why it was rejected), which gets logged for debugging but never turns into an allocation on the hot path.

The practical result is that Synapse's DLP achieves 0% false positive rate in GoTestWAF validation. Most vendors quote a low single-digit false positive rate and hope you don't ask what that means at scale. Zero is what zero-allocation validators buy you.

Scan once. Validate cheaply. Cap early. Parallelize with I/O.

The soft 8KB cap

The other architectural decision that keeps DLP fast is that Synapse doesn't scan the entire request body. It scans the first 8KB and stops.

This is the part that most people react to first. Wait, doesn't that mean you miss things? And the answer is: no, and here's why.

Sensitive data in HTTP requests almost always appears in one of three places: the query string, a well-known header (Authorization, Cookie, X-API-Key), or the top of the request body where the primary fields live. If a request is exfiltrating an API key, the key is in the first 200 bytes. If it's exfiltrating a credit card, the card is a top-level field or close to it. Data buried in the 5000th nested object of a 10MB payload isn't what attackers are moving, because attackers want their exfiltration to work, and working exfiltration uses the structure the server expects.

The 8KB cap is soft, not hard. It limits the sync scan window, not the request itself. A request with a body larger than 8KB is still served. The DLP scan just stops at the 8KB mark, and anything past that is handed off to Synapse Fleet's deeper structural analysis, which runs asynchronously on the telemetry copy of the request. Synapse Fleet isn't bound by the latency budget. It can do the recursive parse, walk the tree, and match at every leaf without anyone waiting on it.

The hotpath gets fast scanning. The analyst workbench gets thorough scanning. Both audiences get what they need.

Parallel execution with upstream I/O

Here's the architectural decision that reduces the perceived cost of DLP to effectively zero.

Synapse is built on Pingora, which gives you async hooks at each phase of the request lifecycle. The DLP scan runs as a Tokio task spawned from the request_body_filter hook. Critically, spawning the scan doesn't block the proxy. While the DLP scanner is reading the body and running pattern matching, the proxy is doing other work that would have to happen anyway: selecting an upstream peer, establishing a TCP connection, doing the TLS handshake if needed, writing the request line and headers.

Those operations are network-bound. They take somewhere between hundreds of microseconds and low single-digit milliseconds on healthy infrastructure. The DLP scan is CPU-bound and finishes in tens of microseconds. The scan is already done by the time the proxy is ready to write the body to the upstream.

The proxy only awaits the DLP result at one specific point: the upstream_request_filter hook, right before it commits to sending the final headers. At that point the scan either finished (the usual case, awaiting the result is a no-op), or it hasn't (the task yields, the proxy yields, everybody waits a few microseconds), or the scan timed out and the circuit breaker fires. Failure modes fail open for DLP, not closed, because blocking legitimate traffic on a scanner timeout is a worse outcome than missing a pattern on one request that gets logged for forensics anyway.

The net observable impact of DLP on request latency is effectively zero for the common case, because the scan finished during work the proxy had to do anyway. This isn't optimization. It's refusing to serialize work that doesn't need to be serialized.

Response-phase inspection

Everything above is about inbound DLP: detecting sensitive data in the request. The more important half of the problem is outbound DLP: detecting sensitive data in the response. The "L" in DLP is Loss, and loss happens when the server accidentally or maliciously leaks something the client should never have received.

Synapse runs a response_filter hook that applies the same literal-first scanner to the outbound response body, with the same 8KB soft cap, the same validators, and the same async overlap strategy (the scan runs during downstream write operations). If the response contains an API key that shouldn't be there, Synapse can block the response before it goes out, even if the originating request was completely benign.

This is the scenario that matters most in practice and that most WAFs ignore. An attacker who has gotten past authentication (via a stolen session, an IDOR, a broken-access-control bug) is going to exfiltrate data through legitimate-looking requests. The request itself is clean. The response is where the leak happens. A WAF that only inspects requests will never catch it.

Response-phase DLP is the difference between "we prevent attacks from getting in" and "we prevent sensitive data from getting out." Those are different problems, and most vendors only solve the first one.

Fast Mode

For extremely high-RPS APIs, Synapse supports a Fast Mode that skips low-priority patterns. Email addresses and IPv4 addresses are common in normal traffic and have a high base rate of non-sensitive matches. Skipping them cuts scan time further at the cost of losing coverage on two patterns that are usually not the ones you care about blocking.

Fast Mode is a deployment-time toggle, not a per-request decision. If you're protecting a public API doing 50K RPS where every request includes an email in the payload, Fast Mode is the right call. If you're protecting a backoffice tool where an email appearing in a response body is genuinely a leak (customer lists, user enumeration), leave it off.

The numbers

From the benchmarks, on production traffic:

~21μsDLP scan
(4KB clean)

~42μsDLP scan
(8KB clean)

22Built-in
patterns

0%False
positives

ZeroAdded
latency

These numbers include all four layers: the Aho-Corasick literal scan, the RegexSet regex scan, the validators, and the deferred-scan overhead for payloads past the soft cap. The "zero added latency" claim refers to perceived impact on typical requests, where the DLP scan completes during the upstream TCP connection setup and the proxy never waits for it.

The generalization

The literal-first split isn't specific to DLP. Any pattern-matching problem with a mix of fixed-string patterns and regex patterns benefits from the same two-phase architecture:

Threat signature matching. Most malware signatures have distinctive byte sequences. Literal-first scanning catches the easy 90%.
Log scanning. Authentication tokens, internal URLs, and path traversal patterns are mostly literal. Regexes are for the edge cases.
Content moderation. Profanity filters, banned phrases, and PII detection all share the same structure.
Secret detection in source code. Literally the DLP problem applied to git commits and CI logs.

The broader lesson is that exact-match and regex aren't mutually exclusive. They're complementary phases of the same scan. Running them in a single pass over the byte stream is the architectural choice that makes pattern matching at line rate possible. Skipping the parse layer means you lose structural context, but structural context can be rebuilt asynchronously by a slower, thorough scanner that doesn't live on the request hotpath.

Synapse's DLP is sub-millisecond because each stage was designed to skip work the default approach can't avoid. Every one of those is a choice someone could make, and most don't, because the default is to keep patching the recursive tree parser until it's not quite slow enough to fail.

Keep Reading

The DLP at the Edge infographic visualizes the scanner architecture across request and response phases. The WAF Rule Pipeline shows how DLP fits into the broader detection loop. If you haven't read it yet, Every sensor is a brain covers the full autonomous-sensor thesis that this article is a zoom-in on. Or browse all writing.