Skip to main content
Saved
Pattern
Difficulty Advanced

PII Redaction

Remove or mask personally identifiable information systematically from logs, analytics, and error reports.

Den Odell
By Den Odell Added

PII Redaction

Problem

A form submission fails, your error reporter fires, and the captured event includes the full request body: the user’s email, their plaintext password, a credit card number, a session token. None of that was supposed to leave the browser, but now it sits in a third-party dashboard that dozens of people can search, indefinitely.

This happens quietly and everywhere. Analytics events carry user identifiers and form payloads to Mixpanel or Google Analytics. Console logs that someone added for a quick debugging session get shipped to a log aggregator and never removed, slowly accumulating emails and phone numbers. Exception monitoring captures stack traces alongside request parameters that contain auth tokens. A support engineer copies a log line containing a customer’s address into a Slack thread. Each path looks harmless in isolation, but together they scatter personally identifiable information across systems you don’t fully control.

The consequences are not theoretical. When a logging platform gets breached, attackers inherit a pre-assembled dataset of user PII harvested from your own telemetry. Compliance audits surface sensitive data spread through infrastructure with no access controls or retention policy, exposing the company to GDPR fines and CCPA penalties. The data you never meant to send is exactly the data that leaks, because nobody was watching the exits.

Solution

Redact PII at the boundary, the moment data is about to leave the application, rather than trying to clean it up downstream. Every sink (your logger, your analytics wrapper, your error reporter) should pass payloads through a single redactor before transmission. Centralizing this matters: if each integration rolls its own scrubbing, the one you forget becomes the leak.

Prefer an allow-list over a deny-list. Listing the fields that are safe to emit (userId, route, statusCode) and dropping everything else fails closed: a new field someone adds next quarter is redacted by default instead of silently shipped. Deny-listing known-bad keys is useful as a second layer, but on its own it always lags behind reality, because you can only block the sensitive fields you’ve already thought of.

For free-text where structure isn’t available, fall back to pattern matching. Regular expressions can mask emails, phone numbers, credit card numbers, and token-shaped strings inside log messages and stack traces, replacing them with placeholders like [EMAIL] or [CARD]. Patterns are imperfect, so treat them as a safety net behind structured redaction rather than the primary defense.

Then wire the redactor into the places data escapes. Hook your error reporter’s outbound payload, for example Sentry’s beforeSend, and scrub the request body, headers, and extra context before the event is sent. Wrap analytics calls so properties are filtered the same way. In structured logging, apply field-level redaction driven by the allow-list so a password or creditCard key never reaches the transport. Keep real values available only in local development, and enforce redaction in staging and production. The result preserves the shape and flow of your data, which is what debugging actually needs, without preserving the sensitive values.

Example

These examples build a single reusable redactor, then route an error reporter and a structured logger through it. The redactor is the only thing that knows what’s sensitive; everything else just calls it.

A reusable redactor

The core redactor deep-clones the input so the original object is never mutated, masks any key on the sensitive list, and runs regex patterns over string values to catch PII embedded in free text. An optional allow-list lets you flip to fail-closed behavior, where only explicitly safe keys survive.

const SENSITIVE_KEYS = new Set([
  'password', 'pwd', 'token', 'accessToken', 'refreshToken',
  'authorization', 'cookie', 'ssn', 'creditCard', 'cardNumber',
  'cvv', 'email', 'phone',
]);

const PATTERNS = [
  [/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g, '[EMAIL]'],
  [/\b(?:\d[ -]*?){13,16}\b/g, '[CARD]'],
  [/\b\d{3}[-.\s]?\d{3}[-.\s]?\d{4}\b/g, '[PHONE]'],
  [/\b[A-Za-z0-9_-]{20,}\.[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}\b/g, '[TOKEN]'],
];

function redactString(value) {
  return PATTERNS.reduce((acc, [pattern, label]) => acc.replace(pattern, label), value);
}

// allowList: if provided, only these keys survive (fail-closed).
function redact(input, { allowList } = {}) {
  if (typeof input === 'string') return redactString(input);
  if (Array.isArray(input)) return input.map((item) => redact(item, { allowList }));
  if (input && typeof input === 'object') {
    const output = {};
    for (const [key, value] of Object.entries(input)) {
      if (allowList && !allowList.has(key)) {
        output[key] = '[REDACTED]';
      } else if (SENSITIVE_KEYS.has(key)) {
        output[key] = '[REDACTED]';
      } else {
        output[key] = redact(value, { allowList });
      }
    }
    return output;
  }
  return input;
}

export { redact };

Scrubbing error reports

Error reporters capture far more than the error itself: request bodies, headers, breadcrumbs, and arbitrary context you attach. Sentry’s beforeSend hook runs on every event right before it leaves the browser, which is the ideal place to push the whole payload through the redactor.

import * as Sentry from '@sentry/browser';
import { redact } from './redact';

Sentry.init({
  dsn: process.env.SENTRY_DSN,
  beforeSend(event) {
    // Scrub the request the user was making when it failed.
    if (event.request) {
      event.request.data = redact(event.request.data);
      event.request.headers = redact(event.request.headers);
      event.request.cookies = '[REDACTED]';
    }

    // Scrub any extra context attached by the app.
    if (event.extra) {
      event.extra = redact(event.extra);
    }

    // Scrub breadcrumb data, which often mirrors network payloads.
    if (event.breadcrumbs) {
      event.breadcrumbs = event.breadcrumbs.map((crumb) => ({
        ...crumb,
        data: crumb.data ? redact(crumb.data) : crumb.data,
      }));
    }

    return event;
  },
});

Field-level redaction in logs

Structured logging is where the allow-list pays off. Instead of hoping nobody logs a sensitive field, the wrapper decides which keys are safe to emit and redacts the rest before the record is written, so a stray password in a context object never reaches the transport.

import { redact } from './redact';

// Keys that are always safe to emit. Everything else is dropped.
const SAFE_KEYS = new Set([
  'level', 'message', 'timestamp', 'requestId', 'userId',
  'route', 'method', 'statusCode', 'durationMs',
]);

function write(level, message, context = {}) {
  const record = {
    level,
    message: redact(message),
    timestamp: new Date().toISOString(),
    // Fail closed: only allow-listed keys survive, and their
    // string values still pass through pattern matching.
    ...redact(context, { allowList: SAFE_KEYS }),
  };

  // Single exit point: the only place log records leave the app.
  process.stdout.write(JSON.stringify(record) + '\n');
}

const logger = {
  info: (message, context) => write('info', message, context),
  warn: (message, context) => write('warn', message, context),
  error: (message, context) => write('error', message, context),
};

logger.error('Checkout failed', {
  userId: 'u_123',
  route: '/checkout',
  email: 'user@example.com', // dropped: not in SAFE_KEYS
  cardNumber: '4111 1111 1111 1111', // dropped: not in SAFE_KEYS
});
// {"level":"error","message":"Checkout failed","timestamp":"...",
//  "userId":"u_123","route":"/checkout","email":"[REDACTED]","cardNumber":"[REDACTED]"}

export { logger };

Benefits

  • Sensitive values are stripped at the exit, so PII never reaches logs, analytics, or error dashboards where it would persist out of your control.
  • An allow-list fails closed: new fields are redacted by default instead of silently shipped, so privacy doesn’t depend on remembering to block every bad key.
  • Centralizing the redactor means every sink shares one definition of “sensitive,” eliminating the forgotten integration that becomes the leak.
  • It directly supports GDPR and CCPA compliance by keeping regulated data out of systems with no access controls or retention policy.
  • A breach of your logging or monitoring platform yields far less, because there’s no pre-assembled dataset of user PII to harvest.
  • Debugging signal is preserved: redacted records still show data structure, field names, and flow, which is usually what you actually need.
  • Pattern matching catches PII embedded in free-text messages and stack traces that field-level rules can’t see.
  • Local development can keep real values while staging and production enforce redaction, so you don’t trade away diagnosability.

Tradeoffs

  • Over-redaction destroys debugging signal: mask too aggressively and every log line becomes [REDACTED], leaving you blind during an incident.
  • Regex patterns both miss and over-match. International phone formats and unusual emails slip through, while innocuous numeric strings get flagged as cards.
  • Deep-cloning and scanning large payloads on every log call and error event adds CPU cost and latency in hot paths.
  • The sensitive-key list and allow-list need active maintenance; a renamed or newly added field can re-open a leak until someone updates the rules.
  • Allow-listing requires discipline up front, because every genuinely useful field must be explicitly named or it silently disappears from your telemetry.
  • Centralizing redaction creates a single point of failure: a bug in the redactor affects every sink at once, so it needs tests.
  • Nested or deeply structured payloads can hide PII in places the redactor doesn’t traverse, such as serialized JSON strings or base64 blobs.
  • Redaction is not a substitute for not collecting PII in the first place; the safest sensitive value is the one you never put into a payload.
  • Third-party SDKs may capture and transmit data through code paths your wrapper doesn’t intercept, so coverage is only as complete as the hooks you control.

Summary

PII redaction scrubs sensitive data at the boundary, before it leaves the app for any log, analytics event, or error report, using an allow-list for structured fields and pattern matching for free text. Centralize the redactor so every sink shares it and hook it into error reporters like Sentry’s beforeSend and your structured logger. Treat it as a safety net rather than a license to collect PII you never needed.

Newsletter

A Monthly Email
from Den Odell

Behind-the-scenes thinking on frontend patterns, site updates, and more

No spam. Unsubscribe anytime.