Tokenization Engines and PAN Security Architecture
Tokenization Engines and PAN Security Architecture
A Primary Account Number (PAN) — the 16-digit number on your card — is the most valuable piece of data in a payment transaction. It’s the key that unlocks the ability to charge any amount to a cardholder’s account. Every system that stores, processes, or transmits a PAN falls under PCI DSS scope, requiring expensive annual audits, network segmentation, encryption, access controls, and logging.
Tokenization replaces the PAN with a surrogate value (a token) that is useless outside the specific context it was issued for. The token looks like a card number, passes Luhn validation, and flows through existing payment infrastructure — but it cannot be used to initiate a payment without the Token Service Provider’s (TSP’s) de-tokenization.
Token Vault Architecture
The token vault is the core of any tokenization system: a database that maps PANs to tokens. Its design determines the security, performance, and reliability of the entire tokenization layer.
Design Requirements
-
Bijective mapping: Each PAN maps to exactly one token per domain (merchant + channel). Same PAN at the same merchant always returns the same token (for subscription billing).
-
Format preservation: Tokens must be the same length as PANs, pass Luhn check, and have the correct BIN prefix (so routing infrastructure still works).
-
Non-reversible without vault access: Given a token, you cannot derive the PAN mathematically. This distinguishes tokenization from encryption.
-
High availability: The vault is in the critical payment path. If it’s down, no transactions process. Target: 99.999% uptime.
import hashlib
import secrets
from typing import Optional
class TokenVault:
"""
Payment token vault with domain-restricted token generation.
Production implementations use HSM-backed encryption for PAN
storage and deterministic token generation for idempotency.
"""
def __init__(self, encryption_key: bytes, db_connection):
self._key = encryption_key
self._db = db_connection
def tokenize(
self,
pan: str,
merchant_id: str,
channel: str = "ecommerce"
) -> str:
"""
Generate or retrieve a token for a PAN within a specific domain.
Domain = merchant_id + channel. The same PAN tokenized for
different merchants produces different tokens, preventing
cross-merchant correlation.
"""
domain = f"{merchant_id}:{channel}"
# Check if token already exists for this PAN + domain
existing = self._lookup_by_pan(pan, domain)
if existing:
return existing
# Generate a new format-preserving token
token = self._generate_format_preserving_token(pan)
# Store the mapping (PAN is encrypted at rest)
encrypted_pan = self._encrypt_pan(pan)
self._store_mapping(encrypted_pan, token, domain)
return token
def detokenize(self, token: str, merchant_id: str, channel: str) -> str:
"""
Retrieve the original PAN from a token.
Only the TSP can perform this operation. The merchant never
sees the PAN after initial tokenization.
Domain restriction: a token issued for Merchant A cannot be
detokenized by Merchant B, even if both call the TSP.
"""
domain = f"{merchant_id}:{channel}"
encrypted_pan = self._lookup_by_token(token, domain)
if not encrypted_pan:
raise ValueError("Token not found or domain mismatch")
return self._decrypt_pan(encrypted_pan)
def _generate_format_preserving_token(self, pan: str) -> str:
"""
Generate a token that:
1. Preserves the BIN (first 6 digits) for routing
2. Preserves the last 4 digits (for display: **** **** **** 0234)
3. Replaces middle digits with random values
4. Passes Luhn validation
"""
bin_prefix = pan[:6]
last_four = pan[-4:]
# Generate random middle digits
middle_length = len(pan) - 10 # Subtract BIN (6) + last 4
middle = ''.join([str(secrets.randbelow(10)) for _ in range(middle_length)])
# Construct token without check digit
token_partial = bin_prefix + middle + last_four[:-1]
# Calculate Luhn check digit
check = self._luhn_check_digit(token_partial)
token = bin_prefix + middle + last_four[:-1] + str(check)
# Verify uniqueness (collision is extremely unlikely but must be handled)
if self._token_exists(token):
return self._generate_format_preserving_token(pan)
return token
@staticmethod
def _luhn_check_digit(partial: str) -> int:
"""Calculate the Luhn check digit for a partial card number."""
digits = [int(d) for d in partial]
odd_sum = sum(digits[-1::-2])
even_sum = sum(sum(divmod(2 * d, 10)) for d in digits[-2::-2])
total = odd_sum + even_sum
return (10 - (total % 10)) % 10
def _encrypt_pan(self, pan: str) -> bytes:
"""Encrypt PAN for storage using AES-256-GCM."""
from cryptography.hazmat.primitives.ciphers.aead import AESGCM
import os
nonce = os.urandom(12)
aesgcm = AESGCM(self._key)
ct = aesgcm.encrypt(nonce, pan.encode(), b"pan-storage")
return nonce + ct
def _decrypt_pan(self, encrypted: bytes) -> str:
from cryptography.hazmat.primitives.ciphers.aead import AESGCM
nonce, ct = encrypted[:12], encrypted[12:]
aesgcm = AESGCM(self._key)
return aesgcm.decrypt(nonce, ct, b"pan-storage").decode()
def _lookup_by_pan(self, pan: str, domain: str) -> Optional[str]:
"""Look up an existing token by PAN hash + domain."""
pan_hash = hashlib.sha256(
(pan + domain).encode()
).hexdigest()
# Query DB by pan_hash (not the PAN itself)
return None # Placeholder
def _lookup_by_token(self, token: str, domain: str) -> Optional[bytes]:
"""Look up encrypted PAN by token + domain."""
return None # Placeholder
def _store_mapping(self, encrypted_pan: bytes, token: str, domain: str):
"""Store the PAN-token mapping."""
pass # Placeholder
def _token_exists(self, token: str) -> bool:
"""Check if a token is already in use."""
return False # Placeholder
Format-Preserving Encryption (FPE)
An alternative to vault-based tokenization is Format-Preserving Encryption (FPE), which uses a cryptographic algorithm to transform a PAN into a token of the same format. Unlike vault-based tokens, FPE tokens can be de-tokenized using the encryption key alone — no database lookup required.
The NIST-approved FPE modes are FF1 and FF3-1 (specified in NIST SP 800-38G):
# FF1 mode conceptual implementation
# In production, use a vetted library (pyffx, or HSM-native FPE)
def ff1_encrypt(key: bytes, tweak: bytes, plaintext: str, radix: int = 10) -> str:
"""
FF1 Format-Preserving Encryption (NIST SP 800-38G).
Encrypts a string of digits into another string of the same
length using the same alphabet (digits 0-9).
This is a Feistel network where the round function uses AES:
- Split input into left (A) and right (B) halves
- 10 rounds of: A, B = B, A + F(B, round, tweak)
- F uses AES-CBC to generate pseudorandom values in the right range
Parameters:
key: AES key (128, 192, or 256 bits)
tweak: Additional input that restricts the token domain
(use merchant_id + channel as tweak for domain restriction)
plaintext: The PAN digits to encrypt
radix: The alphabet size (10 for decimal digits)
"""
n = len(plaintext)
u = n // 2
v = n - u
# Convert string to integer representation
A = int(plaintext[:u])
B = int(plaintext[u:])
for round_num in range(10):
# Build the round input
# P = [version || method || round || ... || tweak || B]
# Q = derived from tweak + round + B
# Round function F uses AES-CBC-MAC
# F output is a number in [0, radix^m) where m = len(half)
F_output = _ff1_round_function(key, tweak, round_num, B, radix, v)
# Feistel step
C = (A + F_output) % (radix ** u)
A = B
B = C
# Swap u and v for odd rounds
u, v = v, u
# Reconstruct the ciphertext
return str(A).zfill(n // 2) + str(B).zfill(n - n // 2)
When to Use FPE vs Vault
| Aspect | Vault-Based | FPE |
|---|---|---|
| De-tokenization | Requires DB lookup | Key-based (no DB) |
| Token uniqueness | Guaranteed by DB constraint | Guaranteed by encryption bijectivity |
| Scalability | Limited by DB throughput | Limited by crypto throughput |
| Key compromise impact | Attacker needs both key AND DB | Attacker with key can de-tokenize all |
| PCI DSS | Vault is in scope | Key management is in scope |
| Offline de-tokenization | Not possible | Possible with key |
Most production payment systems use vault-based tokenization because key compromise in FPE is catastrophic — one key exposes every token ever generated. Vault-based systems offer defense in depth: even with the encryption key, the attacker still needs access to the vault database.
Network Tokenization
Visa Token Service (VTS) and Mastercard Digital Enablement Service (MDES) operate at the network level. They replace the PAN before it reaches the merchant, so the merchant never handles real card numbers:
Traditional flow:
Cardholder → PAN → Merchant → PAN → Acquirer → PAN → Network → PAN → Issuer
Network-tokenized flow:
Cardholder → PAN → [TSP] → Token → Merchant → Token → Acquirer → Token → Network
Network de-tokenizes internally → PAN → Issuer
Network tokens include domain restrictions: a token issued for Amazon’s e-commerce channel cannot be used at a physical terminal, and vice versa. The TSP enforces these restrictions during de-tokenization.
The shift toward network tokenization is driven by a compelling economic incentive: Visa and Mastercard report that network-tokenized transactions have 26% lower fraud rates than non-tokenized transactions. This translates to lower interchange rates for merchants — a direct financial reward for adopting tokenization.
@dataclass
class NetworkToken:
"""
Representation of a network-issued payment token.
"""
token_pan: str # The token value (looks like a PAN)
token_expiry: str # Token-specific expiry (may differ from card)
token_requestor_id: str # Identifies who requested the token
token_reference_id: str # TSP's internal reference
# Domain restrictions
allowed_merchant_ids: list[str]
allowed_channels: list[str] # "ecommerce", "contactless", "in_app"
allowed_countries: list[str] # ISO 3166-1 alpha-2
# Lifecycle
status: str # ACTIVE, SUSPENDED, DELETED
# Cryptogram support
supports_dynamic_cryptogram: bool # True for mobile wallets
def generate_payment_cryptogram(self, transaction_data: bytes) -> bytes:
"""
Generate a per-transaction cryptogram for this token.
Mobile wallets (Apple Pay, Google Pay) generate a unique
cryptogram for each transaction, making the token useless
even if intercepted — the cryptogram can't be replayed.
"""
if not self.supports_dynamic_cryptogram:
raise ValueError("This token does not support dynamic cryptograms")
# In production, this is computed inside the device's
# Secure Element using keys provisioned by the TSP
import hmac
import hashlib
# Simplified — actual cryptogram uses EMV session key derivation
return hmac.new(
b"token-session-key", # Would be derived per-transaction
transaction_data,
hashlib.sha256
).digest()[:8]
Tokenization doesn’t eliminate risk — it moves and concentrates it. The token vault becomes the highest-value target in the system. But concentrating risk is the point: you’d rather defend one hardened vault with HSMs, FIPS 140-2 Level 3 hardware, and 24/7 monitoring than try to secure PANs across every merchant’s system worldwide.