Skip to main content

On This Page

Go's Regexp is Slow. So I Built My Own - up to 3000x Faster

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

Go’s Regexp is Slow. So I Built My Own - up to 3000x Faster

Andrey Kolkov built coregex, a drop-in replacement for Go’s regexp package, achieving up to 3000x faster performance on patterns like .*error.* through SIMD and multi-engine optimization.

Why This Matters

Go’s regexp package uses Thompson’s NFA exclusively, ensuring O(n) guarantees but sacrificing speed. It lacks SIMD acceleration, prefilters, and strategy selection, leading to 70-80% CPU usage on simple patterns. For example, matching .*error.* in a 250KB file takes 12.6ms with the standard library but just 4µs with coregex, a 3,154x speedup.

Key Insights

  • “3000x speedup on .*error.* patterns, 2025”
  • “SIMD prefiltering reduces CPU usage by 12x for byte searches”
  • “Multi-engine architecture selects optimal strategy per pattern”

Working Example

// Find byte 'e' in slice using AVX2 (32 bytes parallel)
TEXT ·memchr(SB), NOSPLIT, $0-40
MOVQ haystack+0(FP), DI
MOVQ len+8(FP), CX
MOVBQZX needle+24(FP), AX
VMOVD AX, X0
VPBROADCASTB X0, Y0
loop:
VMOVDQU (DI), Y1
VPCMPEQB Y0, Y1, Y2
VPMOVMSKB Y2, AX
TESTL AX, AX
JNZ found
ADD haystack, 32
JMP loop
found:
TZCNT RAX, RAX
ADD result, haystack
func selectStrategy(pattern *syntax.Regexp) Strategy {
	prefix := extractPrefix(pattern)
	suffix := extractSuffix(pattern)
	inner := extractInner(pattern)
	if len(suffix) >= 3 {
		return UseReverseSuffix // 1000x for .*\.txt
	}
	if len(inner) >= 3 {
		return UseReverseInner // 3000x for .*error.*
	}
	if len(prefix) >= 3 {
		return UseDFA // 5-10x for prefix.*
	}
	return UseAdaptive
}

Practical Applications

  • Use Case: Log parsing with .*error.* patterns using coregex instead of regexp
  • Pitfall: Using backreferences (unsupported in coregex) for patterns requiring exponential backtracking

References:


Continue reading

Next article

Model Drift Detection: Real-Time Monitoring for AI Systems

Related Content