Chilkat Uses PCRE2 (Perl Compatible Regular Expressions, version 2)

To understand the features, we can compare it with C#'s Regex.

PCRE2 (Perl Compatible Regular Expressions, version 2) and C#'s System.Text.RegularExpressions.Regex engine both implement regular expressions, but they differ in features, behavior, and syntax compatibility. Here's a comparison:


Summary Comparison

Feature/Behavior PCRE2 C# Regex
Syntax Compatibility Perl-compatible (very close to Perl) Similar to Perl but not fully compatible
Engine Design Native C library .NET-managed engine
Unicode Support Chilkat uses PCRE2 in UTF mode Built-in full Unicode support
Lookbehind Fixed- and variable-width supported Fixed-width only
Atomic Groups ("(?>...)") ✔ Supported ✔ Supported
Possessive Quantifiers ("++") ✔ Supported ✔ Supported
Recursive Patterns ✔ Supported ("(?R)", "(?1)", etc.) ✗ Not supported
Callouts / Hooks ✔ (advanced feature) ✗ Not available
Regex Compilation Compiled at runtime; optional JIT Can be compiled to MSIL (with "RegexOptions.Compiled")
Named Group Syntax "(?<name>...)" or "(?'name'...)" "(?<name>...)" only
Conditional Expressions ✔ Supported ✔ Limited support
Backtracking Control Rich control ("(*SKIP)(*FAIL)", etc.) ✗ Not available

Examples of Differences

Variable-Length Lookbehind (PCRE2 only)
(?<=\w+)\d   # Valid in PCRE2

C# throws an error: lookbehind assertion is not fixed length.


Recursive Matching (PCRE2 only)
\((?:[^()]++|(?R))*\)  # Match balanced parentheses

C# does not support (?R) or recursion.


Unicode Behavior
  • PCRE2: Chilkat uses PCRE2 in UTF mode (specifically utf-8). PCRE2 UTF mode enables the regular expression engine to interpret input text as Unicode characters, rather than raw bytes or code units. This is essential for correctly handling internationalized text and matching characters beyond the basic ASCII range.
  • C# Regex: Always Unicode-aware by default (using .NET strings).

When to Use Each

Use Case Choose PCRE2 if... Choose C# Regex if...
Need advanced parsing features You require recursion, callouts, backtracking control You work entirely within .NET and need good Unicode support
Performance-sensitive matching You're embedding regex in native C apps You want compiled regex in managed .NET
Regex portability with Perl Regex patterns must match Perl's behavior Regex patterns are .NET-specific

Conclusion

  • PCRE2 is more powerful and closer to Perl in terms of features.
  • C# Regex is good for most tasks but lacks support for recursion, variable-width lookbehinds, and advanced control constructs.
  • Porting complex PCRE2 regex to C# often requires simplification or rewriting. With Chilkat, you can use PCRE2 from C# or any other programming language supported by Chilkat.