Chilkat Uses PCRE2 (Perl Compatible Regular Expressions, version 2)
To understand the features, we can compare it with C#'s Regex.
PCRE2 (Perl Compatible Regular Expressions, version 2) and C#'s System.Text.RegularExpressions.Regex
engine both implement regular expressions, but they differ in features, behavior, and syntax compatibility. Here's a comparison:
Summary Comparison
Feature/Behavior | PCRE2 | C# Regex |
---|---|---|
Syntax Compatibility | Perl-compatible (very close to Perl) | Similar to Perl but not fully compatible |
Engine Design | Native C library | .NET-managed engine |
Unicode Support | Chilkat uses PCRE2 in UTF mode | Built-in full Unicode support |
Lookbehind | Fixed- and variable-width supported | Fixed-width only |
Atomic Groups ("(?>...)") | ✔ Supported | ✔ Supported |
Possessive Quantifiers ("++") | ✔ Supported | ✔ Supported |
Recursive Patterns | ✔ Supported ("(?R)", "(?1)", etc.) | ✗ Not supported |
Callouts / Hooks | ✔ (advanced feature) | ✗ Not available |
Regex Compilation | Compiled at runtime; optional JIT | Can be compiled to MSIL (with "RegexOptions.Compiled") |
Named Group Syntax | "(?<name>...)" or "(?'name'...)" | "(?<name>...)" only |
Conditional Expressions | ✔ Supported | ✔ Limited support |
Backtracking Control | Rich control ("(*SKIP)(*FAIL)", etc.) | ✗ Not available |
Examples of Differences
Variable-Length Lookbehind (PCRE2 only)
(?<=\w+)\d # Valid in PCRE2
C# throws an error: lookbehind assertion is not fixed length.
Recursive Matching (PCRE2 only)
\((?:[^()]++|(?R))*\) # Match balanced parentheses
C# does not support (?R)
or recursion.
️ Unicode Behavior
- PCRE2: Chilkat uses PCRE2 in UTF mode (specifically utf-8). PCRE2 UTF mode enables the regular expression engine to interpret input text as Unicode characters, rather than raw bytes or code units. This is essential for correctly handling internationalized text and matching characters beyond the basic ASCII range.
- C# Regex: Always Unicode-aware by default (using .NET strings).
When to Use Each
Use Case | Choose PCRE2 if... | Choose C# Regex if... |
---|---|---|
Need advanced parsing features | You require recursion, callouts, backtracking control | You work entirely within .NET and need good Unicode support |
Performance-sensitive matching | You're embedding regex in native C apps | You want compiled regex in managed .NET |
Regex portability with Perl | Regex patterns must match Perl's behavior | Regex patterns are .NET-specific |
Conclusion
- PCRE2 is more powerful and closer to Perl in terms of features.
- C# Regex is good for most tasks but lacks support for recursion, variable-width lookbehinds, and advanced control constructs.
- Porting complex PCRE2 regex to C# often requires simplification or rewriting. With Chilkat, you can use PCRE2 from C# or any other programming language supported by Chilkat.