RegexLib — Regular Expression Library
Comprehensive regular expression support for Plan9Basic. Pattern matching, searching, replacing, splitting, and capture group extraction. Built on Delphi’s System.RegularExpressions unit. 16 base functions (28 with option overloads).
| Category | Count | Description |
|---|---|---|
| Validation | 2 | regex_isvalid, regex_error$ |
| Matching | 2 | regex_match, regex_matchfull (+ option overloads) |
| Find / Search | 3 | regex_find$, regex_findpos, regex_findlen (+ option overloads) |
| Find All | 2 | regex_findall#, regex_count (+ option overloads) |
| Replace | 2 | regex_replace$, regex_replacefirst$ (+ option overloads) |
| Split | 1 | regex_split# (+ option overload) |
| Groups / Capture | 3 | regex_groups#, regex_group$, regex_groupcount |
| Utility | 1 | regex_escape$ |
| Feature | Description |
|---|---|
| Engine | Delphi System.RegularExpressions (PCRE-compatible) |
| Results | TStringList pointers (from StrListLib) with 0-based indexing |
| Positions | regex_findpos() returns 1-based positions |
| Backreferences | $1, $2, etc. in replacement strings |
| Error Handling | Invalid patterns return 0 or empty strings — no exceptions |
| Memory | All TStringList results are garbage-collected |
Regex Options
Options are numeric values combined with +. Pass as the last parameter to functions that accept an options argument. Plan9Basic does not have predefined constants, so define your own or use numeric values directly.
| Option | Value | Description |
|---|---|---|
| REGEX_IGNORECASE | 1 | Case-insensitive matching |
| REGEX_MULTILINE | 2 | ^ and $ match line boundaries (not just start/end of string) |
| REGEX_SINGLELINE | 4 | . matches newlines (normally it doesn’t) |
| REGEX_EXPLICITCAPTURE | 8 | Only named or explicitly numbered groups capture |
' Define your own constants let REGEX_IGNORECASE = 1 let REGEX_MULTILINE = 2 let REGEX_SINGLELINE = 4 let REGEX_EXPLICITCAPTURE = 8 ' Combine with + opts = REGEX_IGNORECASE + REGEX_MULTILINE found = regex_match("[a-z]+", text$, opts) ' Or use numeric values directly found = regex_match("[a-z]+", text$, 3) ' 1 + 2 = 3
Validation Functions
| Function | Signature | Description |
|---|---|---|
regex_isvalid(pattern$) | regex_isvalid@$ | Check if pattern is valid regex (1=valid, 0=invalid) |
regex_error$(pattern$) | regex_error$@$ | Error message for invalid pattern (empty string if valid) |
' Validate before using a pattern if regex_isvalid("[a-z]+") = 1 then println "Valid pattern" endif ' Get error description err$ = regex_error$("[invalid") if err$ <> "" then println "Error: "; err$ endif
ⓘ Note: Always validate user-supplied patterns with
regex_isvalid() before passing them to other regex functions.Matching Functions
| Function | Signature | Description |
|---|---|---|
regex_match(pat$, text$) | regex_match@$$ | Match pattern anywhere in text (1/0) |
regex_match(pat$, text$, opts) | regex_match@$$n | Match with options |
regex_matchfull(pat$, text$) | regex_matchfull@$$ | Match pattern against entire text (1/0) |
regex_matchfull(pat$, text$, opts) | regex_matchfull@$$n | Full match with options |
' Partial match - pattern found anywhere in string if regex_match("\d+", "abc123def") = 1 then println "Contains numbers" endif ' Case-insensitive (option 1) if regex_match("hello", "HELLO WORLD", 1) = 1 then println "Found hello" endif ' Full match - entire string must match email$ = "user@example.com" if regex_matchfull("^[\w.-]+@[\w.-]+\.\w+$", email$) = 1 then println "Valid email format" endif
ⓘ Note:
regex_match() succeeds if the pattern is found anywhere in the text. regex_matchfull() requires the entire text to match the pattern.Find / Search Functions
| Function | Signature | Description |
|---|---|---|
regex_find$(pat$, text$) | regex_find$@$$ | First match as string (empty if none) |
regex_find$(pat$, text$, opts) | regex_find$@$$n | First match with options |
regex_findpos(pat$, text$) | regex_findpos@$$ | 1-based position of first match (0 if none) |
regex_findpos(pat$, text$, opts) | regex_findpos@$$n | Find position with options |
regex_findlen(pat$, text$) | regex_findlen@$$ | Length of first match (0 if none) |
regex_findlen(pat$, text$, opts) | regex_findlen@$$n | Find length with options |
' Find first number in text num$ = regex_find$("\d+", "Price: $42.99") println "Found: "; num$ ' 42 ' Position of first digit (1-based!) pos = regex_findpos("\d", "abc123") println "First digit at: "; pos ' 4 ' Length of first match ln = regex_findlen("\d+", "abc123def") println "Match length: "; ln ' 3
⚠ Warning:
regex_findpos() returns 1-based positions (consistent with other Plan9Basic string functions). Returns 0 when no match is found.Find All Functions
| Function | Signature | Description |
|---|---|---|
regex_findall#(pat$, text$) | regex_findall#@$$ | All matches as TStringList (0-based) |
regex_findall#(pat$, text$, opts) | regex_findall#@$$n | Find all with options |
regex_count(pat$, text$) | regex_count@$$ | Number of matches |
regex_count(pat$, text$, opts) | regex_count@$$n | Count with options |
' Find all words matches# = regex_findall#("\w+", "Hello World Test") n = strings_count(matches#) for i = 0 to n - 1 println strings_strings$(matches#, i) next ' Hello / World / Test ' Count vowels (case-insensitive) cnt = regex_count("[aeiou]", "Hello World", 1) println "Vowel count: "; cnt ' 3
ⓘ Note:
regex_findall#() returns a TStringList pointer. Use strings_count() and strings_strings$() from StrListLib to access individual results (0-based indexing).Replace Functions
| Function | Signature | Description |
|---|---|---|
regex_replace$(pat$, text$, repl$) | regex_replace$@$$$ | Replace all matches |
regex_replace$(pat$, text$, repl$, opts) | regex_replace$@$$$$n | Replace all with options |
regex_replacefirst$(pat$, text$, repl$) | regex_replacefirst$@$$$ | Replace first match only |
regex_replacefirst$(pat$, text$, repl$, opts) | regex_replacefirst$@$$$$n | Replace first with options |
' Replace all digits with X result$ = regex_replace$("\d", "abc123def", "X") println result$ ' abcXXXdef ' Swap first and last name using capture groups name$ = "John Smith" result$ = regex_replace$("(\w+) (\w+)", name$, "$2, $1") println result$ ' Smith, John ' Replace only the first match result$ = regex_replacefirst$("\d", "a1b2c3", "X") println result$ ' aXb2c3
ⓘ Note: Use
$1, $2, etc. in the replacement string to reference capture groups from the pattern.Split Functions
| Function | Signature | Description |
|---|---|---|
regex_split#(pat$, text$) | regex_split#@$$ | Split string by pattern (returns TStringList, 0-based) |
regex_split#(pat$, text$, opts) | regex_split#@$$n | Split with options |
' Split by multiple delimiters (comma, semicolon, or whitespace) parts# = regex_split#("[,;\s]+", "apple,banana;cherry orange") for i = 0 to strings_count(parts#) - 1 println strings_strings$(parts#, i) next ' apple / banana / cherry / orange
Group / Capture Functions
| Function | Signature | Description |
|---|---|---|
regex_groups#(pat$, text$) | regex_groups#@$$ | All capture groups from first match (TStringList, 0=full match) |
regex_groups#(pat$, text$, opts) | regex_groups#@$$n | Groups with options |
regex_group$(pat$, text$, index) | regex_group$@$$n | Specific capture group (0=full match, 1+=groups) |
regex_group$(pat$, text$, index, opts) | regex_group$@$$nn | Group with options |
regex_groupcount(pat$, text$) | regex_groupcount@$$ | Number of groups including full match |
' Parse date: capture year, month, day groups# = regex_groups#("(\d{4})-(\d{2})-(\d{2})", "Date: 2025-01-03") if strings_count(groups#) > 0 then println "Full: "; strings_strings$(groups#, 0) ' 2025-01-03 println "Year: "; strings_strings$(groups#, 1) ' 2025 println "Month: "; strings_strings$(groups#, 2) ' 01 println "Day: "; strings_strings$(groups#, 3) ' 03 endif ' Extract domain from email domain$ = regex_group$("@([\w.-]+)", "user@example.com", 1) println "Domain: "; domain$ ' example.com ' Count groups cnt = regex_groupcount("(\w+)-(\w+)-(\w+)", "abc-def-ghi") println "Groups: "; cnt ' 4 (full match + 3 captures)
ⓘ Note: Group index 0 is always the full match. Numbered capture groups start at index 1.
Utility Functions
| Function | Signature | Description |
|---|---|---|
regex_escape$(text$) | regex_escape$@$ | Escape special regex characters for literal matching |
' Make user input safe for regex userInput$ = "What is (2+2)?" safePattern$ = regex_escape$(userInput$) ' safePattern$ = "What is \(2\+2\)\?" if regex_match(safePattern$, userInput$) = 1 then println "Found exact text" endif
Complete Examples
Email Validator
' Email validation program function validateEmail$(email$) local pattern$, result$ pattern$ = "^[\w.-]+@[\w.-]+\.[a-zA-Z]{2,}$" if regex_matchfull(pattern$, email$) = 1 then result$ = "Valid" else result$ = "Invalid" endif return result$ endfunction emails$ = "user@example.com,invalid@,test@site.org,bad email" parts# = regex_split#(",", emails$) for i = 0 to strings_count(parts#) - 1 email$ = strings_strings$(parts#, i) println email$; " -> "; validateEmail$(email$) next
Log Parser
' Parse Apache-style log entries logLine$ = "192.168.1.1 - - [03/Jan/2025:10:15:30 +0000] \"GET /index.html HTTP/1.1\" 200 1234" pattern$ = "^(\d+\.\d+\.\d+\.\d+).*\[([^\]]+)\].*\"(\w+) ([^ ]+).*\" (\d+) (\d+)" groups# = regex_groups#(pattern$, logLine$) if strings_count(groups#) >= 7 then println "IP: "; strings_strings$(groups#, 1) println "Date: "; strings_strings$(groups#, 2) println "Method: "; strings_strings$(groups#, 3) println "Path: "; strings_strings$(groups#, 4) println "Status: "; strings_strings$(groups#, 5) println "Size: "; strings_strings$(groups#, 6) else println "Failed to parse" endif
Text Sanitizer
' Sanitize user input: strip HTML, normalize whitespace function sanitize$(text$) local result$ result$ = regex_replace$("<[^>]+>", text$, "") result$ = regex_replace$("\s+", result$, " ") result$ = regex_replace$("^\s+|\s+$", result$, "") return result$ endfunction dirty$ = " <b>Hello</b> <script>alert('xss')</script> World! " clean$ = sanitize$(dirty$) println "Original: ["; dirty$; "]" println "Cleaned: ["; clean$; "]"
Phone Number Formatter
' Format phone numbers consistently function formatPhone$(phone$) local digits$, area$, prefix$, line$, result$ digits$ = regex_replace$("[^\d]", phone$, "") if len(digits$) = 10 then area$ = mid$(digits$, 1, 3) prefix$ = mid$(digits$, 4, 3) line$ = mid$(digits$, 7, 4) result$ = "(" + area$ + ") " + prefix$ + "-" + line$ else if len(digits$) = 11 and left$(digits$, 1) = "1" then area$ = mid$(digits$, 2, 3) prefix$ = mid$(digits$, 5, 3) line$ = mid$(digits$, 8, 4) result$ = "+1 (" + area$ + ") " + prefix$ + "-" + line$ else result$ = phone$ end if return result$ end function println formatPhone$("5551234567") println formatPhone$("(555) 123-4567") println formatPhone$("555.123.4567") println formatPhone$("1-555-123-4567")
Common Regex Patterns
| Pattern | Description | Example Match |
|---|---|---|
\d+ | One or more digits | 123 |
\w+ | One or more word characters | hello_123 |
[a-zA-Z]+ | One or more letters | Hello |
^\s*$ | Empty or whitespace-only line | |
\b\w+\b | Whole word | word |
[^,]+ | Anything except comma | abc def |
\S+@\S+\.\S+ | Simple email pattern | a@b.com |
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} | IPv4 address | 192.168.1.1 |
#[0-9a-fA-F]{6} | Hex color code | #FF5733 |
(\w+)\s(\w+) | Two words (capture each) | John Smith |
^https?:// | URL protocol prefix | https:// |
Notes & Error Handling
| Topic | Details |
|---|---|
| 0-based results | TStringList results (regex_findall#, regex_split#, regex_groups#) use 0-based indexing |
| 1-based positions | regex_findpos() returns 1-based positions (like other Plan9Basic string functions) |
| Backreferences | Use $1, $2, etc. in replacement strings to reference capture groups |
| Group index 0 | Group 0 is always the full match; numbered captures start at 1 |
| Memory | All TStringList results are managed by the garbage collector automatically |
| Error handling | Invalid patterns return 0 or empty strings — no exceptions raised to BASIC |
| Performance | For repeated matching, compile complex patterns once and reuse |
| StrListLib | Results use StrListLib functions: strings_count(), strings_strings$() |
Quick Reference
All 28 Registered Signatures
| Function | Signature | Description |
|---|---|---|
regex_isvalid(pat$) | regex_isvalid@$ | Pattern valid? (1/0) |
regex_error$(pat$) | regex_error$@$ | Error message for invalid pattern |
regex_match(pat$, text$) | regex_match@$$ | Match anywhere (1/0) |
regex_match(pat$, text$, opts) | regex_match@$$n | Match with options |
regex_matchfull(pat$, text$) | regex_matchfull@$$ | Match entire text (1/0) |
regex_matchfull(pat$, text$, opts) | regex_matchfull@$$n | Full match with options |
regex_find$(pat$, text$) | regex_find$@$$ | First match string |
regex_find$(pat$, text$, opts) | regex_find$@$$n | First match with options |
regex_findpos(pat$, text$) | regex_findpos@$$ | Position of first match (1-based) |
regex_findpos(pat$, text$, opts) | regex_findpos@$$n | Find position with options |
regex_findlen(pat$, text$) | regex_findlen@$$ | Length of first match |
regex_findlen(pat$, text$, opts) | regex_findlen@$$n | Find length with options |
regex_findall#(pat$, text$) | regex_findall#@$$ | All matches (TStringList) |
regex_findall#(pat$, text$, opts) | regex_findall#@$$n | Find all with options |
regex_count(pat$, text$) | regex_count@$$ | Number of matches |
regex_count(pat$, text$, opts) | regex_count@$$n | Count with options |
regex_replace$(pat$, text$, repl$) | regex_replace$@$$$ | Replace all matches |
regex_replace$(pat$, text$, repl$, opts) | regex_replace$@$$$$n | Replace all with options |
regex_replacefirst$(pat$, text$, repl$) | regex_replacefirst$@$$$ | Replace first match |
regex_replacefirst$(pat$, text$, repl$, opts) | regex_replacefirst$@$$$$n | Replace first with options |
regex_split#(pat$, text$) | regex_split#@$$ | Split by pattern (TStringList) |
regex_split#(pat$, text$, opts) | regex_split#@$$n | Split with options |
regex_groups#(pat$, text$) | regex_groups#@$$ | All capture groups (TStringList) |
regex_groups#(pat$, text$, opts) | regex_groups#@$$n | Groups with options |
regex_group$(pat$, text$, idx) | regex_group$@$$n | Specific capture group |
regex_group$(pat$, text$, idx, opts) | regex_group$@$$nn | Group with options |
regex_groupcount(pat$, text$) | regex_groupcount@$$ | Number of groups |
regex_escape$(text$) | regex_escape$@$ | Escape special regex chars |
16 base functions — 28 registered signatures (with option overloads) across 8 categories.