-
Notifications
You must be signed in to change notification settings - Fork 50
Open
Description
I want to gather up many areas of near-future work that we've been clarifying through the proposal reviews.
Loose categorization:
Language and integration
- Ability to use a
String-backed, CaseIterable enum as a regex component - Define errors types for compilation and type mismatches
- Callouts from literals
- A Regex-backed enum that will construct a
ChoiceOfall cases in order
API
- Ability to
mapover a regex, perhaps per-capture, to supply post-processing transforms at regex declaration time - A modifier on a regex to convert it to matches-anywhere semantics
- E.g.
regex.matchingAnywhere => Regex { /.*?/ ; regex ; /.*/ }. - But we'd preserve the matched range, i.e. reset start/end position
- E.g.
- Character alignment queries
- API for whether start/end is
Character-aligned for whole match and each capture
- API for whether start/end is
- API to query options (e.g. is this case insensitive?)
- API for
(?n), could be nice to strip out captures you don't care about, especially for type erased regexes.- compilation error if there are back-references or it if changes the semantics of the program
Algorithms
- Add a
replace(_:withTemplate:)method that recognizes$1or\1placeholders - A separator-preserving split variant
- Suffix / from-the-end operations (trim etc)
- Customize search
String and Unicode
- Add unsupported Unicode properties to
Unicode.Propertiesand support in regexes - Add
Unicode.AllScalarsas a public type (semi-tangential) - Add
var Substring.range: Range<String.Index>to simplify getting the range of a capture group - Inits for making a NFC string from UTF-8
String.lines()andString.words()- Add option for canonical equivalence in scalar-semantic mode
Dynamic Regex API
- Add a capture-description API to all regexes
- some RAC of capture, which has a type and optionality
- Missing match conversions
Regex<T>.Match.init?(_:ARO)Regex<T>.Match.init?(_:Regex<ARO>.Match)
Builders
- A high-level helper for separated/quoted repetitions, e.g
Repeat(separator: \.whitespace) { ... } - A helper for repeated matching lookahead and negative lookahead, e.g.
Repeat(while:)Repeat(whileNot:)Until(negLookaheadCondition) { ... }
- A
func compile() throwsto explicitly trigger compilation and get errors, such as quantifying the unquantifiable- This is useful when composing regexes together to check the final result instead of trapping at run time.
- Default
Referencecapture type toSubstring.self
Engine
- Engine limiters, low-level backtracking control and timeouts
- Provide a way to access all values of a repeated capture (e.g. subscribe)
- Conditionals
(?(x)...)(requires updated parsing) - Quoted string inside custom character classes (e.g.
[a-z\q{ch}])
Parser
- Support for duplicate group names through
(?J)(requires figuring out typed captures) - Support for branch reset alternations
(?|)(parsing is implemented, but requires figuring out typed captures) - Parsing of conditionals
(?(x)...)in accordance to what is in the syntax proposal (we currently parse the condition differently)- Including interpolation conditions
(?(?{...})) - Conditional conditions don't capture on their own, only for child nodes e.g
(?((x))x). .NET also forbids named capture conditions, we should ban that. - Stop parsing named reference conditions for
(?(x)...) - Don't allow
(?(DEFINE))to have a false branch
- Including interpolation conditions
- Support for regex property values
\p{key=/regex/} - Support for transform matching e.g
\p{toNFKC_Casefold=@toNFKC@} - Support for alternative character property separators?
- UTS#18 suggests
key≠value,key!=value - Perl allows
key:value
- UTS#18 suggests
- Support
a**syntax as explicitly eager quantification- I.e. it's not affected by API to change default quantification kind, (probably) not affected by
(?U)
- I.e. it's not affected by API to change default quantification kind, (probably) not affected by
stephentyrone, lin72h, fwgreen, natecook1000, Azoy and 1 more
Metadata
Metadata
Assignees
Labels
No labels