Skip to content

RegExp memory efficiency: built-in negated character classes could use NegatedSet #5633

@dilijev

Description

@dilijev

We are currently manually negating the input set and encoding the full negated set, rather than checking that a set does NOT include a character (for which we already have logic, except for the built-in character sets).

This has a minor to moderate unnecessary memory cost. We should be using whichever set (negated or natural) which contains the fewest characters, which minimizes memory cost -- which also happens to include selecting an ASCII-only set, if possible, which saves lots of memory by avoiding the creation of the non-ASCII external data structure for a character set.

This means pushing the set negation check into the bytecode (for which we already have opcodes).

See #5592 (comment)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions