Skip to content

Conversation

@khwilliamson
Copy link
Contributor

People have avoided looking at this routine for a very long time. As a result it hasn't known until recently about lexical variables, for example. This p.r. rectifies that. The new enhancements committed today to scan_ident are now used by this function to examine potential identifiers better.

If 'strict vars' is in effect, and a variable is not found, it now assumes this is an error, or a character class was meant. This was done for the variable in effect at the time of the call to this function (external to it) recently via 3450d19. But is here extended to potential variables it encounters in its parse.

  • This set of changes does not require a perldelta entry.

This swaps the order of handling a conditional
The previous commit allows this one to outdent things.
Future commits will want to use this space.
Using \w here missed the possibility of lots of other syntaxes of
variables, like $::foo or ${foo}, that scan_ident looks for.  So call
scan_ident without first filtering what it looks at.

It is more convenient to then swap the code block that handles
punctuation variables with the code block that handles other length 1
identifiers
Outdent, as the previous commit removed a block; and one block we indent
for a future new block
I don't know why the code here uses a buffer 4 times bigger than what is
the maximum an identifier can be.  One might think this is because
Unicode characters can be 4 bytes long, but this number has been in
effect since before Perl knew about Unicode; or maybe it was to avoid
a potential buffer overflow.

But in any event, I'm pretty certain there is no need for that now.  The
buffer is now allocated to handle the maximum Unicode-needed size, and
buffer overflow is checked for and avoided.
It returns more than a boolean; save the actual return for use in future
commits
If scan_ident indicates that $ @ & are followed by nothing that looks
like an identifier, then this isn't an expression.  It has to be a
character class or an error.

Almost anything is an identifier when 'use utf8' isn't in effect;  when
it is, non ASCII has to be an Identifier Start character following these
This function was totally unaware of the possibility of these.
The code attempted to do this, but was written before lexical variables
existed, and had never been updated.
@khwilliamson khwilliamson added the Use merge commit Don't merge this p.r. from github It contains multiple related commits. Instructions in perlgit label Oct 28, 2025
case '\\':
if (s[1]) {
if (memCHRs("wds]", s[1])) {
if (s[1] == '\0') {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally: great change, maybe even add break into then branches and get rid of else if branches

Curious question: common pattern is condition if (s[1]), so why not use if (! s[1] as well

return FALSE;
}
return TRUE;
return ! regcurly(s, e, NULL);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great change. Former pattern appears to be used in other places, marking such cleanup as my next TODO once my current PR is merged.

* next character after it is a closing bracket, it makes it
* quite likely to be that, and hence a subscript. If it is
* something else, more mildly a subscript */
if (/*{*/ memCHRs("])} =", tmpbuf[2]))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accidentally deleted previous comment :(

little bit confused: this happens in branch when strlen of tmpbuf is already checked to be 1, so tmpbuf[2] should \0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Use merge commit Don't merge this p.r. from github It contains multiple related commits. Instructions in perlgit

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants