- 
                Notifications
    You must be signed in to change notification settings 
- Fork 602
toke.c: S_intuit_more: More effectively use S_scan_ident #23880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: blead
Are you sure you want to change the base?
Conversation
This swaps the order of handling a conditional
The previous commit allows this one to outdent things.
Future commits will want to use this space.
Using \w here missed the possibility of lots of other syntaxes of
variables, like $::foo or ${foo}, that scan_ident looks for.  So call
scan_ident without first filtering what it looks at.
It is more convenient to then swap the code block that handles
punctuation variables with the code block that handles other length 1
identifiers
    Outdent, as the previous commit removed a block; and one block we indent for a future new block
I don't know why the code here uses a buffer 4 times bigger than what is the maximum an identifier can be. One might think this is because Unicode characters can be 4 bytes long, but this number has been in effect since before Perl knew about Unicode; or maybe it was to avoid a potential buffer overflow. But in any event, I'm pretty certain there is no need for that now. The buffer is now allocated to handle the maximum Unicode-needed size, and buffer overflow is checked for and avoided.
It returns more than a boolean; save the actual return for use in future commits
If scan_ident indicates that $ @ & are followed by nothing that looks like an identifier, then this isn't an expression. It has to be a character class or an error. Almost anything is an identifier when 'use utf8' isn't in effect; when it is, non ASCII has to be an Identifier Start character following these
This function was totally unaware of the possibility of these.
The code attempted to do this, but was written before lexical variables existed, and had never been updated.
| case '\\': | ||
| if (s[1]) { | ||
| if (memCHRs("wds]", s[1])) { | ||
| if (s[1] == '\0') { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally: great change, maybe even add break into then branches and get rid of else if branches
Curious question: common pattern is condition if (s[1]), so why not use if (! s[1] as well
| return FALSE; | ||
| } | ||
| return TRUE; | ||
| return ! regcurly(s, e, NULL); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great change. Former pattern appears to be used in other places, marking such cleanup as my next TODO once my current PR is merged.
| * next character after it is a closing bracket, it makes it | ||
| * quite likely to be that, and hence a subscript. If it is | ||
| * something else, more mildly a subscript */ | ||
| if (/*{*/ memCHRs("])} =", tmpbuf[2])) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accidentally deleted previous comment :(
little bit confused: this happens in branch when strlen of tmpbuf is already checked to be 1, so tmpbuf[2] should \0
People have avoided looking at this routine for a very long time. As a result it hasn't known until recently about lexical variables, for example. This p.r. rectifies that. The new enhancements committed today to
scan_identare now used by this function to examine potential identifiers better.If 'strict vars' is in effect, and a variable is not found, it now assumes this is an error, or a character class was meant. This was done for the variable in effect at the time of the call to this function (external to it) recently via 3450d19. But is here extended to potential variables it encounters in its parse.