Skip to content

Conversation

@ndossche
Copy link
Member

@ndossche ndossche commented Nov 11, 2025

We would need to escape the attributes, but there's no builtin method that we can call in libxml2 to do so in a way consistent with the attribute escape rules and expat. The two escape functions that are exposed are xmlEncodeEntitiesReentrant and xmlEncodeSpecialChars and they use the internal xmlEscapeText function. However, we can't access the right flag and that function from outside of libxml2.

In fact, expat just repeats the input, while we reconstruct it. To fix the issue, and fix consistency with expat, we repeat the input as well. This works by seeking to the start and end of the tag and passing it to the default handler. This is fine for the parser because the parser used in ext/xml is always in non-progressive mode, so we have access to the entire input buffer. Since the grammar of XML does not allow '<' and '>' in start elements or inside self-closing elements, seeking works fine. A self-closing tag ends its event at the solidus. Expat emits one event: only a start tag default. The compat layer emits two events, we keep BC by keeping the emission of two events and replace the solidus with a '>'.

A nice side effect is that this PR reduces the amount of code in the compatibility layer nicely.

…pecial characters in attributes when passing data to callback

We would need to escape the attributes, but there's no builtin method
that we can call in libxml2 to do so in a way consistent with the
attribute escape rules and expat.
In fact, expat just repeats the input, while we reconstruct it.
To fix the issue, and fix consistency with expat, we repeat the input as
well. This works by seeking to the start and end of the tag and passing
it to the default handler. This is fine for the parser because the
parser used in ext/xml is always in non-progressive mode, so we have
access to the entire input buffer.
@ndossche ndossche merged commit 3cc36b0 into php:PHP-8.3 Nov 11, 2025
1 check passed
ndossche added a commit that referenced this pull request Nov 11, 2025
* PHP-8.3:
  Fix GH-20439: xml_set_default_handler() does not properly handle special characters in attributes when passing data to callback (#20453)
ndossche added a commit that referenced this pull request Nov 11, 2025
* PHP-8.4:
  Fix GH-20439: xml_set_default_handler() does not properly handle special characters in attributes when passing data to callback (#20453)
ndossche added a commit that referenced this pull request Nov 11, 2025
* PHP-8.5:
  Fix GH-20439: xml_set_default_handler() does not properly handle special characters in attributes when passing data to callback (#20453)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

xml_set_default_handler() does not properly handle special characters in attributes when passing data to callback

3 participants