ENH: Enhance XMP metadata handling with creation and setter methods #3410

Arya-A-Nair · 2025-07-30T10:25:41Z

- Fix missing Dict and List imports in typing - All XMP setter methods are now working correctly - All tests pass except one unrelated remote file test - XmpInformation.create() and all setter methods fully functional - Code follows existing codebase patterns and style

codecov · 2025-07-31T11:32:58Z

Codecov Report

❌ Patch coverage is 99.42363% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 97.04%. Comparing base (bc318d7) to head (8cabff0).

Files with missing lines	Patch %	Lines
pypdf/xmp.py	99.42%	0 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3410      +/-   ##
==========================================
+ Coverage   96.97%   97.04%   +0.06%     
==========================================
  Files          54       54              
  Lines        9337     9567     +230     
  Branches     1711     1739      +28     
==========================================
+ Hits         9055     9284     +229     
  Misses        168      168              
- Partials      114      115       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Add tests for ownerDocument None error conditions (lines 459-465, 484, 506, 535, 564, 597) - Add tests for description element creation paths - Add tests for attribute handling and edge cases - Improve test coverage from 95% to 97% - All XMP functionality thoroughly tested with error conditions and edge cases

stefan6419846 · 2025-07-31T11:56:44Z

Thanks for the PR. Besides the coverage: Could we please have proper properties instead of a read-only property and a setter? This is not really pythonic.

Arya-A-Nair · 2025-07-31T12:03:02Z

@stefan6419846 yeah sure, I understood my mistake. I'll refactor it and ask for review, once done

Arya-A-Nair · 2025-07-31T12:56:23Z

@stefan6419846 you can take a look at it now

stefan6419846 · 2025-08-06T19:45:32Z

Sorry for the delays - I did not yet have the time to properly review these larger changes again. I hope that I manage to look into this in the next days.

Arya-A-Nair · 2025-08-10T07:25:59Z

Hey @stefan6419846 can i contribute to some other issue till that time?

stefan6419846 · 2025-08-11T06:18:32Z

You can of course look into another issue if you start the corresponding changes from the main code again.

pypdf/xmp.py

docs/user/metadata.md

pypdf/xmp.py

stefan6419846 · 2025-08-13T12:06:37Z

pypdf/xmp.py

+            del self.cache[namespace][name]
+        desc = self._get_or_create_description()
+
+        existing_elements = list(desc.getElementsByTagNameNS(namespace, name))


Why do we need the list conversion and a separate variable?

I didnt quite understand this, Can you clarify on it?

Will there be any issues if we use the following direct code?

for elem in desc.getElementsByTagNameNS(namespace, name): desc.removeChild(elem)

Yes, there would be issues with the direct approach. getElementsByTagNameNS() returns a live NodeList that automatically updates when the DOM changes. If we iterate directly over it while calling removeChild(), the NodeList shrinks and re-indexes during iteration, causing elements to be skipped. The list() conversion creates a static snapshot of the elements to remove, avoiding this iteration-while-modifying problem.

If you think we are fine with this. Then I will go ahead and change it

pypdf/xmp.py

tests/test_xmp.py

docs/user/metadata.md

…adata fields

…edicated methods for improved clarity and maintainability

…ocessing

…pInformation class

… improved maintainability

…te for consistency and clarity; update assertions in the test for improved accuracy

…treamline cache management for metadata values

…ith PDF 2.0 XMP recommendations and improve XML serialization

…MP values while preserving existing data

…erify incremental updates of XMP values without overwriting

pypdf/xmp.py

… NAMESPACE_PREFIX_MAP to _NAMESPACE_PREFIX_MAP and MINIMAL_XMP to _MINIMAL_XMP

…ate documentation to reflect new approach for incrementally updating XMP metadata fields directly using standard data structures

…pdf into feature/xmp-create-method

Arya-A-Nair · 2025-08-20T18:13:23Z

Hey @stefan6419846 can you take a look at this PR?

stefan6419846 · 2025-08-20T20:53:09Z

I am currently on tour and will not be able to do look into larger PRs until September.

ENH: Enhance XMP metadata handling with creation and setter methods

7129e69

Arya-A-Nair force-pushed the feature/xmp-create-method branch from 9183c12 to 7129e69 Compare July 30, 2025 10:26

Arya-A-Nair added 2 commits July 31, 2025 16:26

Merge branch 'main' into feature/xmp-create-method

cfef29e

Arya-A-Nair marked this pull request as draft July 31, 2025 12:20

Add XmpInformation.create() method with Pythonic properties

1abdf04

Arya-A-Nair force-pushed the feature/xmp-create-method branch from d66f82b to 1abdf04 Compare July 31, 2025 12:32

Add tests for XMP error handling and attribute removal

fc0386d

Arya-A-Nair marked this pull request as ready for review July 31, 2025 12:55

Arya-A-Nair added 2 commits August 1, 2025 09:31

Merge branch 'main' into feature/xmp-create-method

ec70685

Merge branch 'main' into feature/xmp-create-method

574b45b

Merge branch 'main' into feature/xmp-create-method

2247b45