Skip to content

Conversation

khink
Copy link

@khink khink commented Aug 22, 2025

There are cases where the IndirectObject is not None, but d[0] will fail with TypeError: 'NoneType' object is not subscriptable in generic/_base.py, at return self._get_object_with_check()[key].

I previously reported this issue in #3211, but the fix for that issue didn't fix this one.

Unfortunately i still can't attach the PDF file that triggered this issue, because it contains personal information.
If someone has an idea how to create a PDF to test this with, i'd be happy to try.
I fully understand that a maintainer would not be keen on including fixes for rare bugs without a test.

More details on the error

The script i used to reproduce the error:

import glob

from pypdf import PdfWriter


def merge_pdfs():
    with PdfWriter() as merger:
        for pdf in glob.glob("*.pdf"):
            merger.append(pdf)
        merger.write("merged.pdf")

if __name__ == '__main__':

    merge_pdfs()

In the script directory i have a broken.pdf (the offending file i can't share here) and an empty.pdf.

This fails as below:

Object 6 0 not defined.
Object 6 0 not defined.
Overwriting cache for 0 6
Traceback (most recent call last):
  File "/home/kees/Projects/pypdf/check_merge.py", line 21, in <module>
    merge_pdfs()
    ~~~~~~~~~~^^
  File "/home/kees/Projects/pypdf/check_merge.py", line 16, in merge_pdfs
    merger.append(pdf)
    ~~~~~~~~~~~~~^^^^^
  File "/home/kees/Projects/pypdf/pypdf/_writer.py", line 2693, in append
    self.merge(
    ~~~~~~~~~~^
        None,
        ^^^^^
    ...<4 lines>...
        excluded_fields,
        ^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/kees/Projects/pypdf/pypdf/_writer.py", line 2851, in merge
    lst = self._insert_filtered_annotations(
        pag.original_page.get("/Annots", []), pag, srcpages, reader
    )
  File "/home/kees/Projects/pypdf/pypdf/_writer.py", line 3055, in _insert_filtered_annotations
    p = self._get_cloned_page(d[0], pages, reader)
                              ~^^^
  File "/home/kees/Projects/pypdf/pypdf/generic/_base.py", line 402, in __getitem__
    return self._get_object_with_check()[key]  # type: ignore
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^
TypeError: 'NoneType' object is not subscriptable

Setting a breakpoint before the error occurs gives:

(Pdb) d
IndirectObject(6, 0, 127177489623296)
(Pdb) d[0]
Object 6 0 not defined.
Overwriting cache for 0 6
*** TypeError: 'NoneType' object is not subscriptable

(To be fair, in this last example i cheated a bit because the variable name d clashes with the pdb command. So i renamed that.)

…ltered_annotations

There are cases where the IndirectObject is not None, but d[0] will fail with
"TypeError: 'NoneType' object is not subscriptable" in generic/_base.py,
in __getitem__ at `return self._get_object_with_check()[key]`
Copy link

codecov bot commented Aug 22, 2025

Codecov Report

❌ Patch coverage is 50.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.95%. Comparing base (bc318d7) to head (9a27b3f).

Files with missing lines Patch % Lines
pypdf/_writer.py 50.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3444      +/-   ##
==========================================
- Coverage   96.97%   96.95%   -0.03%     
==========================================
  Files          54       54              
  Lines        9337     9340       +3     
  Branches     1711     1711              
==========================================
+ Hits         9055     9056       +1     
- Misses        168      170       +2     
  Partials      114      114              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

try:
p = self._get_cloned_page(d[0], pages, reader)
except TypeError:
# There are cases where the IndirectObject is not None, but d[0] will fail:
Copy link
Collaborator

@stefan6419846 stefan6419846 Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You basically want to check for not is_null_or_none(d[0].get_object())? Having this explicitly is preferred over more costly exception handling.

@stefan6419846
Copy link
Collaborator

Thanks for the PR. Please have a look at the failing checks.

@stefan6419846 stefan6419846 added the needs-test A test should be added before this PR is merged. label Aug 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-test A test should be added before this PR is merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants