Skip to content

Two Catalog objects after cloning #1923

@KanorUbu

Description

@KanorUbu

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-5.19.0-45-generic-x86_64-with-glibc2.35

$ python -c "import pypdf;print(pypdf.__version__)"
3.11.1

Code + PDF

Create a PdfReader and PdfWriter

 from pypdf import PdfReader, PdfWriter                                                                                                                                                                                                                                            
 reader = PdfReader("ial_inscriptible.pdf")                                                                                                                                                                                                              
 writer = PdfWriter()                                                                                                                                                                                                                                                              
 print(writer._objects )

An empty writer already has objects

[{'/Type': '/Pages', '/Count': 0, '/Kids': []}, {'/Producer': 'pypdf'}, {'/Type': '/Catalog', '/Pages': IndirectObject(1, 0, 139758986133584)}]    

Clone the pdf

writer.clone_document_from_reader(reader)
import pprint
pprint.pprint([obj for obj in writer._objects if isinstance(obj, dict)  and obj.get('/Type') ==  '/Catalog'])

We have two objects of type Catalog :|

[{'/Pages': IndirectObject(1, 0, 139758986133584), '/Type': '/Catalog'},
 {'/AcroForm': IndirectObject(5, 0, 139758986133584),
  '/Lang': 'fr-FR',
  '/MarkInfo': {'/Marked': True},
  '/Metadata': IndirectObject(817, 0, 139758986133584),
  '/PageLabels': IndirectObject(818, 0, 139758986133584),
  '/Pages': IndirectObject(194, 0, 139758986133584),
  '/StructTreeRoot': IndirectObject(819, 0, 139758986133584),
  '/Type': '/Catalog',
  '/ViewerPreferences': IndirectObject(1484, 0, 139758986133584)}]

Pdf use https://www.georisques.gouv.fr/sites/default/files/ial/ial_inscriptible.pdf

Fast fix

 from pypdf import PdfReader, PdfWriter                                                                                                                                                                                                                                            
 reader = PdfReader("ial_inscriptible.pdf")                                                                                                                                                                                                              
 writer = PdfWriter()     
 writer._objects = []                                                                                                                                                                                                                                                         
 print(writer._objects )
 writer.clone_document_from_reader(reader)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions