Replies: 3 comments 6 replies
-
What you've seen is the text layer above the PDF document. The document itself is rendered as a canvas per page below the text layer. The text layer, in turn, is invisible. It's only there to allow users to select text and to highlight the find results. When you make it visible - as this demo does - you'll see that the text layer is slightly off. It's not an exact representation of the PDF file. Instead, it's a best-effort approximation based on a lot of heuristics. In earlier versions, almost every word was an individual |
Beta Was this translation helpful? Give feedback.
-
When we refactored the text stuff our main goal was to fix almost all the bugs around copy/paste and search without degrading the text selection itself.
We have of course some heuristics to guess when a space is really a space and they're of course imperfect but they helped us to really fix a ton of bugs and the price to pay is sometimes to have too much spans. As far as I can tell, TeX/LaTeX tends to produce a little more spans than a basic pdf producer mainly because of/thanks to Knuth algorithm and kerning adjustment to make nice line breaking. If you can share the pdfs with a lot of spans, you can file a bug in pdf.js repo and I'll have a look on them and I'll see (when I'll have some time) if there is something we can do to improve the situation. |
Beta Was this translation helpful? Give feedback.
-
I have resolved my issue by extracting the page content and searching the occurances of the words I need without the built-in-search-function, and then highlighting them myself. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I've been getting some weird results.
Some documents are being viewed as a DOM tree of spans, where each span is a whole line in the PDF,
but some documents are being viewed as a span for each letter or two.
Why is this happening? And is there a way to distinguish when will it happen?
Beta Was this translation helpful? Give feedback.
All reactions