Skip to content
@datalab-to

Datalab

Developing state of the art document intelligence models.

Pinned Loading

  1. marker marker Public

    Convert PDF to markdown + JSON quickly with high accuracy

    Python 28.3k 1.8k

  2. surya surya Public

    OCR, layout analysis, reading order, table recognition in 90+ languages

    Python 18.4k 1.2k

  3. pdftext pdftext Public

    Extract structured text from pdfs quickly

    Python 586 53

Repositories

Showing 7 of 7 repositories
  • marker Public

    Convert PDF to markdown + JSON quickly with high accuracy

    datalab-to/marker’s past year of commit activity
    Python 28,265 1,840 270 38 Updated Aug 29, 2025
  • datalab-on-prem Public

    Scripts to run Datalab's self-service on-prem container

    datalab-to/datalab-on-prem’s past year of commit activity
    Shell 0 0 0 0 Updated Aug 29, 2025
  • surya Public

    OCR, layout analysis, reading order, table recognition in 90+ languages

    datalab-to/surya’s past year of commit activity
    Python 18,406 1,246 124 10 Updated Aug 28, 2025
  • sdk Public
    datalab-to/sdk’s past year of commit activity
    HTML 3 MIT 2 2 1 Updated Aug 21, 2025
  • datalab-to/inference-mirror’s past year of commit activity
    Python 2 1 0 1 Updated Aug 13, 2025
  • docext Public

    An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)

    datalab-to/docext’s past year of commit activity
    Python 4 Apache-2.0 1 0 0 Updated Jun 18, 2025
  • pdftext Public

    Extract structured text from pdfs quickly

    datalab-to/pdftext’s past year of commit activity
    Python 586 Apache-2.0 53 8 3 Updated Jun 11, 2025

Top languages

Loading…

Most used topics

Loading…