feat: upgrade PaddlePaddle and PaddleOCR to 3.x

- Update paddlepaddle from >=2.5.0 to >=3.0.0,<3.3.0
- Update paddleocr from >=2.7.0 to >=3.0.0
- Update paddlepaddle-gpu from >=2.5.0 to >=3.0.0,<3.3.0

Note: PaddlePaddle 3.3.0 has an OneDNN bug that breaks CPU inference
(ConvertPirAttribute2RuntimeAttribute not implemented). Using <3.3.0
until the bug is fixed upstream.

This upgrade enables PP-StructureV3 for table extraction and uses
PP-OCRv5 for improved text recognition accuracy. The existing codebase
is already compatible with the 3.x API (predict() method and new
response format).

Verified:
- PaddleOCR import works
- PPStructureV3 is available
- OCREngine initializes correctly
- Inference API returns correct field extractions
- 2117 unit tests pass

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Yaojia Wang
2026-02-02 11:49:21 +01:00
parent 883fab5c4a
commit c4e3773df1
2 changed files with 5 additions and 5 deletions

View File

@@ -25,8 +25,8 @@ classifiers = [
dependencies = [ dependencies = [
"PyMuPDF>=1.23.0", "PyMuPDF>=1.23.0",
"paddlepaddle>=2.5.0", "paddlepaddle>=3.0.0,<3.3.0",
"paddleocr>=2.7.0", "paddleocr>=3.0.0",
"ultralytics>=8.1.0", "ultralytics>=8.1.0",
"Pillow>=10.0.0", "Pillow>=10.0.0",
"numpy>=1.24.0", "numpy>=1.24.0",
@@ -45,7 +45,7 @@ dev = [
"testcontainers[postgres]>=4.0.0", "testcontainers[postgres]>=4.0.0",
] ]
gpu = [ gpu = [
"paddlepaddle-gpu>=2.5.0", "paddlepaddle-gpu>=3.0.0,<3.3.0",
] ]
[project.scripts] [project.scripts]

View File

@@ -4,8 +4,8 @@
PyMuPDF>=1.23.0 # PDF rendering and text extraction PyMuPDF>=1.23.0 # PDF rendering and text extraction
# OCR # OCR
paddlepaddle>=2.5.0 # PaddlePaddle framework paddlepaddle>=3.0.0,<3.3.0 # PaddlePaddle framework (3.3.0 has OneDNN bug)
paddleocr>=2.7.0 # PaddleOCR paddleocr>=3.0.0 # PaddleOCR (PP-OCRv5)
# YOLO # YOLO
ultralytics>=8.1.0 # YOLOv8/v11 ultralytics>=8.1.0 # YOLOv8/v11