OCR Text Recognition

iClick provides built-in OCR (Optical Character Recognition) functionality that can meet the text recognition needs of most automation scenarios. This article explains the advantages of built-in OCR and why we don't integrate third-party OCR libraries like PaddleOCR.

How It Works

iClick's built-in OCR performs recognition on the iPhone device, not on the computer host.

Core Advantages:

✅ Multi-language Support: Based on iOS system capabilities, supports 18+ major languages worldwide without additional configuration
✅ Zero Host Resource Usage: Recognition process completed on iOS device, does not consume computer CPU, memory, or GPU resources

Applicable Scenarios

Built-in OCR can meet most automation needs:

✅ App interface text recognition / button and menu text extraction
✅ Message notification content recognition / in-game text recognition
✅ Simple data collection tasks

Recommendation

If your needs do not involve large-scale document scanning, built-in OCR is completely sufficient, no need to consider third-party OCR libraries.

Why Not Integrate Third-party OCR (Using Baidu PaddleOCR as Example)

1. Model Files Too Large, Integrating Single Language Makes No Sense

PaddleOCR and other deep learning OCR library model files are very large:

Single language model: typically 100MB - 500MB
Multi-language support: requires downloading multiple models, easily exceeding 1GB - 3GB
High-precision models: can reach 5GB or larger
Each language service requires separate model files

2. GPU Version Requirements Are Demanding

Hardware Limitations

⚠️ Only supports NVIDIA graphics cards (AMD cards, integrated graphics cannot be used)
⚠️ Requires mid to high-end cards: RTX 2060 or higher
⚠️ Official CPU version exists but efficiency makes it impractical

Performance Bottleneck

Even with a dedicated graphics card, performance is very limited. Example with RTX 3060 + 15 iPhones for button recognition:

Configuration: NVIDIA GeForce RTX 3060 (12GB)
Devices: 15 iPhones
Optimization: Extreme concurrency control and performance optimization
Result:
  - GPU already running at full capacity (100% usage)
  - OCR recognition speed unbearably slow
  - Overall performance severely degraded

You Can Still Use Third-party OCR

If you still need to use third-party OCR libraries like PaddleOCR (such as for doc scan, specific language recognition, etc.), and you have a high-performance NVIDIA graphics card, you can integrate it yourself.

Advantages of Self-integration

In fact, building PaddleOCR and other OCR libraries into independent services is very simple:

🚀 Minimal Code - Building a basic OCR HTTP service requires only dozens of lines of code
🔧 Flexible Control - Can choose desired language models, customize features and parameters
🎯 Specialized Optimization - Optimize models for specific scenarios (such as captchas)

Applicable Scenarios

Third-party OCR is suitable for the following special scenarios:

Scenario	Recommended Solution	Description
Captcha Recognition	Specialized captcha model	Targeted training, higher recognition rate
Specific Languages	Corresponding language model	Such as minority languages, dialects
Handwritten Text	Handwriting recognition model	Built-in OCR has limited handwriting support
Table Recognition	Table-specific model	Structured data extraction
Document Scanning	High-precision model	Bulk document processing

OCR Text Recognition ​

How It Works ​

Applicable Scenarios ​

Why Not Integrate Third-party OCR (Using Baidu PaddleOCR as Example) ​

1. Model Files Too Large, Integrating Single Language Makes No Sense ​

2. GPU Version Requirements Are Demanding ​

Hardware Limitations ​

Performance Bottleneck ​

You Can Still Use Third-party OCR ​

Advantages of Self-integration ​

Applicable Scenarios ​

OCR Text Recognition

How It Works

Applicable Scenarios

Why Not Integrate Third-party OCR (Using Baidu PaddleOCR as Example)

1. Model Files Too Large, Integrating Single Language Makes No Sense

2. GPU Version Requirements Are Demanding

Hardware Limitations

Performance Bottleneck

You Can Still Use Third-party OCR

Advantages of Self-integration

Applicable Scenarios