OCR Text Recognition
iClick provides built-in OCR (Optical Character Recognition) functionality that can meet the text recognition needs of most automation scenarios. This article explains the advantages of built-in OCR and why we don't integrate third-party OCR libraries like PaddleOCR.
How It Works
iClick's built-in OCR performs recognition on the iPhone device, not on the computer host.
Core Advantages:
- ✅ Multi-language Support: Based on iOS system capabilities, supports 18+ major languages worldwide without additional configuration
- ✅ Zero Host Resource Usage: Recognition process completed on iOS device, does not consume computer CPU, memory, or GPU resources
Applicable Scenarios
Built-in OCR can meet most automation needs:
- ✅ App interface text recognition / button and menu text extraction
- ✅ Message notification content recognition / in-game text recognition
- ✅ Simple data collection tasks
Recommendation
If your needs do not involve large-scale document scanning, built-in OCR is completely sufficient, no need to consider third-party OCR libraries.
Why Not Integrate Third-party OCR (Using Baidu PaddleOCR as Example)
1. Model Files Too Large, Integrating Single Language Makes No Sense
PaddleOCR and other deep learning OCR library model files are very large:
- Single language model: typically 100MB - 500MB
- Multi-language support: requires downloading multiple models, easily exceeding 1GB - 3GB
- High-precision models: can reach 5GB or larger
- Each language service requires separate model files
2. GPU Version Requirements Are Demanding
Hardware Limitations
- ⚠️ Only supports NVIDIA graphics cards (AMD cards, integrated graphics cannot be used)
- ⚠️ Requires mid to high-end cards: RTX 2060 or higher
- ⚠️ Official CPU version exists but efficiency makes it impractical
Performance Bottleneck
Even with a dedicated graphics card, performance is very limited. Example with RTX 3060 + 15 iPhones for button recognition:
Configuration: NVIDIA GeForce RTX 3060 (12GB)
Devices: 15 iPhones
Optimization: Extreme concurrency control and performance optimization
Result:
- GPU already running at full capacity (100% usage)
- OCR recognition speed unbearably slow
- Overall performance severely degradedYou Can Still Use Third-party OCR
If you still need to use third-party OCR libraries like PaddleOCR (such as for doc scan, specific language recognition, etc.), and you have a high-performance NVIDIA graphics card, you can integrate it yourself.
Advantages of Self-integration
In fact, building PaddleOCR and other OCR libraries into independent services is very simple:
- 🚀 Minimal Code - Building a basic OCR HTTP service requires only dozens of lines of code
- 🔧 Flexible Control - Can choose desired language models, customize features and parameters
- 🎯 Specialized Optimization - Optimize models for specific scenarios (such as captchas)
Applicable Scenarios
Third-party OCR is suitable for the following special scenarios:
| Scenario | Recommended Solution | Description |
|---|---|---|
| Captcha Recognition | Specialized captcha model | Targeted training, higher recognition rate |
| Specific Languages | Corresponding language model | Such as minority languages, dialects |
| Handwritten Text | Handwriting recognition model | Built-in OCR has limited handwriting support |
| Table Recognition | Table-specific model | Structured data extraction |
| Document Scanning | High-precision model | Bulk document processing |