Skip to content

OCR Text Recognition

iClick provides built-in OCR (Optical Character Recognition) functionality that can meet the text recognition needs of most automation scenarios. This article explains the advantages of built-in OCR and why we don't integrate third-party OCR libraries like PaddleOCR.

How It Works

iClick's built-in OCR performs recognition on the iPhone device, not on the computer host.

Core Advantages:

  • Multi-language Support: Based on iOS system capabilities, supports 18+ major languages worldwide without additional configuration
  • Zero Host Resource Usage: Recognition process completed on iOS device, does not consume computer CPU, memory, or GPU resources

Applicable Scenarios

Built-in OCR can meet most automation needs:

  • ✅ App interface text recognition / button and menu text extraction
  • ✅ Message notification content recognition / in-game text recognition
  • ✅ Simple data collection tasks

Recommendation

If your needs do not involve large-scale document scanning, built-in OCR is completely sufficient, no need to consider third-party OCR libraries.

Why Not Integrate Third-party OCR (Using Baidu PaddleOCR as Example)

1. Model Files Too Large, Integrating Single Language Makes No Sense

PaddleOCR and other deep learning OCR library model files are very large:

  • Single language model: typically 100MB - 500MB
  • Multi-language support: requires downloading multiple models, easily exceeding 1GB - 3GB
  • High-precision models: can reach 5GB or larger
  • Each language service requires separate model files

2. GPU Version Requirements Are Demanding

Hardware Limitations

  • ⚠️ Only supports NVIDIA graphics cards (AMD cards, integrated graphics cannot be used)
  • ⚠️ Requires mid to high-end cards: RTX 2060 or higher
  • ⚠️ Official CPU version exists but efficiency makes it impractical

Performance Bottleneck

Even with a dedicated graphics card, performance is very limited. Example with RTX 3060 + 15 iPhones for button recognition:

Configuration: NVIDIA GeForce RTX 3060 (12GB)
Devices: 15 iPhones
Optimization: Extreme concurrency control and performance optimization
Result:
  - GPU already running at full capacity (100% usage)
  - OCR recognition speed unbearably slow
  - Overall performance severely degraded

You Can Still Use Third-party OCR

If you still need to use third-party OCR libraries like PaddleOCR (such as for doc scan, specific language recognition, etc.), and you have a high-performance NVIDIA graphics card, you can integrate it yourself.

Advantages of Self-integration

In fact, building PaddleOCR and other OCR libraries into independent services is very simple:

  • 🚀 Minimal Code - Building a basic OCR HTTP service requires only dozens of lines of code
  • 🔧 Flexible Control - Can choose desired language models, customize features and parameters
  • 🎯 Specialized Optimization - Optimize models for specific scenarios (such as captchas)

Applicable Scenarios

Third-party OCR is suitable for the following special scenarios:

ScenarioRecommended SolutionDescription
Captcha RecognitionSpecialized captcha modelTargeted training, higher recognition rate
Specific LanguagesCorresponding language modelSuch as minority languages, dialects
Handwritten TextHandwriting recognition modelBuilt-in OCR has limited handwriting support
Table RecognitionTable-specific modelStructured data extraction
Document ScanningHigh-precision modelBulk document processing

Cooperation: try.catch@foxmail.com