JACIII Vol.8 No.2 pp. 200-207
doi: 10.20965/jaciii.2004.p0200


Printed Japanese Character Recognition Using Multiple Commercial OCRs

Hidetoshi Miyao*, Yasuaki Nakano**, Atsuhiko Tani***, Hirosato Tabaru****, and Toshihiro Hananoi**

*Shinshu University, 4-17-1, Wakasato, Nagano 380-8553, Japan

**Kyushu Sangyo University, 2-3-1 Matsukadai, Higashi-ku, Fukuoka 813-8503, Japan

***Hitachi Software Engineering Co., Ltd., 5030 Totsuka-cho, Totsuka-ku, Yokohama 244-8555, Japan

****Fuji Xerox Co., Ltd., 2-3-1 Matsukadai, Higashi-ku, Fukuoka 813-8503, Japan

July 31, 2003
December 1, 2003
March 20, 2004
OCR, DP matching, printed Japanese character recognition, character extraction, document image analysis
This paper proposes two algorithms for maintaining matching between lines and characters in text documents output by multiple commercial optical character readers (OCRs). (1) a line matching algorithm using dynamic programming (DP) matching and (2) a character matching algorithm using character string division and standard character strings. The paper proposes a method that introduces majority logic and reject processing in character recognition. To demonstrate the feasibility of the method, we conducted experiments on line matching recognition for 127 document images using five commercial OCRs. Results demonstrated that the method extracted character areas with more accuracy than a single OCR along with appropriate line matching. The proposed method enhanced recognition from 97.61% provided by a single OCR to 98.83% in experiments using the character matching algorithm and character recognition. This method is expected to be highly useful in correcting locations at which unwanted lines or characters occur or required lines or characters disappear.
Cite this article as:
H. Miyao, Y. Nakano, A. Tani, H. Tabaru, and T. Hananoi, “Printed Japanese Character Recognition Using Multiple Commercial OCRs,” J. Adv. Comput. Intell. Intell. Inform., Vol.8 No.2, pp. 200-207, 2004.
Data files:

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on Apr. 22, 2024