single-jc.php

JACIII Vol.8 No.2 pp. 200-207
doi: 10.20965/jaciii.2004.p0200
(2004)

Paper:

Printed Japanese Character Recognition Using Multiple Commercial OCRs

Hidetoshi Miyao*, Yasuaki Nakano**, Atsuhiko Tani***, Hirosato Tabaru****, and Toshihiro Hananoi**

*Shinshu University, 4-17-1, Wakasato, Nagano 380-8553, Japan

**Kyushu Sangyo University, 2-3-1 Matsukadai, Higashi-ku, Fukuoka 813-8503, Japan

***Hitachi Software Engineering Co., Ltd., 5030 Totsuka-cho, Totsuka-ku, Yokohama 244-8555, Japan

****Fuji Xerox Co., Ltd., 2-3-1 Matsukadai, Higashi-ku, Fukuoka 813-8503, Japan

Received:
July 31, 2003
Accepted:
December 1, 2003
Published:
March 20, 2004
Keywords:
OCR, DP matching, printed Japanese character recognition, character extraction, document image analysis
Abstract

This paper proposes two algorithms for maintaining matching between lines and characters in text documents output by multiple commercial optical character readers (OCRs). (1) a line matching algorithm using dynamic programming (DP) matching and (2) a character matching algorithm using character string division and standard character strings. The paper proposes a method that introduces majority logic and reject processing in character recognition. To demonstrate the feasibility of the method, we conducted experiments on line matching recognition for 127 document images using five commercial OCRs. Results demonstrated that the method extracted character areas with more accuracy than a single OCR along with appropriate line matching. The proposed method enhanced recognition from 97.61% provided by a single OCR to 98.83% in experiments using the character matching algorithm and character recognition. This method is expected to be highly useful in correcting locations at which unwanted lines or characters occur or required lines or characters disappear.

Cite this article as:
Hidetoshi Miyao, Yasuaki Nakano, Atsuhiko Tani, Hirosato Tabaru, and Toshihiro Hananoi, “Printed Japanese Character Recognition Using Multiple Commercial OCRs,” J. Adv. Comput. Intell. Intell. Inform., Vol.8, No.2, pp. 200-207, 2004.
Data files:

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on Jun. 08, 2021