HANDWRITTEN TEXT RECOGNITION OF HISTORICAL DOCUMENTS USING DEEP NEURAL NETWORK TECHNOLOGIES
Annotation
The application of deep neural network technologies to the problem of handwriting recognition in pre-reform Russian is considered. The initial data used are scanned JPG images of historical documents from the 19th century, in particular containing various noises and interference, which complicates the work of the recognition algorithm. Text recognition is performed in three stages: noise removal, segmentation (highlighting) of text lines in the image, since the input data for the deep neural network are precisely the lines, and then recognition of the text of the highlighted lines using the pre-trained Tesseract OCR model, which performs electronic translation of images of handwritten or printed text into text data. The model used is a convolutional recurrent neural network; the model is a combination of a convolutional neural network for extracting local features from an image and a recurrent neural network represented by two layers of bidirectional LSTM networks for processing the sequence. Using this model allows for reliable recognition of handwritten text.
Keywords
Постоянный URL
Articles in current issue
- IMPROVEMENT OF THE HUMAN EMOTIONAL STATE IDENTIFICATION ALGORITHM USING MFCC
- METHOD OF DYNAMIC UPDATING OF THE INTERACTION MODEL OF PARALLEL PROCESSES IN EMBEDDED SYSTEMS
- OPTIMAL AGGREGATION OF CLUSTERED SAMPLE INTERVALS FOR APPLYING THE χ2 TEST
- SPEED ANALYSIS OF ALOHA-BASED RANDOM-ACCESS ALGORITHM WITH VARIOUS SLOT DURATION
- IMPROVEMENT OF THE CALCULATION AND ANALYTICAL METHOD OF TV-CAMERA EFFICIENCY ASSESSMENT IN OBJECTS DETECTION AND RECOGNITION
- STUDY OF A LOW-COHERENCE INTERFEROMETRIC PROBE OPERATING IN THE SCANNING MEASUREMENT MODE
- CONTRAST AGENT DISTRIBUTION IN THE LUMEN AND WALL OF THE ABDOMINAL AORTA ACCORDING TO CT-ANGIOGRAPHIC STUDY DATA