Handwritten Gujarati Character Recognition using Machine Learning Approach by Ankit Sharma
Material type:
- TT000066 SHA
Item type | Current library | Collection | Call number | Status | Date due | Barcode | Item holds | |
---|---|---|---|---|---|---|---|---|
![]() |
NIMA Knowledge Centre | Reference | TT000066 SHA (Browse shelf(Opens below)) | Not For Loan | TT000066 | |||
![]() |
NIMA Knowledge Centre | Reference | TT000066 SHA (Browse shelf(Opens below)) | Not For Loan | TT000066-1 | |||
![]() |
NIMA Knowledge Centre | Reference | TT000066 SHA (Browse shelf(Opens below)) | Not For Loan | TT000066-2 |
Guided by: Dr. Dipak Adhyaru and Dr. Tanish Zaveri With Synopsis and CD 12EXTPHDE93
ABSTRACT:
Handwritten character recognition is an active area of research. Over the past three
decades, there has been increasing interest among researchers in problem related to
the machine simulation of the human reading process. Optical Character Recognition
(OCR) is the tool that is utilized to convert printed or handwritten scanned document
into machine readable form/text. Handwritten character recognition is a challenging
task and people are striving to convert handwritten literature to computer readable
format. Recognising handwritten characters is dicult compared to printed charac-
ters because handwritten characters may vary from person to person with respect to
the individual writing style, size, curve, strokes and thickness of characters.
Languages have played a major role in Indian history and they continue to
in
uence the lives of the Indians till date. Plentiful research on OCR techniques for
Indian languages such as Hindi, Tamil, Bangla, Kannada, Gurumukhi and Malayalam
has already been carried out. Development of OCR systems for Gujarati script is
still in infancy and hence, there exists many unaddressed challenging problems for
research community in this domain. This clearly necessitates the need to attend the
task of handwritten Gujarati character recognition. This thesis addresses the issues
of handwritten Gujarati character recognition.
Gujarati is the mother tongue of people belong to Gujarat state in India. All over
the world more than 65 million people use Gujarati language for their communication
purpose. As Gujarat is one of the eminent state of India, Gujarati is a well-known
and culturally rich language. Gujarati Character Recognition oers more diculties
like the most other Indian languages relative to the western languages due to these
reasons: (a) number of classes are higher, (b) structure of characters in Gujarati script
contains curves, holes and strokes which result in signicant variations in writing style
of different persons, (c) presence of similar looking characters (d) unavailability of standard dataset for experimentation and validation.
One of the signicant contributions of proposed work is towards the development
of large and representative datasets for the task of recognising handwritten Gujarati
characters and numerals. Benchmark datasets having 88,000 handwritten Gujarati
character images and 14,000 handwritten Gujarati numeral images are developed.
Special forms are utilized for dataset collection and isolated characters are extracted
from these forms. Preprocessing steps including noise removal, size normalization,
binarization and thinning are applied on each segmented numeral/character image.
Systematic and exhaustive experiments are carried out on these developed datasets
using dierent kinds of features and their fusion. Zone based, projection proles based
and chain code based features are employed as individual features. It is also proposed
to use the fusion of these features. Few novel features are also proposed to represent
handwritten Gujarati characters. These features include features extracted based on
structural decomposition, zone pattern matching and normalized cross correlation.
Methods based on articial neural network (ANN), support vector machine (SVM)
and naive Bayes (NB) classier are used for handwritten Gujarati character and
numeral recognition. In case of individual features, chain code based features provided
higher recognition accuracy values compared to other features which were 99.25% and
99.47% with polynomial SVM for numerals and characters datasets respectively. In
case of fusion based features, fusion of chain code based and zoning based features
provided best results compared to other fusion based features. Proposed structural
decomposition based features provided highest accuracy of 99.48% with polynomial
SVM for handwritten characters. Experimental results show signicant improvement
over state-of-the-art and validate our proposals.
There are no comments on this title.