Odjel za računarstvo

Područje rada odjela za računarstvo obuhvaća sve vidove teorije, oblikovanja, prakse i primjene metoda i sustava vezanih uz računarstvo i obradu informacija. Djelovanje odjela usmjereno je na znanstvenu, stručnu, obrazovnu i društvenu komponentu. Kroz razmjenu tehničkih informacija i znanstvenih spoznaja, odjel teži unaprjeđenju struke i održavanju visoko profesionalnog položaja među članovima. S druge strane, kroz organizaciju znanstvenih i stručnih predavanja i rasprava te izdavanje tehničkih časopisa, promiče se višedisciplinarna suradnja s drugim strukama i otvorenom društvenom zajednicom.
Vodstvo odjela
Mandat do 31. 12. 2024.

Lucija Petricioli
predsjednica

Hana Ivandić
dopredsjednica

Odjel za računarstvo Hrvatske sekcije IEEE, Centar za računalni vid i ZEMRIS pozivaju Vas na predavanje

Indi Scripts Corpus Development and Handwritten Text Recognition

koje će održati

Assistant Professor Dr. Neeta Nain

Department of Computer Engineering, Malaviya National Institute of Technology, Jaipur, India

u srijedu 28. svibnja u 10 sati u Sivoj vijećnici, Fakultet elektrotehnike i računarstva, Unska 3, Zagreb

Sažetak predavanja i životopis predavačice nalaze se u nastavku obavijesti.

Biography

Neeta Nain is teaching Computer Graphics and Image Processing, over 15 years. Her research area is Image processing, Pattern Recognition, Multimedia Techniques and Computer Graphics. Presently she is guiding research in Handwritten Text Recognition and Image vectorization. She has written three Springer book chapters, and has authored over 30 research papers for various International Journals and conferences. She has also authored a book Computer Architecture and Organization

 

Summary

To spread the acquired knowledge to next generations related with different fields, it is frequently stored in handwritten and printed form. With the advancement in printing technology and its eventual consumption, the volume of printed material has sky rocketed. With the progress of optical character recognition technology, now it is possible to scan documents as an image and to make it editable and searchable for further processing. However, for most of the languages due to unavailability of robust character recognition system, there is no efficient way to search through this printed and handwritten material quickly and efficiently. Indian scripts like Hindi and Urdu are some of these languages which needs robust character recognition system to convert huge handwritten as well as printed data into editable form. The talk would focus on handwritten text recognition (TR) systems with the use of Neural Network (NN) and Support Vector Machine (SVM) models for recognizing unconstrained off-line handwritten texts in general. As an example case Indian scripts like Hindi and Urdu language would be showcased. Presently there exists no such tool for Indian regional language handwritten document recognition. A standard method of representing normalized feature vectors for Indian scripts would be illustrated in details.

As compared to other languages very limited research work is done in Indi scripts like Hindi and Urdu character recognition, due to word and character segmentation problems, which, also generates variations in character shape with respect to change in its position. Some degree of work is done in printed isolated character recognition for specific font and size, no complete work exists in Hindi / Urdu word recognition. To the best of our knowledge handwritten Hindi and Urdu text recognition is still in its nascent stage. The CIIL (Central Institute of Indian Languages, Hyderabad, India) and CRULP (Centre for Research in Urdu Language Processing, Lahore, Pakistan) are doing research and development in linguistic and computational aspects of Hindi and Urdu respectively.

Issues like the development of a handwritten database for Indian script Corpus, the difficulties of the Indi script grammar or calligraphic styles would be addressed so that a standard corpus could be defined and mandated which would serve as a guide for font developers. To explore and also to determine the ways in which the printed as well as digital material is to be prepared, so that one standard could be developed for representing a given language. Also, to present a Corpus benchmarking dataset for Indi scripts, which will provide a common platform to researchers for objectively comparing different systems on the same dataset. Indi scripts are followed by more than one fourth population of the world in the form of many languages, that is, Arabic, Persian, Urdu, Punjabi, Pasto etc. in many countries. Presently Hindi is the official Language of India and Urdu is the official language of Middle East and Pakistan and is one of the sixteen major languages constitutionally recognized in India. In India also Urdu is the second language after Hindi across states which is used for writing and speaking.

 

A brief outline of the talk

1. Use of the Indian linguistic resources for technology development. Difficulties in automation in Indian scripts and their solution as Handwritten Indian script recognition.

2. Use of Handwritten Indi scripts like Hindi and Urdu as the language for various schemes

like Government census data collection input forms, passports, visa etc.

3. Corpus Development, data collection, annotation and validation.

4. Corpus standardization: a platform for the comparison and evaluation of different algorithms and techniques on the same grid.

5. Benchmarking for experimentation, verification and validation.

6. Issues and challenges in HTR.

7. Segmentation techniques for line, word and character segmentation.

8. Feature vector, development of normalized feature vector.

9.PCA, LDA and Decision tree classifiers.

10.HTR using Neural Network and Support Vector Machines.

11. Handwritten Text Recognition as a signature (document) verification tool, to authenticate persons' identity.

12. Usage in forensic science for forgery detection and prevention.

 

Social impact: education is the key to knowledge.

1. Understanding for Indian scripts for technology usage.

2. A Job sector for Hindi and Urdu speaking population particularly females as they mostly drop out of studies due to social circumstances.

3. Presently 50% of the Rural and Muslim population from East is engaged in labour work in Middle east Countries where the crucial work is carried out in Urdu. Providing learning abilities in Hindi and Urdu and its usage in daily work, the same workforce could provide the Middle east with IT solutions instead of masonry work.

4. Documentation and preservation of old historical manuscripts in art, music, literature and architecture. Mughal and Iranian architecture is one of the best contribution in architecture. Preservation of ancient medicinal manuscripts like Al-Kanoon (Islamic Medical Manuscripts at the National Library of Medicine) which is the basis of present allopathy medicine, and other excellent manuscripts stored in Spain (library) which was a centre of learning in medical science till 15th century, which needs to be brought in public domain. These manuscripts were the reference for medicinal procedures in their time. Prophet Mohammed's wife Aeysha herself was a medicine expert. Foreign delegates used to come and discuss their medical practices with her. We want to make such manuscript documents available in public domain for knowledge dissemination.

5. Collaborate within the document analysis, linguistics/computational departments and pattern recognition community, as well as with government agencies. To be a part of a vital force in the research and development of linguistic technologies that will ultimately make information access faster, easier and more precise. To collaborate with other Institutes and universities with the necessary inclination, skills and resources.

Autor: Dejan Škvorc
Popis obavijesti

Forum

>> / Sve diskusijske grupe / Kurikulum za srednje tehničke škole

Br. poruka:    Prikaz: (1 - 200)  Ukupno: 200

Sortiraj prema: naslovu | vremenu zadnjeg odgovora | autoru

Napomena:
* - oznaka za nove poruke