Handwriting Transcription in Arabic, Syriac and Thai
My current system, written in Java, transcribes Syriac with high accuracy from old printed sources in all three scripts (Estrangelo, Serto, and East Syriac). It recognises individual characters and diacritical marks by matching contours using a dynamic programming approach. Match scores are recorded in a trellis. A knapsack decoder is used to find the optimal character sequence from the trellis. A lexicon is not required. It is planned to use this system to convert a 4,800-page Syriac document into machine readable form. This system can also be used for transcription of Arabic.
Previous methods I have investigated with collaborators have used Hidden Markov Models, support vector machines, and Bayesian approaches. Collaborators have included: Mohammad Khorsheed (Arabic), Prem Fernando (Estrangelo) and Roonroj Nopsuwanchai (Thai).