In Part I, we found out that there are 7099 living languages in the world. That includes both written and spoken only languages. According to Ethnologue (20th edition) out of that 7,099 living languages, 3,866 have a developed writing system.
Which leads us to this second part of our series – Searching Text inside a video. Besides Speech, Text is probably the second most important element where we can extract data from.
For example, in a presentation or talk given by a speaker. Besides speech, the speaker would augment the session with a set of slides. Therefore, besides his voice, text (in the slides) is another set of data that can be captured. This is important because what he says and what he present in the slides can be vastly different.
The technology to capture these text inside the video is called Video OCR (Optical Character Recognition). Video OCR is derived from OCR, a technology that has been around a long time.
By strict definition, Optical Character Recognition (OCR) is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo or from subtitle text superimposed on an image (source: Wikipedia). The first OCR machine that read characters and converted them into standard telegraph code was invented by Emanuel Goldberg in 1914!
Unfortunately, one hundred years on, OCR technology still has some ways to go, especially in the field of adding more language capabilities and recognizing handwriting. However, with more A.I. and Machine Learning, the hope is that researchers can add more capabilities to what OCR can do now.
However, Video OCR is giving OCR a new lease of life by simply adding another dimension – moving images. Given the amount of videos that has never been OCRed before and the amount of videos being generated every day, the potential for Video OCR is immerse.