What is a Video Search Engine? Part III – Detecting Motion

In Part I and II, we examined how we would be able to search Speech and Text inside videos. In Part III, we will look at one of the first names given to videos – “Motion” Picture. 

So, all videos have motion? That may not be true, not all videos have motion (or movement) all the time, especially in the case of security and surveillance videos.

video search as a service motion detection

Detecting motion in videos enables you to efficiently identify sections of interest within an otherwise long and uneventful video. That might sound simple with a single video, but what if you have 10,000 hours of videos to review every night? That’s a near impossible task to eyeball every video minute.

Motion detection can be used on static camera footage to identify sections of the video where motion occurs.

  • Detect when motion has occurred in videos with stationery backgrounds
  • Eliminate false positives caused because of light changes, shadows, small insects, and others

While there are motion sensors that can detect motion real-time, these systems tend to be expensive. Thus, the reason why most of the CCTV surveillance systems only does recording at best. Therefore, there are many scenarios that does not require real-time motion detection, like detecting a car entering a bus lane during peak hours.

video search engine - bus lane detection

Current technology has come to a point where it is able to differentiate between real motion (such as a person walking into a room), and false positives (such as leaves in the wind, along with shadow or light changes). This allows you to generate security alerts from camera feeds without being spammed with endless irrelevant events, while being able to extract moments of interest from extremely long surveillance videos.

To find out more about how you can detect motion inside your videos, visit VideoSpace Video Search Engine or our Video-Search-as-a-Service.

What is a Video Search Engine? Part II – Searching Text

In Part I, we found out that there are 7099 living languages in the world. That includes both written and spoken only languages. According to Ethnologue (20th edition) out of that 7,099 living languages, 3,866 have a developed writing system.

Which leads us to this second part of our series – Searching Text inside a video. Besides Speech, Text is probably the second most important element where we can extract data from.

For example, in a presentation or talk given by a speaker. Besides speech, the speaker would augment the session with a set of slides. Therefore, besides his voice, text (in the slides) is another set of data that can be captured. This is important because what he says and what he present in the slides can be vastly different.

Text that can be OCRed during a presentation

Text that can be OCRed during a presentation

The technology to capture these text inside the video is called Video OCR (Optical Character Recognition). Video OCR is derived from OCR, a technology that has been around a long time.

By strict definition, Optical Character Recognition (OCR) is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo or from subtitle text superimposed on an image (source: Wikipedia). The first OCR machine that read characters and converted them into standard telegraph code was invented by Emanuel Goldberg in 1914!

Unfortunately, one hundred years on, OCR technology still has some ways to go, especially in the field of adding more language capabilities and recognizing handwriting. However, with more A.I. and Machine Learning, the hope is that researchers can add more capabilities to what OCR can do now.

However, Video OCR is giving OCR a new lease of life by simply adding another dimension – moving images. Given the amount of videos that has never been OCRed before and the amount of videos being generated every day, the potential for Video OCR is immerse.

To find out more about how you can search TEXT inside your videos, visit VideoSpace Video Search Engine or our Video-Search-as-a-Service.

What is a Video Search Engine? Part I - Searching Speech

Of all formats, videos are the most difficult to search. Typically, current search engines can only search for "Title" and "Metadata" of the videos, which are manually keyed in by a human. There is no way to search the content inside the video. For example, how do you find a specific piece of news in a news clip? Or specific words that appear inside a video? How can you find them without actually watching the videos yourself?

Before we even get into the question of what is a video search engine, we need to have an understanding what can we search inside a video? Elements can include SpeechWords (or Text), MotionEmotionsFaces and Objects.

Video Search as a Service

To kick of this “What is a Video Search Engine?” series, let’s tackle the most obvious of the elements – Speech.

In an hour, a person can say up to 9,000 words. Given the rate of videos are being produced today, that’s a lot of words. According to The Ethnologue catalogue of world languages, there are currently 7099 living languages. Obviously, Speech Recognition technology has not been able to keep with these vast number of languages. However, the good news is (depending on you see things), just 23 languages account for more than half of the world’s population.

Languages in the World (Source: www.ethnologue.com)

Languages in the World (Source: www.ethnologue.com)

On the technical aspect of searching speech in videos, the following process is required: 

  1. Transcribe (Speech-to-Text) – transcribing speech in the video
  2. Index - make the speech searchable
  3. Search - brings the users to exactly where the search terms are in the video.

The processes involved might sound simple, but the process of transcribing speech is filled with problems. There are factors that can affect the accuracy of speech recognition. For example:

  • heavy localized accent
  • low speech volume
  • bad diction
  • heavy background noise
  • multiple voices speaking at the same time

With the above in consideration, there are a lot of videos that are “not suitable” for machine transcribing: movies, TV shows, anything with mixed audio and sound effects, poorly recorded content with background noise (hiss).

To find out more about how you can search speech inside your videos, visit VideoSpace Video Search Engine or our Video-Search-as-a-Service.