4

What kind of semantic information can be extracted from such media? Anything would be fine, be it differentiation between music and spoken text, detection of distinct sounds (like gunshots or birds or cars), detecting indoor/outdoor takes or intensity of camera motion.

I know that there are many, many, many, manymanymany research topics in this category, but I didn't find any applications of any of these. Does anybody have links to applications / libraries / working prototypes / news about upcoming products on these topics?

soulmerge
  • 73,842
  • 19
  • 118
  • 155
  • You mean something like the EXIF information in JPEGs? – Will Morgan Jul 29 '09 at 11:11
  • No, not technical information about the media, but something like: 'This picture was taken outdoors' or 'There are people in this picture' or 'This audio track contains spoken text' or 'This audio track contains music' – soulmerge Jul 29 '09 at 11:26

3 Answers3

1

Have a look at MP4REG, which is the registration authority for code-points in "MP4 Family" files.

Short primer: Within the MPEG4 & QuickTime world, the basic physical building block of media is called an "Atom". Atoms can not only contain the actual audio and video, but also technical and non-technical meta data. The last of which sound interesting to you.

E.g.:

  • albm: Album title and track number (user-data)
  • jp2i: intellectual property information

I've only looked closely at this stuff once, with respect to meta-data, and my impression was that is it a fast and loose world. You might want to look at some low-level MP4 parsing tools that will let you inspect the individual atoms of real world media files. I think there are even unofficial (unregistered), custom atoms for use within specific systems.

Stu Thompson
  • 38,370
  • 19
  • 110
  • 156
  • The library itself looks very interesting. But if I understood it correctly, it only provides a.) technical data and b.) data that was entered by the user. I'm rather looking for information that is extracted through anlysis of the media. – soulmerge Nov 02 '09 at 09:14
  • It can provide more than just technical data. But, yes, it is just data that is specifically entered in by the creating/managing system. – Stu Thompson Nov 02 '09 at 11:44
0

The best topic for finding applications of this is that you might want to look at the research topic of "Content Based Video Retrieval and Indexing"

Other than that:

  • You can use learning techniques to classify the information recieved (video, single frames, or audio)
  • You can use clustering techniques to find similar sections of audio or video

One application of this is commercial removal. Commercial removers typically do a clustering approach to eliminate sections of commercials in TV video.

monksy
  • 14,156
  • 17
  • 75
  • 124
0

Music feature analysis is a huge topic these days. Imagine the possibilities! http://en.wikipedia.org/wiki/Music_information_retrieval

Also, check out the Conet Project: http://www.archive.org/details/ird059

just_wes
  • 1,288
  • 14
  • 25