Deep Video Audio Match Network

A Deep Network for Human Video and Audio Matching

Abstract

Make model to understand dancing and music is very attractive. It is a task involving not only human being's motion beats but also their emotion convey by body. Music is a kind of data which has rich connotations and is in high dimension for it is very complicated. We propose our DVAMN model to give out a recommending system for soundless dancing videos. Fortunately, our model not only has the ability of comprehending human being's motion and emotion, but also can understand music very well. Thank to Youtube-8m Dataset, we do training and validation mainly on dancing videos from Youtube. After parameters are adjusted precisely, we build a music library instead of original sound of videos for our model to select and have a visual result of our model performance. We will open source our code on Github for discussion and keep undating if something new is discovered.

Author

Junqi Liu Peiming Yang Lesheng Jin

Code

Our source code is available on Github.

Paper

Our paper is available here.

Match Demo