Learning a Convolutional Bilinear Sparse Code for Natural Videos

In contrast to the monolithic deep architectures used in deep learning today for computer vision, the visual cortex processes retinal images via two functionally distinct but interconnected networks: the ventral pathway for processing object-related information and the dorsal pathway for processing motion and transformations. Inspired by this cortical division of labor and properties of the magno- and parvocellular systems, we explore an unsupervised approach to feature learning that jointly learns object features and their transformations from natural videos. We propose a new convolutional bilinear sparse coding model that (1) allows independent feature transformations and (2) is capable of processing large images. Our learning procedure leverages smooth motion in natural videos. Our results show that our model can learn groups of features and their transformations directly from natural videos in a completely unsupervised manner. The learned "dynamic filters" exhibit certain equivariance properties, resemble cortical spatiotemporal filters, and capture the statistics of transitions between video frames. Our model can be viewed as one of the first approaches to demonstrate unsupervised learning of primary "capsules" (proposed by Hinton and colleagues for supervised learning) and has strong connections to the Lie group approach to visual perception.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here