Scenery of Shanghai

Weakly Supervised Learning for Big Multimedia Analysis

With the explosive growth of visual/acoustic signal data in local and cloud data centers, as well as the increasing social-networking sites, we have witnessed the popularity of big data in many multimedia based applications, e.g., large-scale image and video retrieval. Semantically understanding the content of these multimedia data can substantially enhance applications based on the large-scale multimedia data. The major limitation of the many existing models in multimedia and computer vision is that they are built upon low-level visual features and have limited predictability power of regional semantics. The problem is known as the “semantic gap” between the human perception and the low-level visual features. For example, conventional image/video annotators cannot efficiently and effectively label the semantics of these large-scale visual/acoustic data. Many of them are designed heuristically and can only detect a few semantic categories. To effectively fill the semantic gap of visual data in large-scale applications, weakly supervised learning paradigms are developed recently. They focus on an intelligent mechanism that transfers the image/video /social level semantics to different finer levels, e.g., image regions. Compared to the labor-intensive labeling in the fully supervised setting, the transferring mechanism can greatly reduce human effort. Extensive efforts have been dedicated to design weakly supervised learning models that enhance conventional multimedia tasks, while effective tools to manipulate these data are still at their infancy. This special session will target the most recent progresses on visual/acoustic semantic understanding with weak supervision. The possible topics are weakly supervised image segmentation/annotation, photo aesthetic ranking/cropping/retargeting, object localization/tracking, and video summarization/recommendation. This special session also targets on applying new types of weak supervision in semantic modeling, e.g., interactive image rendering and socially-aware image search. The primary objective of this special session fosters focused attention on the latest research progress in this interesting area.

Scope and Topics

  • Visual recognition, video summary, and annotation with weak supervision
  • Integration and ensemble of multimedia classifiers with weak supervision
  • Large-scale database systems and its applications
  • Image/video quality evaluation based on weakly supervised learning
  • Learning weak attributes for multimedia data analysis and modeling
  • Learning weak visual attributes by exploring socially aware cues
  • Learning visual semantics for intelligent traffic systems
  • Human interactive learning for image recognition and processing
  • Visual feature extraction with weak and social supervision
  • Discovering new types of weak supervision for computer vision tasks
  • Weakly-supervised indexing/hashing/ranking techniques for large-scale image and video retrieval
  • Vision model by learning the spatial-temporal context in social media
  • Applying location-aware social media to enhance visual search
  • Different social media visualization techniques
  • Learning visual semantics for intelligent traffic system
  • Visual semantic understanding in 3D/stereo data
  • Statistical models for weakly-supervised learning
  • Robustly model learning with little or no supervision.
  • Other related applications on weakly supervised learning.


Luming Zhang, National University of Singapore

Yi Yang, The University of Queensland

Liqiang Nie, National University of Singapore


Deadline: January 25, 2015, 11:59 PM PST

Please submit your work using the EasyChair conference website.

Sponsors and Partners