Keynote-1: Sep 24 (Thursday), 08:30 AM IST – 09:30 IST

Dr. Jingdong Wang 王井东
Senior Principal Research Manager
Microsoft Research
Beijing, China

Title: High-Resolution Network for Visual Recognition

Abstract: Since AlexNet was invented in 2012, there has been rapid development in convolutional neural network architectures for visual recognition. Most milestone architectures, e.g. GoogleNet, VGGNet, ResNet, and DenseNet, are developed initially from image classification. It’s a golden rule that classification architecture is the backbone for other computer vision tasks.

What’s next for a new architecture that is broadly applicable to general computer vision tasks? Can we design a universal architecture from general computer vision tasks rather than from classification tasks?

We pursued these questions and developed a High-Resolution Network (HRNet), a network that comes from general vision tasks and wins on many fronts of computer vision, including semantic segmentation, human pose estimation, face alignment, and object detection. It is conceptually different from the classification architecture. HRNet is designed from scratch, rather than from the classification architecture, and it breaks the dominant design rule, connecting the convolutions in series from high resolution to low resolution, which goes back to LeNet-5.

Profile: Jingdong Wang is a Senior Principal Research Manager with the Visual Computing Group, Microsoft Research, Beijing, China. He received the B.Eng. and M.Eng. degrees from the Department of Automation, Tsinghua University, Beijing, China, in 2001 and 2004, respectively, and the PhD degree from the Department of Computer Science and Engineering, the Hong Kong University of Science and Technology, Hong Kong, in 2007. His areas of interest include deep learning, large-scale indexing, human understanding, and person re-identification. He is an Associate Editor of IEEE TPAMI, IEEE TMM and IEEE TCSVT, and is an area chair (or SPC) of some prestigious AI conferences, such as CVPR, ICCV, ECCV, ACM MM, IJCAI, and AAAI. He is a Fellow of IAPR and an ACM Distinguished Member.

His representative works include deep high-resolution network (HRNet), interleaved group convolutions, discriminative regional feature integration (DRFI) for supervised saliency detection, neighborhood graph search (NGS) for large scale similarity search, composite quantization for compact coding, and so on. He has shipped a dozen of technologies to Microsoft products, including Bing search, Bing Ads, Cognitive service, and XiaoIce Chatbot. His NGS algorithm is a fundamental element of many Microsoft products. He has developed Bing image search color filter using his efficient salient object algorithm. He has developed the first commercial color-sketch image search system. His homepage is here.

Keynote-2: Sep 25 (Friday), 08:30 AM IST – 09:30 IST

Prof. Ramesh Jain
University of California, Irvine

Title: Multimodal Augmented Homeostasis

Abstract: Homeostasis is nature’s engineering behind the most complex autonomic system that exists: the human body. Homeostasis is a self-regulating process by which biological systems tend to maintain stability while adjusting to conditions that are optimal for survival. Disruption in homeostasis results in malfunctioning of natural autonomic system causing chronic diseases. Chronic diseases have been the leading cause of death and human suffering in the last 50 years. They also have resulted in highest financial burden for individuals and countries. This can be corrected using external augmentation of the homeostasis loop. Recent progress in artificial pancreas for Type 1 Diabetes is a compelling example for such augmentation. In this paper we discuss emerging multimodal approaches for such augmentation in the context of chronical diseases. We show that multimodal sensing and fundamental technology developed by multimedia computing community may offer powerful augmentation of natural homeostasis to assist in management of chronic diseases.

Profile: Ramesh Jain is an entrepreneur, researcher, and educator. He is a Donald Bren Professor in Information & Computer Sciences at University of California, Irvine. His research interests covered Control Systems (cybernetics), Computer Vision, Artificial Intelligence, and Multimedia Computing. His current research passion is in addressing health issues using cybernetic principles building on the progress in sensors, mobile, processing, artificial intelligence, computer vision, and storage technologies. He is founding director of the Institute for Future Health at UCI. He is a Fellow of AAAS, ACM, IEEE, AAAI, IAPR, and SPIE. Ramesh co-founded several companies, managed them in initial stages, and then turned them over to professional management. He enjoys new challenges and likes to use technology to solve them. He is participating in addressing the biggest challenge for us all: how to enjoy long life in good health.

Keynote-3: Sep 26 (Saturday), 08:30 AM IST – 09:30 IST

Profile photo of Debdoot Mukherjee
Debdoot Mukherjee
Vice President, AI, ShareChat

Title: Multimodal learning in practice

Abstract: Multimodal Learning aims to develop models that process information from multiple heterogeneous inputs viz. visual, audio and text. This multidisciplinary field has been growing in importance in recent times since it has shown great prowess in improving machine understanding of real world content. This talk provides an overview of key applications of multimodal learning in social media platforms such as ShareChat. We do a deeper dive on the problem of holistic understanding of videos. Specifically, we discuss different approaches for creating multimodal representations of videos from noisy, user defined tags as well as from data on video watches. We present several examples of topics where the current techniques can achieve human level cognition and other examples where there are gaps and hence offer opportunities for future research.

Profile: Debdoot is Vice President, AI at ShareChat, where he leads a team in the areas of feed relevance, recommender systems and multimodal intelligence. Debdoot has over 12 years of experience in building innovative AI products in social, mobile and e-commerce domains. Prior to ShareChat, Debdoot was leading the AI team at Hike Messenger where he created several novel applications around conversation modelling in Indic languages, image and video understanding, massive scale social graph mining, etc. Previously, he led AI programs at Myntra for applications such as personalized search, product discovery, marketing and merchandising intelligence. Prior to Myntra, Debdoot worked at IBM Research, where he investigated problems related to enterprise search and information extraction. Debdoot is a gold medallist from IIT Delhi from where he graduated with a Master’s degree in Computer Science & Engineering.