<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><span style="font-family: Tahoma; font-size: 13.333333015441895px;" class="">The 6th Workshop on Vision and Language (VL’17)</span><br style="font-family: Tahoma; font-size: 13.333333015441895px;" class=""><div style="font-family: 'Times New Roman'; font-size: 16px;" class=""><div style="direction: ltr; font-family: Tahoma; font-size: 10pt;" class=""><div style="font-family: 'Times New Roman'; font-size: 16px;" class=""><div style="direction: ltr; font-family: Tahoma; font-size: 10pt;" class="">At EACL’17 in Valencia, Spain<br class=""><br class="">Computational vision-language integration is commonly taken to mean the process of associating visual and <br class="">corresponding linguistic pieces of information. Fragments of natural language, in the form of tags, captions, subtitles,<br class="">surrounding text or audio, can aid the interpretation of image and video data by adding context or disambiguating<br class="">visual appearance. Labeled images are essential for training object or activity classifiers. Visual data can help<br class="">resolve challenges in language processing such as word sense disambiguation, language understanding, machine<br class="">translation and speech recognition. Sign language and gestures are languages that require visual interpretation.<br class="">Studying language and vision together can also provide new insight into cognition and universal representations of<br class="">knowledge and meaning, the focus of researchers in these areas is increasingly turning towards models for grounding language in action and perception. There is growing interest in models that are capable of learning from, and exploiting, multi-modal data, involving constructing semantic representations from both linguistic and visual or<br class="">perceptual input.<br class=""><br class="">The 6th Workshop on Vision and Language (VL’17) aims to address all the above, with a particular focus on the<br class="">integrated modelling of vision and language. We welcome papers describing original research combining language<br class="">and vision. To encourage the sharing of novel and emerging ideas we also welcome papers describing new datasets,<br class="">grand challenges, open problems, benchmarks and work in progress as well as survey papers.<br class="">Topics of interest include (in alphabetical order), but are not limited to:<br class=""><br class="">* Computational modelling of human vision and language<br class="">* Computer graphics generation from text<br class="">* Cross-lingual image captioning<br class="">* Detection/Segmentation by referring expressions<br class="">* Human-computer interaction in virtual worlds<br class="">* Human-robot interaction<br class="">* Image and video description and summarisation<br class="">* Image and video labelling and annotation<br class="">* Image and video retrieval<br class="">* Language-driven animation<br class="">* Machine translation with visual enhancement<br class="">* Medical image processing<br class="">* Models of distributional semantics involving vision and language<br class="">* Multi-modal discourse analysis<br class="">* Multi-modal human-computer communication<br class="">* Multi-modal machine translation<br class="">* Multi-modal temporal and spatial semantics recognition and resolution<br class="">* Recognition of narratives in text and video<br class="">* Recognition of semantic roles and frames in text, images and video<br class="">* Retrieval models across different modalities<br class="">* Text-to-image generation<br class="">* Visual question answering / visual Turing challenge<br class="">* Visually grounded language understanding<br class="">* Visual storytelling<br class=""><br class="">Accepted poster submissions will be presented in the form of brief ’teaser’ presentations, followed by a poster presentation during the workshop poster session, and will be published in the VL'17 proceedings. <br class=""><br class=""><br class="">Poster Abstract Submission<br class=""><br class="">Abstracts for posters should be up to 2 pages long plus references. Submissions should adhere to the EACL 2017 format (style files available <a href="http://eacl2017.org/index.php/calls/call-for-papers" class="">http://eacl2017.org/index.php/calls/call-for-papers</a>), and should be in PDF format.<br class=""><br class="">Please make your submission via the workshop submission pages: <font size="2" class=""><span style="font-size: 10pt;" class=""><a href="https://staffmail.brighton.ac.uk/owa/UrlBlockedError.aspx" target="_blank" class="">https://www.softconf.com/eacl2017/VL2017</a> </span></font><br class=""><br class=""><br class="">Important Dates<br class=""><br class="">Feb 28, 2017: Workshop Poster Abstracts Due Date<br class="">Mar 5, 2017: Notification of Acceptance<br class="">Mar 10, 2017: Camera-ready Abstracts Due<br class="">April 4, 2017: VL'17 Workshop<br class=""><br class="">Programme Committee<br class=""><br class="">Raffaella Bernardi, University of Trento, Italy<br class="">Darren Cosker, University of Bath, UK<br class="">Aykut Erdem, Hacettepe University, Turkey<br class="">Jacob Goldberger, Bar Ilan University, Israel<br class="">Jordi Gonzalez, Autonomous University of Barcelona, Spain<br class="">Frank Keller, University of Edinburgh, UK<br class="">Douwe Kiela, University of Cambridge, UK<br class="">Adrian Muscat, University of Malta, Malta<br class="">Arnau Ramisa, IRI UPC Barcelona, Spain<br class="">Carina Silberer, University of Edinburgh, UK<br class="">Caroline Sporleder, Germany<br class="">Josiah Wang, University of Sheffield, UK<br class="">Further members t.b.c.<br class=""><br class=""><br class="">Organisers<br class=""><br class="">Anya Belz, University of Brighton, UK<br class="">Katerina Pastra, Cognitive Systems Research Institute (CSRI), Athens, Greece<br class="">Erkut Erdem, Hacettepe University, Turkey<br class="">Krystian Mikolajczyk, Imperial College London, UK<br class=""><br class="">Contact<br class=""><br class=""><a href="mailto:a.s.belz@brighton.ac.uk" class="">a.s.belz@brighton.ac.uk</a><br class=""><a href="http://vision.cs.hacettepe.edu.tr/vl2017/" class="">http://vision.cs.hacettepe.edu.tr/vl2017/</a><br class=""><br class=""><br class="">This Workshop is organised by European COST Action IC1307: The European Network on Integrating Vision and Language (iV&L Net)<br class=""><br class="">The explosive growth of visual and textual data (both on the World Wide Web and held in private repositories by<br class="">diverse institutions and companies) has led to urgent requirements in terms of search, processing and management<br class="">of digital content. Solutions for providing access to or mining such data depend on the semantic gap between<br class="">vision and language being bridged, which in turn calls for expertise from two so far unconnected fields: Computer<br class="">Vision (CV) and Natural Language Processing (NLP). The central goal of iV&L Net is to build a European<br class="">CV/NLP research community, targeting 4 focus themes: (i) Integrated Modelling of Vision and Language for CV<br class="">and NLP Tasks; (ii) Applications of Integrated Models; (iii) Automatic Generation of Image & Video Descriptions;<br class="">and (iv) Semantic Image & Video Search. iV&L Net will organise annual conferences, technical meetings,<br class="">partner visits, data/task benchmarking, and industry/end-user liaison. Europe has many of the world's leading<br class="">CV and NLP researchers. Tapping into this expertise, and bringing the collaboration, networking and community<br class="">building enabled by COST Actions to bear, iV&L Net will have substantial impact, in terms of advances in both<br class="">theory/methodology and real world technologies.<br class=""></div></div></div></div></body></html>