Odjel za računarstvo

Područje rada odjela za računarstvo obuhvaća sve vidove teorije, oblikovanja, prakse i primjene metoda i sustava vezanih uz računarstvo i obradu informacija. Djelovanje odjela usmjereno je na znanstvenu, stručnu, obrazovnu i društvenu komponentu. Kroz razmjenu tehničkih informacija i znanstvenih spoznaja, odjel teži unaprjeđenju struke i održavanju visoko profesionalnog položaja među članovima. S druge strane, kroz organizaciju znanstvenih i stručnih predavanja i rasprava te izdavanje tehničkih časopisa, promiče se višedisciplinarna suradnja s drugim strukama i otvorenom društvenom zajednicom.

Vodstvo odjela

Mandat do 31. 12. 2024.

Lucija Petricioli

predsjednica

e-mail

Hana Ivandić

dopredsjednica

e-mail

Poziv na predavanje: A Probabilistic Framework for Modeling Cross-Lingual Semantic Similarity Based on Latent Cross-Lingual Concepts (out of and in Context)

Odjel za računarstvo i Odjel za računalnu inteligenciju Hrvatske sekcije IEEE pozivaju Vas na predavanje

"A Probabilistic Framework for Modeling Cross-Lingual Semantic Similarity Based on Latent Cross-Lingual Concepts (out of and in Context)"

koje će održati Ivan Vulić, doktorski student u Language Intelligence & Information Retrieval Group, Department of Computer Science, KU Leuven, Belgija. Predavanje će se održati u utorak 10. prosinca 2013. godine u 16:00 sati, Siva vijećnica, Fakultet elektrotehnike i računarstva Sveučilišta u Zagrebu, Unska 3.

Predavanje je otvoreno za sve zainteresirane.

Sažetak predavanja i životopis predavača nalaze se u nastavku obavijesti.

Abstract

Following the ongoing growth of the World Wide Web and its omnipresence in today's increasingly connected world, users tend to abandon English as the lingua franca of the global network, since more and more content becomes available in their native languages. In addition, given the rapid development of online encyclopedias such as Wikipedia, blogosphere, and online news portals, users have simultaneously generated a huge volume of multilingual text resources. There is a pressing need to provide tools that are able to induce knowledge from the user-generated multilingual text resources and effectively accomplish cross-lingual text processing automatically or with minimum human intervention.

In this talk we address cross-lingual semantic similarity, the task of detecting words (or more generally, text units) that address similar semantic concepts and convey similar meanings across languages. Models of cross-lingual similarity are typically used to automatically induce bilingual lexicons and have found numerous applications in information retrieval (IR), statistical machine translation (SMT) and other natural language processing (NLP) tasks.

Research into corpus-based cross-lingual models of distributional similarity has focused on building context-insensitive models of cross-lingual similarity that typically rely on external resources such as readily available bilingual lexicons or parallel data to bridge the lexical chasm between two languages. In this talk we follow a completely new research path and present a new probabilistic approach to modeling cross-lingual semantic similarity (out of and in context) that is fully data-driven as it does not rely on any other resources besides a (non-parallel) multilingual corpus. The framework relies on an idea of projecting words and sets of words into a shared latent semantic space spanned by language-pair independent latent cross-lingual semantic concepts (e.g., cross-lingual topics obtained by a multilingual topic model). These latent concepts are induced from a comparable corpus without any additional lexical resources. Word meaning is represented as a probability distribution over the latent cross-lingual concepts, and a change in meaning is represented as a change in the distribution over these latent concepts. The first part of this talk provides a crash course on multilingual text mining models with an emphasis on the multilingual topic modeling approach. These models are utilized to induce the latent cross-lingual concepts from multilingual data. In the second part of the talk, we present a systematic overview of the context-insensitive models of cross-lingual similarity that are built upon the paradigm of latent cross-lingual concepts. We compare these models in the task of bilingual lexicon extraction (BLE). The final part of this talk reports on a novel work, as we complement the current state-of-the-art research by presenting an extension of the probabilistic framework towards context-aware models of cross-lingual similarity. We describe new models of similarity that modulate the isolated out-of-context word representations with contextual knowledge and report our findings on the task of word translation in context.

Biography

Ivan Vulić was born in Zadar, Republic of Croatia on May 7th, 1986. He received the degree of Master in Computer Science from the Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia in September 2009. He was awarded the bronze plaque "Josip Lončar" as the best graduated student in his class. In November 2009, he joined the LIIR (Language Intelligence & Information Retrieval) research group at the Department of Computer Science, KU Leuven, Belgium as a predoctoral student. In April 2010, he started his Ph.D. program with the emphasis on models of multilingual text mining and their applications in natural language processing (NLP) and information retrieval (IR) where he contributed to more than 15 research papers. His main research interests are information retrieval, natural language processing, and machine learning theory and applications, mostly in multilingual settings, including models of semantic similarity and bilingual lexicon extraction, cross-lingual information retrieval, multilingual topic modeling, terminology mining and alignment, machine translation, document categorization and classification, unsupervised techniques for languages with scarce resources, text mining and information extraction.

He has been a member of the Association for Computational Linguistics (ACL) since 2011. Since 2011 he also serves as an elected member of the student board of the European Chapter of the Association for Computational Linguistics (EACL).

3. 12. 2013.

Uređeno: 5. 12. 2013. u 16:59

Autor: Dejan Škvorc

Popis obavijesti

Forum

>> / Sve diskusijske grupe / Kurikulum za srednje tehničke škole

Napomena:
* - oznaka za nove poruke

Odjel za računarstvo

Poziv na predavanje: A Probabilistic Framework for Modeling Cross-Lingual Semantic Similarity Based on Latent Cross-Lingual Concepts (out of and in Context)

Forum

Poziv za predavače

Repozitorij