The Workshop HyTra-4 (Hybrid Approaches to Translation, 4th Edition) aims at providing a communication platform and informing research agenda around theoretical and practical issues of Hybrid MT, and specifically – the problems, methodologies, resources and theoretical ideas which originate outside the mainstream MT paradigm, but have potential to enhance the quality of state-of-the-art MT systems by bringing together diverse range of technologies, methods and tools into MT domain.
HyTra-4 intends to further progress on the findings from the third HyTra Workshop held at EACL 2014, the second HyTra, held at ACL 2013, and first HyTra which was held (together with the ESIRMT workshop) as a joint 2-day EACL 2012 workshop. The first editions of HyTra brought together researchers working on diverse aspects of hybrid machine translation. Full paper sessions allowed for fruitful, creative and interdisciplinary discussions, comparing and contrasting diverse ways to integrate MT paradigms. Project sessions led to discussions and interactions between European project leaders and researchers identifying challenges to progress in MT. HyTra proceedings put together high-quality papers experimenting with current topics including statistical approaches integrating morphological, syntactic, semantic and rule-based information.
Machine Translation (MT) is a highly interdisciplinary and multidisciplinary field since it is approached from the point of view of human translators, engineers, computer scientists, mathematicians and linguists. This workshop aims at motivating the cooperation and interaction between them, and to foster innovative combinations between the two main MT paradigms: statistical and rule-based.
The advantages of statistical MT are fast development cycles, low cost, robustness, superior lexical selection and relative fluency due to the use of language models. But (pure) statistical MT has also disadvantages: It needs large amounts of data, which for many language pairs are not available, and are unlikely to become available in the foreseeable future. This problem is especially relevant for under-resourced languages. Recent advances in factored morphological models and syntax-based models in SMT indicate that non-statistical symbolic representations and processing models need to have their proper place in MT research and development, and more research is needed to understand how to develop and integrate these non-statistical models most efficiently.
The advantages of rule-based MT are that its rules and representations are geared towards human understanding and can be more easily checked, corrected and exploited for applications outside of machine translation such as dictionaries, text understanding and dialog systems. But (pure) rule-based MT has also severe disadvantages, among them slow development cycles, high cost, a lack of robustness in the case of incorrect input, and difficulties in making correct choices with respect to ambiguous words, structures, and transfer equivalents.
The translations of statistical systems are often surprisingly good with respect to phrases and short distance collocations, but they often fail when selectional preferences need to be based on more distant words. In contrast, the output of rule-based systems is often surprisingly good if the parser assigns the correct analysis to a sentence. However, it usually leaves something to be desired if the correct analysis cannot be computed, or if there is not enough information for selecting the correct target words when translating ambiguous words and structures. Given the complementarity of statistical and rule-based MT, it is natural that the boundaries among them have narrowed. The question is what the combined architecture should look like. In the past few years, in the MT scientific community, the interest in hybridization and system combination has significantly increased. This is why a large number of approaches for constructing hybrid MT have already been proposed offering a considerable potential of improving MT quality and efficiency. Mainly, the following hybrid MT systems can be identified: (1) SMT models augmented with morphological, syntactic or semantic information; (2) Rule-based MT systems are using parallel and comparable corpora to improve results by enriching their lexicons and grammars and by applying new methods for disambiguation; (3) MT system combination based on different MT paradigms (including voting systems); (4) various post-editing approaches. There is also great potential in expanding hybrid MT systems with techniques, tools and processing resources from other areas of NLP, such as Information Extraction, Information Retrieval, Question Answering, Semantic Web, Automatic Semantic Inferencing. The aim of the proposed workshop is to bring together and share ideas among researchers developing statistical, example-based, or rule-based translation systems and who enhance MT systems with elements from the other approaches. Hereby a focus will be on effectively combining linguistic and data driven approaches (rule-based and statistical MT).