Seminar on three Human Language Technology research projects here at DSV

Hercules Dalianis, Sumithra Velupillai and Martin Hassel held a lunch seminar on three Human Language Technology research projects here at DSV.

Hercules presented results from the project Tvärsök which aims to bridge the language gap between the Nordic countries. More specifically, if you search for documents in Swedish you should be able to find corresponding information in Danish or Norwegian if such are lacking for Swedish. The idea is that while you might have active knowledge of Swedish, in the sense that you can formulate search queries in that language, you might still be able to read and comprehend closely related languages even if you cannot effectively formulate queries in those languages (passive knowledge). This work has resulted in new lexical resources in the mobility domain as well as new methods for identifying parallel text [Hercules’ slides]

Sumithra presented ongoing work in the KEA project which concerns the medical domain. In this project we have access to a very large set of medical records, on which we aim to apply an array of text mining techniques. However, these patient files need to be de-identified before they can be exposed to a broader spectrum of researchers. A major part so far has been the development and porting of de-identification methods to Swedish. Parallel tracks are the exposing of relations between loosely defined features hidden in the vast amount of clinical data, as well as the identification of (un)certainty in diagnoses [Sumithra’s slides]

Martin presented two related projects on language technology based services for eGoverment. The first one, IMAIL, which just recently has been approved, concerns the automation of responses to requests sent by e-mail to the Swedish Social Insurance Agency (Försäkringskassan). These requests will be classified as recognised, in which case a predefined response can automatically be sent to the customer, or unrecognised and forwarded to a government official. In the latter case a tool already employed in the KEA project – Infomat – can be adapted to present the official with a view over previous transactions related to incoming request. The second project, still awaiting approval, “Intelligent cross-language services for eGovernment” is a quite ambitious extension to IMAIL which aims at further dealing with translation, multilingual authoring and cross-lingual information extraction for the Swedish minority languages [Martin’s slides]