The paper Automatic training of lemmatization rules that handle morphological changes in pre-, in- and suffixes alike written by Bart Jongejan, CST University of Copenhagen and Hercules Dalianis was accepted to ACL-IJCNLP 2009,Singapore. Out of 571 valid submissions only 121 were accepted, which gives an acceptance rate of 21%.
Project proposal submitted to VR: VESPTEC
Today I submitted a project proposal to the Swedish Research Council (Vetenskapsrådet) with the title “VESPTEC – Vector space representations of textual content”. Collaborating with me on this proposal are Magnus Rosell and Viggo Kann at KTH CSC as well as Jussi Karlgren at SICS and Hercules Dalianis here at DSV.
Abstract:
Since the 1960s vector space models have been used extensively for representation of semantics, especially in information-retrieval systems such as Google. These vector spaces are usually multi-dimensional and the terms and documents are represented by very large matrices. There is no greater regard to context. For instance, how a term occurs in a document is almost completely disregarded. Texts are thus viewed as mere bags-of-words. Much of the research so far has either focused on the application of these representations on specific tasks, or on the efficiency of this application by reducing the dimensionality of the original space in some way. This project proposes the study of vector space representations of textual content in a more systematic manner.
We have identified two main tasks. One is to explore the notion of intrinsic dimensionality and the spatial metaphor often used in describing “likeness” between documents. The other, and perhaps more intriguing task is that of moving from a bag-of-words representation to a more informed document space, modeling more than just the cooccurrence of lexical items within documents. These models will be systematically validated on a diverse array of text processing tasks and well established test sets with built-in success criteria. A better representation of textual content is interesting in itself, but will also lead to better underlying models that will improve applications, such as search engines and text summarization.
Presentation on eHealth by Monica Winge
Monica Winge from Vinnova talked today about challenges and opportunities in e-health. One of the main challenges in today’s health care is to realise patient centered work processes. This is a complex task as many different care providers need to cooperate. IT systems and services could be a major enabler for this purpose, but still they are insufficient as they are often based on the care provider’s organisational views and not the needs of the patient.
Monica Winge has worked for a number of years at Karolinska Institutet, and she is presently at Vinnova. Monica has been the project leader of several e-health projects. She has been elected as one of “the 25 most powerful people in e-health in Sweden“.
MobiSams
MobiSams Paul Johannesson
AAL
AAL Paul Johannesson
Business Process Management with Social Software: An Integrated Technology forWork Organisation
Today, Paul, Birger, Martin and I submitted a project application to the Swedish Research Council.
Abstract:
Software support for well structured business processes is today provided through workflow technology
and process management tools. Tailored to support well structured processes, these tools do not provide
adequate support for loosely structured work activities such as knowledge intensive processes. This type of work
is heavily reliant on professional knowledge, deals with large amounts of data and tasks that can be redone several
times. The purpose of the project is to bring together state-of-the-art research in business process management
systems and social software to design services and methodology for supporting loosely structured processes. This
architecture will enable flexible process enactment, configurable and context-aware user interfaces, and service
based task support.
Journal article accepted to the International Journal of Medical Informatics special issue on Mining of Clinical and Biomedical Text and Data
The paper Developing a standard for de-identifying electronic patient records written in Swedish: precision, recall and F-measure in a manual and computerized annotation trial, which I’ve written together with Hercules Dalianis, Martin Hassel (both here at DSV) and Gunnar H Nilsson (at the Department of Neurobiology, Care Sciences and Society, Center for Family Medicine, Stockholm) has been accepted to the International Journal of Medical Informatics special issue on Mining of Clinical and Biomedical Text and Data. The work described in the paper has been part of the KEA project.
Abstract:
Background: Electronic patient records (EPRs) contain a large amount of information written in free text. This information is considered very valuable for research but is also very sensitive since the free text parts may contain information that could reveal the identity of a patient. Therefore, methods for de-identifying EPRs are needed. The work presented here aims to perform a manual and automatic Patient Health Information (PHI)-annotation trial for EPRs written in Swedish.
Methods: This study consists of two main parts: the initial creation of a manually PHI-annotated gold standard, and the porting and evaluation of an existing de-identification software written for American English to Swedish in a preliminary automatic de-identification trial. Results are measured with precision, recall and F-measure.
Results: This study reports fairly high Inter-Annotator Agreement (IAA) results on the manually created gold standard, especially for specific tags such as names. The average IAA over all tags was 0.65 F-measure (0.84 F-measure highest pairwise agreement). For name tags the average IAA was 0.80 F-measure (0.91 F-measure highest pairwise agreement). Porting a de-identification software written for American English to Swedish directly was unfortunately non-trivial, yielding poor results.
Conclusion: Developing gold standard sets as well as automatic systems for de-identification tasks in Swedish is feasible. However, discussions and definitions on identifiable information is needed, as well as further developments both on the tag sets and the annotation guidelines, in order to get a reliable gold standard. A completely new de-identification software needs to be developed.
REVVA 2 presentation
As a member of the technical team representing the Swedish Defense Research Agency (FOI), Today, I had the opportunity to present the outcome of the work done within the REVVA 2 consortium. Under the auspices of the Simulation Interoperability Standards Organization (SISO), the project was focused on developing a generic and comprehensive Verification, Validation & Acceptance (VV&A) methodology for modeling and simulation (M&S) products. The outcome is meant to be submitted as an internationally recognized standard and recommended practice for the application of VV&A, the Generic Methodology for Verification, Validation and Acceptance (GM-VV).
The GM-VV provides a generic framework to efficiently develop an argument to justify why identified models, simulations, underlying data, outcomes and capabilities are believed to be acceptable for deployment in the target (intended) operational (use) context. This argument is intended to support stakeholders in their acceptance decision-making process on the utilization the aforementioned M&S artifacts to satisfy their business goals. The methodology provides this support throughout the whole life-cycle of these (M&S) artifacts (development, employment and (re)use). More importantly, the GM-VV defines the required information and argumentation mechanisms that allow well-balanced and risk informed arguments for acceptance decision making with various levels of formality.
The presentation can be found here:2009.04.08 REVVA2 for SYSLAB.pdf, while the full document set (the GM-VV Handbook, the GM-VV RPG and the GM-VV Reference Manual) can be found at SISO’s product development group webpage.
Presentation on Mendix
Staffan Qvist at Mendix Sweden gave a presentation on Mendix for universities. Mendix delivers a powerful, model-driven application platform providing tools and architecture to rapidly design, build, test, integrate, deploy, manage and optimize dynamic business applications in any existing business and IT environment. Mendix technology uses graphical models – instead of code – to build dynamic applications.
Mendix Academy
Mendix Academy Paul Johannesson Presentation of Mendix
Agreement with Mendix
The Department of Computer and Systems Sciences (DSV) at Stockholm University and the Royal Institute of Technology (KTH) has signed a contract with Mendix about the use of their products for both education and research. Mendix delivers a powerful, model-driven application platform providing tools and architecture to rapidly design, build, test, integrate, deploy, manage and optimize dynamic business applications in any existing business and IT environment. Mendix technology uses graphical models – instead of code – to build dynamic applications.