Deep learning based on various neural network architectures has seen tremendous success in recent years, substantially outperforming alternative learning approaches in fields such as natural language processing. By learning abstract representations through multiple processing layers, the learning task can be simplified while removing the need for carefully handcrafted features. Deep learning hence provides an effective paradigm for obtaining end-to-end learning models from complex data, such as the vast amounts of longitudinal and heterogeneous data that are stored in electronic health records. Learning general-purpose representations of patients can be useful for modeling patient trajectories and disease progression, supporting early prediction and detection of adverse events, such as healthcare-associated infections or adverse drug effects. There are numerous open research questions w.r.t. deep learning from healthcare data, including (i) effectively learning from small amounts of (labeled) data through, e.g., unsupervised pre-training, (ii) modeling the temporality of clinical events, and (iii) creating interpretable models that can be understood by clinical decision makers.
The PhD project involves designing novel deep learning architectures that address these challenges in order to make better use of heterogeneous healthcare data, in particular free-text clinical notes, for ultimately supporting healthcare by improving patient safety and reducing healthcare costs. In HEALTH BANK, we have access to eight years of specialized healthcare data from Karolinska University Hospital and are currently in the process of also obtaining primary care data, thereby allowing patients to be followed throughout the healthcare system.