Professor Hercules Dalianis and I got a paper about the privacy preserving qualities of BERT accepted to the AAAI Fall Symposium on Human Partnership with Medical Artificial Intelligence! The paper is titled Are Clinical BERT Models Privacy Preserving? The Difficulty of Extracting Patient-Condition Associations. Our results strongly suggest that BERT’s poor generative capabilities makes it resistant to training data extraction attacks. Other models, such as GPT-2, have been shown to be susceptible to these attacks. From a privacy perspective, being a poor generator may be a feature!
Later in the same week, I flew from Stockholm to Punta Cana in the Dominican Republic to participate at EMNLP 20211. Almost 500 participants were there, with the total number of participants exceeding 4,000. There were many interesting presentations regarding NLP in general, but also some that were specifically about the privacy aspects of NLP. It was a great experience to learn where the field is headed and also to get to know many talented researchers. I have written a summary of some of the interesting papers – reach out if you are interested in it.
1: EMNLP stands for Empirical Methods in Natural Language Processing