NLP Summit Healthcare 2023

This week I attended the NLP Summit Healthcare 2023, a free virtual event organized by the John Snow Labs. It was a great event with a lot of interesting talks. I’ll share some of my key takeaways.

1. Best practices when developing NLP models

Presented in the opening keynote by Dr. David Gondek, Chief Data Scientist at John Snow Labs, he sumarized some best practices that i found interesting as I’m beginning my NLP journey:

2. Mitigating bias in healthcare language models

Gaurav Kaushik from ScienceIO presented a talk about mitigating bias in healthcare language models and the importance of its evaluation. Performance benchmarks lack the statistical power, they aren’t well validated enough and don’t incentivize the use of biased systems.

Check this article for further information: Beyond Accuracy: Behavioral Testing of NLP Models with CheckList

3. Prototypical Networks for Interpretable Diagnosis Prediction

Betty van Aken from DATEXIS Research Group presented a language model that makes predictions based on parts of the text that are similar to prototypical patients providing justifications that doctors understand. It uses a prototypical network with label-wise attention to find the most similar patients to the input text and then uses a transformer to predict the diagnosis. This is a great example of how NLP can be used in healthcare space and opens the door to a lot of interesting applications.

Check their demo and the paper here: This Patient Looks Like That Patient: Prototypical Networks for Interpretable Diagnosis Prediction from Clinical Text

4. EHR-Safe: Generating High-Fidelity and Privacy-Preserving Synthetic Electronic Health Records

AI in healthcare has important privacy concerns, especially when dealing with sensitive data like Electronic Health Records (EHR). Cloud AI Team suggested that one way to overcome this challenge is to generate high-fidelity, privacy-preserving synthetic EHR data. They proposed a generative modeling framework, EHR-Safe, that can generate highly realistic synthetic EHR data that are robust against privacy attacks.

Check their paper here: EHR-Safe: Generating high-fidelity and privacy-preserving synthetic electronic health records

5. Some organizations that are doing interesting work in NLP in healthcare:


This is a list of some interesting projects that were presented at the summit:


Finally, I’ll share some of the articles that were cited in the talks:


There were many more interesting sessions I didn’t describe here, visit https://www.nlpsummit.org/ if you want further information. I’ll be there next year for sure! Thanks for reading!

P.S. Copilot helped writing this blog post. LLMs are amazing and I’m excited to see what the future holds for NLP!