diogo carapito

Family Medicine, M.D. | Data Science
Postgraduate in Information Management and Business Intelligence in Healthcare @NOVA IMS
Family Medicine Residency @USF Mactamã, ULS Amadora-Sintra, Portugal
Medical Degree @NOVA Medical School, Portugal

Summer is over! Time to MLOps

This summer was rough. The beguining was going well. I finished my project for Predictive Methods of Data Mining at NOVA IMS. I’m pretty happy with the results. We ended up used streamlit, and we got a pretty good score on Kaggle with our rather simple neural network. =D I also watched the LLM Bootcamp 2023, which gave me a lot of insight on the current state of LLMs. But my side projects were slowing down. I guess I was able to picture a structure in my head that might work (front end, backend’s data flow, data storage, RAG adn LLM inteerface) but I was having trouble implementing it. ...

Medical Large Language Models

I attended last week the Medical Large Language Models for Clinical Text Summarization, Information Extraction, and Question Answering from John Snow Labs. I’m sharing my notes here. LLMs LLMs and NLP in general are providing new tools to solve existing problems in healthcare. Here is a list of some new tools that are available today: Question Answering Text Summarization Text Generation Information extraction (e.g. from clinical notes) Relation extraction (e.g. symptoms related to a disease) Entity recognition (like ICD-10 code extraction) Chatbots Many open source LLMs available today have close performance to the best commercial state-of-the-art models, like GPT-4, GPT-3.5-turbo form OpenAI or Claude from Anthropic. The last tend to be general purpose, powerful, but extremely expensive to train. The open source models tend to lack performance in a broad sense but can be fine-tuned to specific tasks. This field is moving fast, which means that there is much potential for innovation, but it’s also a challenge to keep up with the state-of-the-art. ...

LLM Bootcamp 2023

I discovered the LLM Bootcamp 2023 on youtube. It’s a conference about MLOps that was recorded on April 2023 and it anserers many of my questions

hugo

Last time I posted a blog post, I almost went nuts to make it work. I couldn’t remember how to publish a post. I don’t know what it was, maybe guithub pages, maybe the quarto framework, but definitly my dumbness was a big part of it. I just can’t grasp yet how all this unintuitive git shenanigans work. So I’ve been postponing my new blog post, knowing what awaited me. ...

mgfhub.com is now live!

I’m excited to announce that mgfhub.com is now live! It’s sort of a search tool with data visualization components for KPIs (“indicadores”) that exists in Portuguese Primary Care. I have imagined it for more than a year, and it’s trying to answer questions that I have in my daily life as a Family Medicine Resident when I’m working with KPIs: How many KPIs exist? How do KPIs work? How to quickly find specific KPIs (e.g. which KPIs are related to diabetes?) ...

NLP Summit Healthcare 2023

This week I attended the NLP Summit Healthcare 2023, a free virtual event organized by the John Snow Labs. It was a great event with a lot of interesting talks. I’ll share some of my key takeaways. 1. Best practices when developing NLP models Presented in the opening keynote by Dr. David Gondek, Chief Data Scientist at John Snow Labs, he sumarized some best practices that i found interesting as I’m beginning my NLP journey: ...

webapps and tutorials

This week has been exciting. I have this NLP project cooking inside my head (for some time now) and I’ve been speeding through many youtube tutorials on both backend and frontend structure. For the backend, I’ve been cruising through Pinecone, LangChain and OpenAI API, thanks to tutorials from James Briggs and Data independent. Google colab has been my best friend. Even though the backend is the new exciting stuff, I have a sweet spot for the way it’s presented. The frontend must: ...

hello, world!

I’ve been charging into different directions on my journey to build a bridge between health and tech (NLP and LLMs, I’m looking at you). There is so much potential and I have so many ideas! So, lately I’ve been: grinding through fast.ai Practical Deep Learning for Coders (just finished the 4th lesson this week) exploring website domain name stuff and setting up an website of my previous project (Primary care KPIs exploration tool, mgfhub.com) reconnecting to social media (I’ve been away for a while now), setting up linkedin, twitter and joining relevant discord servers after completing Postgraduate Program in Information Management and Business Intelligence in Healthcare @NOVA IMS, I felt the need to keep learning about data mining, so i’m doing a single course on Predictive Methods of Data Mining exploring DL, ML, NLP and LLM main concepts and recent developments (looking at you chatbase.co, ChatPDF and Quarto Help Bot) trying and failing implementing LLMs with jupyter notebooks exploring how to start a blog (thanks fast.ai for the push!) Initially I thought implementing a blog with Dash and publish it on pythonanywhere.com, since I’ve worked with Flask in the past months. But that would take more time to set up something that might be already solved. ...