5th Nordic RWE and AI conference – 28-29 January 2025, Helsinki
5th Nordic RWE and AI conference is organized by University of Helsinki and Åbo Akademi University. Register to the event here!
The machine learning algorithm was originally developed to extract smoking status from patient texts with purpose to analyze the effects of smoking on postoperative complications. Today, it is also being utilized in lung cancer research.
A well-designed machine learning tool can be adapted for various research purposes. This is precisely how Medaffcon’s smoking status algorithm functions. Initially, it was used to analyze how smoking affects postoperative complications in a retrospective study. Medaffcon’s Data Scientist, Olivia Hölsä, presented the model at the 5th Nordic RWE and AI Conference at the end of January.
Medaffcon explored this issue first in study including half a million patients operated at Helsinki University Hospital (HUS). Identifying smoking status from patient records may sound straightforward, but in practice, it is not. Smoking-related data is not structured but rather recorded in free text within broader patient records. The challenge is: how can smoking status be efficiently extracted from a vast amount of text?
Medaffcon and the research group developed a machine learning-based classifier to assist in data analysis. To train the model, clinical experts manually classified a total of 20,000 smoking-related sentences. This task was completed in a single day by two clinical experts, supported by Medaffcon’s pre-processing techniques and specialized tools. Following this, a total of half a million smoking-related sentences were analyzed and categorized using the machine learning algorithm.
Since then, the algorithm has been applied not only to studies on operated patients but also to healthcare resource utilization, survival outcomes and prognostic factors in lung cancer in the HUS region.
According to Olivia Hölsä, scalability requires that the algorithm is trained on a sufficiently large and diverse population.
“The algorithm we developed is based on a large and representative patient cohort, which includes a wide variety of patients.”
Hölsä explains that for this reason Medaffcon’s machine learning model is robust enough to analyze both large patient populations and more specific patient subgroups.
“In machine learning, it is crucial to ensure that the training data is consistently annotated and comparable to the real-world data for which the model is intended, allowing the model to correctly interpret and extract relevant information from clinical documentation”.
Hölsä says that it would also be interesting to compare machine learning models developed for a specific patient group across different university hospitals in Finland to assess how scalable the models are across the regions.
When scaling the machine learning model to various research needs, it is essential to ensure flexibility and avoid overfitting it to a single specific use case.
“For example, in the case of smoking, we must account for the fact that smoking status can change over time. A person may be a smoker but later quit smoking. Therefore, the model should be able to incorporate time-based restrictions.”
High-quality data is essential for developing an effective machine learning model. Medaffcon has extensive experience conducting Real-World Evidence (RWE) studies. As a result, its experts have a deep understanding of reliable data sources and know how to account for critical factors during data collection.
“We understand where the data is documented, and what information is available. We know the right questions to ask clinicians regarding data entry and can define the necessary specifications for data collection.”
Hölsä also emphasizes the importance of recognizing data limitations. For instance, the data of interest can be recorded across different healthcare systems such as specific laboratory tests being conducted in primary care rather than specialized healthcare. This must be addressed early in the study design, including specifications for data collection, and further considered during model development.
Beyond smoking status, other critical treatment-related factors, such as cancer progression and metastases, are still documented in unstructured formats. According to Hölsä, machine learning could be effectively used to analyze these records and extract valuable insights.
Medaffcon, founded in 2009, is a Nordic research and consulting company specializing in Real-World Evidence, Medical Affairs, and Market Access. With offices in Stockholm, Sweden, and Espoo, Finland, we provide expert services across the Nordic region. Our services combine strong medical and health economic expertise with modern data science.
The company employs some 30 experts. Since 2017, Medaffcon has been a subsidiary of Tamro Oyj and is part of the PHOENIX group, which is a leading provider of healthcare services in Europe.
5th Nordic RWE and AI conference is organized by University of Helsinki and Åbo Akademi University. Register to the event here!
Artificial intelligence (AI) is transforming research and working life across various industries. But what is its significance in health data analysis and real-world evidence (RWE) research? Medaffcon’s experts, Juhani Aakko and Lisse-Lotte Hermansson, share their perspectives on AI in their work.
Pharmaceutical companies should consider Finland when evaluating locations for Real-World Evidence (RWE) studies. The country boasts numerous strengths that make it an outstanding choice for real-world data-based (RWD) research. “Finland’s strong tradition and extensive experience in utilizing healthcare registries for research make it unique,” says Dr. Riikka Mattila, Scientific Advisor at Medaffcon.
Data Scientist
MSc (Tech.)
Olivia joined Medaffcon in 2021 as a trainee to work on her Master’s thesis. She is finishing her studies in bioinformatics and digital health at Aalto University and has also worked as teaching assistant alongside her studies there for three years. Her main strengths are analytical thinking, problem solving skills and proactive mindset.
She is interested in data analysis in the field of social and healthcare to support decision making in improving social and healthcare services and allocating their resources more effectively.
Biostatistician
Data Analysis Lead
MSc (Tech.)
+358 44 314 1597
iiro.toppila@medaffcon.com
Iiro joined Medaffcon in March 2017 as a Biostatistician. For the preceding four years, he has worked as a research assistant in an academic study group, analyzing clinical and genetic patient data. Iiro holds a Master of Science degree in Technology in Bioinformation Technology.
Iiro’s strengths include his strong expertise in statistics and data-analysis, hands-on experience in working with sensitive patient data, and strong interdisciplinary communication skills with experts from various fields. In the field, he is particularly interested in the large data amounts made available with the revolution of technology and how the information received such data can potentially be utilized to draw concrete conclusions, both in order to understand the nature of diseases and to advance the goals of the pharmaceutical industry and patient treatment.
“Machine learning and AI-based solutions will have a major impact on the healthcare sector now and in the future. However, effectively utilizing the already collected and available health-data will have a higher importance in order to improve health-care”.