Home > All articles > Medaffcon’s Machine Learning Algorithm Adapts to Various Research Needs

Medaffcon’s Machine Learning Algorithm Adapts to Various Research Needs

25.3.2025

The machine learning algorithm was originally developed to extract smoking status from patient texts with purpose to analyze the effects of smoking on postoperative complications. Today, it is also being utilized in lung cancer research.

A well-designed machine learning tool can be adapted for various research purposes. This is precisely how Medaffcon’s smoking status algorithm functions. Initially, it was used to analyze how smoking affects postoperative complications in a retrospective study. Medaffcon’s Data Scientist, Olivia Hölsä, presented the model at the 5th Nordic RWE and AI Conference at the end of January.

Medaffcon explored this issue first in study including half a million patients operated at Helsinki University Hospital (HUS). Identifying smoking status from patient records may sound straightforward, but in practice, it is not. Smoking-related data is not structured but rather recorded in free text within broader patient records. The challenge is: how can smoking status be efficiently extracted from a vast amount of text?

Medaffcon and the research group developed a machine learning-based classifier to assist in data analysis. To train the model, clinical experts manually classified a total of 20,000 smoking-related sentences. This task was completed in a single day by two clinical experts, supported by Medaffcon’s pre-processing techniques and specialized tools. Following this, a total of half a million smoking-related sentences were analyzed and categorized using the machine learning algorithm.

Since then, the algorithm has been applied not only to studies on operated patients but also to healthcare resource utilization, survival outcomes and prognostic factors in lung cancer in the HUS region.

The Potential of Scalability in Machine Learning Models

According to Olivia Hölsä, scalability requires that the algorithm is trained on a sufficiently large and diverse population.

“The algorithm we developed is based on a large and representative patient cohort, which includes a wide variety of patients.”

Hölsä explains that for this reason Medaffcon’s machine learning model is robust enough to analyze both large patient populations and more specific patient subgroups.

“In machine learning, it is crucial to ensure that the training data is consistently annotated and comparable to the real-world data for which the model is intended, allowing the model to correctly interpret and extract relevant information from clinical documentation”.

Hölsä says that it would also be interesting to compare machine learning models developed for a specific patient group across different university hospitals in Finland to assess how scalable the models are across the regions.

When scaling the machine learning model to various research needs, it is essential to ensure flexibility and avoid overfitting it to a single specific use case.

“For example, in the case of smoking, we must account for the fact that smoking status can change over time. A person may be a smoker but later quit smoking. Therefore, the model should be able to incorporate time-based restrictions.”

Extensive Expertise in Real-World Evidence (RWE) Research

High-quality data is essential for developing an effective machine learning model. Medaffcon has extensive experience conducting Real-World Evidence (RWE) studies. As a result, its experts have a deep understanding of reliable data sources and know how to account for critical factors during data collection.

“We understand where the data is documented, and what information is available. We know the right questions to ask clinicians regarding data entry and can define the necessary specifications for data collection.”

Hölsä also emphasizes the importance of recognizing data limitations. For instance, the data of interest can be recorded across different healthcare systems such as specific laboratory tests being conducted in primary care rather than specialized healthcare. This must be addressed early in the study design, including specifications for data collection, and further considered during model development.

Beyond smoking status, other critical treatment-related factors, such as cancer progression and metastases, are still documented in unstructured formats. According to Hölsä, machine learning could be effectively used to analyze these records and extract valuable insights.

Contact us!

Olivia
Hölsä

Data Scientist
Iiro
Toppila

Biostatistician

Data Analysis Lead

Smoking is a predictor of complications in all types of surgery: a machine learning-based big data study

Real-World Evidence Study of Patients With NSCLC in Finland: Use of Machine Learning Algorithm to Extract Smoking Status From Patient Texts and Analysis of Resource Use and Survival by Smoking Status

Machine Learning-Based Prediction of Survival Prognosis for Treated NSCLC Patients: Insights from Finnish Real-World Data (PDF)

Medaffcon, founded in 2009, is a Nordic research and consulting company specializing in Real-World Evidence, Medical Affairs, and Market Access. With offices in Stockholm, Sweden, and Espoo, Finland, we provide expert services across the Nordic region. Our services combine strong medical and health economic expertise with modern data science.

The company employs some 30 experts. Since 2017, Medaffcon has been a subsidiary of Tamro Oyj and is part of the PHOENIX group, which is a leading provider of healthcare services in Europe.

Cookies

Medaffcon’s Machine Learning Algorithm Adapts to Various Research Needs

The Potential of Scalability in Machine Learning Models

Extensive Expertise in Real-World Evidence (RWE) Research

Contact us!

Olivia
Hölsä

Iiro
Toppila

Read more:

Related articles

5th Nordic RWE and AI conference – 28-29 January 2025, Helsinki

AI Brings New Opportunities to RWE Research – Its Use in Health Data Analysis is Limited

Finland: An Ideal Environment for the Pharmaceutical Industry’s Real-World Evidence (RWE) Studies

Cookies

Medaffcon’s Machine Learning Algorithm Adapts to Various Research Needs

The Potential of Scalability in Machine Learning Models

Extensive Expertise in Real-World Evidence (RWE) Research

Contact us!

OliviaHölsä

IiroToppila

Read more:

Related articles

5th Nordic RWE and AI conference – 28-29 January 2025, Helsinki

AI Brings New Opportunities to RWE Research – Its Use in Health Data Analysis is Limited

Finland: An Ideal Environment for the Pharmaceutical Industry’s Real-World Evidence (RWE) Studies

Olivia Hölsä

Iiro Toppila

Olivia
Hölsä

Iiro
Toppila