In an RSNA 2018 e-poster that received a certificate of merit from the judges, Dr. Sebastian Roehrich and colleagues at the Computational Imaging Research Lab in the department of biomedical imaging and image-guided therapy at the Medical University of Vienna presented the following checklist for radiologists: Is a research question clinically relevant?
What are possible clinical use cases for a machine-learning solution?
What is the quality and availability of the data?
What is the validity and reliability of the data?
Is there some inherent bias to the data?
Does the clinical workflow allow collection of the data in the intended way?
What is the clinical gold standard to which an algorithm needs to be compared with?
How can you correctly interpret the results in a clinical context?
By ignoring these points, a machine-learning project may fail, even if the methodological approach is sound, they pointed out.
Supervised versus unsupervised learning
For both supervised and unsupervised learning, it is important for the automated analysis conducted by the data scientist to determine whether the data are structured and whether certain parameters introduce variability to the image. For example, the fatty tissue of adipose patients leads to an increased attenuation, and thus noise over the whole CT scanning volume. This will influence imaging features extracted by machine learning, noted the authors, who added that other sources of variability are acquisition parameters, reconstruction kernel, slice thickness, and movement artifacts.
In machine learning, the input data may comprise different clinical parameters, histology images or results, lab findings, radiological images, or basically any medical data, but the outputs are different for supervised and unsupervised learning. For supervised learning, the output may be data (or a label) that a computer can predict, such as a diagnosis. For unsupervised learning, there are no labels, and the computer finds meaningful results on its own.
“What is supervised learning?” Roehrich and colleagues asked. “We present both input and output to the computer. The goal is to learn a predictive model from input to output (= label). Therefore, the supervised machine-learning algorithm does not find a new output. Neither can it confirm nor reject a hypothesis. It learns the way to the given label.”
To get a reliable result, both the input data and labels must be known beforehand, and after training, a successful model will be able to predict the label in new data that was not part of the training data, they explained. However, if data quality is low, an accurate prediction might not be possible.[…]