Researchers have developed a multitask deep learning model that can effectively assess signs of hip osteoarthritis in x-rays, sharing their findings in Radiology.
“Radiography of the pelvis is the most commonly used primary imaging technique in patients suspected of having hip osteoarthritis,” wrote lead author Claudio E. von Schacky, MD, University of California, San Francisco, and colleagues. “Radiographic features of hip osteoarthritis include joint-space narrowing (JSN), osteophytes, subchondral sclerosis (SS), subchondral cysts (SCs), and flattening of the femoral head … however, accurate assessment of these features is time consuming and requires expertise, and reproducibility in the hands of inexperienced or untrained readers is limited.”
The team explored imaging data from more than 4,000 patients who participated in the Osteoarthritis Initiative observational study. The patients were recruited from February 2004 to May 2006, and a majority underwent follow-up imaging after four years. Participants were divided into a training, validation and testing datasets.
Overall, using findings from the internal test set, the team’s multitask deep learning model achieved accuracies of 87% for assessing femoral osteophytes (FOs), 69.9% for acetabular osteophytes (AOs), 81.7% for JSN, 95.8% for SS, and 97.6% for SCs. For the external set, the model achieved accuracies of 82.7% for FOs, 65.4% for AOs, 80.8% for JSN, 88.5% for SS, and 91.3% for SCs.
This performance was similar to that of experienced radiologists, a sign that suggests the researchers’ model could make a significant impact on patient care by assisting with the evaluation of such findings.
“Our study demonstrated the feasibility of a multitask deep learning approach to grading hip osteoarthritis features on radiographs and showed that its performance was similar to that of expert radiologists,” the authors wrote. “This model may be useful in large epidemiologic studies for structural assessment of hip osteoarthritis features.”
The study did have certain limitations, however. The AI model only used a single view when makings its assessments, for example, even though additional views are typically required for such assessments. Also, low-quality imaging studies were excluded altogether from the dataset, which could have “possibly affected the diagnostic performance of the model.”