Background
Clinical Neurophysiology laboratories are encouraged to establish their own normative data due to several factors influencing nerve conduction study (NCS) reference ranges, including techniques, equipment and digital processing. Despite this, many laboratories lack robust local normative data. There are two main approaches to this problem: collecting data from ‘healthy’ subjects or applying statistical methods to infer normal ranges from mixed-population data. The former approach is resource intensive and may be biased by subclinical pathology. For the latter approach several statistical methods, each with limitations, have been described, such as: E-Norms (Jabre et al, 2015), E-refs (Nandedkar et al, 2018), and mixture model clustering (Reijntjes et al, 2021).
Aim
We explored a Gaussian Mixture Model (GMM) method to distinguish between ‘physiological’ and ‘pathological’ measurements (Reijntjes et al, 2021). We developed the method further to determine operator-specific positive and negative predictive values across a range of diagnostic cutoffs.
Methods
Data from 18,995 NCSs (134,752 individual nerves) performed over the last 5 years in the Department of Clinical Neurophysiology at King’s College Hospital were analysed. Density-based spatial clustering and robust linear regression were used to determine the effects of age and environmental temperature for each investigated nerve-specific parameter and operator. GMM is an unsupervised machine learning algorithm that can segregate mixed data by assigning probabilities to data points belonging to different clusters based on a mixture of Gaussian distributions. Applying GMM clustering to Box-Cox transformed data enabled us to establish diagnostic cutoffs, such as Youden’s J, and posterior probabilities were calculated across the range of observed values.
Results
Patient age and seasonal effects (temperature-related) were identified as covariates for nerve conduction study measurements. In addition, differences in inter-operator reference values were revealed, stressing the importance of determining not just laboratory-wide, but also operator-specific normative data.
Conclusion
The described statistical method can provide operator-specific local reference ranges by clustering ‘physiological’ and ‘pathological’ values from mixed-population data. This approach is advantageous as it enables the determination of diagnostic cutoffs and post-test probabilities. Furthermore, when ruling in or out a condition is clinically important, cutoffs corresponding to high positive / negative predictive values can be used to boost and quantify diagnostic confidence. Rather than a one-off investment into establishing normative data this process can provide continuous updates to keep pace with changing technology and techniques.