Image_1_Generalizability of High Frequency Oscillation Evaluations in the Ripple Band.TIFF
Objective: We examined the interrater reliability and generalizability of high-frequency oscillation (HFO) visual evaluations in the ripple (80–250 Hz) band, and established a framework for the transition of HFO analysis to routine clinical care. We were interested in the interrater reliability or epoch generalizability to describe how similar the evaluations were between reviewers, and in the reviewer generalizability to represent the consistency of the internal threshold each individual reviewer.
Methods: We studied 41 adult epilepsy patients (mean age: 35.6 years) who underwent intracranial electroencephalography. A morphology detector was designed and used to detect candidate HFO events, lower-threshold events, and distractor events. These events were subsequently presented to six expert reviewers, who visually evaluated events for the presence of HFOs. Generalizability theory was used to characterize the epoch generalizability (interrater reliability) and reviewer generalizability (internal threshold consistency) of visual evaluations, as well as to project the numbers of epochs, reviewers, and datasets required to achieve strong generalizability (threshold of 0.8).
Results: The reviewer generalizability was almost perfect (0.983), indicating there were sufficient evaluations to determine the internal threshold of each reviewer. However, the interrater reliability for 6 reviewers (0.588) and pairwise interrater reliability (0.322) were both poor, indicating that the agreement of 6 reviewers is insufficient to reliably establish the presence or absence of individual HFOs. Strong interrater reliability (≥0.8) was projected as requiring a minimum of 17 reviewers, while strong reviewer generalizability could be achieved with <30 epoch evaluations per reviewer.
Significance: This study reaffirms the poor reliability of using small numbers of reviewers to identify HFOs, and projects the number of reviewers required to overcome this limitation. It also provides a set of tools which may be used for training reviewers, tracking changes to interrater reliability, and for constructing a benchmark set of epochs that can serve as a generalizable gold standard, against which other HFO detection algorithms may be compared. This study represents an important step toward the reconciliation of important but discordant findings from HFO studies undertaken with different sets of HFOs, and ultimately toward transitioning HFO analysis into a meaningful part of the clinical epilepsy workup.
Read the peer-reviewed publication