Consensus error grid

From Wikipedia, the free encyclopedia

The consensus error grid (also known as the Parkes error grid) was developed as a new tool for evaluating the accuracy of a blood glucose meter. In recent times, the consensus error grid has been used increasingly by blood glucose meter manufacturers in their clinical studies. It was published in August 2000 by Joan L. Parkes, Stephen L. Slatin, Scott Pardo, and Barry H. Ginsberg.^[1] The guidelines for ISO15197:2013 specify the usage of the consensus error grid for evaluation of blood glucose monitoring systems.^[2]

Sources[edit]

http://care.diabetesjournals.org/cgi/reprint/23/8/1143

References[edit]

^ Parkes, J.L.; Slatin, S.L.; Pardo, S.; Ginsberg, B.H. (2000). «A new consensus error grid to evaluate the clinical significance of inaccuracies in the measurement of blood glucose». Diabetes Care. 23 (8): 1143–48. doi:10.2337/diacare.23.8.1143. PMID 10937512.
^ Pfützner, A; Klonoff, D.C.; Pardo, S; Parkes, J.L. (2013). «Technical aspects of the Parkes error grid». J Diabetes Sci Technol. 7 (5): 1275–81. doi:10.1177/193229681300700517. PMC 3876371. PMID 24124954.

Источник

Consensus error grid

Consensus error grid in clinical studies

OneTouch Ultra
WaveSense Presto

Please note: The bias error is NOT EQUAL the precision of the glucose test result.

Compare the Clarke Error Grid Zone A which considers 20% error as tolerable.

References

Parkes JL, Slatin SL, Pardo S, Ginsberg BH. A new consensus error grid to evaluate the clinical significance of inaccuracies in the measurement of blood glucose. Diabetes Care. 2000; 23(8):1143–1148.
http://care.diabetesjournals.org/cgi/reprint/23/8/1143

Figures and Tables from this paper

456 Citations

The Surveillance Error Grid

D. KlonoffCourtney Lias B. Kovatchev
Medicine

Journal of diabetes science and technology
2014

A new error grid, called the surveillance error grid (SEG), is developed as a tool to assess the degree of clinical risk from inaccurate blood glucose (BG) monitors to allow regulators and manufacturers to monitor and evaluate glucose monitor performance in their surveillance programs.

References

SHOWING 1-10 OF 15 REFERENCES

Источник

Достижения медицины позволили взять под контроль большое количество серьезных системных заболеваний, в частности — сахарный диабет (СД). Помимо развития медикаментозной терапии, принципиальное значение имело создание методов контроля уровня глюкозы в биологических жидкостях. В настоящее время невозможно оптимизировать лечение СД и добиться удовлетворительной его компенсации без средств гликемического контроля и самоконтроля [1, 2].

В развитии методов гликемического контроля можно выделить несколько этапов: I — органолептический анализ, II — химический анализ, III — электрохимический анализ и уже отчетливо просматривающийся IV этап — нанотехнологический (рис. 1).

Рис. 1. Этапы развития технологий гликемического контроля.

Методология первого этапа уже давно в прошлом. Наиболее совершенные разработки II этапа (тест-полоски Клинистикс) используются до сих пор, но возможность по-настоящему качественно контролировать СД появилась на этапе электрохимического анализа (III этап).

Создание эффективных средств гликемического контроля еще в 80-е годы прошлого века позволило поднять медицинскую помощь больным СД на качественно новый уровень. С этого времени биосенсоры постоянно совершенствовались, уменьшаясь в размерах, но увеличивая точность измерений и простоту использования. Современные биосенсоры (глюкометры) обладают высокой точностью, так как имеют три электрода. Референсный и базовый электроды контролируют поток электронов, абсорбируемых медиатором ферросеном при окислении глюкозы до глюконолактона (поток электронов пропорционален уровню глюкозы в крови), а триггерный электрод исключает эффект высоких концентраций «факторов влияния» (мочевой и аскорбиновой кислот и парацетамола), «вычитая» электроны метаболитов этих веществ.

Современная практическая медицина движется в направлении усовершенствования биосенсоров. С этой целью в РФ впервые был принят и утвержден Национальный стандарт ГОСТ Р ИСО 15197-2015, подготовленный International Organization for Standardization [3]. В связи с этим возобновился интерес к оценке аналитической и клинической точности средств гликемического контроля, наиболее широко используемых в клинической практике.

В общей группе биосенсоров выделяют глюкометры для профессионального применения (гос питальные, многопользовательские) и для индивидуального применения (самоконтроля). На качество измерений глюкометров для самоконтроля влияет ряд физических (температура, влажность, высота над уровнем моря и др.) и биологических факторов (высокий или низкий гематокрит, повышенная вязкость крови, гиперлипидемия, ацидоз, алкалоз и др.). Однако пациенты не всегда осознают тяжесть своего состояния и продолжают пользоваться средствами самоконтроля. Представляется интересным выяснить, насколько увеличивается риск ошибки при применении средств самоконтроля в условиях изменения кислотно-щелочного равновесия, т.е. оценить так называемый запас прочности современных глюкометров. Такая возможность появилась в процессе клинического исследования (АААА-А16-116020410106-0, 2015-11-01- 2016-01-25 гг.) с оценкой качества гликемического контроля у больных СД при использовании портативных измерителей концентрации глюкозы в крови ПКГ-03 Сателлит Экспресс (ООО «Компания «ЭЛТА», Россия). Исследование проводилось в условиях стационара, что позволило осуществлять наблюдение за пациентами, госпитализированными с неотложными состояниями (кетоацидозом).

Цель — оценить клиническую точность пкг-03 Сателлит Экспресс в исследовании гликемии у пациентов с СД 1-го и 2-го типов, получающих инсулинотерапию, в условиях, когда течение заболевания осложняется кетозом или кетоацидозом.

Материал и методы

Материалы статьи представляют собой ранее не публиковавшийся фрагмент уже упомянутого клинического исследования IV фазы (пострегистрационные исследования). Одномоментное эпидемиологическое исследование проведено для группы пациентов, госпитализированных в состоянии кетоза или кетоацидоза.

В одной и той же капле капиллярной крови пациентов определяли гликемию на глюкометре Сателлит Экспресс и на лабораторном анализаторе глюкозы и лактата SUPER GL. В соответствии с требованиями ГОСТ Р ИСО 14155-2014 пункт 6.2.3.1. «Дизайн изучения» прецизионность измерения (близость соответствия показателей измерения или значений измеренной величины, полученных параллельными измерениями одного и того же или сходного объекта) обеспечивалась серией измерений в течение короткого интервала времени одним и тем же лицом с использованием одних и тех же глюкометров и реагентов [3].

Критерии соответствия

Пациенты с СД 1-го типа, поступившие в состоянии кетоза и кетоацидоза.
Пациенты с СД 2-го типа, получающие инсулинотерапию, поступившие в состоянии кетоза и кетоацидоза.
Пациенты с СД 1-го и 2-го типов, получающие инсулинотерапию, и не имеющие на момент исследования нарушений метаболического равновесия.

Основой для включения пациентов в исследование являлось также информированное согласие пациента.

Все пациенты были обследованы в условиях стационара; базовый диагностический комплекс, помимо оценки анамнеза и физикальных методов исследования, включал суточный мониторинг гликемии и определение уровня НbА1с. Для диагностики кетоза и кетоацидоза определяли рН крови, НСО3, рСО2, кетоновые тела в сыворотке и моче, анионный интервал, эффективную осмолярность плазмы, калий, натрий, гематокрит. Стандартная и неотложная помощь была оказана пациентам в соответствии с алгоритмами специализированной медицинской помощи больным СД [2].

Обязательное условие для включения пациентов в исследование — инсулинотерапия. Основным критерием для распределения пациентов в группы было наличие нарушений кислотно-щелочного равновесия (кетоз, кетоацидоз). В 1-ю группу были включены пациенты без нарушений метаболического равновесия; во 2-ю группу — пациенты, у которых СД сопровождался кетозом или кетоацидозом, что и послужило причиной их госпитализации.

Оценка клинической точности глюкометра Сателлит Экспресс при использовании в условиях нарушения метаболического равновесия.

ГОСТ Р ИСО 15197-2015 не устанавливает критерии приемлемого риска (такой низкий уровень ошибки полученного значения показателя, что это не может привести к принятию неправильного решения), но эти критерии широко используются в оценке клинической точности глюкометров. Для установления приемлемой точности прибора был проведен отбор проб плановых пациентов, и разницу показателей, полученных на глюкометре и референтном анализаторе, оценивали по шкале ошибок Кларка. Данный метод позволяет изучить клиническую точность работы приборов для измерения концентрации глюкозы у пациентов с СД с учетом пяти зон риска. Оценка результатов по шкале ошибок Кларка рекомендована DIN EN ISO 15197-2:2013 и описана в Международном протоколе CLSI EP27-P:2009 [3, 4].

В процессе исследования сравнивали показатели гликемического контроля пациентов, полученные с помощью индивидуального глюкометра ПКГ-03 Сателлит Экспресс, с референсными значениями на предмет их соответствия требованиям ГОСТ Р ИСО 15197-2015. За референсные концентрации глюкозы в капиллярной крови принимались показатели анализатора глюкозы и лактата SUPER GL (CE Conformite Europeene, Германия). Автоматический анализатор предназначен для серийных исследований, имеет функцию измерения экстренных образцов. Принцип измерения — электрохимический энзиматический (глюкозооксидазный) метод. Анализатор калиброван по капиллярной крови. Для каждой партии тестов использовались 3 глюкометра и 3 тест-полоски из разных партий. Референтная методика измерения позволяет оценивать правильность измерения величины другими методиками (ИСО/МЭК Руководство 99:2007, определение 2.7).

Шкала ошибок Кларка распределяет парные показатели по пяти зонам (A, B, C, D и E) в зависимости от риска, связанного с неточным определением уровня глюкозы: зона А — клинически точные результаты, позволяющие скорректировать терапию; зона B — отклонения показателей не представляют опасности из-за принятого на их основе решения по коррекции терапии; зона C — ошибки привели бы к чрезмерной коррекции нормального уровня глюкозы, но не представляющей существенной опасности для здоровья пациента; зона D — ошибка не позволяет адекватно определить требующий коррекции уровень гликемии, а это уже опасно; зона E — ошибочные данные, противоположные истинным. Любой датчик уровня глюкозы должен пройти этот тест. Шкала ошибок Кларка — «золотой стандарт» оценки точности таких устройств. Получаемые с помощью сенсора результаты должны точно соответствовать результатам стандартных лабораторных исследований с 99,0% нахождением данных в зонах А и В [3, 4].

Соответствие исследования нормам биомедицинской этики подтверждено заключением Комитета по этике Медицинского института Российского университета дружбы народов (протокол №8 от 18.02.16).

Принципы расчета размера выборки: размер выборки предварительно не рассчитывался.

Методы статистического анализа данных. Статистическая обработка полученных данных проведена с использованием пакета прикладных программ statistica (StatSoft Inc. версия 8.0, USA) и SPSS 11.0. Количественные переменные в двух независимых группах при нормальном типе распределения оценивались методом дисперсионного анализа. Для расчета прецизионности использовался анализ вариант ANOVA. За уровень статистической значимости принималась р<0,05. Анализ проведен по согласованной сетке ошибок Кларка (Consensus Error Grid, CEG).

Результаты

В исследовании приняли участие 77 пациентов с СД 1-го типа (26 человек) и СД 2-го типа, находившиеся на инсулинотерапии (51 человек) в возрасте от 20 до 65 лет; из них 28 человек были госпитализированы в состоянии кетоза или кетоацидоза. В 1-й группе (пациенты, у которых не было нарушений кислотно-щелочного равновесия) оказалось 49 человек, от которых в процессе динамического исследования было получено 705 проб крови. Во 2-й группе (пациенты, поступившие в состоянии кетоза и кетоацидоза) было 28 пациентов, у которых однократно (до начала интенсивной терапии) проводился забор капиллярной крови (28 проб).

При оценке значений гликемии в 705 пробах капиллярной крови пациентов без кетоза и кетоацидоза (1-я группа) по шкале ошибок Кларка (рис. 2) было установлено, что 98% (692 пары) проб, полученных непосредственно из крови пациентов, оказались в зоне А и 2% (13 пар) — в зоне В. Ни один из показателей не находился в зонах С, D или E.

Рис. 2. Оценка клинической точности ПКГ-03 Сателлит Экспресс по согласованной сетке ошибок (Consensus Error Grid, CEG) по показателям капиллярной крови. Здесь и на рис. 3:

Зона А — клинически точные результаты; зона B — отклонения показателей не представляют опасности; зона C — ошибки, не представляющие существенной опасности для здоровья пациента; зона D — опасные отклонения показателей от истинных значений; зона E — ошибочные данные, противоположные истинным.

Клиническая и диагностическая точность результатов, полученных в группе пациентов, поступивших в состоянии кетоза и кетоацидоза (2-я группа), была ниже (рис. 3).

Рис. 3. Оценка клинической точности ПКГ-03 Сателлит Экспресс по согласованной сетке ошибок (Consensus Error Grid, CEG) по анализам нативной крови пациентов, находящихся в состоянии кетоацидоза.

В стандартных исследованиях при оценке аналитических характеристик (точность, вариативность) глюкометров состояние кетоацидоза и декомпенсация СД — критерии исключения. В представленном разделе исследования оценка достоверности результатов измерения уровня гликемии проводилась по отклонениям от референсных значений именно в группе пациентов с доказанным кетоацидозом (рН <7,2) или кетозом (кетонурия). Чтобы оценить качество гликемического контроля с помощью ПКГ-03 в условиях кетоацидоза, также был построен график по шкале ошибок Кларка (см. рис. 3).

В отклонениях показателей гликемии, полученных на глюкометре, от референсных значений во 2-й группе на границе зоны В и С оказался только один (3%) показатель, а это свидетельствует о том, что все пары значений оказались в зоне безопасных отклонений, даже в условиях острых осложнений СД (кетоз, кетоацидоз).

Оценка эффективности использования глюкометра у пациентов с диабетическим кетоацидозом, т.е. в условиях, которые могли бы привести к ошибочным результатам при самоконтроле гликемии, представлена в таблице .

Точность результатов измерений концентрации глюкозы с использованием ПКГ-03 Сателлит Экспресс по шкале Clark Error Grid — CEG у пациентов в условиях стационарного лечения

Клиническая группа	Число проб	Зона ошибок А, %	Зона ошибок В, %	Зона ошибок С	Зона ошибок D и Е
Пациенты, получавшие интенсивную терапию	243	96,7	3,3	0	0
Пациенты, получавшие стандартную гипогликемизирующую и антигипергликемическую терапию	462	99	1	0	0
Пациенты в состоянии кетоза или кетоацидоза	28	90	7	3%	0

В процессе исследования нежелательных явлений не отмечено.

Обсуждение

Полученные отклонения в показателях глюкометра у пациентов 1-й группы оказались в зонах А и В, т.е. зонах клинически точных результатов и без опасных отклонений.

В результате исследования были получены данные, позволяющие оценить диагностическую точность глюкометра Сателлит Экспресс у пациентов с СД и нарушениями кислотно-щелочного равновесия. При гликемии более 30 ммоль/л активируется липолиз, что приводит к нарастанию уровня свободных жирных кислот и глицерина. Свободные жирные кислоты поступают в печень, где из них образуются кетоновые тела. В результате развиваются неконтролируемая продукция кетоновых тел, ацидоз, оказывающий значительное влияние на точность показаний глюкометра. Ацидоз может приводить к ложно низким показателям глюкозы крови.

В глюкометрах, имеющих три электрода, качество измерения определяется активностью триггерного электрода, который исключает импульсы, образуемые при распаде кислот. Сохранение диагностической точности прибора в условиях ацидоза может косвенным образом характеризовать качество работы триггерного электрода глюкометра Сателлит Экспресс, основная функция которого (триггерный электрод) — исключение влияния электронов метаболитов других веществ.

Данные, полученные при исследовании крови пациентов, находящихся в состоянии кетоацидоза, позволяют считать, что глюкометр Сателлит Экспресс является надежным техническим средством самоконтроля гликемии даже в условиях возможного влияния биологических факторов на точность измерений гликемии.

Обычно наличие таких факторов, как ацидоз, алкалоз, дегидратация, гипоксия, является основанием для исключения из любых исследований по оценке аналитических и клинических характеристик средств измерения содержания глюкозы в крови. В приведенном же исследовании изначально закладывался нестандартный подход, а именно оценка пределов точности ПКГ Сателлит Экспресс в условиях ацидоза, поскольку в реальной жизни пациент может оказаться в таких условиях.

Заключение

Использование ПКГ-03 Сателлит Экспресс в клинической практике позволяет качественно контролировать гликемию, удобно и безопасно для пациентов. Пределы допустимой систематической погрешности измерений соответствуют требованиям национального и Международного стандартов и обес печивают удовлетворительное качество контроля, даже в условиях декомпенсации СД (кетоз и кето ацидоз); 97% отклонений показаний глюкометра от референсных значений в этих случаях соответствуют зонам клинически верных и безопасных отклонений.

Дополнительная информация.

Источник финансирования. Проведение клинического исследования было проведено на основании Договора, заключенного между Российским университетом дружбы народов и ООО «Компания «ЭЛТА», Россия. Финансирование клинического исследования и публикации материалов за счет средств ООО «Компания «ЭЛТА».

Конфликт интересов. Авторы декларируют отсутствие явных и потенциальных конфликтов интересов, связанных с публикацией настоящей статьи.

Участие авторов: концепция и дизайн исследования — И.А. Курникова, Л.Ю. Моргунов; сбор и обработка материала — А.У. Уалиханова, Э.Р. Мавлялиева, М.А. Сурикова; анализ полученных данных — И.А. Курникова, А.У. Уалиханова, Л.Ю. Моргунов; написание текста — И.А. Курникова, А.У. Уалиханова; редактирование — И.А. Курникова Благодарности. Авторы выражают благодарность Российским университетом дружбы народов (ректор — акад. РАО В.М. Филиппов) и Городской клинической больнице им. А.К. Ерамишанцева Департамента здравоохранения Москвы (главный врач — д.м.н. А.Р. Габриелян) за помощь в организации и проведении исследования.

Источник

Зарегистрируйтесь в проекте и получите приветственные 10 баллов

В рамках образовательного проекта вы сможете:

посмотреть

вебинары, аккредитованные в Совете НМО, и получить баллы

задать

в прямом эфире вопросы ведущим специалистам и получить исчерпывающие ответы

прочитать

краткую и полезную выдержку из лекций прошедших в рамках проекта

изучить

материалы по теме в формате красочной инфографики

Программа

Видеотвит

Современные подходы к самоконтролю глюкозы. Часть 1

Спикер:

Канд. мед. наук Н.А. Черникова

Время просмотра: 5 минут

17 октября
16:00 (Мск)

Видеозапись

Современные подходы к самоконтролю глюкозы. Часть 1

Спикер:

Канд. мед. наук Н.А. Черникова

Время просмотра: 50 минут

Видеотвит

Сердечно-сосудистые риски у пациентов с сахарным диабетом: диалог эндокринолога и кардиолога

Спикеры:

Проф. О.Д. Остроумова
Канд. мед. наук Д.Г. Гусенбекова

Время просмотра: 5 минут

27 октября
16:00 (Мск)

Видеозапись

Сердечно-сосудистые риски у пациентов с сахарным диабетом: диалог эндокринолога и кардиолога

Спикеры:

Проф. О.Д. Остроумова
Канд. мед. наук Д.Г. Гусенбекова

Время просмотра: 90 минут

Видеотвит

Современные подходы к самоконтролю глюкозы. Часть 2

Спикер:

Канд. мед. наук А.В. Зилов

Время просмотра: 5 минут

21 ноября
16:00 (Мск)

Видеозапись

Современные подходы к самоконтролю глюкозы. Часть 2

Спикер:

Канд. мед. наук А.В. Зилов

Время просмотра: 50 минут

12 декабря
16:00 (Мск)

Видеозапись

Современные подходы к самоконтролю глюкозы. Подведение итогов

Спикеры:

Канд. мед. наук Н.А. Черникова
Канд. мед. наук А.В. Зилов

Время просмотра:

Спикеры

Остроумова Ольга Дмитриевна

Д-р мед. наук, проф., зав. кафедрой терапии и полиморбидной патологии имени академика М.С. Вовси ФГБОУ ДПО РМАНПО Минздрава РФ

Черникова Наталья Альбертовна

Канд. мед. наук, доц. кафедры эндокринологии ФГБОУ ДПО РМАНПО Минздрава РФ, старший науч. сотр. отдела персонализированной медицины НИЦ

Гусенбекова Динара Гаджимагомедовна

Канд. мед. наук, врач-эндокринолог высшей категории, ассистент каф. терапии и полиморбидной патологии имени академика М.С. Вовси ФГБОУ ДПО РМАНПО Минздрава РФ

Зилов Алексей Вадимович

Канд. мед. наук, доц. кафедры эндокринологии Первого МГМУ имени И.М. Сеченова, член президиума Российской Ассоциации Эндокринологов, член Европейской эндокринологической ассоциации по изучению сахарного диабета

Материалы, необходимые в ежедневной практике

Диабетическая ретинопатия

COVID-19 и сахарный диабет: на что обратить внимание?

Сахарный диабет и здоровье сердца

Изучайте материалы, собирайте баллы и получите возможность выиграть призы

20 сертификатов Ozon номиналом 3000 рублей

от портала CON-MED.RU для участников, набравших
наибольшее количество баллов

Контур Плюс Уан (Contour® Plus One):
высокая точность измерений глюкозы крови¹…

Система Контур Плюс Уан продемонстрировала
высокую точность: полученные результаты превосходят
минимальные требования к точности
стандарта ISO 15197:2013*^,1

Благодаря этому пациенты получают
точные результаты.

Стандарт ISO допускает погрешность
0,83 ммоль/л или ±15%².

Система Контур Плюс Уан
продемонстрировала погрешность
в общем диапазоне ±0,47 ммоль/л или
±8,5%^**1.

Легкая интерпретация результатов благодаря функции «Умная
подсветка» – помогает с первого взгляда определить находится ли
уровень глюкозы крови в целевом диапазоне.

Технология взятия образца крови «Второй шанс» – высокая точность
измерений, даже после добавления крови на ту же тест-полоску³

Интеллектуальный контроль диабета с помощью приложения Контур
Диабитис – беспроводная передача измерений глюкозы в мобильное
Приложение Контур Диабитис позволяет вести дневник в смартфоне,
просматривать результаты в удобном виде и отправлять отчеты лечащему врачу.

Загрузите приложение CONTOUR™ DIABETES (Контур Диабитис):

Источники литературы:

1. Bailey T. et al. Journal of Diabetes Science and Technology. 2017; Vol. 11(4):736-743

2. Международная организация стандартизации. Диагностические тест-системы in vitro — требования к системам мониторинга уровня глюкозы в крови для самоконтроля при лечении сахарного диабета (ISO 15197). Международная организация по стандартизации, Женева, Швейцария, 2013 г.

3. Harrison B, Brown D. Expert Rev Med Devices. 2020 Jan 10:1-8. doi: 10.1080/17434440.2020.1704253

РУ №ФСЗ 2008/02237 от 18.12.2018 г., №РЗН 2015/2584 от 17.12.2018 г.

Источник

The measurement of arterial pressure (AP) is a pivotal part of cardiovascular monitoring and hemodynamic therapy in anesthesiology and critical care. Depending on patient-related factors and the clinical setting, physicians can choose from a variety of different techniques for the assessment of AP.^¹^,^²

Automated intermittent noninvasive AP assessment with the oscillometric method is widely used in clinical practice.^¹ However, in high-risk surgical patients and critically ill patients, continuous assessment of AP using an arterial catheter is considered the criterion standard method.^¹ In recent years, a variety of innovative technologies able to provide continuous AP measurements in a completely noninvasive manner became additionally available.^¹^,^{^3–5}

When choosing a certain AP monitoring technique for the individual patient, it is of importance to balance the risks and benefits of each technique (considering the measurement performance, invasiveness, and continuity of AP readings). Especially in regard of the novel technologies that are regularly introduced into the market, one should be aware of each technique’s practical advantages and limitations.^²^,^⁶ This awareness is of crucial importance to avoid misinterpretation of AP readings and subsequent treatment errors. A prerequisite for a physician’s informed decision to use a certain AP monitoring technique in an individual patient or in a certain group of patients is that clinical validation studies are available that describe the measurement performance based on adequate statistical analyses and provide appropriate conclusions.^⁶ These clinical validation studies are “method comparison studies” comparing a test method with a reference method. Different comparative statistical tests are used in these method comparison studies; the appropriate application of these statistical tests and a profound knowledge of their problems and pitfalls are of utmost importance to be able to draw meaningful conclusions about the results of any method comparison study.

THE PROBLEM WITHIN AP MEASUREMENT METHOD COMPARISON STUDIES

Method comparison studies comparing a test method with a reference method give information on (1) how accurately and precisely the test technique is able to measure AP (ie, a description of the systematic and random error) and (2) whether the test technique is able to correctly indicate the direction and magnitude of changes in AP (trending).^{^7–10} Several statistical methods are usually applied in these studies.^{^7–10} Correlation analysis assesses the relation (not agreement) between 2 measurement methods by providing a correlation coefficient and linear regression. The Bland-Altman analysis provides the mean of the differences (and its standard deviation) between the test method and the reference method and the 95% limits of agreement (LOA)^¹¹ and allows calculation of the percentage error.^¹² In addition, analyses based on 4-quadrant or polar plots allow evaluating whether changes in the “true” value (assessed with the reference method) are adequately and timely depicted by the measurements done with the test method.^⁹

Based on the numeric and graphic results of the statistical tests, authors try to conclude whether the test method shows “clinically acceptable agreement” or is “interchangeable” with an established criterion standard method. However, it is complex to generally define “acceptable agreement” between a test AP measurement technology and the criterion standard method based on the existing established statistical tests for several reasons. The definition of “acceptable agreement” depends on the patient group and clinical setting (eg, low-risk surgery versus high-risk surgery versus critical illness; emergency department versus intensive care unit versus operating room).

In addition, in different ranges of AP, it is of different clinical relevance how accurately and precisely a technology is able to provide the AP values. However, the issue of “clinical relevance” is not adequately reflected in correlation analysis or Bland-Altman analysis. Correlation analysis describes the linear relation (not agreement) of different sets of measurements and can provide r values (or r² values) close to 1 although the differences between single measurements, eg, in a certain AP range, can be quite large, indicating poor agreement.^¹³ Note that for simple linear regression, r is identical to the Pearson correlation coefficient, and the more well-known r² describes the proportion of the variance of the outcome (or reference) explained by the predictor. Linear regression may be in particular useful when there is a gold standard as a reference. Then the estimated systematic relationship between the new method and gold standard may be used to predict the reference from the new method.

On the other hand, if the study goal is to assess agreement as opposed to the strength of the linear relationship, several methods are available.

A method that quantifies agreement (not just association) between 2 series of measurements is, for example, Lin’s^¹⁴ concordance correlation coefficient. It is a combination of the Pearson correlation coefficient and the bias from the 45° line in a scatter plot. It is obviously more appropriate than r when a measure of agreement is sought, but would still suffer from the same limitation of not distinguishing the different risks associated with deviations across the range of true values of the variable.

When performing Bland-Altman analysis, the mean of the differences by definition is calculated as the mean of all single differences between corresponding measurement points over the whole range of recorded AP data (eg, mean AP 35–140 mm Hg). In addition, negative and positive differences balance each other when calculating the mean of the differences. The visual inspection of the Bland-Altman plot allows drawing conclusions about a method’s measurement performance in different levels of AP values (especially when regression analysis is additionally performed to identify nonlinear relations). However, relying only on the mean difference across all observations between the test and reference technique may be only a rough estimate of the measurement performance available in the data set.

Of note, a measurement error of, for instance, 10 mm Hg is much more important in a patient with an actual mean AP of 50 mm Hg compared with a patient with an actual mean AP of 100 mm Hg. In line with these considerations on the mean of the differences, a percentage error calculated over a wide range of AP data ignores the fact that, for instance, a difference of 20% is, again, clinically much more relevant when the mean AP is 50 mm Hg than when the mean AP is 100 mm Hg. To provide adequate information on a technique’s measurement performance in comparison with a criterion standard technique, a statistical test should ideally reflect the fact that AP differences are of different clinical relevance depending on the level of AP values.

AIM OF THIS STUDY

Established statistical tests are able to provide measures to describe the “statistical agreement” between different AP measurement techniques but do not provide much information on the clinical relevance of the respective statistical findings and leave the clinician alone in the definition of clinically “acceptable agreement.”

For example, although the Bland-Altman LOA show the range where 95% of differences are expected to fall, it is left up to the study authors or the reader to decide whether those limits fall outside of what would be considered clinically acceptable agreement.

In this study, we propose an error grid analysis to illustrate the agreement of 2 AP measurement methods with regard to the clinical relevance of measurement differences. Differences between 2 measurements are classified into 5 risk levels ranging from “no risk” to “dangerous risk”; the classification depends on both the differences between the measurements and the magnitude of the measurements themselves. In this article, we provide error grids with calibrated risk zones derived from a survey among 25 experts in the field of anesthesiology and intensive care medicine. We provide the coordinates for the error grid zones in Supplemental Digital Content 1, Appendix 1, https://links.lww.com/AA/C96, which researchers may adapt to their study setting based on their own pathophysiological considerations.

ERROR GRID ANALYSIS IN AP MEASUREMENT METHOD COMPARISON STUDIES

Basic Concept of AP Error Grid Analysis

The error grid analysis method has been proposed for method comparison studies for blood glucose monitoring systems.^¹³^,^{^15–17} Given the fact that AP is a completely different variable associated with entirely different boundaries for the risk zones and different considerations in clinical routine, we redesigned the error grid analysis and defined appropriate risk zones for AP measurement differences between 2 measurement systems. In the following, we suggest an approach for error grid analysis in AP method comparison studies. It is important to consider the proposed error grid method as a graphical and quantitative method in which error grids and risk regions are created based on expert opinion (but not measured deviations) and smoothed. Measured deviations are then overlaid on the grid. The resulting graphic and quantitative summaries can finally be used to assess the clinical relevance of deviations. To compute an AP error grid, the AP data are plotted in a scatter plot with the reference method on the horizontal (x) axis and the test method on the vertical (y) axis. The scatter plot is divided into different areas of interest representing different “risk zones” depending on the clinical consequence of a measurement difference falling into the respective zone. Based on visual inspection of the data and quantitative analysis indicating the absolute number (and percentage) of data points within the different zones, the error grid allows analyzing the clinical relevance of measurement differences.

Our AP error grid approach is based on the following assumptions:

There is a target AP range for systolic AP and mean AP.
Physicians will try to correct AP values outside the target range by therapeutic interventions.
Treatment of AP values inside the target range is inappropriate and potentially dangerous.
Failure to treat AP values outside the target range is also inappropriate and potentially dangerous.
AP values measured with the test method (in relation to the “true” AP value measured by the reference method) can trigger therapeutic interventions that are beneficial or dangerous for the patient.

These assumptions formed the basis for the definition of 5 risk levels for the patient resulting from potential treatment errors due to measurement differences between the test and the reference method. The risk levels are therefore deducted from the reaction to a wrong measurement compared with the clinically appropriate reaction to the true value.

We defined the 5 levels of risk A to E:

No risk (ie, no difference in clinical action between the reference and test method)
Low risk (ie, test method values that deviate from the reference but would probably lead to benign or no treatment)
Moderate risk (ie, test method values that deviate from the reference and would eventually lead to unnecessary treatment with moderate non–life-threatening consequences for the patient)
Significant risk (ie, test method values that deviate from the reference and would lead to unnecessary treatment with severe non–life-threatening consequences for the patient)
Dangerous risk (ie, test method values that deviate from the reference and would lead to unnecessary treatment with life-threatening consequences for the patient)

The error grid therefore basically consists of 5 zones corresponding to these 5 risk levels.

Construction of an Error Grid Based on Expert Opinion

To obtain unbiased definitions for the AP ranges and limits of the 5 risk zones of the error grid, we conducted a survey among 25 international specialists in anesthesiology and intensive care medicine. We used the survey methodology of the surveillance error grid as described before by Klonoff et al.^¹⁸ All of the specialists are highly experienced clinicians treating critically ill patients in the perioperative phase or in the intensive care unit.

Thirty specialists were asked to fill out a 2-step questionnaire (Supplemental Digital Content 2, Document 1, https://links.lww.com/AA/C97). Each specialist was reached personally (either orally by phone or via email). The specialists received instructions as required and were offered to ask questions for clarification. Only a minority had to ask a question for further clarification. Because the questionnaire was completed by 25 medical professionals in the end, the response rate was 83%. The median age of the specialists was 44 years with a median of 14.5 years of working experience. Seventeen of them were mainly anesthesiologists in the operating room while 8 were mainly working in the intensive care unit.

First, we asked the specialists to consider AP measurements in patients treated in daily clinical practice and to define 5 ranges of AP values both for mean AP and systolic AP associated with the following therapeutic actions:

Code 1: Emergency treatment for low AP
Code 2: Treatment of low AP appropriate
Code 3: No action needed
Code 4: Treatment of high AP appropriate
Code 5: Emergency treatment for high AP

We deliberately excluded the diastolic AP from the error grid analysis because of its minor role as an isolated value in the perioperative and critical care setting.

Second, we asked the specialists to imagine that a patient’s AP is measured simultaneously with a clinical reference method (invasive arterial catheter) and a “novel blood pressure measurement technology” and to assign the new technology’s measured deviations (=error) from the true AP value to a degree of risk (5 levels of risk A–E as described above). For every combination of “measurement lies in interval i” and “true value lies in interval j,” the experts were asked to provide their assessment of resulting risk due to adequate/inadequate therapeutic interventions. To compute the error grids, we aggregated the experts’ opinions from the questionnaires. For every combination of true value x and of measurement y (where x and y are both varied from 1 to 250), we summed up the 25 risk assessments using the weights [0, 5, 10, 30, 50] for the 5 subjective risk levels from “no risk” to “dangerous risk.” Note that there is no natural and self-suggesting weighting scheme for this task. One has to decide how much more serious “dangerous risk” is than “low risk” (in terms of numbers). When designing their error grid for blood glucose monitors, eg, Klonoff et al^¹⁸ use the weights [0, 1, 2, 3, 4] implying that the highest risk is 4 times as serious as the low-risk case and 1/3 more serious than the second highest risk. In the case of AP, however, the weighting scheme should be more conservative and serious risk should be weighted more. Therefore, the scheme proposed here weights dangerous risk 10 times higher than the low-risk case and 5 times higher than the moderate-risk case. Comparing the risk levels with the clinical definitions of the levels in the last section (eg, life-threatening versus non–life-threatening cases) may further validate the plausibility of the scheme.

The aggregated 250 × 250 grid contains scores from 0 (all 25 experts agree in “no risk”) to 1250 (all 25 experts agree in “dangerous risk”) for every measurement–true value combination, ie, 62,500 numbers in total. Note that this grid of aggregated numbers is similar to the Klonoff et al^¹⁸ continuous surveillance error grid. Figure 1A, B shows the resulting continuous error grid. For convenience, we normalized the maximum risk score (1250) to 100% and used the same color scheme as used by Klonoff et al.^¹⁸ Furthermore, we conducted the outlier procedure suggested in Appendix B in Klonoff et al^¹⁸ to prevent decreasing scores with increasing risks. For systolic AP, 0.2% of all entries of the 250 × 250 grid were affected by the outlier procedure and the replacement led to no visible changes (for mean AP no values were affected).

Figure 1.:

Continuous version of the error grid for systolic (A) and mean (B) arterial pressure. A risk level between 0% and 100% risk is assigned to each combination of measurement and true value (test device versus gold standard). The actual numbers are calculated by adding up weighted risk assessment of 25 experts.

We provide the numerical values of the continuous plot in an Excel spreadsheet in Supplemental Digital Content 3, Document 2, https://links.lww.com/AA/C98.

Figure 2.:

Error grids for systolic arterial pressure (A) and mean arterial pressure (B) are shown. The error grids are based on a survey of 25 specialists in anesthesiology and intensive care. The horizontal axis represents arterial pressure values measured by a gold standard method, and the vertical axis represents arterial pressure values measured by a test device. The grid is divided into zones showing the degree of risk posed by the test device’s incorrect measurement: zone A represents no risk for the patient; zone B represents low risk; zone C represents moderate risk; zone D represents significant risk; and zone E represents dangerous risk. The colors are based on the survey together with the derived limits defining the risk zones to illustrate the fit of the zones.

A further goal of our study is to provide risk zones for the resulting plot, ranging from “no risk” to “dangerous risk.” To this end, we classify risk scores from 0 to 199 to represent “A: no risk,” 200–499 “B: low risk,” 500–799 “C: moderate risk,” 800–1099 “D: significant risk,” 1100–1250 “E: dangerous risk.” Figure 2A, B shows the resulting consensus error grids for systolic AP and mean AP with different colors corresponding to the different risk levels. Note that the figures also contain the smoothed boundaries of the risk levels which are introduced in the next section.

Smoothing of the Consensus Error Grids

In a next step, we specify polygons to address the risk regions within the error grids constructed based on the cumulative specialist opinions. When drawing the polygons, we keep in mind simplicity in terms of numbers of knots and clinical relevance. The resulting error grid polygons are shown in Figure 2A (systolic AP) and Figure 2B (mean AP) and their coordinates are listed in Supplemental Digital Content 1, Appendix 1, https://links.lww.com/AA/C96. Note that Figure 2A, B shows colors based on the survey together with the derived limits defining the risk zones to illustrate the fit of the zones. In subsequent figures, the defined risk zones will be colored and the survey data will not further be pictured. There are 2 areas, where our lines deviate systematically from the aggregated experts’ opinions. First, on the lower left corner (representing very low and therefore dangerous true AP values), the zone “no risk” borders directly on “high risk,” omitting areas for the other increasing risk levels. The reason for this is that at such low AP values, even minor variations in measurement accuracy and consequently delayed or opposed action may have fatal consequences for the patient. Second, for the sake of simplicity, we decided to use mainly straight lines in the center of the plots instead of mimicking the curves of the experts’ opinions.

We further analyzed the variability between the 25 expert opinions with respect to their assessment of certain regions. In Figure 3A, B, we therefore provide a variation plot in terms of the color-coded interquartile range between the 25 experts. As to be expected, in the transition areas from “no risk” to “dangerous risk,” the variety of different opinions is largest, depicting the gradual increase of the average assessment of risk. Note that when independently questioning the experts and then aggregating their assessments, we do not see a convergence in their opinions as compared to an open discussion between the experts and we do expect variability between the experts.

Figure 3.:

Variation between the responses of the 25 specialists in anesthesiology and intensive care medicine derived from a questionnaire for systolic (A) and mean (B) arterial pressure. For every point of the grid, the interquartile range between the responses of the 25 experts multiplied by the weights chosen for construction of the error grid is shown together with the risk regions of the aggregated error grid. The variation is largest in the area of risk level C, reflecting the transition between low-risk regions and high-risk regions.

Having specified the risk regions, the resulting error grids allow assessing the measurement performance of AP measurement techniques in comparison to the true AP value (reference technique) with regard to clinical relevance of the measurements. For example, a systolic AP measurement of 90 mm Hg with the test method (ie, no immediate therapeutic intervention needed) while the true systolic AP is around 55 mm Hg is a higher risk situation (risk level D or E). A lower risk situation (risk level A or B) would occur when the test method gives a systolic AP of 105 mm Hg, while the true value is around 140 mm Hg. In both cases, the absolute difference between the test method and the reference method is 35 mm Hg but results in completely different risk situations for the patient. The error grid-derived information regarding the clinical relevance of the agreement can be used in combination with the information about measurement agreement given by the Bland-Altman analysis as illustrated in the following.

Worked Examples Using the Consensus Error Grids: Information in Addition to the Bland-Altman Plot About Clinical Relevance of Measurement Differences

Based on 2 artificial scenarios (worked examples), Figures 4 and 5 show the comparison between AP measurements with a test method and a reference method using error grids and Bland-Altman plots. The Association for the Advancement of Medical Instrumentation standards for noninvasive AP measurement (ANSI/AAMI SP10) defines clinically acceptable agreement as a bias of ±5 mm Hg and a standard deviation of 8 mm Hg.^¹⁹ In the first scenario (Figure 4), the Bland-Altman methodology (Figure 4B) indicates moderate accuracy (mean of the differences) and poor precision (standard deviation and 95% LOA of the mean of the differences) of the test method in comparison with the reference method. However, error grid analysis (Figure 4A) shows that the agreement of the test method with the reference method is good in terms of clinically relevant measurement differences because there are no measurements in the high-risk zones of the error grid. The high standard deviation (and therefore 95% LOA) of the mean of the differences results from lower precision in AP ranges in which differences in the AP measurements are clinically less relevant (ie, do not trigger dangerous or prevent necessary therapeutic interventions).

Figure 4.:

Worked example for the comparison between mean arterial pressure measurements with a test method and a gold standard using error grid (A) and Bland-Altman plot (B). In this scenario, the Bland-Altman methodology indicates moderate accuracy (mean of the differences; continuous horizontal line) and poor precision (standard deviation and 95% limits of agreement of the mean of the differences; dashed horizontal lines) of the test method in comparison with the gold standard. However, error grid analysis shows that the agreement of the test method with the gold standard is good in terms of clinically relevant measurement differences because there are no measurements in the high-risk zones of the error grid. Zone A represents no risk for the patient; zone B represents low risk; zone C represents moderate risk; zone D represents significant risk; and zone E represents dangerous risk. ABP indicates invasive arterial blood pressure; NBP, noninvasive blood pressure.

Figure 5.:

Worked example for the comparison between mean arterial pressure measurements with a test method and a gold standard using error grid (A) and Bland-Altman plot (B). In this scenario, the Bland-Altman methodology indicates relatively high accuracy (mean of the differences; continuous horizontal line) and higher precision (standard deviation and 95% limits of agreement of the mean of the differences; dashed horizontal lines) of the test method. However, error grid analysis illustrates that the agreement of the test method with the reference method is poor in terms of clinically dangerous measurement differences with several data pairs lying in high-risk zones of the error grid. Zone A represents no risk for the patient; zone B represents low risk; zone C represents moderate risk; zone D represents significant risk; and zone E represents dangerous risk. The test method has the tendency to overestimate the true arterial pressure for lower values. The error grid analysis reveals that differences between arterial pressure values obtained with the test method and the reference method may erroneously induce or prevent therapeutic interventions. ABP indicates invasive arterial blood pressure; NBP, noninvasive blood pressure.

The opposite scenario is illustrated in Figure 5. Here, in comparison with the first scenario, the Bland-Altman plot (Figure 5B) indicates higher accuracy (mean of the differences) and higher precision (standard deviation and 95% LOA of the mean of the differences) of the test method. However, error grid analysis (Figure 5A) illustrates that the agreement of the test method with the reference method is poor in terms of clinically dangerous measurement differences with several points lying in high-risk zones of the error grid. In this scenario, the test method has the tendency to overestimate the true AP for lower values. For mean AP, error grid analysis yields 76, 20, 3, 0, and 1 measurements for the risk levels A–E, respectively. Thus, the error grid analysis reveals that differences between AP values obtained with the test method and the reference method may erroneously induce or prevent therapeutic interventions.

Examples Using the Consensus Error Grids With AP Data From the MIMIC II Database

Figure 6.:

Illustration of the error grid analysis using simultaneously measured noninvasive (oscillometry) and invasive (arterial catheter) systolic arterial pressure (AP) data from the Multiparameter Intelligent Monitoring in Intensive Care II database in 1035 patients. The Bland-Altman plot (B) shows marked discrepancies between arterial catheter–derived systolic AP values and AP values assessed using noninvasive oscillometric measurements. To get information about the clinical relevance, the error grid analysis (A) can be used and demonstrates that the proportions of measurements in risk levels A–E are 78%, 14%, 6%, 1%, and 1% for systolic AP. Zone A represents no risk for the patient; zone B represents low risk; zone C represents moderate risk; zone D represents significant risk; and zone E represents dangerous risk. This allows conclusions about the incidence of dangerous situations due to measurement differences between methods. ABP indicates invasive arterial blood pressure; NBP, noninvasive blood pressure.

Figure 7.:

Illustration of the error grid analysis using simultaneously measured noninvasive (oscillometry) and invasive (arterial catheter) mean arterial pressure (AP) data from the Multiparameter Intelligent Monitoring in Intensive Care II database in 1035 patients. The Bland-Altman plot (B) shows marked discrepancies between arterial catheter–derived mean AP values and AP values assessed using noninvasive oscillometric measurements. To get information about the clinical relevance, the error grid analysis (A) can be used and demonstrates that the proportions of measurements in risk levels A–E are 18%, 54%, 20%, 7%, and 8% for mean AP, respectively. Zone A represents no risk for the patient; zone B represents low risk; zone C represents moderate risk; zone D represents significant risk; and zone E represents dangerous risk. This allows conclusions about the incidence of dangerous situations due to measurement differences between methods. ABP indicates invasive arterial blood pressure; NBP, noninvasive blood pressure.

We further illustrate the error grid analysis using AP data from the Multiparameter Intelligent Monitoring in Intensive Care (MIMIC) II database (MIMIC II v2 Numerics Database; mimic2db/numerics on the Pysionet Server^{^20–22}). We removed all patients from the database in whom not at least 1 concurrent pair of invasive (arterial catheter) and noninvasive AP measurement is available (for systolic, mean, and diastolic AP at the same time). After this step, 1035 patients remained. For most of the patients (18%), 1 pair of measurements was available, for 10% 2 pairs. The median of consecutive measurements is 8, the mean is 113 measurements, and the maximum around 60,000 measurements. We used only 1 pair of measurements of each patient to circumvent repeated measurement problems. In case of n > 1 consecutive measurements per patient, we used the n/2th measurement, ie, the middle (if necessary, rounded up to (n + 1)/2). The 1035 resulting measurements are shown in Figure 6A, B for systolic AP and Figure 7A, B for mean AP. In both cases, the error grid and the Bland-Altman plot rate the agreement of invasive and noninvasive measurement as not acceptable. The Bland-Altman standard deviation (bias) is 25.5 (6.1) and 23.1 (28.3) for systolic AP and mean AP, respectively. As previously described,^²³ the Bland-Altman plot shows marked discrepancies between arterial catheter–derived systolic and mean AP values and AP values assessed using noninvasive oscillometric measurements in the ICU patients available in the MIMIC II database (applying the Association for the Advancement of Medical Instrumentation [AAMI] criteria that define clinically acceptable agreement as a bias of ±5 mm Hg and a standard deviation of 8 mm Hg^¹⁹). However, based on the mean and the standard deviation of the differences, the Bland-Altman plot does not allow drawing definite conclusions regarding the clinical relevance of the AP measurement differences. To get information about the clinical relevance, the error grid analysis can be used and demonstrates that the proportions of measurements in risk levels A–E are 78%, 14%, 6%, 1%, and 1% for systolic AP and 18%, 54%, 20%, 7%, and 8% for mean AP, respectively. This allows conclusions about the incidence of dangerous situations due to measurement differences between methods. Further, these numbers derived from a large medical database might be used for comparison of results when evaluating a novel AP measurement technology. The error grid analysis also shows that there is a clear difference between the reliability of the 2 variables systolic and mean AP. This might be explained by the fact that larger differences in systolic AP might still result in similar and constant mean AP values while the mean AP represents the most reliable parameter for organ perfusion of the 3 AP values and therefore less variation between measurement techniques was tolerated by the specialists.

Limitations and Further Developments

One limitation of the error grid analysis as proposed in this study is that due to the geometric shape of the zones and their placement within the error grid, there remains the possibility that marginally different AP measurements are assigned to completely different zones (“zone skipping”). As proposed by Parkes et al,^¹⁷ for blood glucose measurement studies, our AP error grid might be adapted and modified for specific patient populations and other reference methods than the invasive reference.

In this article, we provide error grids with risk zones which were derived from a survey among 25 experts in the field of anesthesiology and intensive care medicine. The lack of response validation is a strong limitation of our study. The presented error grid risk zones are not based on evidence in the form of studies that the designated risk regions would actually lead to the designated actions but rather on clinical experiences and reasonable ranges accepted among the medical society.

In a previous approach suggested by Morey et al,^²⁴ error grid analysis was defined for hemoglobin measurement methods using 3 instead of 5 risk zones.

One might argue that 3 risk zones instead of 5 might be more convenient in regard of providing information to the user on whether the action taken would differ using the new method compared with the action using the reference. Given the clinical situation regarding AP with a wide continuous transition area between no risk and extreme risk, a more gradual classification of a measurement’s risk provides a deeper understanding of the situation. This is underlined by the variation plots (Figure 3A, B) together with continuous risk plots (Figure 1A, B). The continuous plot allows researchers and practitioners to easily divide the continuous risk levels into a 3 level system if they wish. However, one should consider the advantage of the 5-risk-zone approach from a clinical perspective: When comparing our method with Morey’s article on a 3-zone error grid for hemoglobin measurements, 1 important fact must be highlighted: in our method, distinguishing between the measurement differences in (very) low and (very) high-value ranges is highly important. In Morey et al’s^²⁴ article discussing measurement comparison for the variable hemoglobin, the high-value ranges (concentrations >10 g/dL) are of minimal interest to the anesthesiologist. This is different for AP where high values are absolutely relevant. Therefore, it is important that our proposed method enables evaluation of the measurement performance in regard of clinical relevance in the low- and high-value ranges. This information cannot be obtained from the LOA alone (the visual inspection of the Bland-Altman plot adds further information here). In addition, the values in between the low- and high-value ranges are of higher clinical relevance either in contrast to hemoglobin, as moderate, or even small differences might have important consequences for the hemodynamic therapy of the patient, whereas a blood transfusion is still considered only in patients with hemoglobin of 7 g/dL or less.

How the Error Grid Analysis Can Be Used in AP Method Comparison Studies

The risk zones for the construction of specific error grids can be defined using different approaches. First, the risk zones can be subjectively determined based on pathophysiological considerations. We, however, recommend a more objective approach based on aggregated opinions resulting in a consensus error grid for the particular research question (as described in our study). The consensus error grid is then mathematically constructed based on the aggregated experts’ opinions as described above. After plotting the AP data in the error grid with the reference method on the horizontal (x) axis and the test method on the vertical (y) axis, visual inspection and the quantitative analysis (the absolute number and percentage) of data points within the different risk zones finally allow analyzing the measurement performance of the test method with regard to clinical relevance.

We suggest defining “acceptable agreement” a priori considering (1) the reference and the test method used in the study, (2) the patient group (eg, adult versus pediatric), and (3) the clinical setting (surgical versus critically ill).

For example, one might define that zone A is the acceptable error region where at least 90% of the data points should be found. Zone E is the erroneous region where no data should be found. Zones B, C, and D are the zones of low, moderate, and significant risk where no more than 5%, 4%, and 2% of data should be found. It is important to note that the interpretation of the results should always refer to the data at hand and not the population version. For example, if there were only 20 measured deviations during a method comparison study and 5 of them lie in the “dangerous” risk region, limits provided by the clinician should be of the form: “5 of 20 is too much” instead of “can I, based on 5 of 20 observations, reject that the true proportion is above 25%.” Furthermore, a result “5 of 20” should be differently interpreted from a “5000 of 20,000” case although both describe a proportion of 25% falling into the dangerous zone. Therefore, a nonrejection of a method should always be based on numbers with small standard deviations. For example, the number 5 in “5 of 20” has a standard deviation of approximately 1, ie, 5 percentage points, while the number 5000 in the second case has a standard deviation of approximately 30, ie, only 0.1 percentage points.

Further, to rank different test devices while developing novel AP measurement technologies, the error grid analysis could be used to define a statistical measure for agreement between the test and a reference method. To do this, one needs to define a clinical relevant loss function that is based on the corresponding error grid analysis.

Error Grid Analysis for Hemodynamic Variables Other Than AP

While we primarily discuss how to perform error grid analysis for AP method comparison studies in this study, it needs to be stressed that error grid analysis could be applied to a wide range of applications in the field of hemodynamic monitoring. The error grid methodology might in fact be applied for method comparison studies of all kind of continuous hemodynamic variables. Nevertheless, every hemodynamic variable has its own specific unit, scale, normal range, and clinical relevance (with regard to triggering treatment decisions). This implies that error grids with specific risk zones need to be defined for each hemodynamic variable. Consensus definitions of error grid risk zones for cardiac output error grids for example should be part of future research aiming to improve statistics in method comparison studies.

CONCLUSIONS

We propose error grid analysis for AP method comparison studies because it allows illustrating the clinical relevance of AP measurement differences between a test method and a reference method. Error grid analysis, therefore, expands and improves the critical evaluation of techniques for AP measurement.

DISCLOSURES

Name: Bernd Saugel, MD.

Contribution: This author helped to conceive and design the study, was responsible for acquisition of data, was responsible for data analysis and interpretation, and drafted the manuscript. This author read and approved the final version of the manuscript and agreed to be accountable for all aspects of the study.

Name: Oliver Grothe, PhD.

Contribution: This author helped to conceive and design the study, was responsible for acquisition of data, was responsible for data analysis and interpretation, performed statistical analyses, and drafted the manuscript. This author read and approved the final version of the manuscript and agreed to be accountable for all aspects of the study.

Name: Julia Y. Nicklas, MD.

This manuscript was handled by: Maxime Cannesson, MD, PhD.

REFERENCES

1. Saugel B, Dueck R, Wagner JY. Measurement of blood pressure. Best Pract Res Clin Anaesthesiol. 2014;28:309–322.

2. Wagner JY, Saugel B. When should we adopt continuous noninvasive hemodynamic monitoring technologies into clinical routine? J Clin Monit Comput. 2015;29:1–3.

3. Meidert AS, Huber W, Müller JN, et al. Radial artery applanation tonometry for continuous non-invasive arterial pressure monitoring in intensive care unit patients: comparison with invasively assessed radial arterial pressure. Br J Anaesth. 2014;112:521–528.

4. Wagner JY, Negulescu I, Schöfthaler M, et al. Continuous noninvasive arterial pressure measurement using the volume clamp method: an evaluation of the CNAP device in intensive care unit patients. J Clin Monit Comput. 2015;29:807–813.

5. Broch O, Bein B, Gruenewald M, et al. A comparison of continuous non-invasive arterial pressure with invasive radial and femoral pressure in patients undergoing cardiac surgery. Minerva Anestesiol. 2013;79:248–256.

Cited Here |
PubMed

6. Saugel B, Reuter DA. Are we ready for the age of non-invasive haemodynamic monitoring? Br J Anaesth. 2014;113:340–343.

7. Hapfelmeier A, Cecconi M, Saugel B. Cardiac output method comparison studies: the relation of the precision of agreement and the precision of method. J Clin Monit Comput. 2016;30:149–155.

8. Squara P, Imhoff M, Cecconi M. Metrology in medicine: from measurements to decision, with specific reference to anesthesia and intensive care. Anesth Analg. 2015;120:66–75.

9. Saugel B, Grothe O, Wagner JY. Tracking changes in cardiac output: statistical considerations on the 4-quadrant plot and the polar plot methodology. Anesth Analg. 2015;121:514–524.

10. Thiele RH, McMurry TL. Data agnosticism and implications on method comparison studies. Anesth Analg. 2015;121:264–266.

11. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1:307–310.

12. Critchley LA, Critchley JA. A meta-analysis of studies using bias and precision statistics to compare cardiac output measurement techniques. J Clin Monit Comput. 1999;15:85–91.

13. Clarke WL, Cox D, Gonder-Frederick LA, Carter W, Pohl SL. Evaluating clinical accuracy of systems for self-monitoring of blood glucose. Diabetes Care. 1987;10:622–628.

14. Lin LI. A concordance correlation coefficient to evaluate reproducibility. Biometrics. 1989;45:255–268.

15. Cox DJ, Clarke WL, Gonder-Frederick L, et al. Accuracy of perceiving blood glucose in IDDM. Diabetes Care. 1985;8:529–536.

16. Cox DJ, Gonder-Frederick LA, Kovatchev BP, Julian DM, Clarke WL. Understanding error grid analysis. Diabetes Care. 1997;20:911–912.

17. Parkes JL, Slatin SL, Pardo S, Ginsberg BH. A new consensus error grid to evaluate the clinical significance of inaccuracies in the measurement of blood glucose. Diabetes Care. 2000;23:1143–1148.

18. Klonoff DC, Lias C, Vigersky R, et al. Error Grid Panel. The surveillance error grid. J Diabetes Sci Technol. 2014;8:658–672.

Cited Here

20. Lee J, Scott DJ, Villarroel M, Clifford GD, Saeed M, Mark RG. Open-access MIMIC-II database for intensive care research. Conf Proc IEEE Eng Med Biol Soc. 2011;2011:8315–8318.

Cited Here

21. Saeed M, Villarroel M, Reisner AT, et al. Multiparameter Intelligent Monitoring in Intensive Care II: a public-access intensive care unit database. Crit Care Med. 2011;39:952–960.

22. Goldberger AL, Amaral LA, Glass L, et al. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation. 2000;101:E215–E220.

23. Lehman LW, Saeed M, Talmor D, Mark R, Malhotra A. Methods of blood pressure measurement in the ICU. Crit Care Med. 2013;41:34–40.

24. Morey TE, Gravenstein N, Rice MJ. Let’s think clinically instead of mathematically about device accuracy. Anesth Analg. 2011;113:89–91.

Supplemental Digital Content

Источник

See also[edit]

Sources[edit]

References[edit]

Consensus error grid in clinical studies

References

See also

Look at other dictionaries:

Figures and Tables from this paper

456 Citations

The Surveillance Error Grid

References

Материал и методы

Результаты

Обсуждение

Заключение

Дополнительная информация.

THE PROBLEM WITHIN AP MEASUREMENT METHOD COMPARISON STUDIES

AIM OF THIS STUDY

ERROR GRID ANALYSIS IN AP MEASUREMENT METHOD COMPARISON STUDIES

Basic Concept of AP Error Grid Analysis

Construction of an Error Grid Based on Expert Opinion

Smoothing of the Consensus Error Grids

Worked Examples Using the Consensus Error Grids: Information in Addition to the Bland-Altman Plot About Clinical Relevance of Measurement Differences

Examples Using the Consensus Error Grids With AP Data From the MIMIC II Database

Limitations and Further Developments

How the Error Grid Analysis Can Be Used in AP Method Comparison Studies

Error Grid Analysis for Hemodynamic Variables Other Than AP

CONCLUSIONS

DISCLOSURES

REFERENCES

Supplemental Digital Content

Читайте также: