Ma chine Learn ing in Acute Stroke Neuroimaging. A Sys tem atic Lit er a ture Re view

ity and spec i fic - ity were 0.746 and 0.862, re spec tively. 27 out of 60 stud ies used hu man op er a tors, with the av er age num ber of hu man op er a tors per study be ing 3.7±2.9. Con clu sions. AI so lu tions can be widely ap plied in com pu ta tion of in farct vol umes. Us - ing AI in stroke di ag no sis still re quires fur ther re search with more pro spec tive stud ies, more ex pert hu man op er a tors, and more fo cus on eval u at ing sec ond ary out comes.

Some ad vo cates of AI even sug gest that cur rent med ical im ag ing workflows might be trans formed to an ex tent re sult ing in staff re lay [6].How ever, other ev i dence suggests that AI tech nol ogy is still in its early phase and most the re search re lated to it tends to be flawed in study de sign [7].Some stud ies even sug gest that many com puter aided di ag no sis (CAD) sys tems re sult in ad di tional work-ups for ra di ol o gists be cause of the false-pos i tive cases [8,9].There fore, most AI re search must be as sessed ex tremely crit i cally.

STUDY AIM
This sys tem atic re view aims to give a con tem po rary overview of the stud ies that com pare di ag nos tic AI per formance of stroke de tec tion and seg men ta tion of stroke lesions, e.g., hem or rhage or large ves sel oc clu sion, with human cli ni cians and with out, ap prais ing the AI mod els, study de sign, and met rics used.

METH ODS
This sys tem atic re view was per formed us ing PubMed search en gine for peer-re viewed ar ti cles us ing the search terms: "deep learn ing" or "ma chine learn ing" or "ar ti fi cial in tel li gence" or "com puter aided di ag no sis" and "computed to mog ra phy" or "CT", and "stroke" or "ischemic stroke" or "hem or rhagic stroke".Lit er a ture pub lished in the time frame of 2015 Jan u ary 1 to 2021 July 23 was included in or der to re view the new est im prove ments made for AI-aided stroke di ag nos tics.

In clu sion cri te ria
Stud ies were in cluded if they were in Eng lish only, had full-ar ti cles avail able, were us ing deep learn ing in brain im ag ing for seg men ta tion and de tec tion only, were us ing only com puted to mog ra phy as an im ag ing mo dal ity, and were used for hem or rhagic or ischemic stroke de tec tion.

Ex clu sion cri te ria
Stud ies were ex cluded from the search only if they did not meet in clu sion cri te ria or com pared dif fer ent soft ware with out com par i son with hu man cli ni cians, were ab stracts only, were re view ar ti cles, were stud ies in volv ing an i mals or chil dren, used ma chine learn ing to al ter pixel val ues for qual ity im prove ment or were stud ies dis cuss ing only techni cal net work ar chi tec ture mat ters or pre dict ing pa tient stroke out come.

Lit er a ture anal y sis
A to tal num ber of 438 stud ies were found us ing the Pubmed search en gine.Upon re moval of stud ies that did not fit the in clu sion cri te ria, 60 jour nal ar ti cles were cho sen for the pres ent re view.Iden ti fied ar ti cles were re viewed inde pend ently by two au thors.Rel e vant ar ti cles were an alyzed to de ter mine the dataset size used for train ing and val i da tion, if any, the type of stroke dis cussed in the study (large ves sel oc clu sion, ischemic, intracranial, and intracerebral hem or rhage), the im ag ing mo dal ity used (CTA, CTP, NCCT), the name of the soft ware (if com mercially avail able) or the type of the model used, the met rics used to eval u ate the al go rithm, and whether the study was con ducted pro spec tively or ret ro spec tively.The data analy sis workflow is pre sented in Fig.The re sults, cov ered in this ar ti cle, are re ported only for the best re sults (e.g., the pa per ob serves sev eral ma chine learn ing tech niques, but we re port the tech nique that showed the best re sults).

Data anal y sis
Data was ana lysed us ing MS Ex cel (2021).

RE SULTS
Ta ble 1 shows gen eral char ac ter is tics of the 60 stud ies covered in this ar ti cle.Only 2 out of 60 (3.3%) in cluded stud ies were pro spec tive.Out of the 60 ar ti cles iden ti fied, 17 were on com puted to mog ra phy angiography, 7 were on computed to mog ra phy per fu sion, 44 were on non-con trast enhanced com puted to mog ra phy, 6 were on com puted tomog ra phy angiography and non-con trast en hanced computed to mog ra phy, 1 was on com puted to mog ra phy angiography and com puted to mog ra phy per fu sion, and 1 was on com puted to mog ra phy per fu sion and non-con trast enhanced com puted to mog ra phy.21 stud ies out of 60 assessed large ves sel oc clu sion, 32 as sessed ischemic core vol ume, 4 as sessed intracranial hem or rhage, 7 as sessed  [35] 2020 Ret ro spec tive Sup port vec tor ma chine 1832 -Ko H. [36] 2020 Ret ro spec tive CNN-LSTM; Xception 727392 4516842 Iron side N. [37] 2019 Ret ro spec tive An a lyze 12.0 40 260 Amukotuwa SA. [38] 2019 Ret ro spec tive RAPID 4.9.1 969 -Kuang H. [39] 2019 Ret ro spec tive Ran dom For est 100 157 Seker F. [40] 2019 Ret ro spec tive Brainomix e-AS PECTS 43 -Vargas J. [41] 2018 Ret ro spec tive CNN 40 356 Albers GW. [42] 2019 Ret ro spec tive RAPID AS PECTS 65 -Austein F. [43] 2019 When as sess ing the study dataset size, the study by Ko H et al. 2020 stood out, as it had a sig nif i cantly greater ret ro spec tive da ta base than oth ers (a to tal of 4516842 images used for train ing and 727392 for val i da tion).Therefore, we ex cluded it from the sta tis tics as an out lier.The cor rected sta tis tics for train ing set sizes con sisted of min imum 28 CT scans, max i mum -24214, mean -1279, median -153, stan dard de vi a tion -±5006.7.Test and val i dation dataset sizes re ported in the stud ies were as fol lows: min i mum unique CT scans in cluded for val i da tion -10, max i mum -21586, mean -599, me dian -100, stan dard de vi a tion -±2801.1.The re sults of the fi nal sta tis tics are pre sented in Ta ble 2. Most pop u lar com mer cially avail able soft ware used in the stud ies were Brainomix (n=12, 20% of stud ies) and RAPID (n=12, 20%), 6 stud ies (10%) spec i fied us ing convolutional neu ral net works, and 6 stud ies did not identify the type of soft ware, nor the model, nor the neu ral network ar chi tec ture used.The other types of ma chine learning tech niques used are shown in Ta ble 1.
A large spec trum of dif fer ent sta tis ti cal meth ods was used to eval u ate the per for mance of the model, mak ing it dif fi cult to com pare their re sults ac cu rately.The most popu lar meth ods used in the stud ies are pre sented in Ta ble 3.
We com pared the re sults of the five most of ten re ported met rics: sen si tiv ity, spec i fic ity, ac cu racy, re ceiver op er ating char ac ter is tic area un der the curve (ROC AUC), and Dice sim i lar ity co ef fi cient.Bland-Altman plots were not 44 in cluded in the com par i son as the re sults are mostly vi sual rather than nu meric.
The av er age value of the ROC AUC re sults is 0.884 and the av er age ac cu racy is 0.857, both of which can be con sidered as ex cel lent.The av er age sen si tiv ity and spec i fic ity of the stud ies are es ti mated at 0.746 and 0.862, re spec tively.The larg est stan dard de vi a tion of 0.198 and the vari ance of 0.039 be tween study re sults were among the re sults of the sen si tiv ity met ric.
Such find ings sug gest that the re ported re sults show a low vari ance be tween dif fer ent stud ies, in di cat ing that the dif fer ent soft ware used per form at a rather sim i lar level.The re sults are sum ma rized in Ta ble 4.
Hu man com para tor groups were used in 27 out of 60 stud ies, and these groups were rel a tively small.The min i mum num ber of peo ple (ex perts or non-ex perts) involved in the study was 1, the max i mum was 16, with the av er age num ber of hu man op er a tors per study be ing 3.7±2.9.In most stud ies, the data were rated in de pend ently by most of the hu man com para tors.

DIS CUS SION
We have es tab lished sev eral find ings from our re view.First, out of 60 stud ies re viewed, 58 stud ies were ret rospec tive and only 2 were pro spec tive.This is an im por tant lim i ta tion in stud ies aimed at test ing AI in clin i cal prac tice, as pro spec tive stud ies are more suit able to rep re sent the real clin i cal en vi ron ment.AI per for mance is likely to be less ac cu rate when fac ing new, real-world data, rather than the data used in al go rithm train ing [10].The suc cess in silico stud ies and ex cel lent per for mance met rics do not always trans late into clin i cal func tion al ity, as met rics such as ROC AUC, which is uni ver sally used in AI stud ies, are argued not to be the best met rics to rep re sent clin i cal suc cess [10].
There is lit tle to no con sen sus on what met rics to report of AI ap pli ca tion in stroke di ag nos tics should be used.Our find ings show a large spec trum of dif fer ent statis ti cal mea sures to prove the per for mance of the model, how ever, only one study at tempted to show the clin i cal ben e fit.The study by Yahav-Dovrat et al. (2021) measured how the AI ap pli ca tion VIZ LVO re duced the time to the groin punc ture [11].Other stud ies in tro duced metrics that do not nec es sar ily re flect any ben e fit in clin i cal prac tice, and their com par i son with hu man op er a tors re -flects an in silico en vi ron ment in which cli ni cians do not typ i cally work.
Sec ond, the stud ies in our re view pres ent a very var ied num ber of dataset sizes with dif fer ent sam ple dis tri bu tions and char ac ter is tics.The av er age num ber of unique CT scans used for val i da tion of neu ral net works per study was noted to be 599, how ever, the val i da tion dataset sizes ranged from 10 to 21586 CT scans.The use of small datasets with less than 100 scans for train ing and/or val i dation raises the ques tion of whether the re sults can be re liable, whether they can be rep li cated by oth ers, whether they can be rep li cated in real-life clin i cal sce nar ios, or can be gen er al ized in dif fer ent pop u la tions and dif fer ent regions.
Third, there were few hu man op er a tors across all studies com pared to AI ap pli ca tions, with the av er age num ber of hu man op er a tors per study be ing 3.7±2.9.Inter-rater vari abil ity as well as intra-rater vari abil ity among hu man op er a tors can also be high, there fore fu ture re search need to use larger sam ples of hu man op er a tors to en sure re li ability.In ad di tion, in clud ing non-ex perts and com par ing them with AI can make the al go rithm better com pared to it, mak ing it harder for a hu man, so com par i son with ex perts would be pref er a ble [12].Most im por tantly, stud ies with hu man com para tors should at tempt to move past op pos ing AI vs. cli ni cians and move to wards ob serv ing col lab o ration be tween cli ni cians and ma chine learn ing, as the combi na tion of both tends to out per form ei ther alone [13,14].
How ever, AI seems to be a prom is ing tool for rid ding peo ple of man ual and re pet i tive tasks.For ex am ple, AI solu tions can be widely ap plied in com pu ta tion of in farct volumes since man ual de lin ea tion tends to be a te dious task.It is widely rec og nized that med i cal re port ing is a huge respon si bil ity for med i cal prac ti tio ners and takes a sig nif icant amount of time for each ex am i na tion.
Fi nally, the le gal sta tus of any AI ap pli ca tion op er at ing in a clin i cal set ting should be es tab lished.AI as a standalone de ci sion maker or work ing on a black box prin ci ple does not seem to be a safe and sus tain able so lu tion.For exam ple, AI ap pli ca tions in stroke di ag no sis still carry a rel atively high prob a bil ity of false neg a tive cases that may result in late or missed di ag no sis.Ven dors of AI sys tems must make their ap pli ca tions trans par ent and fo cus on the ap pli ca tion de sign that em pow ers the end-user rather than re places it.Fur ther re search should aim to men tion the intended po si tion of the AI sys tem un der study in the clin i cal path way.In sum mary, fur ther stud ies should fo cus on in clud ing larger pa tient sam ples from dif fer ent geo graph ical re gions, in clud ing more pro spec tive cases, val i dat ing AI mod els across dif fer ent care cen ters, in clud ing more ex pert hu man op er a tors to ac count for inter-op er a tor vari abil ity among hu mans, eval u at ing sec ond ary out comes, e.g., time savings per pro ce dure and long-term pa tient out comes, avoiding in silico test ing en vi ron ment to eval u ate the per formance of hu man op er a tors com pared to the AI sys tems, and ul ti mately mak ing datasets and source codes avail able to other re search ers to ac cel er ate AI re search glob ally.

CONLUSIONS
1. Ar ti fi cial in tel li gence is a rap idly de vel op ing field that prom ises sig nif i cant im pact in time sav ings, qual ity im prove ment of med i cal pro ce dures, and even better pa tient out comes. 2. How ever, the use of AI in rou tine stroke di ag no sis still re quires fur ther re search with more pro spec tive studies, more ex pert hu man op er a tors, and more fo cus on eval u at ing sec ond ary out comes. 3. The per for mance of AI sys tems in clin i cal set tings should be dem on strated more in tan dem with hu man op er a tors rather than stand-alone sys tems.Re zul ta tai.Tik 2 stu di jos ið 60 (3,3 %) bu vo per spek ty vi nës.Ma þiau sias uni ka liø kom piu te ri nës to mog ra fi jos vaiz dø skaièius, nau do tas DI sis te mos va li da ci jai, bu vo 10, di dþiau sias -21 586, vi dur kis -599, me dia na -100, stan dar ti nis nuo kry pis -±2801,1.Ma þiau sias duo me nø kie kis, nau do tas neu ro ni niø tinklø mo ky mui, bu vo 28 stu di jos, di dþiau sias -24 214, vi dur kis -1 279, me dia na -153, stan dar ti nis nuo kry pis -±5 006,7.Po pulia riau sia nau do ta pro gra mi në áran ga bu vo "Brai no mix" (n = 12, 20 % vi sø pub li ka ci jø) ir RAPID (n = 12, 20 % vi sø pub li ka ci jø), 6 stu di jos (10 %) nau do jo kon vo liu ci nius neu ro ni nius tin klus ir 6 pub li ka ci jos ne nu ro dë nei mo de lio, nei pro gra mi nës áran gos pa va di ni mo.Vi du ti nis plo to po krei ve ro dik lis bu vo 0,884, o vidu ti nis tiks lu mas -0,857.Jaut ru mo ir spe ci fið ku mo vi dur kiai buvo 0,746 ir 0,862.27 ið 60 at lik tø stu di jø tu rë jo þmo nes ver tin tojus, o þmo niø ver tin to jø skaièiaus vi dur kis ðio se stu di jo se bu vo 3,7 ± 2,9.
Ið va dos.Dirb ti nio in te lek to spren di mai ga li bû ti pla èiai taiko mi au to ma ti zuo tam gal vos sme ge nø in sul to tû rio ap skai èia vimui kom piu te ri nës to mog ra fi jos vaiz duo se.DI tai ky mas gal vos sme ge nø in sul to diag nos ti kai vis dar rei ka lau ja pa pil do mø ty rimø, per spek ty vi niø stu di jø, pla tes nio pa ly gi ni mo su þmo në mis ver tin to jais ir di des nio dë me sio ant ri niø ið ei èiø ver ti ni mui.
Rak ta þo dþiai: dirb ti nis in te lek tas, in sul tas, ma ði ni nis mo kyma sis, neu ro ra dio lo gi ja.

Jatuþis Fig. Flowchart of lit er a ture screen ing and re view pro to col Ma
D. Matuliauskas, I. Straþnickaitë, A. Samuilis, D. chine Learn ing in Acute Stroke Neuroimaging.A Sys tem atic Lit er a ture Re view Ta ble 1.

Main study char ac ter is tics Lead au thor Pub li ca tion year Study type Soft ware used Val i da tion dataset size Train ing set size
2020 Ret ro spec tive Multilayer perceptron, De ci sion tree, Ran dom For est, Adaboost, Gra di ent boost ing, Bag ging, Bernoulli na ive Bayes, Gaussi an na ive Bayes, Sup port vec tor ma chine, K-near est neigh bor

Train ing and val i da tion dataset sizes Val i da tion Train ing
Ta ble 2.

Most pop u lar met rics used and their re sults Sen si tiv ity Spec i fic ity Ac cu racy ROC AUC Dice sim i lar ity co ef fi cient
Ma chine Learn ing in Acute Stroke Neuroimaging.A Sys tem atic Lit er a ture Re view Ta ble 4.
Dirb ti nis in te lek tas (DI) yra spar èiai be si vys tan ti tech nolo gi ja, ku ri ga li at neð ti daug tei gia mø po ky èiø gal vos sme ge nø in sul to diag nos ti ko je.Ty ri mo tiks las -ap þvelg ti pub li ka ci jas, ku rio se DI ge bë ji mas diag no zuo ti ûmi ná in sul tà ið ra dio lo gi niø vaiz dø ir ge bë ji mas seg men tuo ti in sul to ra dio lo gi nius po þy mius yra ly gi na mas su þmo në mis ver tin to jais ar ba ap þvel gia mas be jø.Ap þvel gia mi nau do ti mo de liai, stu di jø di zai nas ir tai ko mi sta tisti kos me to dai.Ti ria mie ji ir ty ri mo me to dai.Sis te mi në ap þval ga bu vo at lik ta nau do jant "Pub med" duo me nø ba zae, á ap þval gà átrau kiant pub li -ka ci jas nuo 2015 m. sau sio 1 d.iki 2021 m. lie pos 23 d.Ið vi so buvo ap tik tos 438 pub li ka ci jos, ið ku riø ap þval gai at rink ta 60.