Predictive Value of Alvarado, Acute InflammatoryResponse, Tzanakis and RIPASA Scores in theDiagnosis of Acute Appendicitis.

Introduction. The diagnosis of acute appendicitis (AA), as the most common cause of acute abdominal pain, has changed in the past decade by introducing scoring systems in addition to the use of clinical, laboratory parameters, and radiological examinations. This study aimed to assess the significance of the four scoring systems (Alvarado, Appendicitis Inflammatory Response (AIR), Raya Isteri Pengiran Anak Saleha Appendicitis (RIPASA) and Tzanakis) in the prediction of delayed appendectomy. Materials and methods. The study included 100 respondents, who were diagnosed with AA in the period from January 2018 to February 2019 and were also operated on. In addition to the clinical, laboratory, and ultrasonographic examinations, four scoring systems (Alvarado, AIR, RIPASA, and Tzanakis) were used to diagnose AA. According to the obtained histopathological (HP) findings, the patients were divided into 3 groups: timely appendectomy, delayed appendectomy and unnecessary appendectomy. Using the sensitivity and specificity of all 4 scoring systems, ROC analysis was performed to predict delayed appendectomy. Results. In the study that included 100 patients (58% men, 42% women), after the appendectomy was performed, the resulting HP showed that 74% had a timely appendectomy, while 16% had delayed and 10% had an unnecessary appendectomy. For the prediction of delayed appendectomy, the area under the ROC curve showed a value of 0.577 for the Alvarado score, 0.504 for the AIR, 0.651 for the RIPASA, and 0.696 for the Tzanakis. Sensitivity and specificity for the Alvarado score was 54% and 62%, for RIPASA 62.5% and 63.5%, for Tzanakis 69% and 60.8%, respectively. Combining the three scoring systems (Alvarado, RIPASA, and Tzanakis), the surface area under the ROC curve was 0.762 (95% CI 0.521–0.783), with a sensitivity of 85% and a specificity of 62%. Conclusion. In our study, the diagnostic accuracy of RIPASA and Tzanakis showed better results than Alvarado, while AIR cannot be used to predict delayed appendectomy in our population. However, the simultaneous application of all three scoring systems, RIPASA, Tzanakis and Alvarado, has shown much better discriminatory ability, with higher sensitivity and specificity, as opposed to their use alone. Combining scoring systems should help in proper diagnosis to avoid negative appendectomy, but additional studies with a larger number of patients are needed to support these results.


Introduction
Acute appendicitis (AA) is the most common cause of acute abdominal pain, occurring in 7 to 10% of the general population. It is an inflammation of the vermiform deformity of the appendix in the right lower quadrant of the abdomen and often the pain increases and migrates proportionally with increased inflammation [1,2]. The therapy of AA can be conservative and surgical [3]. Delayed treatment of these patients is associated with an increased percentage of complications, so timely surgery is the gold standard for preventing blind perforations [4]. Despite advances in the diagnosis of other diseases, the diagnosis of AA remains a challenge and a dilemma for any surgeons [5].
The improvement of the accuracy of the diagnosis and reducing negative appendectomies can be achieved with diagnostic tools such as clinical signs and symptoms, laboratory findings, and radiological examinations (ultrasonography (US) and magnetic resonance imaging (MRI)). Computed tomography (CT), on the other hand, has high sensitivity (94%) and specificity (95%), but the cost of the procedure is high, and an additional problem is its availability in routine diagnostics [6,7]. Due to all this, recently, scoring systems with their clinical diagnostic scoring have become an important part of AA diagnostic tools. These scoring systems should be non-invasive, understandable, easy to use and effective in diagnosing AA, to reduce the rate of negative appendectomy [8]. Today, multiple scoring systems are used to diagnose AA, the most common of which are Alvarado, Tzanakis, RIPASA, AIR, but new scores are constantly appearing. Scoring systems incorporate a variety of parameters: demographic data, clinical symptoms and signs, laboratory values, and radiological examinations.
Alvarado score is the most well-known and widely used, which includes 8 clinical and laboratory parameters [9], while AIR as newer also includes the C-reactive protein (CRP) [10]. The RIPASA score, which includes 18 clinical and laboratory parameters, has higher specificity in the Asian population [11], while the Tzanakis score also uses the US review, which can provide a better and faster differential diagnosis of AA [12].
This study aimed to assess the diagnostic accuracy (specificity and sensitivity) of Alvarado, AIR, Tzanakis and RIPASA scores for assessment of timely, delayed or unnecessary appendectomy and ROC analysis for prediction of delayed appendectomy.

Materials and methods
In this prospective cohort study that was conducted at the University Clinic for Surgical Diseases "St. Naum Ohridski" in Skopje, N. Macedonia 100 respondents were included. Before the study, ethical consent was obtained in accordance with the international Helsinki Protocol. All respondents signed an informed consent. Patients between the ages of 15 and 77 who received pain in the lower right quadrant (RLQ) on suspicion of AIR: 0-4 -low probability of AA; 5-8 -moderate probability of AA; 9-12 -high probability of AA.
AA were included in the period from January 2018 to February 2019. The study excluded pregnant patients, patients with a history of urolithiasis, patients with suspected pelvic disease, patients with tumor formation in the right inguinal region, immunocompromised patients, and patients with elective appendectomy. In all patients, in addition to clinical and laboratory tests, ultrasonography was performed preoperatively. All patients were screened with the four scoring systems: Alvarado, Tzanakis, AIR and RIPASA. Table 1 shows clinical, laboratory, and diagnostic parameters incorporated in all 4 scoring systems examined. All patients included in the study had an appendectomy, regardless of the values of the scoring systems. Patients were examined from the time of admission until discharge from the hospital, including the necessary postoperative controls, as well as obtaining a histopathological (HP) finding from the material obtained from the surgery. According to the received HP results, the patients were divided into three groups: patients with timely, delayed and unnecessary appendectomy.
The statistical program SPSS for Windows 23.0 was used for the statistical analysis of the results. A bivariate analysis was used for comparison of the three groups with timely, delayed, and unnecessary appendectomy. The Pearson Chi-square test and the Fisher exact test were used for comparison of the groups in terms of quality parameters, while the Analysis of Variance and Kruskal-Wallis tests were used as quantitative parameters. ROC analysis was used to determine the discriminatory ability of scores, Alvarado, RIPASA, AIR, Tzanakis, and their combinations for patients with appendectomy.

Results
In our study of 100 operated patients, 58% were men and 42% were women, with a mean age of 34.7±14.9 years, and the mean age of patients in the group with delayed appendectomy was 42.87±16.3 years (p = 0.017). The most common symptom was nausea (97%), while the most common clinical sign was palpable reflex sensitivity (85%), and the presence of this symptom was up to 93.75% in patients with delayed appendectomy, but there was no significant difference between the symptoms and the signs among the groups. From the laboratory tests, the most significant levels of leukocytes and neutrophilia were higher in patients with timely appendectomy (p = 0.002; p = 0.00), while CRP was reported to be the highest in patients with delayed appendectomy (p = 0.032). A positive US finding was diagnosed in 76% of respondents; significantly the highest percentage of positive findings in the group of timely and the highest percentage of negative findings in delayed appendectomy (p = 0.000) ( Table 2).
In our study based on HP, 74% of patients were found to be timely appendectomies and 16% were in the late appendectomy group. Using the Alvarado, RIPASA, and the Tzanakis Scoring System, most respondents had a score higher than 8, and also the highest percentage of timely appendectomies. In the AIR score the values were 7-8. Using the Alvarado, the group score of delayed and unnecessary appendectomy was 7-8 among the majority, and using the RIPASA and Tzanakis the score was higher than 8 (Table 3).  Table 4 shows that in all 4 scores there was a statistically significant difference between the three groups, timely, delayed and unnecessary appendectomies (Alvarado, p = 0.027; AIR, p = 0.003; RIPASA, p = 0.000; Tzanakis, p = 0.000). In terms of intergroup comparisons, the Alvarado score found a statistically significant difference between timely and unnecessary appendectomies (p = 0.035), the AII score between the time needed (p = 0.003) and the late time needed (p = 0.019), in RIPASA scores between timely and unnecessary appendectomies (p = 0.000), and in Tzanakis the scores between late and delayed (p = 0.032) and between timely and unnecessary appendectomies (p = 0.014).   ROC analysis for prediction of delayed appendectomy showed that the surface area below the ROC curve for the Alvarado score was 0.577, with a sensitivity of 54% and a specificity of 62%; for the RIPASA score the value was 0.651, with a sensitivity of 62.5% and a specificity of 63.5%, and for the Tzanakis score the value was 0.696, with a sensitivity of 69% and a specificity of 60.8%. Regarding the AIR score, the ROC analysis showed that the size of the area under the ROC curve was 0.504, so this scoring system was not useful in isolating patients with delayed and timely appendectomy ( Table 5, Figures 1 and 2). Combining the three scoring systems (Alvarado, RIPASA and Tzanakis), the surface area under the ROC curve was 0.762 (95% CI 0.521-0.783), with a sensitivity of 85% and a specificity of 62% (Table 5, Figure 3).

Discussion
Acute appendicitis is one of the most common causes of emergency surgery, and despite the high prevalence, its diagnosis is still a challenge [1,2]. In practice, surgeons often encounter a dilemma whether to perform an unnecessary appendectomy or delay it, and hence the decision requires diagnostic accuracy. The rate of removal of normal appendixes, which ranges between 15-30%, is still high [13]. According to the literature, the criteria for diagnostic quality is considered to be: a 15% rate for negative appendectomies, 10% for negative laparotomies, 35% for threatening perforations, and 15% for perforated appendectomies [3-6, 14, 15]. In our study, according to the obtained HP findings, 90% were positive for AA, of which 74% had a timely and 16% had a delayed appendectomy, while 10% had an unnecessary appendectomy. Similar results were presented in the study of Kalliakmaniset et al., which included a representative number of respondents (717 operated patients), and HP findings showed that 11% of appendectomies were unnecessary [16]. Of the total number of patients operated on in our study, 58% were men and 42% were women, and the mean age of patients in the group with delayed appendectomy was 42.87±16.3 years. Men did not differ significantly from women in the outcome after obtaining the HP finding. In addition to the known clinical signs and symptoms present in the examined population, no significant difference was found between the three groups. From laboratory tests, the most significant levels of leukocytes and neutrophilia were significantly higher in patients with timely appendectomy (p = 0.002; p = 0.00), while CRP was observed with the highest value in patients with delayed appendectomy (p = 0.032). In comparison, there are several studies in the literature that have similarities in the values obtained [17] and it is considered that the differences that occur in the demographic characteristics are due in part to the different exclusion criteria.
Recently, however, scoring systems with their clinical diagnostic scoring have become an important part of AAs diagnostic tools. These scoring systems should be non-invasive, understandable, easy to use, and effective in diagnosing AA to reduce the rate of negative appendectomy [8]. The importance of this study consists in examining the four most used scores in our population. The Alvarado Scoring system is the first and most widespread score based on clinical signs and laboratory values, with precision and clinical approval. In our study, 49% of respondents had a score of >8, followed by the highest percentage of timely appendectomies (54.0%). The mean value of patients with timely, delayed, and unnecessary appendectomy was 8. between timely and unnecessary appendectomies (p = 0.035). With values obtained for the area under the ROC curve of 0.577, and with a sensitivity of 54% and a specificity of 62%, the Alvarado score proved to be a weak discriminator in isolating patients with delayed and timely appendectomy. However, the systematic review of Ohle et al. [18] where 42 studies were analyzed, showed that Alvarado score could be calibrated in men, inconsistent in children, and was useful for excluding the diagnosis of AA with a threshold value of 5 in all patient groups, and for scores 9 and 10, an appendectomy without additional examinations was recommended, and between scores, there was a need for additional examinations (CRP, US), while in older patients CT diagnostics was also necessary [19][20][21].
Regarding the AIR score, which includes CRP as an important inflammatory parameter [10], in our study 45% of respondents had a score of 7-8, while only 35% had a score of >8. The mean value of patients with timely, delayed, and unnecessary appendectomy was 8.08±1.5, 8.0±1.2, 6.20±1.4 (p = 0.003), with a significant difference between timely and unnecessary (p = 0.003) and between delayed and unnecessary appendectomies (p = 0.01). In our patients with a value of 0.50 for AUC, this scoring system did not show validity for use in appendectomy decisions. Our result does not correlate with the studies where the AIR score is used, has great discriminatory power, and exceeds the result of Alvarado. This result obtained in our respondents may be due to a different subjective assessment of the value of clinical signs, primarily on painful sensitivity to palpation because patients were examined by different surgeons, but also by the fact that using only inflammatory parameters without urinalysis patients with other differential diagnoses were excluded, such as patients with a urinary tract infection as well as patients with pelvic disease [22][23].
The newer RIPASA score includes 18 clinical and laboratory parameters, has greater sensitivity and specificity than the Alvarado score, especially in the Asian population [11,24]. In this study, 92% of patients had a score higher than 8. The mean value of the respondents for timely (11.03±1.9), delayed (10.09±1.4) and unnecessary (8.60±0.9) appendectomy was significantly different (0.000), with a significant difference between timely and unnecessary appendectomies (p = 0.000). The ROC analysis for the prevention of delayed RIPASA appendectomy has shown to have sufficient discriminatory ability (AUC of 0.65), with a sensitivity of 62.5% and a specificity of 63.5%. There are publications in the literature with high specificity and sensitivity; such are the studies of Chong et al. and of Chee Fui Chong [11,[24][25][26].
In addition to the clinical and laboratory parameters, the Tzanakis score uses ultrasonography (US) to provide better diagnostics, especially in regions where computed tomography is not available 24 hours a day. A score of ≥8 is considered valid in diagnosing AA and the need for performing a surgery [12]. Our study found that 95% of respondents had a score higher than 8. Tzanakis score had the highest value in the group with timely (mean 15), lower in the group with unnecessary (mean 11), and lowest in the group with delayed appendectomy (mean 9). For p = 0.0007, a overall statistical significance between the three groups concerning the value of Tzanakis was confirmed, which was due to a significantly higher score in the group with timely versus delayed and unnecessary intervention (p = 0.032, p = 0.014, consequently). ROC analysis for prediction of delayed appendectomy has shown that with an AUC value of 0.696 and a sensitivity of 69% and a specificity of 60.8%, this score has sufficient discriminatory ability to distinguish patients with delayed and timely appendectomy. This coincides with data from the literature where this test shows the greatest sensitivity and specificity, even up to 99% sensitivity, 91% specificity, and an average diagnostic accuracy of 95% [27][28][29].
Several studies in the literature suggest that some of these systems are made for Western countries, but when placed in different environments, sensitivity and specificity change [11]. Some studies show an improvement in diagnostic accuracy when the threshold is reduced to 6.0, with a sensitivity of 88.3% and a specificity of 94.5% [29][30][31][32][33].
In our study about the use of each of the scoring systems, except of AIR, we obtained values of sensitivity and specificity that were lower than those presented in the literature, and using them individually we concluded that they were not sufficient for complete diagnosis and surgical treatment. But, when combining the three scoring systems (Alvarado, RIPASA, and Tzanakis), the size of the surface under the ROC curve had a value of 0.762, with a sensitivity of 85% and a specificity of 62. This suggests the conclusion that simultaneous application of the three scoring systems, Alvarado, RIPASA, and Tzanakis, has a good discriminatory ability in isolating patients with delayed and timely appendectomy.
Limiting factors in our study are the following: (a) the relatively small size of the sample despite the potential nature of the study and (b) the different physicians who decided on an appendectomy for different cases.

Conclusion
In our study, the diagnostic accuracy of RIPASA and Tzanakis showed better results than Alvarado, while AIR cannot be used to predict delayed appendectomy in our population. However, the simultaneous application of all three scoring systems, RIPASA, Tzanakis and Alvarado, has shown much better discriminatory ability, with higher sensitivity and specificity, as opposed to their use alone. Combining scoring systems should help in proper diagnosis to avoid negative appendectomy, but additional studies with a larger number of patients are needed to support these results.