Mnemonic Techniques and Lie Detection: Accuracy of Truth and Deception Judgments in Repeated Accounts.

This study was an examination into whether the use of memory-enhancing techniques (mnemonics) in interviews can be helpful to distinguish truth tellers from liars. In the previous study (Izotovas et al., 2018), it was found that when mnemonic techniques were used in the interview immediately after the event, truth-tellers reported more details than liars in those immediate interviews and again after a delay. Moreover, truth-tellers, but not liars, showed patterns of reporting indicative of genuine memory decay. In the current experiment, participants (n = 92) were asked to read the repeated statements reported by participants in the Izotovas et al.’s (2018) study and decide whether the statements they read were truthful or deceptive. One group of participants (informed condition) received information about the findings of the previous study before reading the statement. The other group received no information before reading the statement (uninformed condition). After participants made veracity judgements, they were asked an open-ended question asking what factors influenced their credibility decision. Although truthful statements were judged more accurately in the informed condition (65.2%) than in the uninformed condition (47.8%), this difference was not significant. In both conditions deceptive statements were detected at chance level (52.2%). Participants who relied on the self-reported diagnostic verbal cues to deceit were not more accurate than participants who self-reported unreliable cues. This could happen because only the minority of participants (27.4%) in both conditions based their decisions on diagnostic cues to truth/deceit.

In investigative interviews questions about the credibility of witnesses or suspects frequently arise (Granhag & Strömwall, 2004;Vrij, 2008Vrij, , 2015. However, studies have shown that both laypeople and professionals are in general poor at detecting lies, with accuracy typically not much better than chance level when assessing speech or behaviour (Bond & DePaulo, 2006). A meta-analysis examined possible reasons for this low accuracy rate (Hartwig & Bond, 2011). The most compelling reason was that cues to deception are typically weak. That is, liars and truth tellers often display similar (non)verbal responses.
However, judgements can be more accurate than a chance level, if the observers rely on specific cues. In general, deception detection research has shown that verbal cues tend to be more diagnostic than non-verbal cues in discriminating truth tellers and liars (Amado, Arce, Fariña, & Vilarino, 2016;DePaulo et al., 2003;Vrij, 2008). Therefore, it is reasonable to expect increase in accuracy when veracity decisions are based on verbal cues. In an earlier study, in which police officers attempted to detect truths and lies in the videotaped interviews with suspects, accuracy rates positively correlated with the cues related to the suspect story (Mann, Vrij, & Bull, 2004). Another study showed that high familiarity with a situation and use of verbal (and less use of nonverbal) content cues were associated with higher classification accuracy (Reinhard, Sporer, Scharmach, & Marksteiner, 2011). Similarly, recent study showed that undergraduate students and police officers performed at a higher level, when they had better insight into verbal cues (Bogaard & Meijer, 2018). Furthermore, a meta-analysis on training to detect deception revealed larger effects, if the training was based on verbal content cues (Hauch, Sporer, Michael, & Meissner, 2016). The aforementioned studies revealed the tendency that reliance on stereotypical non-verbal cues (such as gaze aversion, nervousness, or fidgeting) had no positive effects on performance to detect deception.
In addition, recent research has shown that higher accuracy rates can be achieved when specific interview techniques are used (Hartwig, Granhag, & Luke, 2014;Vrij, Fisher, & Blank, 2017;Vrij & Granhag, 2012), because these techniques elicit or enhance speech differences between liars and truth tellers. One of the approaches is Cognitive Credibility Assessment (CCA; Vrij, 2018). With this approach, accuracy rates just above 70% can be obtained (Vrij et al., 2017). One of the elements of the CCA is encouraging interviewees to provide more information (Colwell, Hiscock, & Memon, 2002;Geiselman, 2012;Vrij et al., 2017). This can, amongst other ways, be achieved by using memory-enhancement techniques called 'mnemonics' (Fisher & Geiselman, 1992). Previous studies have shown that the use of mnemonics may increase the verbal differences between truthful and deceptive statements (Bembibre & Higueras, 2011;Hernández & Alonso-Quequty, 1997;Vrij et al., 2009), because truth tellers, who are recalling genuinely remembered events, benefit more from such memory enhancement techniques than liars, who are fabricating. Liars may lack the imagination or cognitive resources to report as many (plausible) details as truth tellers, or may be unwilling to do so out of fear that these additional details give leads to investigators that they can check (Vrij et al., 2017).
These differences between truth-tellers and liars often obtained using the objective measures, e.g. by calculating the amount of details in the statement with a statistical software. However, the practical problem arises when the observer (typically, practitioners such as police officers) needs to infer whether someone is lying or telling the truth on the individual basis. In the current study, we examined whether observers would be able to spot the enlarged objective differences between truth tellers and liars.
Although from previous research it is known that the use of CCA interviewing techniques can improve veracity judgements in single accounts (e.g. Vrij, Leal, Mann, Vernham, & Brankaert, 2015), to our knowledge not much studies have been conducted to examine the accuracy of these judgements in repeated interviews settings. This study is a continuation of Izotovas et al. (2018) findings. In this experiment, the effects of different mnemonic techniques on immediate and delayed statements reported by truth tellers or liars were examined. It was found that truth tellers provided significantly more information than liars, both in the immediate interview, and after a two-week delay. Amongst the three mnemonics tested (context reinstatement, sketching and event-line), the event-line was the most effective mnemonic in discriminating between truthful and deceptive statements, achieving large effect sizes in terms of the amount of different types of detail (visual, spatial, temporal, and action) reported in the immediate (Cohen's d ranging from 1.08 to 1.47) and delayed statements (Cohen's d ranging from 0.78 to 1.40) (Izotovas et al., 2018). The event-line mnemonic is based on the Timeline interviewing format developed by Hope, Mullis, and Gabbert (2013), which is related to reproducing temporal context and a sequence of actions in an event.
In addition, truth tellers experienced more of a decline than liars in reporting details when comparing the immediate and delayed interviews (Izotovas et al., 2018). In other words, truth tellers showed patterns of reporting details indicative of genuine memory decay/forgetting, whereas liars showed patterns of a 'stability bias', defined as a metacognitive error to correctly understand the nature of memory decline over time (Kornell & Bjork, 2009).
As noted, previous research has shown that accuracy in detecting deception improves when people rely on the correct verbal cues (Bogaard & Meijer, 2018;Hauch et al., 2016;Mann et al., 2004). In the current study, we were interested whether observers' understanding of the Izotovas et al. (2018) findings was related to their lie detection performance. We therefore informed one group of participants about the previous findings and asked the participants to take this into account when making their veracity judgements in the subsequent lie detection task.
We tested two hypotheses. First, it was predicted that the accuracy rates in identifying truth tellers and liars would be higher in the informed group than in the uninformed group. Second, we predicted that accurate participants would rely more on the diagnostic verbal cues to deceit than inaccurate participants.

Participants.
A total of 92 volunteers participated in the study. The mean age of participants was M = 21.97 years (SD = 6.43) and 82.6% were female. Participants were recruited via posters, flyers, and the University's volunteer database. Fluent English speakers were required to take part in the study because their task was to evaluate the verbal content of the statements. Participants were awarded with £5 after they completed the experiment. The experiment was accepted by the Science Faculty Ethics Committee of the University.
Design. A 2 (Veracity: Truthful interviewee vs deceptive interviewee) X 2 (Instruction: Informed group vs uninformed group) experimental design was used with Veracity and Instruction as between-subjects factors. Dependent variables were participant's veracity judgments and the answers given to questions in a questionnaire: self-reported level of confidence, and perceived cues that affected their decisions. Participants were randomly assigned to the Informed (n = 46) and Uninformed (n = 46) groups. They were asked to read the statements reported either by truthful (n = 23) or by deceptive interviewees (n = 23). The allocation to the Veracity condition also occurred randomly. All participants completed the experiment individually, none of the data was gathered in groups.
Stimulus material. Forty six verbatim transcripts (23 truthful, 23 deceptive) obtained from a previous study (Izotovas et al., 2018) were used in the current experiment. In that study participants (n = 143) watched a video-recorded staged break-in to an apartment. They were instructed to tell the truth or lie about the event in the video. Each participant was interviewed twice about the event: Immediately and after a two-week delay. At the beginning of the immediate interview participants were asked to report everything they could remember about the event (free recall phase). After this they were given one of three mnemonics (context reinstatement, sketch, or event-line) and asked to describe the event again (mnemonic phase). In the delayed interview of the previous experiment, participants were asked to provide only a free recall. Only the transcripts of the 46 interviews using the event-line mnemonic were used in the current experiment.
Procedure. Each participant was randomly given one of the 46 set of transcripts. They were informed that they would now read two statements made by one person who might be lying or telling the truth about an incident, a break-in into an apartment. Participants were also notified that the first interview was conducted immediately after the alleged event, and the second interview two weeks later.
Informed group. Participants in the informed group were instructed that i) the amount of detail (e.g., descriptions of people and objects, spatial arrangements, events and activities) in the statement may be considered an indicator of truthfulness (that is, truth-tellers commonly report more details than liars), and ii) although the statements of truth tellers are usually richer, they tend to show a natural memory decline over time, whereas liars tend to report a similar amount of detail, no matter how much time has passed by since an event. Participants were instructed to take this into account when making their veracity judgments. The information was provided in written format to ensure all participants received the identical instruction. No additional instructions or trainings about detecting deception were provided to the informed group.
Uninformed group. The uninformed group was only asked to read two interview transcripts from one interviewee and no instructions about the credibility cues was given.
After reading the two statements, all participants were asked to make a veracity judgment (whether the statements were provided by a truth teller or liar). They were also asked to what extent they thought the statements were truthful/deceptive (1 = totally deceptive, 7 = totally truthful), and how confident they felt about their decision (1 = not at all, 7 = totally).
The informed participants only were also asked to rate: i) the extent to which their decision about the credibility of the statements was based on the amount of details in the immediate and the delayed statements (1 = not at all, 7 = totally), and ii) the extent to which their decision about credibility of the statements was based on the difference in the amount of information provided in the immediate and the delayed statement (for truthtellers: decline in details; for liars: similar amount of details) (1 = not at all, 7 = totally). These two items were used as manipulation checks. Finally, the informed participants were Coding of perceived cues. Participant's self-reported cues that affected their veracity decisions were classified into categories. The responses of the informed and uninformed groups regarding perceived cues were classified into categories. One coder, blind to veracity condition, made the following classification of the reported cues (some typical examples are provided in brackets): Richness of detail ("Detailed describing and remember colours and places and sequence of rooms"), Lack of detail ("The story was not very detailed with aspects of the area and rooms"), Change of details, contradictions ("He said in the first one there was two phones a Samsung, but this changed to an iPhone"), Coherent order Mnemonic Techniques and Lie Detection: Accuracy of Truth and Deception Judgments in Repeated Accounts ("Making sure the order was roughly the same"), Incoherent order ("The fact that the events were not in the exact order"), Consistency ("The details were the same the whole way through which made it more convincing"), Omissions ("Admitted not remembering certain things after the time period, such as the card number"), Reminiscences ("Explained seeing notice boards, phones and laptops which were not previously mentioned"), Plausibility ("The statements seemed to be realistic"), Confidence ("The second interview seemed more confident"), Speech errors, hesitations ("His grammar and his stuttering makes him out to be not fully honest about the events"), and responses that could not match to any of the categories were coded as Other ("Personal experience as a witness, having to describe details in a stressful situation"). To measure inter-rater reliability, a second coder was given the list of categories and asked to allocate each response to a category. In total 77.4% of the responses were classified into the same categories by both coders, showing a satisfactory inter-rater reliability. Discrepancies in coding were identified and resolved between the two coders.
Based on meta-analyses and reviews of deception detection research (DePaulo et al., 2003;Vredevelt, van Koppen, & Granhag, 2014;Vrij, 2008), we further classified the perceived cues categories into reliable cues, unreliable cues, and unknown cues to truth/ deceit, see Table 1. Note that some of the same cues were classified as either reliable or unreliable depending on participants' veracity decisions. For example, the cue 'richness of detail' was classified as reliable if the decision was made as truthful. However, this cue was treated as unreliable, if the decision was made as deceptive because large amount of details in a statement is considered as indication of truthfulness rather than deception (DePaulo et al., 2003;Vrij, 2008).

Results
Manipulation checks. When making veracity judgements, participants in the informed condition reported to have shown a tendency to rely on the amount of details (M = 5.52, SD = 1.01, 95% CI [5.24, 5.80]), and decline (for truth-tellers)/stability of details (for liars) between the immediate and delayed accounts (M = 5.54, SD = 1.11, 95% CI [5.24, 5.85]) when making their veracity judgements (measured on 7-point Likert scales). These results indicate that participants in the informed group followed the instructions given to them about the verbal cues to deceit. Self-reported confidence levels about veracity judgements did not differ between the informed (M = 4.67, SD = 1.25, 95% CI Accuracy of veracity judgements. We compared the accuracy rates obtained by the informed and uninformed groups. In the informed group, 65.2% of truthful statements were correctly classified compared to 47.8% in the uninformed group. These percentages did not differ significantly from each other, χ 2 (1) = 1.42, p = .234. The accuracy rate for deceptive statements was identical in the informed and uninformed groups: 52.2%, χ 2 (1) = 0.00, p = 1.00.
In hypothesis 1, we expected the informed group to show an higher accuracy than the uninformed group. The results showed accuracy figures of 58.7% (informed) and 50.0% (uniformed). Hence, the difference was in the predicted direction, but not statistically significant, χ (1, N = 92) = 0.70, p = .40.
To further examine the accuracy of judgements in the informed and uninformed groups, we analysed the 7-point scale veracity scales (the extent to which the participants rated the statements to be deceptive/truthful). For this purpose, inaccurate truthful and accurate deceptive judgements were converted. For example, if a participant rated a deceptive statement as 7 totally truth, his/her answer was converted into score 1, totally incorrect, and, if a participant rated a deceptive statement as 1 totally lie, the answer was converted into score 7, totally correct. In other words, the higher the score the more correct the participants were in their responses. A independent samples t-test showed that the accuracy rates between the informed group (M = 4.15, SD = 1.53, 95% CI [3.68, 4.59]) and uninformed group (M = 3.98, SD = 1.59, 95% CI [3.53, 4.41]) did not differ significantly, t(90) = 0.54, p = .594, d = 0.11. These results showed no support for Hypothesis 1, although the mean values speak in the predicted direction.
Judgements based on perceived deception cues. The frequencies of reported cues and their classification into reliable, unrelia ble, and unknown cues to truth/deceit in the informed and uninformed groups are shown in Table 1.
There was a significant difference in frequencies of reported cues between the groups, χ 2 (2) = 25.65, p = .007. In the informed group, speech errors, hesitations, 20.0% (truth and lie decisions combined), were the most frequently reported cue. In the uninformed group, consistency, 26.6% (truth and lie decisions combined), was the most frequently reported cue. When the rates of both groups and veracity decisions were combined, the distribution of the cues differed from chance, χ 2 (11) = 70.13, p < .001, with consistency, 21.0%, speech errors, hesitations, 16.9%, change of details, contradictions, 14.5%, and richness of detail, 14.5%, the most prevalent reported cues.
We then examined to which extent the participants based their decisions on reliability of cues. After creating three categories -reliable cues, unreliable cues and unknown cuesthe frequencies between these categories were compared, see Table 2. Although the distribution between the informed and uninformed groups differed significantly, χ 2 (2) = 8.05, p < .018, the majority in both groups reported unreliable cues. When frequencies of both groups were combined, the distribution differed from chance, χ 2 (2) = 90.79, p < .001, with unreliable cues being the most frequently reported cues across participants. We finally examined whether decisions based on reliable or unreliable cues were related to accuracy in the binary veracity judgements. For this we merged the answers for the uninformed and informed groups and disregarded the 'unknown cues' category. The accuracy rates between the two categories reliable and unreliable cues were compared. The results are presented in Table 3. Results showed that participants who mentioned reliable cues were not more accurate than those who mentioned unreliable cues, χ 2 (1) = 1.40, p = .237. Thus, Hypothesis 2 was not supported.

Discussion
In the current study, we found that the informed and uninformed participants were not statistically significantly different in their veracity judgements. One possible explanation is that the instruction we gave to the informed group was not effective to achieve improvements in deception detection accuracy. Previous training in interviewing to detect deception resulted in enhanced accuracy, but it involved at least a few hours of training (including theoretical information about reliable and unreliable cues to deception, practical examples, exercises, and feedback on trainees' performance; Hartwig, Granhag, Strömwall, & Kronkvist, 2006;Luke et al., 2016;Vrij et al., 2015), considerably longer than brief instruction participants in the current study received.
Although participants in the informed group indicated that they relied on the information provided in the instruction, their accuracy was not higher than participants in the uninformed group. However, the self-reported cues showed that the majority of participants relied on unreliable cues, including the informed ones. This finding is consistent with previous research that shows that lay people and practitioners tend to hold incorrect beliefs about deception (Global Deception Research Team, 2006;Strömwall, Granhag, & Granhag, 2004;Vrij, 2008). The results for the informed group support that such views are difficult to change.
Different explanations have been proposed about the origin of the incorrect beliefs to deception (Hartwig & Granhag, 2015;Vrij, 2008). For example, the moral explanation refers to the stereotypical view that lying is bad. If lying is bad, then people should feel ashamed and/or nervous about it and, therefore, display signs of nervousness (e.g., commit speech errors) (DePaulo et al., 2003). The current study showed that signs related to nervousness (speech errors, hesitations) were amongst the most prevalent cues mentioned by participants. In addition, the exposure explanation suggests that stereotypical behaviours associated with the deception (one example is 'consistency', and many participants reported to have relied on this cue) are prominent in the popular media (Vrij & Granhag, 2007). For example, a popular crime drama TV series 'Lie to Me' depicted the main character as a highly skilful security officer in detecting deception. However, many of the interviewing tactics and 'signs of deception' shown in these series were not consistent with scientific evidence (DePaulo et al., 2003;Hartwig & Bond, 2011;Vrij & Granhag, 2012).
Reliability of the reported cues was also not related to the accuracy of judgements. That is, participants who reported reliable cues were as inaccurate as participants who reported unreliable cues. This result could perhaps best explained by the finding that the number of reported reliable cues was very low in general.
One limitation of this study was that lay people, mostly students, took part in it. It is unknown how professionals (e.g., police officers) would perform in this study. In addition, participants were given only brief instruction about the veracity cues to base their judgements on. Apart from short guidance, it is unknown how the instruction in the informed group was perceived. For example, how and when the observers interpreted small or large amount of details in the reports, what kind of details they put emphasis on while reading the statements, did they read the entire interviews attentively, etc. These considerations should be addressed in the future lie detection studies of similar nature.
In conclusion, the current study showed that even when observers are given information about reliable cues to deception, they still used unreliable cues when making veracity judgements. Future studies could focus on examining the ways to prevent people from making veracity decisions based on unreliable cues. For example, training could involve not only informing trainees about reliable cues but also informing them about unreliable cues. Such training also could include information about the reasons why some cues are reliable and other cues are unreliable. This study contributes to the existing deception detection literature that veracity judgements can remain challenging, even with the use of effective interviewing techniques (such as mnemonics) eliciting large verbal differences between truth tellers and liars.