Is the N-Back Task a Valid Neuropsychological Measure for Assessing Working Memory? (2024)

  • Journal List
  • Arch Clin Neuropsychol
  • PMC2770861

As a library, NLM provides access to scientific literature. Inclusion in an NLM database does not imply endorsem*nt of, or agreement with, the contents by NLM or the National Institutes of Health.
Learn more: PMC Disclaimer | PMC Copyright Notice

Is the N-Back Task a Valid Neuropsychological Measure for Assessing Working Memory? (1)

Link to Publisher's site

Arch Clin Neuropsychol. 2009 Nov; 24(7): 711–717.

Published online 2009 Sep 18. doi:10.1093/arclin/acp063

PMCID: PMC2770861

PMID: 19767297

K.M. Miller,a,b,* C.C. Price,a,b,c M.S. Okun,b,d,e,f H. Montijo,g and D. Bowersa,b,e

Author information Article notes Copyright and License information PMC Disclaimer


The n-back is a putative working memory task frequently used in neuroimaging research; however, literature addressing n-back use in clinical neuropsychological evaluation is sparse. We examined convergent validity of the n-back with an established measure of working memory, digit span backward. The relationship between n-back performance and scores on measures of processing speed was also examined, as was the ability of the n-back to detect potential between-groups differences in control and Parkinson's disease (PD) groups. Results revealed no correlation between n-back performance and digit span backward. N-back accuracy significantly correlated with a measure of processing speed (Trail Making Test Part A) at the 2-back load. Relative to controls, PD patients performed less accurately on the n-back and showed a trend toward slower reaction times, but did not differ on any of the neuropsychological measures. Results suggest the n-back is not a pure measure of working memory, but may be able to detect subtle differences in cognitive functioning between PD patients and controls.

Keywords: Working memory, Executive function, Information processing speed, Parkinson's disease, Neuropsychology


The n-back is a sequential letter memory task frequently used in neuroimaging research (Braver et al., 1997; Manoach et al., 1997; ; Ragland et al., 2002). It parametrically varies working memory load, and thus task difficulty, while keeping overall task procedures constant across conditions. One promising aspect of the n-back is that there appear to be distinct neural substrates associated with task performance. Neuroimaging studies have demonstrated that increased working memory load on the n-back is associated with poorer performance in healthy participants and increased activation of the dorsolateral and inferior frontal regions of the prefrontal cortex (Braver et al., 1997; Manoach et al., 1997; Ragland et al., 2002). Behavioral performance on the n-back has been shown to discriminate between patients with dorsolateral prefrontal cortex dysfunction (e.g., schizophrenic patients) and healthy controls (). This suggests that n-back performance may be sensitive to the integrity of the frontal lobes, with greater working memory loads placing greater demand upon frontally mediated cognitive functions. If so, the n-back may be a useful task for assessment of working memory ability within the context of clinical neuropsychological evaluation.

Despite its widespread use in neuroimaging studies, examination of the n-back as a clinical measure has received little attention. The primary aim of the current report was to determine whether performance on the n-back shows convergent validity with a commonly used clinical measure of working memory, digit span backward. A secondary aim was to determine the relative influence of processing speed upon n-back performance by examining the relationship between n-back performance and clinical measures requiring a speeded processing component (Stroop word reading, Stroop color naming, and Trail Making Test Part A [TMT A]). A final aim was to compare n-back performance in a control and patient groups (individuals with Parkinson's disease [PD]) to examine the ability of the n-back to detect between-groups differences, as prior studies have shown that deficits in working memory are commonly found in PD patients early on in the disease course (Costa et al., 2003; Owen et al., 1992, 1995).

Materials and Methods


Participants included 21 patients with idiopathic PD and 17 normal controls. The PD patients were candidates for deep brain stimulation surgery and had been diagnosed by a fellowship-trained movement disorders neurologist at the University of Florida Movement Disorders Center applying U.K. Brain Bank criteria (). Healthy older controls were recruited from the community or were part of another ongoing study in the laboratory (principal investigator CCP). Informed consent to participate in research was obtained following the University of Florida IRB guidelines. Exclusion criteria for the both groups included history of head injury, neurological disease (other than PD in the patient group), learning disability, substance abuse, or major psychiatric disorder; current major medical illness (e.g., HIV or cancer); and possible dementia (defined by a score <26 on the Mini-Mental State Examination [MMSE]). Additional exclusion criteria for PD patients included evidence of secondary or atypical Parkinsonism, co-morbid movement disorders, and prior neurosurgical treatments including deep brain stimulation or lesion surgery. All PD patients were on dopa replacement medication. Six PD patients were on antidepressant medications. No controls were on antidepressants. No participants were on anticholinergic medications.

With respect to demographic characteristics, the sample was predominately men (controls: 10 men, 7 women; PD: 17 men, 4 women), had a mean age of 60 (control age range: 44–77; PD: 43–75), and was well-educated (mean of 15.8 years in each group). The groups did not significantly differ with respect to age, education, or MMSE scores (control mean MMSE: 29.2 [0.8]; PD: 28.4 [1.4]). There was trend toward higher GDS scores in the PD group (t(36) = 2.0, p < .06), although the mean GDS score for the PD group was still in the nondepressed range (control GDS mean: 2.5 [2.8]; PD: 5.0 [4.5]). Three PD patients scored in the depressed range (total score ≥10). None of the controls scored in the depressed range. PD patients had a mean disease duration of 10.5 years (SD = 5.7), a mean Hoehn–Yahr stage of 2 (range: 1–3), and a mean UPDRS motor score of 23.2 (SD = 7.1).


Participants were administered the n-back task and neuropsychological measures during two separate testing sessions. The neuropsychological measures were part of a larger test battery and were chosen for inclusion in this study because they assess working memory or processing speed, two constructs thought to underlie n-back performance. For the PD group, neuropsychological measures were administered as part of their routine clinical assessment. For the control group, measures were administered solely for research purposes. Each measure is described in detail below.

N-back task.

Participants were administered the n-back task (version used by Perlstein et al., 2003) on an Apple Imac laptop connected to a button box on which they made their responses. All participants used the index and middle fingers of their dominant hand to press one of two buttons denoting “target” and “nontarget” on a button box. In the 0-back condition, the target was any letter that matched a pre-specified letter (i.e., “c”). Thus, this condition required sustained attention but no working memory demand. In the 1-back condition, the target was any letter identical to the letter immediately preceding it (i.e., the letter presented one trial back). In the 2-back condition, the target was any letter that was identical to the one presented two trials back. In the 3-back condition, the target was any letter that was identical to the one presented three trials back.

Stimuli were pseudorandom sequences of consonants randomly varying in case and presented in a fixed central location on a computer screen using the PsyScope software () for a 500-ms duration with a 2500-ms interstimulus interval. Participants completed 12 blocks of trials (three blocks of each of the four conditions), with each block consisting of 25 trials. The first three trials of each block were never targets and of the remaining trials 30% were targets. Condition order was randomized across blocks and across participants, with the constraint that all four conditions were sampled in every set of four blocks. A short break (5–20 s) between blocks was provided to allow participants to rest. Prior to the start of the actual task, participants were trained on each of the four conditions. Participants were given up to three practice blocks (of 25 trials each) per condition with feedback on their performance, until they demonstrated that they understood the task and their performance stabilized. Reaction times (RTs) and accuracy measures were obtained for each trial.

Mini-Mental State Examination.

The MMSE () is a brief measure of cognitive status typically used to screen for dementia. A total of 30 points are possible. A cut-off score of ≥26 was used as the criterion for study inclusion.

Digit Span Subtest of the WAIS-III.

Digit Span (Wechsler, 1997) requires the participant to immediately recall and repeat a string of digits presented orally. On the forward trials, the participant must repeat the digits in the exact sequence as they were presented; on the backward trials, the digits must be repeated in the reverse order of presentation. The data reported refer to the longest digit span the participant was able to obtain in the forward and backward directions (not WAIS-III raw digit span scores).

Stroop Interference Task.

The Stroop task (Stroop, 1935) involves cognitive inhibition of overlearned reading responses. The Golden version (1978) was administered and consisted of three subtasks. In the first, participants were instructed to read as many words as possible in 45 s on an 8 × 11 card (Card 1). The words were “red,” “green,” and “blue” written in black ink. In the second subtask, participants were asked to name the color of ink in which a series of “X's” were written, again naming as many as possible in 45 s (Card 2). Finally, in the third subtask, participants were asked to name the color of ink in which a word was written (e.g., the word “blue” written in red ink would require the response of “red”; Card 3). The procedure for determining the interference score was followed from the administration and norms manual.

Trail Making Test Part A.

TMT A (Army Individual Test Battery, 1944) requires participants to use a pencil to connect numbers on a page in sequential order as quickly as they can.

Geriatric Depression Scale.

The Geriatric Depression Scale (GDS; Yesavage et al., 1982) is a 30-item self-report questionnaire in which participants must answer “yes” or “no” to each question. The GDS was specifically designed for older adults and avoids questions based on somatic symptoms of depression that may actually be attributable to medical conditions or changes associated with aging. The GDS was chosen because nonsomatic (i.e., cognitive) symptoms of depression have been shown to discriminate depressive disorders in PD (). A cut score of ≥10 was used to classify participants as “depressed.” This cut score has been found to have the greatest sensitivity and specificity in a PD sample (McDonald et al., 2006).

Statistical Analyses

For each participant, means and standard deviations of RTs for correct responses were computed for each n-back condition. Extreme RTs (defined as greater or less than three standard deviations, calculated per participant, per condition) were excluded from further analyses. Excluded trials accounted for only 2.6% of the total number of trials, and thus it is unlikely that their exclusion would alter the overall pattern of the data. N-back accuracy was calculated with the following algorithm: [1 − ((number of commissions + number of omissions)/total possible correct)] × 100. Both RT and accuracy data were normally distributed and met assumptions of univariate normality. Repeated-measures ANOVAs were used to examine group and load differences on these measures. Significant effects were decomposed through post-hoc t-tests. Performance of the PD and control groups on digit span forward and backward and the Stroop task was analyzed by t-tests. Completion times for TMT A were analyzed by Mann–Whitney tests due to the non-normal distribution of scores. Pearson's r or Spearman's rho (for TMT A) was used to examine the correlation between n-back and neuropsychological test performance. For the ANOVAs and t-tests, p was set at <.05 for significance. For correlation analyses, p was set at <.001 to control for the large number of comparisons.


Relationship Between N-Back Accuracy, RTs, and Neuropsychological Measures of Working Memory and Processing Speed

A series of correlation analyses were performed to examine the relationship between n-back accuracy and RT at each of the four loads and scores on the neuropsychological tests of interest (i.e., digit span forward and backward, TMT A, Stroop word reading, color naming). Correlation analyses were first computed for the PD and control groups separately. None of the correlations were statistically significant. The two groups were then combined into one larger group to increase the sample size as well as the range.

As shown in Table1, the primary findings were as follows: None of the n-back conditions showed a significant correlation with digit span forward or digit span backward. TMT A correlated significantly with 2-back accuracy. Neither Stroop word reading nor Stroop color naming showed a significant correlation with either accuracy or RT for any of the n-back conditions.


Correlation coefficients between n-back accuracy and neuropsychological measures

N-back condition
Digit span forward−0.120.12−0.12−0.13−0.300.06−0.200.02
Digit span backward−−0.20−0.160.06−0.01−0.07
Trail Making Test A+0.380.*−
Stroop word reading0.28−0.330.08−.350.170.070.310.11
Stroop color naming0.43−0.120.38−0.130.430.260.460.31

Open in a separate window

*p < .001.

+Completion times converted to t-scores (thus higher values indicate faster completion times).

Performance on Neuropsychological Measures

Parkinson and control groups did not significantly differ with respect to performance on digit span forward, digit span backward, TMT A, or Stroop word or color naming. Overall, the PD group displayed a pattern of greater variability in scores than the control group.

N-Back Accuracy

Accuracy data were analyzed through a 2 × 4 repeated-measures ANOVA with group as the between-subjects factor and load (0-, 1-, 2-, and 3-back) as the within subjects factor. Accuracy means and standard deviations by group and condition are shown in Table2. Results revealed a significant main effect for load, F(3, 108) = 84.7, p < .001. Post-hoc t-tests indicated that accuracy for each load significantly differed from accuracy for each of the other loads, with a pattern of decreased accuracy as working memory load increased (means: 0-back = 95.85 [4.48], 1-back = 88.27 [8.39], 2-back = 80.10 [8.30], 3-back = 73.41 [8.20]). There was also a main effect of group, F(1, 36) = 11.3, p < .001. The PD patients were significantly less accurate than controls (means: PD = 82.07 [1.03], control = 87.30 [1.15]). The load by group interaction was not significant (p > .1).


N-back performance by group

 Reaction time635.6 (144.0)551.8 (89.4)
 Accuracy94.7 (5.4)97.2 (2.6)
 Reaction time795.0 (178.3)697.1 (146.0)
 Accuracy86.0 (8.8)91.1 (7.2)
 Reaction time951.0 (286.1)827.2 (206.0)
 Accuracy77.1 (7.7)83.9 (7.6)
 Reaction time961.1 (296.6)858.9 (242.3)
 Accuracy70.5 (8.9)77.0 (5.7)

Open in a separate window

Notes: Values are expressed as mean (SD). Reaction times are in ms and accuracy scores are percentage correct. Values in bold indicate a significant group difference (p < .05). Accuracy data showed a main effect of load and group. Reaction time data showed a main effect of load.

N-Back RTs

Analysis of RT data revealed a significant effect of load, F(3,108) = 54.7, p < .001. Post-hoc t-tests indicated that RTs for each load differed significantly from RTs for all other loads, with the exception of 2- and 3-back RTs, which did not significantly differ from one another. In general, there was a pattern of increased RTs as working memory load increased (means in milliseconds: 0-back = 598.08 [128.23], 1-back = 751.21 [169.83], 2-back = 895.6 [257.82], 3-back = 915.40 [274.92]). There was a trend toward a main effect of group, although it did not reach significance, F(1, 36)=3.2, p = .1, with the PD group showing slower RTs than the controls (means: PD = 835.69 [40.40], controls = 733.34 [44.90]). The group by load interaction was nonsignificant, F(3,108) = 0.18, p > .9. Means and standard deviations of RT data by group are shown in Table2.

Influence of Depression, Antidepressant Medication, and Gender on N-Back Performance

The above accuracy and RT analyses were repeated with GDS scores and gender as covariates to determine whether performance was influenced by these variables. The pattern of findings remained the same for both accuracy and RTs. Similarly, removing the six PD patients on antidepressants from the sample did not affect the pattern of results.


The primary aim of the present report was to determine whether the n-back task is a valid clinical neuropsychological measure of working memory. To answer this question, we examined the convergent validity between n-back accuracy and digit span backward, an established working memory measure. Owing to the task's time-pressured design, we also examined the relationship between n-back accuracy and measures of information processing speed. Correlational analyses revealed that n-back accuracy (at each of the 0-, 1-, 2-, and 3-back loads) did not significantly correlate with digit span backward. A series of correlational analyses between n-back accuracy at each level, TMT A, Stroop word reading, and Stroop color naming found a significant correlation between 2-back accuracy and TMT A only.

An additional study aim was to compare n-back performance and performance on standard measures of working memory and processing speed in control and PD groups to examine the n-back's ability to detect between-groups differences. We found that the two groups did not significantly differ with respect to digit span forward or backward, TMT A, Stroop word reading, or Stroop color naming. In contrast, the PD group showed significantly poorer accuracy on the n-back as well as a trend toward slower RTs, suggesting the n-back may be able to detect subtle group differences in performance.

One potential explanation for the surprising lack of relationship between n-back accuracy and digit span backward may be that the n-back is a visually presented, as opposed to aurally presented, working memory task. This may prime participants to use a mental imagery strategy rather than a verbally mediated strategy. One way to examine this hypothesis would be to include a measure of visual working memory, such as a spatial span task or self-ordered pointing task () to see if significant correlations emerge between the n-back and these tasks.

The present results are somewhat consistent with n-back findings from Parmenter, Shucard, Benedict, and Shucard (2006). These authors conducted a principal components analysis of performance on the n-back, Paced Auditory Serial Addition Task (PASAT), and other measures of executive function in a combined sample of patients with multiple sclerosis and healthy controls. They found that n-back accuracy and RTs, PASAT accuracy, and TMT A and B all loaded on a common factor, which they conceptualized as speeded information processing. Digit span forward and backward loaded together on a nonspeeded working memory factor. This is similar to our finding that n-back performance showed a stronger relationship with a test of speeded information processing (TMT A) than with a nonspeeded test of working memory (digit span backward).

Our findings are consistent with past reports of a weak association between n-back performance and performance on other working memory tasks, including reading span, complex span tasks, and mathematics-based tasks (; Oberauer, 2005; Roberts and Gibson, 2002). On the basis of these findings, Kane and colleagues (2007) concluded that “the n-back has too long been used by cognitive neuroscientists without serious efforts to assess its construct validity, and now we may have to reappraise past findings (p. 621).” Indeed, the results of the current study do not support the validity of the n-back as a pure measure of working memory, at least in our limited combined sample of controls and PD patients.

The present study has several limitations, including a small sample size (thus it may be underpowered), a predominantly male sample, and a highly educated PD group consisting of candidates for DBS surgery, which may not represent the typical PD patient. The PD group was also tested while on dopaminergic medication, which may have improved task performance given prior findings that dopamine modulates working memory (Costa et al., 2003).

In conclusion, our study did not demonstrate convergent validity between the n-back and a known working memory measure, digit span backward. Instead, our results suggest that n-back accuracy may rely more on information processing speed or motor speed than on working memory in a PD sample, as evidenced by a correlational relationship with TMT A (albeit at the 2-back load only). Our study argues against using the n-back as a measure of working memory in a PD population; however, our results suggest that n-back accuracy scores may be useful in detecting subtle differences in cognitive functioning between control and PD groups. Until further validation studies are conducted that clearly elucidate the constructs underlying n-back performance, we recommend clinicians continue to assess working memory in PD through established tests that do not rely upon speeded processing, such as digit span and letter–number sequencing.


We would like to acknowledge the support of NIH R01-NS050633 (DB), NIH K23 NS060660-01 (CCP), NIH F32 AG021363-01 (CCP), NIHK23 NS044997 (MSO), the National Parkinson Foundation Center of Excellence, the Michael J. Fox Foundation, the McKnight Brain Institute, UF and Shands, and the University of Florida Colleges of Medicine and Public Health and Health Professions.

Conflict of Interest

None declared.


  • Army individual test battery. Washington, DC: War Department, Adjutant General's Office; 1944. [Google Scholar]
  • Braver T. S., Cohen J. D., Nystrom L. E., Jonides J., Smith E. E., Noll D. C. A parametric study of prefrontal cortex involvement in human working memory. Neuroimage. 1997;5(1):49–62. [PubMed] [Google Scholar]
  • Cohen J. D., MacWhinney B., Flatt M. R., Provost J. PsyScope: A new graphic interactive environment for designing psychology experiments. Behavior Research Methods, Instruments, and Computers. 1993;25:257–271. [Google Scholar]
  • Costa A., Peppe A., Dell'Agnello G., Carlesimo G. A., Murri L., Bonuccelli U., et al. Dopaminergic modulation of visual-spatial working memory in Parkinson's disease. Dementia and Geriatric Cognitive Disorders. 2003;15(2):55–66. [PubMed] [Google Scholar]
  • Folstein M. F., Folstein S. E., McHugh P. R. Mini-mental State: A practical guide for grading the cognitive state of patients for the clinician. Journal of Psychiatric Research. 1975;12:189–198. [PubMed] [Google Scholar]
  • Hughes A. J., Daniel S. E., Kilford L., Lees A. J. Accuracy of clinical diagnosis of idiopathic Parkinson's disease: A clinico-pathological study of 100 cases. Journal of Neurology, Neurosurgery, and Psychiatry. 1992;55(3):181–184. [PMC free article] [PubMed] [Google Scholar]
  • Kane M. J., Conway A. R., Miura T. K., Colflesh G. J. Working memory, attention control, and the N-back task: A question of construct validity. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2007;33(3):615–622. [PubMed] [Google Scholar]
  • Leentjens A. F., Marinus J., Van Hilten J. J., Lousberg R., Verhey F. R. The contribution of somatic symptoms to the diagnosis of depressive disorder in Parkinson's disease: A discriminant analytic approach. The Journal of Neuropsychiatry and Clinical Neurosciences. 2003;15(1):74–77. [PubMed] [Google Scholar]
  • Manoach D. S., Schlaug G., Siewert B., Darby D. G., Bly B. M., Benfield A., et al. Prefrontal cortex fMRI signal changes are correlated with working memory load. Neuroreport. 1997;8(2):545–549. [PubMed] [Google Scholar]
  • McDonald W. M., Holtzheimer P. E., Haber M., Vitek J. L., McWhorter K., Delong M. Validity of the 30-item geriatric depression scale in patients with Parkinson's disease. Movement Disorders: Official Journal of the Movement Disorder Society. 2006;21(10):1618–1622. [PubMed] [Google Scholar]
  • Oberauer K. Binding and inhibition in working memory: Individual and age differences in short-term recognition. Journal of Experimental Psychology: General. 2005;134(3):368–387. [PubMed] [Google Scholar]
  • Owen A. M., James M., Leigh P. N., Summers B. A., Marsden C. D., Quinn N. P., et al. Fronto-striatal cognitive deficits at different stages of Parkinson's disease. Brain. 1992;115(Pt. 6):1727–1751. [PubMed] [Google Scholar]
  • Owen A. M., Sahakian B. J., Hodges J. R., Summers B. A., Polkey C. E., Robbins T. W. Dopamine-dependent frontostrial planning deficits in early Parkinson's disease. Neuropsychology. 1995;9:126–140. [Google Scholar]
  • Parmenter B. A., Shucard J. L., Benedict R. H., Shucard D. W. Working memory deficits in multiple sclerosis: Comparison between the n-back task and the Paced Auditory Serial Addition Test. Journal of the International Neuropsychological Society. 2006;12(5):677–687. [PubMed] [Google Scholar]
  • Perlstein W. M., Carter C. S., Noll D. C., Cohen J. D. Relation of prefrontal cortex dysfunction to working memory and symptoms in schizophrenia. The American Journal of Psychiatry. 2001;158(7):1105–1113. [PubMed] [Google Scholar]
  • Perlstein W. M., Dixit N. K., Carter C. S., Noll D. C., Cohen J. D. Prefrontal cortex dysfunction mediates deficits in working memory and prepotent responding in schizophrenia. Biological Psychiatry. 2003;53(1):25–38. [PubMed] [Google Scholar]
  • Petrides M., Milner B. Deficits on subject-ordered tasks after frontal- and temporal-lobe lesions in man. Neuropsychologia. 1982;20:249–262. [PubMed] [Google Scholar]
  • Ragland J. D., Turetsky B. I., Gur R. C., Gunning-Dixon F., Turner T., Schroeder L., et al. Working memory for complex figures: An fMRI comparison of letter and fractal n-back tasks. Neuropsychology. 2002;16(3):370–379. [PMC free article] [PubMed] [Google Scholar]
  • Roberts R., Gibson E. Individual differences in sentence memory. Journal of Psycholinguistic Research. 2002;31(6):573–598. [PubMed] [Google Scholar]
  • Stroop J. R. Studies of interference in serial verbal reactions. Journal of Experimental Psychology. 1935;18:643–662. [Google Scholar]
  • Wechsler D. Wechsler Adult Intelligence Scale—Third edition. San Antonio, TX: The Psychological Corporation; 1997. [Google Scholar]
  • Yesavage J. A., Brink T. L., Rose T. L., Lum O., Huang V., Adey M., et al. Development and validation of a geriatric depression screening scale: A preliminary report. Journal of Psychiatric Research. 1982;17(1):37–49. [PubMed] [Google Scholar]

Articles from Archives of Clinical Neuropsychology are provided here courtesy of Oxford University Press

Is the N-Back Task a Valid Neuropsychological Measure for Assessing Working Memory? (2024)
Top Articles
Latest Posts
Article information

Author: Golda Nolan II

Last Updated:

Views: 6136

Rating: 4.8 / 5 (78 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Golda Nolan II

Birthday: 1998-05-14

Address: Suite 369 9754 Roberts Pines, West Benitaburgh, NM 69180-7958

Phone: +522993866487

Job: Sales Executive

Hobby: Worldbuilding, Shopping, Quilting, Cooking, Homebrewing, Leather crafting, Pet

Introduction: My name is Golda Nolan II, I am a thoughtful, clever, cute, jolly, brave, powerful, splendid person who loves writing and wants to share my knowledge and understanding with you.