Reproducibility of Single-Subject Functional Connectivity Measurements

BACKGROUND AND PURPOSE: Measurements of resting-state functional connectivity have increasingly been used for characterization of neuropathologic and neurodevelopmental populations. We collected data to characterize how much imaging time is necessary to obtain reproducible quantitative functional connectivity measurements needed for a reliable single-subject diagnostic test. MATERIALS AND METHODS: We obtained 100 five-minute BOLD scans on a single subject, divided into 10 sessions of 10 scans each, with the subject at rest or while watching video clips of cartoons. These data were compared with resting-state BOLD scans from 36 healthy control subjects by evaluating the correlation between each pair of 64 small spheric regions of interest obtained from a published functional brain parcellation. RESULTS: Single-subject and group data converged to reliable estimates of individual and population connectivity values proportional to 1 / sqrt(n). Dramatic improvements in reliability were seen by using ≤25 minutes of imaging time, with smaller improvements for additional time. Functional connectivity “fingerprints” for the individual and population began diverging at approximately 15 minutes of imaging time, with increasing reliability even at 4 hours of imaging time. Twenty-five minutes of BOLD imaging time was required before any individual connections could reliably discriminate an individual from a group of healthy control subjects. A classifier discriminating scans during which our subject was resting or watching cartoons was 95% accurate at 10 minutes and 100% accurate at 15 minutes of imaging time. CONCLUSIONS: An individual subject and control population converged to reliable different functional connectivity profiles that were task-modulated and could be discriminated with sufficient imaging time.

S ince the discovery that functionally related brain regions show synchronized fluctuations in BOLD signal intensity, 1 fcMRI has emerged as a useful tool in identifying the organization of network-level brain architecture. [2][3][4] There is increasing use of fcMRI to characterize neurodevelopmental and neuropathologic conditions on the basis of differences in brain network anatomy and functional network connectivity. Differences in group means in specific or aggregate metrics of functional connectivity between brain regions have been reported in dementia, [5][6][7] autism, [8][9][10][11][12][13][14][15] Tourette syndrome, 16 schizophrenia, [17][18][19][20][21] obsessive compulsive disorder, 22 and multiple sclerosis, 23,24 among other conditions. Identifying such abnormalities in disease populations helps characterize the pathophysiology of the disease but is of little use in diagnosis, endophenotype development, genome-wide association studies, prognosis, and treatment monitoring unless such metrics can be reliably obtained from individual subjects. Basic science studies investigating changes in brain connectivity in development and aging would also greatly benefit from robust single-subject metrics. Classification of pathologic or cognitive states also requires reliability at the level of individual subjects. Yet the extent to which functional connectivity measurements are reproducible and key questions related to experimental design such as necessary scanning duration and choice of task have only recently been subjects of investigation. Core network anatomy of resting-state networks is preserved across subjects, with reproducible qualitative identification of key anatomic relationships, such as in the default mode and attentional networks. 25 A study by using independent-component analysis to define resting-state networks in 14 patients each with 5 scans showed similar boundaries of key resting-state networks across sessions, with a voxelwise analysis showing approximately 20% of voxels exhibiting a main effect of session. 26 Test-retest studies have been performed on a small number of scans from many subjects. A study in which 3 scans were obtained on 26 subjects found moderate reproducibility, depending on the strength of the correlation between measurements and the network in which the correlate brain regions were located. 27 In a study assessing reliability as a function of imaging time, reliability measures decreased with the square root of imaging time, with intersession correlation improving from 0.7 to 0.85 when 40 minutes of imaging time was used instead of 5 minutes. 4 Yet the same study showed that average correlation strengths over an entire network stabilized after approximately 5 minutes of imaging time, reaching asymptotic values. 4 These results led the authors to conclude that surprisingly reliable estimates could be obtained from a single 5-minute run and that increasing imaging time resulted in marginally small improvements in reliability. 4 Although these results are encouraging, the use of functional connectivity as a specific diagnostic test or for use in single-subject classifications will likely require identifying subtle quantitative differences in a subset of "connections" between brain regions. We investigated the reliability of individual functional connectivity measurements within a single subject between small regions of interest to better characterize the incremental improvement in reliability with increased imaging time.

Subject Characteristics
One hundred 5-minute scans were obtained during 10 imaging sessions (10 scans per session) on a single male subject (age, 39 years) during a 3-week period. Five sessions were obtained while the subject was instructed to keep his eyes open and remain awake, and 5 sessions were obtained while the subject was watching 10 five-minute clips from Bugs Bunny cartoons (Looney Tunes Golden Collection, Volume 1, Warner Home Video). The same 10 clips were used for each of the 5 cartoon sessions in the same order, with the clips synchronized to the onset of the BOLD acquisition by a fiber-optic trigger pulse.
Additionally, BOLD fMRI data were obtained from 36 healthy adolescent and adult volunteers examined after informed consent in accordance with procedures approved by the institutional review board. A subset of these data has been previously reported. 28 Subjects were between the ages of 17 and 53 years: 16 male, 20 female. All subjects had no Diagnostic and Statistical Manual of Mental Disorders-IV Axis I diagnoses based on diagnostic semistructured psychiatric interviews and screening surveys as previously described. 28

Data Acquisition
Images were acquired on a 3T Magnetom Trio scanner (Siemens, Erlangen, Germany) with a 12-channel head coil. The scanning protocol consisted of an initial 1-mm isotropic MPRAGE acquisition for an anatomic template. BOLD echo-planar images (TR ϭ 2.0 seconds, TE ϭ 28 ms, generalized autocalibrating partially parallel acquisition with acceleration factor ϭ 2, 40 sections at 3-mm section thickness, 64 ϫ 64 matrix) were obtained during the resting state, in which subjects were instructed to keep their eyes open and remain awake and try to let thoughts pass through their minds without focusing on any particular mental activity. Prospective motion correction was performed during BOLD imaging with a prospective acquisition-correction technique sequence. An 8-minute resting scan (240 volumes) was obtained for each of the group subjects. One hundred 5-minute scans (155 volumes) were obtained for the individual subject. An additional field map scan was obtained for each subject for distortion correction. For all BOLD sequences, simultaneous plethysmograph (pulse oximeter) and chest excursion (respiratory belt) waveforms were recorded for off-line analysis.

Region-of-Interest Correlation
Sixty regions of interest were adapted from published peak MNI coordinates from a study by using high-model-order independent component analysis to parcellate the brain into 42 independent components based on functional connectivity. 32 Coordinates were selected so as to represent 30 pairs of interhemispheric homologues. Four additional coordinates were chosen along the midline. Five-millimeter-radius spheric regions of interest were selected for each of these 64 coordinates (On-line Table). The coordinates were selected such that each coordinate was at least 12 mm from every other coordinate to avoid any overlap in the regions of interest. Mean time series were extracted from each 5-minute scan from each of the 64 regions of interest. The Pearson correlation coefficient between each pair of regions of interest was measured, and Fisher z-transform was performed to obtain a 64 ϫ 64 matrix of correlation between the regions. The correlation procedure was performed separately for each of the group subjects by using 5 or 8 minutes of data, respectively.
A similar approach was performed to extract the mean time series from each of 116 regions in an anatomic parcellation of the brain by using an MNI-normalized version of the AAL atlas, 33 packaged with the WFU PickAtlas toolbox software (http://www.fmri.wfubmc.edu/ cms/software). 34 This approach yielded a 116 ϫ 116 matrix of ztransformed correlation values for one 8-minute scan from each of the 36 group subjects and from 100 five-minute scans from the individual subject.

Reproducibility Calculations
As a measurement of reproducibility, we calculated the mean difference in correlation that would be obtained if a set of measurements was repeated. For intrasession measurements, we selected, at random, 2 groups of k-scans and compared results to those from another group of k-scans from the same scanning session. This process was repeated 100 times for each number of k-scans (1)(2)(3)(4)(5). In each case, we took, for each connection between region i and region j, the mean Fisher-transformed correlation r 1 (i,j) from the first group of k-scans and the mean Fisher-transformed correlation r 2 (i,j) from the second group of kscans to calculate mean difference in correlation: Mean Difference in Correlation for k-scans ϭ ͱ j-1 64 jϭ1Ϫ1 64 (r 1 (i, j)Ϫr 2 (i, j)) (64ϫ63)/ 2 .
Over the 100 samples for each number of scans, means, and SDs of the mean differences in correlations were used for estimates of reproducibility. Analogous measurements were obtained for intersession re-

ORIGINAL RESEARCH
producibility, by using 100 samples each of between 1 and 10 scans from different sessions within the same subject, and for interindividual reproducibility, by using 100 samples each of between 1 and 18 unique subjects of the 36 subjects in the study.

Statistical and Classifier Analysis
To calculate significant differences between individual and group correlation "fingerprints," we evaluated, by using a 2-tailed t test, whether a given connection between regions i and j was significantly different between k-scans from the same individual and the 36 scans from different individuals. Significance was taken at P ϭ .05, Bonferroni-corrected for the number of connections considered. For example, when evaluating the full 64 ϫ 64 matrix of connections between ICA-based coordinates, we divided P values by (64 ϫ 63 / 2). When evaluating just the 30 interhemispheric homolog pairs, we divided P values by 30. A connection was considered significant if this Bonferroni-corrected P value was significant for 95 of 100 random samples of k-scans from the 50 resting scans in the individual. Classification between rest and cartoon scans was performed by obtaining rest and cartoon "standards" from the first 30 rest scans and first 30 cartoon scans. Mean correlation between each pair of regions was calculated across the 30 scans of each type. For testing, 100 samples of 1, 2, or 3 randomly selected rest and cartoon scans from the 20 remaining scans of each type were chosen and the mean correlation values for each connection were compared with the rest and cartoon standards by evaluating the mean difference in correlation as described above. Classification was performed on the basis of whether the test sample showed a mean difference in correlation closer to the rest standard or the cartoon standard.

Results
We measured the reproducibility of quantitative measures of functional connectivity within a single subject. For this subject, we obtained 10 five-minute BOLD scans in each of 10 imaging sessions, half obtained while the subject was resting with eyes open and half obtained while the subject was watching cartoons. Intrasession reproducibility is shown in Fig 1. The mean difference in correlation between the 2016 pairs of 64 regions of interest from measurements obtained from different scans in the same session showed consistent improvement in reproducibility as more scans were averaged.
The reproducibility of measurements while the subject was watching cartoons was slightly better, though within 1 SD of measurements obtained in the no-stimulus condition. This difference was true despite the fact that different stimuli were shown in each scan within a session. Figure 1B shows the mean difference in correlation between groups of 5 scans within the same session, which indicates that this difference is primarily due to fewer outliers among groups of resting scans.
Similar measurements were obtained for differences in correlation between groups of scans obtained in different imaging sessions, shown in Fig 2. In this case, the groups of cartoon scans consisted of the same stimuli in each of the 2 sessions compared for each group of scans. What is surprising, the difference in correlation between scans in the same session, some of which were within minutes of each other, was nearly as great as that between scans obtained on separate days.
We also compared scans from 36 different individuals, all obtained in an eyes-open resting state, shown in Fig 3A. Because these scans were 8 minutes in length, we evaluated difference in correlation by using the entire scan or just the first 5 minutes, for better comparison with individual results shown in the first 2 figures.
Again, the differences in correlation between groups of subjects were only slightly higher than those obtained within a single subject and showed a nearly identical rate of decrease as more scans were averaged as was seen in the individual subject results. This rate of decrease was closely approximated by a curve proportional to 1 / sqrt(n), where n was the number of scans averaged, equivalent to the relationship observed by Van Dijk et al. 4 We additionally compared the mean correlation of 20 resting ( Fig 3B) scans within 1 subject to another 20 resting scans from the same subject to show the reproducibility in each of the 2016 pairs of regions used in the analysis. Comparison of 50 scans from 1 subject with 36 scans all from different subjects ( Fig 3C) does not show reproducibility obtained within individual or group results alone, suggesting that individual and group measurements converge to different values.
The scatterplot in Fig 3B shows similar reproducibility throughout the entire set of "connections," even though pairs of regions differ widely in correlation and likely differ in the extent of underlying anatomic connectivity. These results would indicate that functional connectivity measurements do not require direct connections to achieve consistent absolute individual or population correlation between 2 cortical regions, and functional connectivity information may be useful even when the 2 regions considered are anatomically neither related nor lie within the same resting-state network.
This possibility is further evaluated by considering reproducibility within different subsets of region-of-interest pairs. The regions of interest we selected were generated from a study by using relatively high-model-order independentcomponent analysis to parcellate the brain. Therefore, we have additional information about which regions of interest were within the same independent component. Moreover, we computed separately, for each pair of connections, whether the correlation was significantly different from zero in the group results and individual resting-state results, by using 2-tailed t tests with Bonferroni corrections for multiple comparisons. The regions of interest we selected included 30 pairs of interhemispheric homologues, which are among the most robust functional connections in the brain. 35 Finally, we used a completely separate parcellation of the brain, the AAL atlas, which did not include functional connectivity information in generation of regional boundaries. 33,34 If functional connectivity measurements are more or less reliable on the basis of the degree of underlying anatomic connectivity or boundaries of resting-state networks, then we would predict that the reliability of the correlation would be higher for more anatomically connected pairs of regions.
Rather, we saw a trend toward the opposite relationship, illustrated in Fig 4. The best reproducibility was seen within the set of all pairs of the 64 regions selected. When we restricted measurements to only those pairs of regions that were significantly connected in the individual or group data or to pairs of regions within the same independent component or to interhemispheric homologues, the mean differences in correlation were higher. Error in reproducibility between 116 regions in the AAL atlas (which were spatially larger than the 5-mm-radius regions defined by coordinates chosen from peak independent-component goodness-of-fit measurements) was slightly higher than that using the ICA method. All subsets of correlation measurements showed improved reproducibility with an increasing number of scans averaged, following a relationship approximating ␣ / sqrt(n), where ␣ is the SD of the correlation measurements within the sample.
The similarity of reproducibility of functional connectivity measurements within a session, across sessions but within an individual, and across individuals is compared directly in Fig  5A. These individual and population means are different, as indicated in Fig 5B, where differences between the individual and group diverge with increased imaging time of the individual. Increased imaging time allows identification of more pairs of regions that can significantly distinguish an individual from the group even in Յ4 hours of imaging time. Moreover, more such significant connections establishing an individual's functional connectivity fingerprint from the group mean are identified within the entire sample of pair-wise connections than within subsets restricted to pairs of regions with expected higher anatomic connectivity, even with the more stringent multiple comparison correction required in the larger sample of correlation pairs. Thus, information is present distinguishing the individual from the group that is not restricted to pairs of connections with the strongest underlying anatomic or functional connectivity.
Functional connectivity values are modulated by the underlying task a subject is performing. This is demonstrated by evaluating the performance of a simple task classifier (Fig 6). The first 30 resting scans and the first 30 cartoon scans were each averaged for each of the 2016 pairs of regions of interest studied to form resting and cartoon "standards." Then, groups of 1, 2, or 3 resting or cartoon scans from the remaining 20 scans were compared with the standards to obtain a mean difference in correlation from each standard. Classification was performed by assessing whether each group of test scans showed a smaller mean difference in correlation from the resting and cartoon standards. Classification by using 1, 2, and 3 five-minute scans showed 50%, 95%, and 100% classification accuracy.

Discussion
We show that within a single subject, each correlation measurement between any 2 small regions of interest in the brain converged to an absolute number as a function of 1 over the square root of imaging time. Similarly, within a population of healthy control subjects, any given connection converged to a population mean at a similar rate as a function of imaging time  and number of subjects. These individual and population connectivity fingerprints could be discerned reliably beginning at approximately 15 minutes of imaging time, with continued significant improvements in reliability with Յ4 hours of imaging time. Functional connectivity measurements were slightly more reliable during a constrained task (watching video clips) than during a traditional resting-state paradigm.
Our results indicate that though the core architecture of resting-state networks may be discerned with brief imaging times, investigators and clinicians, nevertheless, should consider the advantages of a much longer imaging time if singlesubject results are desired rather than population means. Our data, similar to those of Shehzad et al 27 and Van Dijk et al, 4 indicate that only moderate reliability is present for individual correlation measurements from a 5-minute scan. Because we observe small differences in such individual measurements between intrasession, intersession, and intersubject reproducibility, it is very likely that these measurements have intrinsic noise due to technical factors or moment-to-moment changes in brain activity that is much larger than the effects of interest between individuals or cognitive states.
We use the distinction of the functional connectivity profile of a healthy individual from a healthy population as a benchmark for the reliability of individual connectivity measurements. No single connection was found to reliably identify the individual from the group with Ͻ25 minutes of scanning time for the individual. At 4 hours of imaging time, almost 100 of the 2016 pair-wise correlation measurements were signifi-cantly different between the individual and group. Although identifying individuals with pathologically altered functional connectivity may require less imaging time, the large variability in measurements obtained from brief scans is likely to limit accurate classification unless ensembles of connections are used. Using ensembles of connections in turn may limit the ability to make more subtle distinctions between disease subpopulations, to develop connectivity-based endophenotypes, or to perform in a robust manner despite differences in task performance.
Task modulation of functional connectivity has been observed in several studies. [36][37][38][39] A study specifically evaluating resting-state scans obtained with the subject's eyes open or closed showed quantitative differences in functional connectivity in these 2 states. 40 Given that BOLD fluctuations have been shown to be related to behavioral measures of task performance, [41][42][43] it seems likely that constraints on the task performed during acquisition will affect functional connectivity results and reliability.
Our results show that within a single subject, a classifier shows increasingly robust ability to discriminate differences in functional connectivity attributable to task with increased imaging time. With 15 minutes of BOLD imaging, our results suggest that even the simple classifier we used was able to distinguish a resting state from when the subject was watching cartoons. Although the results distinguishing the individual from the group in our study may be attributable merely to differences in anatomy, such as the percentage of gray matter  within a given region of interest, the distinction based on task can only be attributed to actual differences in functional connectivity.
If longer imaging times are used for functional connectivity measurements, a significant problem is likely to be subject's ability to tolerate longer scanning times while maintaining wakefulness. Given the significant differences in connectivity we and others have observed related to task, with some significant differences found in connectivity with sleep, 44 light sedation, 45 or even simply eyes closed versus eyes open, 40 vigilance related to subject wakefulness is warranted. If a more constrained task such as watching video clips results in improved reliability of functional connectivity measurements, it may be preferable to acquire connectivity measurements during a task. Although differences in task performance between individuals or populations may be a confounding variable in functional connectivity studies, this is not necessarily different from data acquired in the resting state. It is also possible that the resting-state task can be performed in very different ways in groups that relate to cognitive content or other factors. It remains to be determined whether particular tasks (including the conventional resting task) show improved ability to discriminate functional connectivity differences in pathologic or developmental subjects.
Our approach has several limitations. Restricting analysis to an extended one of a single subject limits generalizability. Nevertheless, although reproducibility may vary from subject to subject, our results showing an extended characterization of 1 subject as well as a group sample indicate that both singlesubject and group results consistently show improvement in reliability with 1 / sqrt(n) for imaging time n, identical to that seen by Van Dijk et al. 4 Thus, it is reasonable to anticipate that other subjects would show scaled reliability curves that may differ in quantitative values but would be otherwise similar. It is possible that it may require less imaging time to characterize subjects with pathologic connectivity values in a diagnostic test or classifier. Because our results were all obtained on the same scanner, we do not have data on the effects of different scanners on the reliability of functional connectivity measurements.

Conclusions
In a characterization of reproducibility of functional connectivity measurements within a single subject, we demonstrated that an individual and a population of subjects each converge to different functional connectivity profiles with increasing imaging time. Given only moderate reproducibility of quantitative functional connectivity measurements in brief scans, we suggest that 15-25 minutes or greater of BOLD imaging be performed when possible for studies used in single-subject diagnosis or classification.