Open Access
27 July 2024 Adjusting effective multiplicity (Meff) for family-wise error rate in functional near-infrared spectroscopy data with a small sample size
Yuki Yamamoto, Wakana Kawai, Tatsuya Hayashi, Minako Uga, Yasushi Kyutoku, Ippeita Dan
Author Affiliations +
Abstract

Significance

The advancement of multichannel functional near-infrared spectroscopy (fNIRS) has enabled measurements across a wide range of brain regions. This increase in multiplicity necessitates the control of family-wise errors in statistical hypothesis testing. To address this issue, the effective multiplicity (Meff) method designed for channel-wise analysis, which considers the correlation between fNIRS channels, was developed. However, this method loses reliability when the sample size is smaller than the number of channels, leading to a rank deficiency in the eigenvalues of the correlation matrix and hindering the accuracy of Meff calculations.

Aim

We aimed to reevaluate the effectiveness of the Meff method for fNIRS data with a small sample size.

Approach

In experiment 1, we used resampling simulations to explore the relationship between sample size and Meff values. Based on these results, experiment 2 employed a typical exponential model to investigate whether valid Meff could be predicted from a small sample size.

Results

Experiment 1 revealed that the Meff values were underestimated when the sample size was smaller than the number of channels. However, an exponential pattern was observed. Subsequently, in experiment 2, we found that valid Meff values can be derived from sample sizes of 30 to 40 in datasets with 44 and 52 channels using a typical exponential model.

Conclusions

The findings from these two experiments indicate the potential for the effective application of Meff correction in fNIRS studies with sample sizes smaller than the number of channels.

1.

Introduction

1.1.

fNIRS

Functional near-infrared spectroscopy (fNIRS) is a noninvasive and convenient neuroimaging tool that has gained popularity over the last few decades.14 It measures cerebral hemoglobin concentration changes following neuronal activation by shining NIR light (650 to 950 nm) onto the head and detecting the reflecting light that propagates through biological tissue. Jöbsis5 reported the first noninvasive measurement of living tissue in humans using this technique. Several research groups reported that fNIRS was effective in capturing cerebral hemodynamic responses associated with brain activity.69 Since then, fNIRS has evolved as an established tool in functional neuroimaging and has been applied in various studies beyond the realm of conventional neuroimaging including early developing brains,10 universal everyday brain activities,11 interpersonal social interactions,12 and others.

fNIRS is compact, highly portable, and tolerant to body motion, allowing measurement of brain activity in everyday environments without disturbing the subject’s natural behavior, especially because of its wearability.13 Such flexibility makes fNIRS suitable for a broad range of experimental designs and age groups, extending its use beyond basic research in cognitive neuroscience to diverse fields.

1.2.

Multiple Comparison Problem Due to Increased Number of Channels

Advancements in fNIRS technology have led to significant expansion in the number of channels and measurement points, necessitating multiple comparison corrections to effectively control type I errors. Originally, fNIRS measurement began with single-channel systems, comprising just one transmitter and one receiver. Later, due to light interference between adjacent channels, wide spacing on the scalp was necessary, limiting the number of channels to only a few. To overcome this limitation, Maki et al.14 introduced multichannel fNIRS that included a frequency encoding system, enabling simultaneous monitoring of multiple brain regions. Since then, with the progression of multichannelization, the number of channels used for standard fNIRS studies has gradually increased to several dozen. Also, whole-head measurements using more than 100 channels have been implemented.15 Furthermore, diffuse optical tomography (DOT), an advanced form of fNIRS, estimates the signal source by integrating both short- and long-distance channels and makes it possible to reconstruct a three-dimensional (3D) image of the functional hemodynamic response.1620 With this ability to reconstruct continuous image data, a significantly large number of channels compared to the number of measurement points could be handled.

Advancements in the multichannelization of fNIRS has enabled measurements of a wide range of brain regions. However, also due to this advancement, addressing the issue of multiple comparisons in statistical hypothesis testing in fNIRS analysis has become necessary. In standard fNIRS analysis, statistical hypothesis tests, such as the t-test and the analysis of variance (ANOVA), are conducted based on summary statistics obtained from first-level analysis to determine whether the activation level in a particular cognitive state is significant. With multichannel fNIRS, the multiplicity is equal to the number of channels, as a null hypothesis is set for each channel. Thus, as the number of channels increases, so does the risk of type I errors (false positive), in which at least one correct null hypothesis among all hypotheses is rejected. In other words, there is a risk of erroneously treating one or more nonactivated channels as activated channels. Therefore, the risk of type I errors must be controlled as family-wise errors (FWEs) in multichannel fNIRS analyses.

1.3.

Multiple Comparison Problem in fNIRS Analysis

fNIRS data are often represented as channel-wise data, where the multiplicity is equal to the number of channels. When conducting multiple comparisons across multiple channels, family-wise error rate (FWER) can be calculated using the significance level (α) and the number of channels (M), as shown in the following equation:

Eq. (1)

αFWE=1(1α)M.

Here, assuming α=0.05 and M=52 (a typical setting for 52 channels in a single-factor fNIRS analysis), the risk of false positives increases, resulting in αFWE0.93.

The most typical control for FWE is the Bonferroni correction, which adjusts the significance α to αBonf by dividing it by M to achieve αFWE0.05:

Eq. (2)

αBonf=αM.

In the example above, this results in αBonf0.00096 (M=52, αFWE0.0499), effectively suppressing type I errors. However, as M increases, the Bonferroni method can be too stringent, thereby increasing the risk of type II errors or false negatives.

One alternative to the conservative Bonferroni correction is Holm’s correction, which utilizes a step-down approach to enhance increased statistical power.21,22 Another alternative is the false discovery rate (FDR) method that targets the proportion of false positives among all significant findings.23 The FDR-based procedure can yield more statistical power than the Bonferroni method and be more robust against variations in the number of channels within regions of interest (ROIs).24

Although these methods yield greater statistical power compared to the Bonferroni method, they all begin with the same multiplicity, which is equal to the number of channels, in order to control type I errors for the most active channels.25 In other words, Bonferroni correction, Holm’s correction, and FDR methods require at least one test to exhibit a probability of significance lower than α/M. It is crucial to recognize that in multichannel fNIRS data, channels are not completely independent due to the correlations between them. Treating each channel as independent can lead to an overestimation of FWEs. Therefore, applying these methods without consideration of channel correlation might result in overcorrection.

1.4.

Effective Multiplicity (Meff)

Uga et al.25 demonstrated that effective multiplicity (Meff) can be an effective approach to fNIRS analysis, accounting for correlations between channels. The Meff correction method was originally developed by Cheverud26 for multiple-testing corrections in genetic studies. In the Meff approach, eigenvalue decomposition is applied to a correlation matrix derived from a dataset with inherent correlations. This process yields eigenvalues that reflect the magnitude of correlations between each data point. These eigenvalues are used to estimate Meff, which represents the number of independent tests. Consequently, α is corrected by Meff instead of M:

Eq. (3)

αMeff=αMeff.

Here we will describe the theoretical framework of the Meff method for a typical fNIRS data structure. It is crucial to recognize that the Meff method, which was originally invented for genetic data, has been modified to fit the fNIRS data structure. For multichannel fNIRS data obtained from M channels across N subjects, summary data for group analysis is represented as βM×N. Then from the βM×N correlation matrix (M×M), the eigenvalue vector (λi) is derived as follows:

Eq. (4)

λ1,λ2,λM.

Previous studies have shown that the total correlation among a dataset can be quantified by the variance of the eigenvalues (Vλ) derived from a correlation matrix. Utilizing this property, Cheverud26 proposed estimating Meff as

Eq. (5)

Meff=1+(M1)(1VλM),Vλ=i=1M(λi1)2M1.

This equation accounts for two extreme situations. When tests are completely independent, each eigenvalue equals 1, resulting in the equation Meff=M. Conversely, when tests are completely identical, the primary eigenvalue is M, and all subsequent eigenvalues are 0, leading to Meff=1. Although Cheverud’s26 equation accurately estimates Meff at these two extremes, it tends to overestimate Meff in intermediate situations, leading to excessively conservative results.

Following a modification by Li and Ji,27 however, Galwey28 proposed a generalized function that overcomes this objection:

Eq. (6)

Meff=(i=1Mλi)2/i=1Mλi.

Uga et al.25 adopted Galwey’s function because this method was optimized for multiple signals with strong correlations and could be applied continuously. Applying the Meff correction to three kinds of experimental data with different activation patterns and performing resampling simulations resulted in the Meff values being 10 to 15 out of 44 channels of data.25 This indicates that the Meff approach provides an effective correction for multichannel fNIRS data.

1.5.

Rank Deficiency Problem in Meff

Although the Meff approach is beneficial for channel-wise fNIRS statistical analysis, there is a nonnegligible concern about the impact of sample size (N) on its reliability. Typically, fNIRS studies involve relatively small N. This often results in situations where N is smaller than M. Specifically, when N is smaller than the number of variables (i.e., N<M), the correlation matrix can have a maximum of N1 nonzero eigenvalues, with the remaining eigenvalues being zero. Therefore, in the context of fNIRS statistical analysis, applying the Meff method to data where N is smaller than M can lead to an underestimation of Meff due to the rank deficiency of the eigenvalues. When α is corrected for an underestimated Meff, the risk of type II errors increases as a result of the less stringent correction. This issue emphasizes the need for a reevaluation of the application of Meff correction to multichannel fNIRS analysis.

1.6.

Objective

Inspired by the challenges of rank deficiency, our study focused on evaluating the effectiveness of the Meff method in multichannel fNIRS data with a small N. We prepared four different sets of experimental data, each with different activation profiles, and conducted a two-step verification process comprising experiment 1 and experiment 2. For all datasets, N was greater than M, which enabled us to evaluate the validity of Meff for a wide range of N/M ratios. In experiment 1, we explored the relationship between N and Meff through random resampling simulations. In experiment 2, we applied a model to this relationship and performed simulations to examine the feasibility of estimating valid Meff from a small N. Based on these results, we discuss whether the Meff approach can be applied for FWE correction in fNIRS data with a small N.

2.

Experiment 1

2.1.

Methods

2.1.1.

Experimental data

In this study, we used four sets of experimental data with different activation profiles and N exceeding M (Table 1). Each dataset was obtained from our previous fNIRS experiments and they collectively provide data for a variety of participants performing a variety of cognitive tasks. They were also used in experiment 2. Below, we will describe the respective experimental procedures and participants. All participants, or their guardians in cases where participants were minors, provided informed consent, and each experiment was approved by the ethics committees of Chuo University (all tasks) and/or Jichi Medical University Hospital and the International University of Health and Welfare (placebo) and complied with the latest version of the Declaration of Helsinki.

Table 1

Summary of experimental data.

Experimental dataNMSummary dataParticipant profile
Go/No-go task6644Average values of oxy-Hb signalTypically developing children
Verification of the placebo effect11644Average values of oxy-Hb signalChildren with ADHD
Word translation task8852β-valuesHealthy adults
Stroop task5952β-valuesHealthy adults

2.1.2.

Go/No-go task

The participant sample for the Go/No-go task was 66 right-handed, typically developing children (38 boys and 28 girls, average age=8.3±1.8, age range 6 to 14 years). Inhibition-related cortical activation was measured during a Go/No-go task. In this study, fNIRS measurements were conducted using 44 channels. The experimental design, preprocessing, and calculation of summary data were consistent with previous studies.2931 The procedure consisted of 6 block sets, containing alternating go (baseline) and Go/No-go (target) blocks, each block lasting 24 s. In the go block, participants were presented with a random sequence of two pictures and were asked to press a button for both pictures. In the Go/No-go block, participants were presented with a no-go picture, 50% of the time, requiring them to respond to half of the trials (go trials) and inhibit their response to the other half (no-go trials). From the preprocessed time series data, channel-wise and participant-wise contrasts were computed as the summary data by calculating the intertrial mean of differences between the oxygenated hemoglobin (oxy-Hb) signals for target periods (4 to 24 s after the Go/No-go block onset) and baseline periods (14 to 24 s after the go-block onset).

2.1.3.

Verification of placebo effect

Participants for the verification of placebo effects sample were 116 right-handed children with attention deficit hyperactivity disorder (ADHD) (92 boys and 24 girls, average age=8.1±1.9, range 6 to 14). Data were extracted from published studies,29,30,32,33 and a detailed description will be published elsewhere (in preparation). Data obtained from randomized, double-blind, crossover, placebo-controlled design trials using methylphenidate (MPH), or atomoxetine (ATX) were analyzed. Participants were examined twice, with an interval of at least 4 days but within 30 days. On each examination day, participants completed two sessions: one before medication (active drug or placebo) administration and the other at 90 min after medication. Those who were administered an active drug on the first day were administered a placebo on the second day, whereas those who were administered a placebo on the first day were administered an active drug on the second day. Placebo effects were assessed by examining brain activation during the Go/No-go task. In this study, fNIRS measurements were conducted using 44 channels. The experimental design, preprocessing, and calculation of summary data were consistent with those described above. To assess the placebo effect, the intraplacebo contrast, which is the difference between post- and preadministration contrasts for placebo participants, was calculated.

2.1.4.

Word translation task

Participants that did the word translation task were 88 healthy right-handed Japanese young adults (15 participants were excluded; 42 males, and 46 females, average averageage=20.0±1.4, age range 18 to 23 years). In this study (submitted), fNIRS measurements were conducted using 52 channels. The experimental design, preprocessing, and calculation of summary data were consistent with a previous study.34 The stimuli were divided into nontranslation baseline blocks and task blocks. There were four task conditions in the task blocks: translation direction (English-into-Japanese/Japanese-into-English)×familiarity(high/low familiarity). In Japanese-into-English task blocks, participants were asked to translate Japanese words written in red into the corresponding English words and to type them. In the English-into-Japanese task blocks, they were asked to translate English words written in red in the Roman alphabet into corresponding Japanese words and to type their translation in the Roman alphabet. Individual timeline data for the oxy-Hb signal of each channel were preprocessed. General linear model (GLM) analysis35 was conducted, and β-values, indicating the degree of activation, for each individual on each channel were used as summary data. For the present experiment, we chose to use data from the “English-into-Japanese/High Familiarity” condition for our analysis because the activation patterns were most similar between groups.

2.1.5.

Stroop task

Participants for the Stroop task were 59 healthy, right-handed, Japanese young adults (2 participants were excluded; 30 males and 29 females, average age=21.8±0.96, age range 20 to 24 years). In this study (in preparation), fNIRS measurements were conducted using 52 channels. Participants were presented with a word stimulus indicating a color that was printed in the same or in a different color. The task had three conditions: congruent, incongruent, and neutral. In the congruent condition, the ink color was consistent with the meaning of the word (e.g., “red” written in red). In the incongruent condition, the ink color was not consistent with the meaning of the word (e.g., “green” written in red). In the neutral condition, participants were only required to name the ink color (e.g., “XXX” written in red) without judging the meaning of a word. There are two types of Stroop tasks in neuroimaging experiments: identifying a color name (task I) and judging the correspondence of an ink color and the meaning of a word (task II). Participants were divided into two groups: one group first engaged in task I and then in task II, whereas the other group started with task II, followed by task I. For task II, a different brain activity was observed between the two groups, suggesting the occurrence of a sequential effect. Consequently, only task 1, where no order effect was observed, was utilized for analysis in this study. For the first level analysis, the individual timeline data for oxy-Hb signal were analyzed. Channels with a signal variation of 10% or less due to defective measurements were excluded from the analysis. After the exclusion, wavelet minimum description length (Wavelet-MDL) was applied to remove the effect of measurement noise, such as breathing and cardiac movement from the remaining channels.36 GLM analysis35 was conducted and the β-values for each individual on each channel were used as summary data. Specifically, the contrast between the incongruent and the neutral conditions was calculated as Stroop interference where a larger contrast indicates greater cognitive interference.

2.1.6.

Resampling simulation

We reanalyzed the four kinds of experimental data described above. The β-values of a GLM or average values of oxy-Hb signals were used as summary data. To elucidate the relationship between N and Meff, we randomly resampled N from the minimum to the maximum and calculated the Meff for each dataset. For each N, resampling was performed 1000 times, and the average value and standard deviation (SD) were calculated. To calculate Meff, we utilized Galwey’s function, as was done in an earlier study.25 The M was fixed based on the actual number used for measurements (44 or 52 channels). We set the minimum N for simulations at 3, due to the requirement of having at least three data points to compute SD, which is essential in calculating the correlation coefficient. These simulations were conducted using MATLAB R2023a (MathWorks, Inc., Natick, Massachusetts, United States).

2.2.

Results

We plotted the average Meff values along with the SD for each N for each dataset (Fig. 1). For each dataset, Meff values displayed a monotonic exponential increase when N was smaller than M. On the other hand, as N surpassed M, the rate of increase gradually decreased and finally converged to a constant value. Specifically, in the datasets derived from the Go/No-go task and the verification of the placebo effect (M=44), the Meff values exhibited a monotonic increase up to approximately N=44. Beyond this point, the rate of increase began to slow down. For the Stroop and word translation tasks (M=52), similar results were observed at around N=52.

Fig. 1

Average Meff values for the total number of N for each type of experimental data: (a) Go/No-go task (M=44), (b) verification of the placebo effect (M=44), (c) word translation task (M=52), and (d) Stroop task (M=52). Error bars indicate SD.

NPh_11_3_035004_f001.png

2.3.

Discussion

Random resampling simulations in this study revealed a consistent pattern of Meff values. We observed that when N is smaller than M, Meff values tend to increase exponentially and monotonically. However, once N exceeds a certain threshold, these increases hit a ceiling, and Meff values begin to converge to a constant value, showing only slight fluctuations even as N continues to increase. Thus when N is smaller than M, Meff is likely to be underestimated due to rank deficiency in calculating eigenvalues, potentially leading to less stringent FWE corrections. On the other hand, when N surpasses M, valid Meff can be obtained as all eigenvalues are included in the calculation.

This pattern, characterized by an initial sharp increase in the dependent variable followed by a convergence at a particular level, corresponds to the behavior of a typical exponential growth model. Based on such a model, it may be feasible to estimate Meff values even when N is smaller than M. In experiment 2, we aimed to estimate valid Meff values from a small N by modeling this observed relationship.

3.

Experiment 2

3.1.

Methods

3.1.1.

Making predictions using a typical exponential model

To predict a valid Meff from a small N, the plots obtained in experiment 1 were modeled. In this experiment, we assumed the following typical exponential model to describe the relationship between N and Meff:

Eq. (7)

y=aebx+c.

This model demonstrates that as x increases, y grows exponentially, and eventually hits a ceiling. In this model, each parameter, a, b, and c, has specific roles: a controls the magnitude of growth, b determines the growth rate, and c sets the upper limit that y approaches as x increases.

Our objective was to estimate a valid Meff by fitting this model to the results of experiment 1 and identifying the parameters a, b, and c. The Meff values at N=M+1, where all eigenvalues were obtained and the rate of increase in Meff values began to decrease in experiment 1, were considered practical upper limits and set as the target values for prediction.

3.1.2.

Assessing model validity

The exponential model, used to explain the relationship between N and Meff proposed earlier, was evaluated for its validity. The model was fitted to the results of experiment 1 (3NM+1) using the nonlinear least square method. We assessed the model’s goodness of fit through the root-mean-squared error (RMSE). The index indicates the extent to which predicted values from the model deviate from the actual observed values. A lower RMSE, closer to 0, signifies a more accurate model.

In addition, when employing an exponential model, a logarithmic transformation can be applied to facilitate the handling of nonlinearity within a linear model framework. If a phenomenon follows the proposed model, its relationship can be transformed into a linear form through the following logarithmic transformations:

Eq. (8)

y=aebx+ccy=aebxln(cy)=bx+lna,
where a, b, and c stand for constants, and y and x correspond to Meff and N, respectively. If a plot of N against ln(cMeff) demonstrates a linear relationship, it is indicative of N and Meff conforming to an exponential. Based on this relationship, we examined N against ln(cMeff), using parameter c derived from fitting the exponential model to the results of experiment 1. The linearity of this relationship was evaluated by fitting a linear model, y=px+q, where p and q stand for constants, and y and x correspond to Meff and N, respectively. Model fitting was conducted using Python’s “curve fit” and “polyfit” functions from the “scipy.optimize” module and “NumPy” library, respectively. Furthermore, the RMSE was calculated using the “mean_squared_error” function from the “sklearn.metrics” library.

3.1.3.

Random resampling and predictive simulations

Using simulations, we tested whether the valid Meff could be predicted from fNIRS datasets when N is smaller than M (Fig. 2). For each dataset, we performed random resampling from N=3 and up (3NM+1), and the Meff values for each N were calculated. Subsequently, an exponential model was fitted to the obtained Meff values to predict the Meff values at N=M+1. This process was replicated 1000 times for each N (3NM+1). The average and SD of the predicted Meff values for each N were plotted for comparison with target values. In addition, we computed the average and SD of the difference between the target and predicted values to represent prediction errors. The ratio of prediction errors to the target value was examined. Similar to the above, MATLAB R2023a (MathWorks, Inc., Natick, Massachusetts, United States) was used for the simulation.

Fig. 2

Resampling and predictive simulation.

NPh_11_3_035004_f002.png

3.2.

Results

3.2.1.

Assessing model validity

For each dataset, the exponential model was fitted to the graph from experiment 1 in the range 3NM+1, and the goodness of fit was calculated (Fig. 3). Within each dataset, the RMSE was found to be notably small: <0.1. Subsequently, utilizing the parameter c obtained from this fitting, the relationship between N and ln(cMeff) was plotted, followed by the fitting of a linear model to this graph (Fig. 4). The RMSE for this linear model was also found to be low: <0.01.

Fig. 3

Exponential model fitting for Meff for each type of experimental data: (a) Go/No-go task (M=44), (b) verification of the placebo effect (M=44), (c) word translation task (M=52), and (d) Stroop task (M=52). The blue dots represent the Meff for each N (3NM+1). The red line indicates the curve resulting from the regression of the exponential model on Meff for each N.

NPh_11_3_035004_f003.png

Fig. 4

Linear model fitting for ln(cMeff) for each type of experimental data: (a) Go/No-go task (M=44), (b) verification of the placebo effect (M=44), (c) word translation task (M=52), and (d) Stroop task (M=52). The blue dots show the values of ln(cMeff) for each N (3NM+1). The red line represents the straight line obtained by regressing a linear model on ln(cMeff) for each N.

NPh_11_3_035004_f004.png

3.2.2.

Resampling and predictive modeling simulations

We graphically represented both the average and SD of the predicted Meff values for each N used in the prediction, along with the Meff values at N=M+1 in experiment 1 (Fig. 5). The average of the difference between predicted and target values was also calculated. The percentage of prediction error against the target values is indicated for each N/M (Fig. 6). As N increased, the average of the predicted values tended to converge toward the target value, and the SD decreased (Fig. 5). For the Go/No-go task data, the percentage decreased to less than 5% at N/M=0.57 (N=25). Similarly, for the verification of the placebo data, this percentage decreased to less than 5% at N/M=0.64 (N=28). For the word translation task data, these percentages decreased to less than 5% at N/M=0.62 (N=32). For the Stroop task data, this percentage decreased to less than 5% at N/M=0.69 (N=36).

Fig. 5

Comparison of predicted values and target values for each type of experimental data: (a) Go/No-go task (M=44), (b) verification of the placebo effect (M=44), (c) word translation task (M=52), and (d) Stroop task (M=52). The blue lines represent average predicted Meff values for each N/M. Error bars indicate SD. The orange lines indicate the true values of Meff of at N=M+1.

NPh_11_3_035004_f005.png

Fig. 6

Average and SD of percentage of the prediction error against target values for each type of experimental data: (a) Go/No-go task (M=44), (b) verification of the placebo effect (M=44), (c) word translation task (M=52), and (d) Stroop task (M=52). Error bars indicate SD.

NPh_11_3_035004_f006.png

3.3.

Discussion

The relationship between N and Meff identified in experiment 1 was approximated with a typical exponential model. In this model, the Meff at N=M+1 was treated as the upper limit of the increasing N, where all eigenvalues were calculated and valid Meff were obtained. The small RMSEs indicate high goodness of fit, and the relationship between N and Meff is well explained by the exponential model. Simulation results using 44 or 52 channel datasets revealed that the average of predicted values converged to the target value of Meff when N ranged from 30 to 40, accompanied by a corresponding decrease in SD. This implies that for multichannel fNIRS data, a 60% to 70% N to M ratio is sufficient to correct for α using reasonable Meff values.

In this experiment, numerous predicted values were generated by repeatedly resampling from the datasets with a large N and then conducting curve fitting based on the plotted relationship between N and Meff. In actual analysis, predicted values of Meff can be derived by fitting the model to curves from a resampling simulation similar to those in experiment 1.

4.

General Discussion

4.1.

Overview

In this study, we investigated the effectiveness of the Meff method for fNIRS data with small N. Experiment 1 exploited resampling simulations using several sets of experimental data with different neural activation profiles to examine the relationship between N and Meff. We found that the Meff values monotonically increase when N is smaller than M. Conversely, Meff values tend to converge when N exceeds M. In datasets with a small N, the impact of rank deficiency in eigenvalues can lead to calculations that underestimate Meff values, posing a risk of insufficient correction. However, in datasets where N exceeds M, all eigenvalues are obtained, allowing for appropriate correction. Experiment 2 attempted to estimate valid Meff by assuming a typical exponential model based on the relationships revealed in experiment 1. The application of the model to the graph produced in experiment 1 resulted in RMSEs of <0.01. The small RMSEs indicated that the model successfully explained the relationship between N and Meff. The simulation involving resampling and prediction showed that even when N is smaller than M, the predicted values are distributed near the target values. Using respective datasets with 44 and 52 channels, the applicability of the Meff correction was demonstrated for N of 60 to 70% relative to M. These results suggest that appropriate Meff correction can be achieved using the typical exponential model even when N is smaller than M.

4.2.

Reevaluation of Meff

The random resampling simulations conducted in experiment 1 indicate potential risks of false positives in previous studies with small N, suggesting a need for a more stringent application of the Meff correction method. However, Meff correction maintains a balance between type I and type II errors compared to the Bonferroni correction that increases in multiplicity with increasing M. In datasets with 44 and 52 channels, the Meff values typically ranged between 20 and 30 when there was a sufficiently large N. This implies that the Meff correction preserves power, probably due to interchannel correlations. These patterns were observed in all datasets, supporting that the Meff approach is a robust correction method regardless of the experimental task, as described in a previous study.25

In this study, we aimed to predict Meff in situations, where N exceeds M, using datasets with 44 and 52 fNIRS channels. Given that typical channel-wise measurements involve 40 to 50 channels, our approach is applicable to current practices. However, since prediction errors are inevitable, it is recommended to ensure a greater N than M when possible. However, even in cases where N is smaller than M, defining an ROI based on previous studies or pilot experiments can ensure the effectiveness of Meff correction. In confirmatory studies with predefined ROIs, the correction for FWE may not be a serious issue due to fewer hypotheses. Conversely, exploratory studies, which require a broader definition of ROI for identifying active channels, might benefit more from our findings, especially in estimating the optimal Meff for datasets where N is smaller than M.

4.3.

Limitations and Future Prospects

In this study, which used datasets with 44 and 52 channels, the exponential model demonstrated the feasibility of applying Meff correction even for N/M ranging from 60% to 70% (N ranging from 30 to 40 participants). The use of 40 to 50 channel settings is common in current channel-wise measurements, suggesting the applicability of our approach. In recent years, however, fNIRS measurements utilizing around 100 channels have been conducted. In these cases, it is anticipated that a larger N than 30 to 40 is required for the simulations described in the current study. In scenarios where the percentage of N to M is notably low, further adjustment is required to prevent underestimation (see Supplementary Material). The current study primarily focused on channel-wise analysis. However, the DOT approach, which reconstructs continuous imaging data, is becoming a mainstream method in fNIRS studies. With DOT data, over 1000 channels are typically defined, far exceeding the number of conventional fNIRS. Moreover, these channels generate a continuous reconstructed image with thousands to million voxels1620 for which distinct statistical considerations are necessary.37 Hence, the furthering of multichannelization imposes a limitation of the exponential model used in this study, suggesting that alternative approaches may be required for future validation.

Moreover, the application of Meff correction explored in this study, as in a previous study,25 is for one-sample t-tests. This is frequently used to test for significant activation in each channel. This approach is also applicable in paired designs where brain activities under different conditions are compared by taking the differences and applying one-sample t-tests. However, in fNIRS studies, two-sample t-tests or ANOVAs may sometimes be more suitable. In unpaired designs, it is impossible to consider differences of summary data. This makes it difficult to provide a sufficiently large N to ensure sufficient Meff correction. In genetic studies, it has been stated that the Meff correction can be applied to both single- and multiple-subject analyses and to multivariate analyses, depending on the approach to the correlation matrix.27,38 The effectiveness of the Meff approach for these statistical tests needs to be verified with fNIRS data. Therefore, this study alone cannot definitively state the effectiveness of the Meff approach in the face of increasing multichannelization and diversification of experimental designs in fNIRS studies.

Furthermore, the Meff method, similar to Bonferroni correction, the Holm correction, and the FDR method, cannot distinguish between functional brain activity and physiological interference. fNIRS signals contain physiological interference from sources, such as respiration, heartbeat, and blood pressure, which are unrelated to neurovascular coupling.3941 Neglecting physiological interference leads to false positives, where the detection of a hemodynamic response is incorrectly attributed to functional brain activity, or false negatives, where brain activity is masked.41 Thus fNIRS signals should be appropriately preprocessed before calculating summary data such as β-values and average values of Hb signals. Although the fNIRS data in this study were preprocessed with Wavelet-MDL,36 methods such as short-channel regression, PCA, and ICA have, in recent years, been employed to remove extraneous scalp hemodynamics.4,41 Further studies are required to determine how the Meff values are affected by these different preprocessing procedures.

Finally, the Meff method is compatible with parametric tests commonly used to analyze fNIRS data, as it employs Pearson’s correlation coefficient. However, due to uncertainties in the distribution of fNIRS signals and variations in responses between participants, parametric model assumptions may not always be assumed. When applying a nonparametric model, a resampling-based approach, such as permutation and bootstrap tests, or the Max-T correction,42 is generally used to control for FWER. In such cases, there is no need to use the Meff method. On the other hand, the Meff method may be applicable when conducting nonparametric tests utilizing rank orders, such as the Wilcoxon rank sum test and Mann–Whitney U test, for each channel. In such cases, the use of Spearman’s rank correlation coefficient is more appropriate. Further studies are needed for validation of the Meff method with these nonparametric tests.

5.

Conclusion

Multichannelization and increasing diversity in experimental designs of fNIRS studies inevitably lead to FWE correction issues in exploratory analyses. Typically, Bonferroni correction and its derivatives are too stringent: The first round of correction always begins by dividing p by the number of channels. Although Meff has been introduced to provide moderate solutions for FWE correction issues, it is not sufficient for small sample size studies due to rank deficiency in eigenvalues. We used simulations with a typical exponential model to explore the possibility of predicting Meff values in regions unaffected by rank deficiency for small N. We concluded that predicted values close to the target values could be obtained in these simulations. This demonstrates the potential applicability of the Meff correction method for fNIRS data with a small N. Thus the Meff correction, taking interchannel correlation into consideration, could serve as a promising alternative to Bonferroni correction and its derivatives even for data from small sample sizes.

Disclosures

There are no conflicts of interest to be disclosed.

Code and Data Availability

The code and sample data needed to conduct this analysis is accessible in our GitHub repository at https://github.com/Dan-brain-Lab

Acknowledgments

This study was partly supported by Grant-in-Aid for Scientific Research (Grant Nos. 22H00681 and 22K18653 to I.D.) from the Japan Society for the Promotion of Science. We would like to thank Ms. Melissa Noguchi (English Language Consultation Services) for proofreading the article. Also, we would like to express our appreciation to Ms. Hiroko Ishida for her administrative support.

References

1. 

M. Ferrari and V. Quaresima, “A brief review on the history of human functional near-infrared spectroscopy (fNIRS) development and fields of application,” NeuroImage, 63 (2), 921 –935 https://doi.org/10.1016/j.neuroimage.2012.03.049 NEIMEF 1053-8119 (2012). Google Scholar

2. 

D. A. Boas et al., “Twenty years of functional near-infrared spectroscopy: introduction for the special issue,” NeuroImage, 85 1 –5 https://doi.org/10.1016/j.neuroimage.2013.11.033 NEIMEF 1053-8119 (2014). Google Scholar

3. 

F. Scholkmann et al., “A review on continuous wave functional near-infrared spectroscopy and imaging instrumentation and methodology,” NeuroImage, 85 6 –27 https://doi.org/10.1016/j.neuroimage.2013.05.004 NEIMEF 1053-8119 (2014). Google Scholar

4. 

M. A. Yücel et al., “Best practices for fNIRS publications,” Neurophotonics, 8 (1), 012101 https://doi.org/10.1117/1.NPh.8.1.012101 (2021). Google Scholar

5. 

F. F. Jöbsis, “Noninvasive, infrared monitoring of cerebral and myocardial oxygen sufficiency and circulatory parameters,” Science, 198 (4323), 1264 –1267 https://doi.org/10.1126/science.929199 SCIEAS 0036-8075 (1977). Google Scholar

6. 

B. Chance et al., “Cognition-activated low-frequency modulation of light absorption in human brain,” Proc. Natl. Acad. Sci. U. S. A., 90 (8), 3770 –3774 https://doi.org/10.1073/pnas.90.8.3770 (1993). Google Scholar

7. 

Y. Hoshi and M. Tamura, “Detection of dynamic changes in cerebral oxygenation coupled to neuronal function during mental work in man,” Neurosci. Lett., 150 (1), 5 –8 https://doi.org/10.1016/0304-3940(93)90094-2 NELED5 0304-3940 (1993). Google Scholar

8. 

T. Kato et al., “Human visual cortical function during photic stimulation monitoring by means of near-infrared spectroscopy,” J. Cereb. Blood Flow Metab., 13 (3), 516 –520 https://doi.org/10.1038/jcbfm.1993.66 (1993). Google Scholar

9. 

A. Villringer et al., “Near infrared spectroscopy (NIRS): a new tool to study hemodynamic changes during activation of brain function in human adults,” Neurosci. Lett., 154 (1–2), 101 –104 https://doi.org/10.1016/0304-3940(93)90181-J NELED5 0304-3940 (1993). Google Scholar

10. 

J. Gervain et al., “Using functional near-infrared spectroscopy to study the early developing brain: future directions and new challenges,” Neurophotonics, 10 (2), 023519 https://doi.org/10.1117/1.NPh.10.2.023519 (2023). Google Scholar

11. 

A. Curtin and H. Ayaz, “The age of neuroergonomics: towards ubiquitous and continuous measurement of brain function with fNIRS,” Jpn. Psychol. Res., 60 (4), 374 –386 https://doi.org/10.1111/jpr.12227 (2018). Google Scholar

12. 

Y. Minagawa, M. Xu and S. Morimoto, “Toward interactive social neuroscience: neuroimaging real-world interactions in various populations,” Jpn. Psychol. Res., 60 (4), 196 –224 https://doi.org/10.1111/jpr.12207 (2018). Google Scholar

13. 

P. Pinti et al., “A review on the use of wearable functional near-infrared spectroscopy in naturalistic environments,” Jpn. Psychol. Res., 60 (4), 347 –373 https://doi.org/10.1111/jpr.12206 (2018). Google Scholar

14. 

A. Maki et al., “Spatial and temporal analysis of human motor activity using noninvasive NIR topography,” Med. Phys., 22 (12), 1997 –2005 https://doi.org/10.1118/1.597496 MPHYA6 0094-2405 (1995). Google Scholar

15. 

A. von Lühmann et al., “Towards neuroscience in the everyday world: progress in wearable fNIRS instrumentation and applications,” in Opt. and the Brain, BM3C-2 (2020). Google Scholar

16. 

D. Chitnis et al., “Functional imaging of the human brain using a modular, fibre-less, high-density diffuse optical tomography system,” Biomed. Opt. Express, 7 (10), 4275 –4288 https://doi.org/10.1364/BOE.7.004275 BOEICL 2156-7085 (2016). Google Scholar

17. 

X. Dai et al., “Fast noninvasive functional diffuse optical tomography for brain imaging,” J. Biophotonics, 11 (3), e201600267 https://doi.org/10.1002/jbio.201600267 (2018). Google Scholar

18. 

M. D. Wheelock, J. P. Culver and A. T. Eggebrecht, “High-density diffuse optical tomography for imaging human brain function,” Rev. Sci. Instrum., 90 (5), https://doi.org/10.1063/1.5086809 RSINAK 0034-6748 (2019). Google Scholar

19. 

E. E. Vidal-Rosas et al., “Wearable, high-density fNIRS and diffuse optical tomography technologies: a perspective,” Neurophotonics, 10 (2), 023513 https://doi.org/10.1117/1.NPh.10.2.023513 (2023). Google Scholar

20. 

Z. E. Markow et al., “Ultra-high density imaging arrays for diffuse optical tomography of human brain improve resolution, signal-to-noise, and information decoding,” (2023). Google Scholar

21. 

S. Holm, “A simple sequentially rejective multiple test procedure,” Scand. J. Stat., 6 (2), 65 –70 SJSADG 0303-6898 (1979). Google Scholar

22. 

Y. Minagawa-Kawai et al., “Neural attunement processes in infants during the acquisition of a language-specific phonemic contrast,” J. Neurosci., 27 (2), 315 –321 https://doi.org/10.1523/JNEUROSCI.1984-06.2007 JNRSDS 0270-6474 (2007). Google Scholar

23. 

Y. Benjamini and Y. Hochberg, “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” J. R. Stat. Soc.: Ser. B (Methodol.), 57 (1), 289 –300 https://doi.org/10.1111/j.2517-6161.1995.tb02031.x (1995). Google Scholar

24. 

A. K. Singh and I. Dan, “Exploring the false discovery rate in multichannel NIRS,” NeuroImage, 33 (2), 542 –549 https://doi.org/10.1016/j.neuroimage.2006.06.047 NEIMEF 1053-8119 (2006). Google Scholar

25. 

M. Uga et al., “Exploring effective multiplicity in multichannel functional near-infrared spectroscopy using eigenvalues of correlation matrices,” Neurophotonics, 2 (1), 015002 https://doi.org/10.1117/1.NPh.2.1.015002 (2015). Google Scholar

26. 

J. M. Cheverud, “A simple correction for multiple comparisons in interval mapping genome scans,” Heredity, 87 (1), 52 –58 https://doi.org/10.1046/j.1365-2540.2001.00901.x HDTYAT 0018-067X (2001). Google Scholar

27. 

J. Li and L. Ji, “Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix,” Heredity, 95 (3), 221 –227 https://doi.org/10.1038/sj.hdy.6800717 HDTYAT 0018-067X (2005). Google Scholar

28. 

N. W. Galwey, “A new measure of the effective number of tests, a practical tool for comparing families of non-independent significance tests,” Genet. Epidemiol., 33 (7), 559 –568 https://doi.org/10.1002/gepi.20408 (2009). Google Scholar

29. 

Y. Monden et al., “Clinically-oriented monitoring of acute effects of methylphenidate on cerebral hemodynamics in ADHD children using fNIRS,” Clin. Neurophysiol., 123 (6), 1147 –1157 https://doi.org/10.1016/j.clinph.2011.10.006 CNEUFU 1388-2457 (2012). Google Scholar

30. 

Y. Monden et al., “Right prefrontal activation as a neuro-functional biomarker for monitoring acute effects of methylphenidate in ADHD children: an fNIRS study,” NeuroImage: Clin., 1 (1), 131 –140 https://doi.org/10.1016/j.nicl.2012.10.001 (2012). Google Scholar

31. 

M. Nagashima et al., “Acute neuropharmacological effects of atomoxetine on inhibitory control in ADHD children: a fNIRS study,” NeuroImage: Clin., 6 192 –201 https://doi.org/10.1016/j.nicl.2014.09.001 (2014). Google Scholar

32. 

T. Tokuda et al., “Methylphenidate-elicited distinct neuropharmacological activation patterns between medication-naive attention deficit hyperactivity disorder children with and without comorbid autism spectrum disorder: a functional near-infrared spectroscopy study,” Neuropsychiatry, 8 (3), 917 –929 https://doi.org/10.4172/Neuropsychiatry.1000418 NNNEEB (2018). Google Scholar

33. 

S. Sutoko et al., “Distinct methylphenidate-evoked response measured using functional near-infrared spectroscopy during Go/No-Go Task as a supporting differential diagnostic tool between attention-deficit/hyperactivity disorder and autism spectrum disorder comorbid children,” Front. Hum. Neurosci., 13 7 https://doi.org/10.3389/fnhum.2019.00007 (2019). Google Scholar

34. 

K. Shinozuka et al., “Language familiarity and proficiency leads to differential cortical processing during translation between distantly related languages,” Front. Hum. Neurosci., 15 https://doi.org/10.3389/fnhum.2021.593108 (2021). Google Scholar

35. 

M. Uga et al., “Optimizing the general linear model for functional near-infrared spectroscopy: an adaptive hemodynamic response function approach,” Neurophotonics, 1 (1), 015004 https://doi.org/10.1117/1.NPh.1.1.015004 (2014). Google Scholar

36. 

K.-E. Jang et al., “Wavelet minimum description length detrending for near-infrared spectroscopy,” J. Biomed. Opt., 14 (3), 034004 https://doi.org/10.1117/1.3127204 JBOPFO 1083-3668 (2009). Google Scholar

37. 

S. Tak and J. C. Ye, “Statistical analysis of fNIRS data: a comprehensive review,” NeuroImage, 85 72 –91 https://doi.org/10.1016/j.neuroimage.2013.06.016 NEIMEF 1053-8119 (2014). Google Scholar

38. 

D. R. Nyholt, “A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other,” Am. J. Hum. Genet., 74 (4), 765 –769 https://doi.org/10.1086/383251 (2004). Google Scholar

39. 

E. Kirilina et al., “The physiological origin of task-evoked systemic artefacts in functional near infrared spectroscopy,” NeuroImage, 61 (1), 70 –81 https://doi.org/10.1016/j.neuroimage.2012.02.074 NEIMEF 1053-8119 (2012). Google Scholar

40. 

F. Scholkmann et al., “End-tidal CO2: an important parameter for a correct interpretation in functional brain studies using speech tasks,” NeuroImage, 66 71 –79 https://doi.org/10.1016/j.neuroimage.2012.10.025 NEIMEF 1053-8119 (2013). Google Scholar

41. 

I. Tachtsidis and F. Scholkmann, “False positives and false negatives in functional near-infrared spectroscopy: issues, challenges, and the way forward,” Neurophotonics, 3 (3), 031405 https://doi.org/10.1117/1.NPh.3.3.031405 (2016). Google Scholar

42. 

A. K. Singh et al., “Scope of resampling-based tests in fNIRS neuroimaging data analysis,” Stat. Sin., 18 1519 –1534 STSNEO (2008). Google Scholar

Biography

Yuki Yamamoto is currently a master’s student at the Applied Cognitive Neuroscience Laboratory, Chuo University, Tokyo, Japan. He received his BSc degree from Chuo University in 2024. His main research interest is the development of functional near-infrared spectroscopy (fNIRS) methodology.

Wakana Kawai earned her master’s degree in engineering from Chuo University, where she is currently pursuing her PhD. Her research interests include processing methods for fNIRS and linguistic processing among second-language learners.

Tatsuya Hayashi earned his PhD in mathematical sciences in 2017 from the University of Tokyo. He worked in the Faculty of Information Science and Technology at Hokkaido University as a specially appointed assistant professor until 2022. He is currently a junior associate professor in the Faculty of Science and Engineering at Yamato University and a visiting associate professor with the Research and Development Initiative at Chuo University.

Minako Uga earned her PhD from Tsukuba University in 2010. She is a professor at the Health Science University, Japan. Her research interests focus on the methodological development of fNIRS data analysis and neuroplasticity.

Yasushi Kyutoku earned his PhD in experimental psychology and is an institute professor with the Research and Development Initiative, Chuo University, Japan. He is currently working on sense of security from an affective engineering perspective, psychological adjustment after natural disasters, food cognition, and consumer psychology with regard to hospitality.

Ippeita Dan graduated from the International Christian University in 1993 and earned his PhD from the University of Tokyo, Japan, in 2002. He was a senior research fellow at the National Food Research Institute and an associate professor at Jichi Medical University. Now, he is a professor at Chuo University, Japan. He is pursuing translational neuroscience mainly utilizing fNIRS with an emphasis on neuromarketing applications. He has been on the SfNIRS Board of Directors since 2016.

CC BY: © The Authors. Published by SPIE under a Creative Commons Attribution 4.0 International License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI.
Yuki Yamamoto, Wakana Kawai, Tatsuya Hayashi, Minako Uga, Yasushi Kyutoku, and Ippeita Dan "Adjusting effective multiplicity (Meff) for family-wise error rate in functional near-infrared spectroscopy data with a small sample size," Neurophotonics 11(3), 035004 (27 July 2024). https://doi.org/10.1117/1.NPh.11.3.035004
Received: 31 March 2024; Accepted: 19 June 2024; Published: 27 July 2024
Advertisement
Advertisement
KEYWORDS
Simulations

Error analysis

Data modeling

Brain

Neurophotonics

Statistical analysis

Near infrared spectroscopy

Back to Top