-
Notifications
You must be signed in to change notification settings - Fork 99
Description
I believe that computefeats2
tries to calculate correlation coefficients between the input data and the mixing matrix (see below) by variance normalizing the data and using np.linalg.lstsq
. It then crops extreme values (<-0.999 and >0.999) and converts the correlation coefficients to z-values. I assumed that the cropping was to prevent large z-values with correlation coefficients approaching 1 and -1. However, it doesn't look like normalization is doing what we want, and the "correlation coefficients" can in fact end up quite large. I added in a couple of lines to print out the max and min values of data_R
before cropping, and got values as high as +/- 12 with our five-echo test dataset.
If I'm right, then this is a bug, although I don't recall if computefeats2
is used for anything beyond generating component weight maps.
NOTE: It is used to compute WTS in fitmodels_direct
, which can impact computed metrics. The mixing matrix is normalized in tedica
, so there is no meaningful impact on ICA component maps or metrics, but this isn't done for tedpca
, so there are small differences between metrics and more noticeable differences between component maps. The tedpca
component maps almost look binarized because so many voxels are cropped.
Lines 310 to 318 in 1bc32e4
# get betas of `data`~`mmix` and limit to range [-0.999, 0.999] | |
data_R = get_coeffs(data_vn, mmix, mask=None) | |
data_R[data_R < -0.999] = -0.999 | |
data_R[data_R > 0.999] = 0.999 | |
# R-to-Z transform | |
data_Z = np.arctanh(data_R) | |
if data_Z.ndim == 1: | |
data_Z = np.atleast_2d(data_Z).T |
However, before I try fixing this, I want to make sure that I'm interpreting the function correctly. Is anyone familiar enough with this function to take a look?
BTW, to fix the issue (and get betas equivalent to correlation coefficients), I would do the following:
data_z = stats.zscore(data[mask], axis=-1)
mmix_z = stats.zscore(mmix, axis=0)
# get betas of `data`~`mmix` and limit to range [-0.999, 0.999]
data_R = get_coeffs(data_z, mmix_z, mask=None)
Activity
tsalo commentedon Feb 18, 2019
If we're interested in getting z-scores for the betas, then I think the following is an appropriate replacement for
computefeats2
:tsalo commentedon Mar 3, 2019
The problem with my proposed approach above is that test statistics for parameter estimates seem to be meaningless when the degrees of freedom are low. The t-statistics end up being grossly inflated (often in the millions to billions with dof = 1), and the z-statistics then end up being binarized at +/- 8 (the max value).
Here are a couple of possible solutions:
np.arctanh(r)
just makes the correlation coefficient normally distributed, but we later compare those values to z-value thresholds, so we definitely need the test statistics. We need to donp.arctanh(r) * np.sqrt(n_trs - 3)
for it to make any sense.jbteves commentedon May 23, 2019
Could you elaborate on point 1? If it's good but computationally intense, we can see if we're clever enough to find a way around that problem.
tsalo commentedon May 26, 2019
I believe that it would look something like this [completely untested code]:
UPDATE: This takes about 4 minutes per iteration with 160 components. To build a null distribution, I think we'd want at least 1000 iterations.
tsalo commentedon Jul 11, 2019
@smoia You've been digging into MELODIC's code a bit. Do you know how they convert their component maps into z-statistics?
tsalo commentedon Jul 18, 2019
If we just use more aggressive dimensionality reduction in the PCA step prior to metric calculation, then the properly calculated statistics should be valid. We can do this by setting
n_components
to be a proportion of the variance. I think somewhere between 0.95 and 0.99 should remove enough low-variance components to give us reasonable degrees of freedom.tsalo commentedon Jul 27, 2019
If we want to go with the permutation approach, it looks like nilearn.mass_univariate.permuted_ols does what I was thinking, so it's an issue that's been dealt with already I guess. It would add a lot of time to the workflow, but we could maybe support it as an option.
1 remaining item
stale commentedon Jan 2, 2020
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions to tedana:tada: !
jbteves commentedon Jan 2, 2020
This issue isn't as stale as you think, bot.
tsalo commentedon Jul 13, 2020
This issue should be somewhat resolved by our shift to MA-PCA methods (see #178 (comment)), and we decided to refactor the regression function instead of using a nonparametric approach like
permuted_ols
.We still need to do the refactor, which includes converting z-values to z-statistics.
tsalo commentedon Nov 9, 2020
This is coming up as well in the BrainHack Donostia project
ica-aroma-reorg
, where we need a solid version ofcomputefeats2
to reproduce MELODIC's outputs.smoia commentedon Nov 11, 2020
Still need the contribution? I can look into it!
tsalo commentedon Dec 30, 2020
I think that the current solution is available here. @CesarCaballeroGaudes @smoia do you know what, if anything, still needs to be done to get it working?
EDIT: I'm happy to handle merge conflicts and any documentation tuning that needs to happen if the core code is working.
[-]Calculating coefficients in computefeats2[/-][+]computefeats2 does not calculate z-statistics accurately[/+]smoia commentedon Jan 13, 2021
Hello, sorry for the delay!
@tsalo I think the correct implementation is here.
I can open a PR and see what's left to do to merge, what do you think?
tsalo commentedon Jan 13, 2021
That would be amazing. Thank you!