computefeats2 does not calculate z-statistics accurately

I believe that `computefeats2` tries to calculate correlation coefficients between the input data and the mixing matrix (see below) by variance normalizing the data and using `np.linalg.lstsq`. It then crops extreme values (<-0.999 and >0.999) and converts the correlation coefficients to z-values. I assumed that the cropping was to prevent large z-values with correlation coefficients approaching 1 and -1. However, it doesn't look like normalization is doing what we want, and the "correlation coefficients" can in fact end up quite large. I added in a couple of lines to print out the max and min values of `data_R` before cropping, and got values as high as +/- 12 with our five-echo test dataset.

If I'm right, then this is a bug, although I don't recall if `computefeats2` is used for anything beyond generating component weight maps. 

**NOTE**: It is used to compute WTS in `fitmodels_direct`, which can impact computed metrics. The mixing matrix is normalized in `tedica`, so there is no meaningful impact on ICA component maps or metrics, but this isn't done for `tedpca`, so there are small differences between metrics and more noticeable differences between component maps. The `tedpca` component maps almost look binarized because so many voxels are cropped.

https://github.com/ME-ICA/tedana/blob/1bc32e46445f1d8c5b69ba2000280e2f8f591321/tedana/model/fit.py#L310-L318

However, before I try fixing this, I want to make sure that I'm interpreting the function correctly. Is anyone familiar enough with this function to take a look?

BTW, to fix the issue (and get betas equivalent to correlation coefficients), I would do the following:
```python
data_z = stats.zscore(data[mask], axis=-1)
mmix_z = stats.zscore(mmix, axis=0)

# get betas of `data`~`mmix` and limit to range [-0.999, 0.999]
data_R = get_coeffs(data_z, mmix_z, mask=None)
```

	# get betas of `data`~`mmix` and limit to range [-0.999, 0.999]
	data_R = get_coeffs(data_vn, mmix, mask=None)
	data_R[data_R < -0.999] = -0.999
	data_R[data_R > 0.999] = 0.999

	# R-to-Z transform
	data_Z = np.arctanh(data_R)
	if data_Z.ndim == 1:
	data_Z = np.atleast_2d(data_Z).T

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

computefeats2 does not calculate z-statistics accurately #178

1 remaining item

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

computefeats2 does not calculate z-statistics accurately #178

Description

Activity

tsalo commented on Feb 18, 2019

tsalo commented on Mar 3, 2019

jbteves commented on May 23, 2019

tsalo commented on May 26, 2019

tsalo commented on Jul 11, 2019

tsalo commented on Jul 18, 2019

tsalo commented on Jul 27, 2019

1 remaining item

stale commented on Jan 2, 2020

jbteves commented on Jan 2, 2020

tsalo commented on Jul 13, 2020

tsalo commented on Nov 9, 2020

smoia commented on Nov 11, 2020

tsalo commented on Dec 30, 2020

smoia commented on Jan 13, 2021

tsalo commented on Jan 13, 2021

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions