Investment Studio > Expressions > Functions > Statistical > CORREL
float correl(float array[rows][cols] a, float array[rows][cols] b, boolean nonbiased = FALSE)
Returns the correlation coefficient of the data in arrays a and b.
Arrays with different dimensions are allowed as long as they all have the same number of elements (e.g. 2 x 3 and 3 x 2 arrays). Corresponding elements are determined by order of appearance (reading from left to right, top to bottom).
All array elements are converted to float and excluded if conversion fails.
If nonbiased = TRUE, arrays a and b are assumed to contain samples of larger populations, causing their sample covariance and sample standard deviations to be used for the computation of the correlation function (see covar, stdev). If nonbiased = FALSE, arrays a and b are assumed to contain entire populations, causing their population covariance and population standard deviations to be used (see covar, stdevp).
If nonbiased is omitted, it defaults to FALSE.
The correlation coefficient ra,b of a and b is a number Î [-1, 1] reflecting the extent of a linear relationship between the two data sets. It's defined as
ra,b = Cov(a, b) / (sa x sb)
where Cov(a, b) is the covariance and sa, sb are the standard deviations. A correlation of 1 means that a and b track each other exactly; a correlation of 0 means that there is no discernible linear relation between a and b (the evolution of b can't be predicted using a linear function of a, and vice versa); a correlation of -1 means that every move in a is exactly matched by an opposite move in b.
Examples
=correl({1, 2, 3, 4}, {10, 11, 12, 13})
= 1;
=correl({1, 2, 3, 4}, {10, 9, 8, 7})
= -1;
=correl({1, 2, 3, 4}, {3, 1, 10, 0})
= 0.
Given two assets identified by asset index symbols _asset_1 and _asset_2, the correlation of their daily closes over the date range [from_date, to_date] can be obtained by first synchronizing the quotes for the two assets,
_synched = sync(asset_quotes(_asset_1, "c", from_date, to_date), asset_quotes(_asset_2, "c", from_date, to_date), false)
and then applying correl to the resulting quote table:
=correl(index(array(_synched), 0, 2), index(array(_synched), 0, 3))
A particularly interesting variation on this theme involves looking for correlations between time-shifted quotes (the ideal being a strong correlation between _asset_1 on day T and _asset_2 on day T + lead, where the lead is sufficiently long to be tradeable). For instance,
_lead_days = 5
_synched_rows = rows(array(_synched))
=correl(subarray(array(_synched), 1, _synched_rows - _lead_days, 2, 2), subarray(array(_synched), _lead_days + 1, _synched_rows, 3, 3))
will return the correlation between _asset_1 on day T and _asset_2 on day T + 5.
Note that the leading quotes could also be produced by an indicator function.
Correlations between time-shifted quotes of the same asset (autocorrelation) imply periodicity, and are best studied using spectral analysis (see fftp, mem).
See also covar, intercept, linest, pearson, rsq, slope, stdev, stdevp, fisher, fisherinv.