Investment Studio > Expressions > Functions > Statistical > LOGEST

float array[*][*] logest(float array known_ys, float array known_xs = {1, 2, 3 [, ...]}, boolean fit_constant = TRUE, boolean statistics = FALSE)

Returns the least square fit of the exponential surface in N dimensions (independent variables)

  N  
y(x1, x2, x3, ... xN) = b P mk xk
 
 
  k = 1  

to a set of known y(x) pairs.

Input

known_ys is the set of known y values. If it contains a single column (row), each column (row) in known_xs is interpreted as containing the values of a separate independent variable. Note that one possible application of this feature is to put different powers of the same independent variable in different columns (rows).

If known_ys contains multiple columns and rows, known_xs is interpreted as containing the values of a single independent variable. Corresponding elements in the two arrays are then determined by order of appearance (reading from left to right, top to bottom).

known_xs is the set of known x values. It can contain one or more independent variables. If only one independent variable is used, known_xs and known_ys can have any shape(s) as long as they contain the same number of elements; corresponding elements in the two arrays are then determined by order of appearance (reading from left to right, top to bottom). If more than one independent variable is used, known_ys must be a column (row) vector, and known_xs must contain a column (row) for each independent variable.

known_xs may be omitted, in which case it defaults to the array {1, 2, 3, ...} with the same number of elements as known_ys.

fit_constant is used to specify how the factor b of the exponential surface fit is to be computed. If fit_constant = TRUE, the value of b returned by the normal least square fit is used. If fit_constant = FALSE, b is forced to = 1 and the mk parameters are adjusted accordingly to still fit the known y values.

If fit_constant is omitted, it defaults to TRUE.

statistics is used to specify if extra statistics are to be appended to the coefficients of the fit returned by the function.

If statistics is omitted, it defaults to FALSE.

All array elements are converted to float, with exclusion if conversion fails.

Output

The result array is headed by a single row containing the N + 1 coefficients of the least square fit in "reverse" order: the last variable (mN) is in the first column, the factor b in the last column.

If statistics = TRUE, this section is followed by extra statistics about the equivalent linear fit ln(y) = xN ln(mN) + ... x1 ln(m1) + ln(b):

A row containing the standard errors of the natural logarithms of the coefficients of the fit (listed in the same order as the coefficients). The standard error of the factor b is included only if fit_constant = TRUE.

These values (along with the number of degrees of freedom, listed two rows below them) can be used to estimate how likely each coefficient is to encode useful information rather than just noise:

significance level = tdist(abs(ln(coefficient) / ln(standard error)), degrees of freedom, 1)

As usual in statistics, the smaller the significance level, the better (confidence level = 1 - significance level), with 5% being a common rejection threshold.

A row containing the coefficient of determination (first column) and the standard error (estimated standard deviation) of the known ln(y) values (second column).

Important: the standard error reported here is NOT a valid estimate of the error in a ln(y) value computed from the fit! To compute the latter, add up the variances of all terms contributing to ln(y), take the square root of the result and optionally multiply by confidence(significance level, 1, 1). See linest for an example.

The coefficient of determination is a real number Î [0, 1] obtained by dividing the regression sum of squares (see below) by the total sum of squares (the sum of the squares of the differences between the known ln(y) values and their arithmetic average). It can be interpreted as the proportion of the variance in the dependent variable attributable to the independent variables. A value of 0 means that the linear fit has no predictive power; a value of 1 means that there is perfect agreement between actual and predicted ln(y) values.

For a single independent variable, the coefficient of determination reduces to the square of the Pearson product moment correlation coefficient (see function rsq).

A row containing the F statistic (first column) and the number of degrees of freedom (second column).

These values can be used to estimate the overall significance level of the observed relationship between ln(y) and the natural logarithms of the independent variable(s), i.e. the probability that it's occurring by chance:

significance level = fdist(F statistic, degrees of freedom, number of independent variables)

Generally speaking, the larger the F statistic, the smaller the probability of the observed relationship being due to chance.

A row containing the regression sum of squares (first column) and the residual sum of squares (second column).

The regression sum of squares is computed as the difference between the total sum of squares (the sum of the squares of the differences between the known ln(y) values and their arithmetic average) and the residual sum of squares.

The residual sum of squares is the sum of the differences between actual and estimated ln(y) values in each known point.

The following illustration summarizes the layout of the output array:

Example

Given y values {1, 2, 4, 16, 256} and x values {2, 3, 4, 5, 6},

=logest({1, 2, 4, 16, 256}, {2, 3, 4, 5, 6})

returns (with rounding to two decimals) {3.73, 0.04}, meaning that the exponential least square fit to the data is y = 0.04 * 3.73x.

See also correl, forecast, growth, intercept, linest, slope, steyx, trend.