Estimation error definition - Исправление ошибок и поиск оптимальных решений проблем

1
estimation error
1. погрешность оценки
2. ошибка оценки
Англо-русский словарь нормативно-технической терминологии > estimation error
2
estimation error

Англо-русский экономический словарь > estimation error
3
estimation error

English-Russian base dictionary > estimation error
4
estimation error

Большой англо-русский и русско-английский словарь > estimation error
5
estimation error

Англо-русский словарь технических терминов > estimation error
6
estimation error

Универсальный англо-русский словарь > estimation error
7
estimation error

English-Russian electronics dictionary > estimation error
8
estimation error

The New English-Russian Dictionary of Radio-electronics > estimation error
9
estimation error

English-Russian dictionary of computer science and programming > estimation error
10
estimation error

Англо-русский словарь по экономике и финансам > estimation error
11
estimation error

Англо-русский словарь по авиационной медицине > estimation error
12
estimation error

Англо-русский словарь по психоаналитике > estimation error
13
estimation error

English-Russian dictionary of technical terms > estimation error
14
estimation error

English-russian dctionary of contemporary Economics > estimation error
15
estimation error

English-russian dctionary of diplomacy > estimation error
16
estimation error

ошибка оценки; погрешность оценки

Англо-русский словарь по машиностроению > estimation error
17
estimation error

Авиасловарь > estimation error
18
estimation error

English-Russian dictionary of computer science > estimation error
19
estimation error

прогн. погрешность оценивания

Англо-русский словарь по исследованиям и ноу-хау > estimation error
20
estimation error

ошибка оценивания, погрешность оценивания

The English-Russian dictionary on reliability and quality control > estimation error

Страницы

См. также в других словарях:

estimation error — įverčio paklaida statusas T sritis Standartizacija ir metrologija apibrėžtis Parametro įvertinimo paklaida, išreiškiama taip: (T – θ); čia T – įvertis, θ – tikroji parametro vertė. atitikmenys: angl. estimation error; estimator error vok.… … Penkiakalbis aiškinamasis metrologijos terminų žodynas
estimation error — vertinimo paklaida statusas T sritis fizika atitikmenys: angl. estimation error vok. Schätzungsfehler, m rus. погрешность оценки, f pranc. erreur d’estimation, f … Fizikos terminų žodynas
Estimation theory — is a branch of statistics and signal processing that deals with estimating the values of parameters based on measured/empirical data. The parameters describe an underlying physical setting in such a way that the value of the parameters affects… … Wikipedia
Estimation of covariance matrices — In statistics, sometimes the covariance matrix of a multivariate random variable is not known but has to be estimated. Estimation of covariance matrices then deals with the question of how to approximate the actual covariance matrix on the basis… … Wikipedia
Estimation de mouvement — L estimation de mouvement ou Motion estimation(en) est un procédé qui consiste à étudier le déplacement des objets dans une séquence vidéo, en cherchant la corrélation entre deux images successives afin de prédire le changement de position du… … Wikipédia en Français
error — errorless, adj. errorlessly, adv. /er euhr/, n. 1. a deviation from accuracy or correctness; a mistake, as in action or speech: His speech contained several factual errors. 2. belief in something untrue; the holding of mistaken opinions. 3. the… … Universalium
estimation — /es teuh may sheuhn/, n. 1. judgment or opinion: In my estimation the boy is guilty. 2. esteem; respect. 3. approximate calculation; estimate: to make an estimation of one s expenditures. [1325 75; ME estimacioun < MF < L aestimation (s. of… … Universalium
estimation d’erreur — paklaidos įvertis statusas T sritis automatika atitikmenys: angl. error estimate vok. Fehlerschätzung, f rus. оценка погрешности, f pranc. estimation d erreur, f … Automatikos terminų žodynas
error estimate — paklaidos įvertis statusas T sritis automatika atitikmenys: angl. error estimate vok. Fehlerschätzung, f rus. оценка погрешности, f pranc. estimation d erreur, f … Automatikos terminų žodynas
Software development effort estimation — is the process of predicting the most realistic use of effort required to develop or maintain software based on incomplete, uncertain and/or noisy input. Effort estimates may be used as input to project plans, iteration plans, budgets, investment … Wikipedia
Mean squared error — In statistics, the mean squared error (MSE) of an estimator is one of many ways to quantify the difference between values implied by a kernel density estimator and the true values of the quantity being estimated. MSE is a risk function,… … Wikipedia

Источник

The standard error of the estimate is a way to measure the accuracy of the predictions made by a regression model.

Often denoted σ_est, it is calculated as:

σ_est = √Σ(y – ŷ)²/n

where:

y: The observed value
ŷ: The predicted value
n: The total number of observations

The standard error of the estimate gives us an idea of how well a regression model fits a dataset. In particular:

The smaller the value, the better the fit.
The larger the value, the worse the fit.

For a regression model that has a small standard error of the estimate, the data points will be closely packed around the estimated regression line:

Conversely, for a regression model that has a large standard error of the estimate, the data points will be more loosely scattered around the regression line:

The following example shows how to calculate and interpret the standard error of the estimate for a regression model in Excel.

Example: Standard Error of the Estimate in Excel

Use the following steps to calculate the standard error of the estimate for a regression model in Excel.

Step 1: Enter the Data

First, enter the values for the dataset:

Step 2: Perform Linear Regression

Next, click the Data tab along the top ribbon. Then click the Data Analysis option within the Analyze group.

If you don’t see this option, you need to first load the Analysis ToolPak.

In the new window that appears, click Regression and then click OK.

In the new window that appears, fill in the following information:

Once you click OK, the regression output will appear:

We can use the coefficients from the regression table to construct the estimated regression equation:

ŷ = 13.367 + 1.693(x)

And we can see that the standard error of the estimate for this regression model turns out to be 6.006. In simple terms, this tells us that the average data point falls 6.006 units from the regression line.

We can use the estimated regression equation and the standard error of the estimate to construct a 95% confidence interval for the predicted value of a certain data point.

For example, suppose x is equal to 10. Using the estimated regression equation, we would predict that y would be equal to:

ŷ = 13.367 + 1.693*(10) = 30.297

And we can obtain the 95% confidence interval for this estimate by using the following formula:

95% C.I. = [ŷ – 1.96*σ_est, ŷ + 1.96*σ_est]

For our example, the 95% confidence interval would be calculated as:

95% C.I. = [ŷ – 1.96*σ_est, ŷ + 1.96*σ_est]
95% C.I. = [30.297 – 1.96*6.006, 30.297 + 1.96*6.006]
95% C.I. = [18.525, 42.069]

Additional Resources

How to Perform Simple Linear Regression in Excel
How to Perform Multiple Linear Regression in Excel
How to Create a Residual Plot in Excel

Источник

To estimate the error in a measurement, we need to know the expected or standard value and compare how far our measured values deviate from the expected value. The absolute error, relative error, and percentage error are different ways to estimate the errors in our measurements.

Error estimation can also use the mean value of all the measurements if there is no expected value or standard value.

The mean value

To calculate the mean, we need to add all measured values of x and divide them by the number of values we took. The formula to calculate the mean is:

Let’s say we have five measurements, with the values 3.4, 3.3, 3.342, 3.56, and 3.28. If we add all these values and divide by the number of measurements (five), we get 3.3764.

As our measurements only have two decimal places, we can round this up to 3.38.

Estimation of errors

Here, we are going to distinguish between estimating the absolute error, the relative error, and the percentage error.

Estimating the absolute error

To estimate the absolute error, we need to calculate the difference between the measured value x0 and the expected value or standard xref:

Imagine you calculate the length of a piece of wood. You know it measures 2.0m with a very high precision of ± 0.00001m. The precision of its length is so high that it is taken as 2.0m. If your instrument reads 2.003m, your absolute error is | 2.003m-2.0m | or 0.003m.

Estimating the relative error

To estimate the relative error, we need to calculate the difference between the measured value x0 and the standard value xref and divide it by the total magnitude of the standard value xref:

Using the figures from the previous example, the relative error in the measurements is | 2.003m-2.0m | / | 2.0m | or 0.0015. As you can see, the relative error is very small and has no units.

Estimating the percentage error

To estimate the percentage error, we need to calculate the relative error and multiply it by one hundred. The percentage error is expressed as ‘error value’%. This error tells us the deviation percentage caused by the error.

Using the figures from the previous example, the percentage error is 0.15%.

What is the line of best fit?

The line of best fit is used when plotting data where one variable depends on another one. By its nature, a variable changes value, and we can measure the changes by plotting them on a graph against another variable such as time. The relationship between two variables will often be linear. The line of best fit is the line that is closest to all the plotted values.

Some values might be far away from the line of best fit. These are called outliers. However, the line of best fit is not a useful method for all data, so we need to know how and when to use it.

Obtaining the line of best fit

To obtain the line of best fit, we need to plot the points as in the example below:

Fig. 1 — Data plotted from several measurements showing variation on the y-axis

Here, many of our points are dispersed. However, despite this data dispersion, they appear to follow a linear progression. The line that is closest to all those points is the line of best fit.

When to use the line of best fit

To be able to use the line of best fit, the data need to follow some patterns:

The relationship between the measurements and the data must be linear.
The dispersion of the values can be large, but the trend must be clear.
The line must pass close to all values.

Data outliers

Sometimes in a plot, there are values outside the normal range. These are called outliers. If the outliers are fewer in number than the data points following the line, the outliers can be ignored. However, outliers are often linked to errors in the measurements. In the image below, the red point is an outlier.

Fig. 2 — Data plotted from several measurements showing variation on the y-axis in green and an outlier in pink

Drawing the line of best fit

To draw the line of best fit, we need to draw a line passing through the points of our measurements. If the line intersects with the y-axis before the x-axis, the value of y will be our minimum value when we measure.

The inclination or slope of the line is the direct relationship between x and y, and the larger the slope, the more vertical it will be. A large slope means that the data changes very fast as x increases. A gentle slope indicates a very slow change of the data.

Figure 3 — The line of best fit is shown in pink, with the slope being shown in light green

Calculating uncertainty in a plot

In a plot or a graph with error bars, there can be many lines passing between the bars. We can calculate the uncertainty of the data using the error bars and the lines passing between them. See the following example of three lines passing between values with error bars:

Fig. 4 — Plot showing uncertainty bars and three lines passing between them. The blue and purple lines begin at the extreme values of the uncertainty bars

How to calculate the uncertainty in a plot

To calculate the uncertainty in a plot, we need to know the uncertainty values in the plot.

Calculate two lines of best fit.
The first line (the green one in the image above) goes from the highest value of the first error bar to the lowest value of the last error bar.
The second line (red) goes from the lowest value of the first error bar to the highest value of the last error bar.
Calculate the slope m of the lines using the formula below.
For the first line, y2 is the value of the point minus its uncertainty, while y1 is the value of the point plus its uncertainty. The values x2 and x1 are the values on the x-axis.
For the second line, y2 is the value of the point plus its uncertainty, while y1 is the value of the point minus its uncertainty. The values x2 and x1 are the values on the x-axis.
You add both results and divide them by two:

Let’s look at an example of this, using temperature vs time data.

Calculate the uncertainty of the data in the plot below.

Figure 6. Plot showing uncertainty bars and three lines passing between them. The red and green lines begin at the extreme values of the uncertainty bars. Source: Manuel R. Camacho, StudySmarter.

The plot is used to approximate the uncertainty and calculate it from the plot.

Time (s)	20	40	60	80
Temperature in Celsius	84.5 ± 1	87 ± 0.9	90.1 ± 0.7	94.9 ± 1

To calculate the uncertainty, you need to draw the line with the highest slope (in red) and the line with the lowest slope (in green).

In order to do this, you need to consider the steeper and the less steep slopes of a line that passes between the points, taking into account the error bars. This method will give you just an approximate result depending on the lines you choose.

You calculate the slope of the red line as below, taking the points from t=80 and t=60.

You now calculate the slope of the green line, taking the points from t=80 and t=20.

Now you subtract the slope of the green one (m2) from the slope of the red one (m1) and divide by 2.

As our temperature measurements take only two significant digits after the decimal point, we round the result to 0.06 Celsius.

Estimation of Errors — Key takeaways

You can estimate the errors of a measured value by comparing it to a standard value or reference value.
The error can be estimated as an absolute error, a percentage error, or a relative error.
The absolute error measures the total difference between the value you expect from a measurement (X0) and the obtained value (Xref), equal to the absolute value difference of both Abs = | Xo-Xref |.
The relative and percentage errors measure the fraction of the difference between the expected value and the measured value. In this case, the error is equal to the absolute error divided by the expected value rel = Abs / Xo for the relative error, and divided by the expected value and expressed as a percentage for the percentage error per = (Abs / Xo) * 100. You must add the percentage symbol for percentage errors.
You can approximate the relationship between your measured values using a linear function. This approximation can be made simply by drawing a line, which must be the line that passes closest to all values (the line of best fit).

Источник

Correlation and Regression

Andrew F. Siegel, Michael R. Wagner, in Practical Business Statistics (Eighth Edition), 2022

The Standard Error of Estimate: How Large Are the Prediction Errors?

The standard error of estimate, denoted S_e here (but often denoted S in computer printouts), tells you approximately how large the prediction errors (residuals) are for your data set in the same units as Y. How well can you predict Y? The answer is to within about S_e above or below.¹⁶ Because you usually want your forecasts and predictions to be as accurate as possible, you would be glad to find a small value for S_e. You can interpret S_e as a standard deviation in the sense that if you have a normal distribution for the prediction errors, then you will expect about two-thirds of the data points to fall within a distance S_e either above or below the regression line. Also, about 95% of the data values should fall within 2S_e, and so forth. This is illustrated in Fig. 11.2.10 for the production cost example.

Fig. 11.2.10. The standard error of estimate, S_e, indicates approximately how much error you make when you use the predicted value for Y (on the least-squares line) instead of the actual value of Y. You may expect about two-thirds of the data points to be within S_e above or below the least-squares line for a data set with a normal linear relationship, such as this one.

The standard error of estimate may be found using the following formulas:

Standard Error of Estimate

Se=SY(1−r2)n−1n−2(forcomputation)=1n−2∑i=1n[Yi−(a+bXi)]2(forinterpretation)

The first formula shows how S_e is computed by reducing S_Y according to the correlation and sample size. Indeed, S_e will usually be smaller than S_Y because the line a + bX summarizes the relationship and therefore comes closer to the Y values than does the simpler summary, Y¯. The second formula shows how S_e can be interpreted as the estimated standard deviation of the residuals: The squared prediction errors are averaged by dividing by n − 2 (the appropriate number of degrees of freedom when two numbers, a and b, have been estimated), and the square root undoes the earlier squaring, giving you an answer in the same measurement units as Y.

For the production cost data, the correlation was found to be r = 0.869193, the variability in the individual cost numbers is S_Y = $389.6131, and the sample size is n = 18. The standard error of estimate is therefore

Se=SY(1−r2)n−1n−2=389.6131(1−0.8691932)18−118−2=389.6131(0.0244503)1716=389.61310.259785=$198.58

This tells you that, for a typical week, the actual cost was different from the predicted cost (on the least-squares line) by about $198.58. Although the least-squares prediction line takes full advantage of the relationship between cost and number produced, the predictions are far from perfect.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128200254000117

Detection of a Trend in Population Estimates

William L. Thompson, … Charles Gowan, in Monitoring Vertebrate Populations, 1998

5.2 VARIANCE COMPONENTS

In this section, we discuss sources of variation that must be considered to make inferences from data when trying to detect trends. Three sources of variation must be considered: sampling variation, temporal variation in the population dynamics process, and spatial variation in the dynamics of the population across space. The latter two sources often are referred to as process variation, i.e., variation in the population dynamics process associated with environmental variation (such as rainfall, temperature, community succession, fires, or elevation). Methods to separate process variation from sampling variation will be presented.

Detection of a trend in a population’s size requires at least two abundance estimates. For example, if the population size of Mexican spotted owls in Mesa Verde National Park is determined as 50 pairs in 1990, and as only 10 pairs in 1995, we would be concerned that a significant negative trend in the population exists during this time period, and that action must be taken to alleviate the trend. However, if the 1995 estimate was 40 pairs, we might still be concerned, but would be less confident that immediate action is required. Two sources of variation must be assessed before we are confident of our inference from these estimates.

The first source of variation is the uncertainty we have in our population estimates. We want to be sure that the two estimates are different, i.e., the difference between the two estimates is greater than would be expected from chance alone because of the sampling errors associated with each estimate. Typically, we present our uncertainty in our estimate as its variance, and use this variance to generate a confidence interval for our estimate. Suppose that the 1990 estimate of Nˆ90 = 50 pairs has a sampling variance of Vaˆr(Nˆ90)=25. Then, under the assumption of the estimate being normally distributed with a large sample size (i.e., large degrees of freedom), we would compute a 95% confidence interval as 50 ± 1.96 25, or 40.2–59.8. If the 1995 estimate was Nˆ95 = 40 with a sampling variance of Vâr(Nˆ95) = 20, then the 95% confidence interval for this estimate is 40 ± 1.96 20, or 31.2–48.8. Based on the overlap of the two confidence intervals (Fig. 5.2), we would conclude that by chance alone, these two estimates are probably not different. We also could compute a simple test as

Figure 5.2. The 95% confidence intervals plotted with the 1990 and 1995 population estimates.

(5.3)z=Nˆ90−Nˆ95Vaˆr(Nˆ90)+Vaˆr(Nˆ95),

which for this example results in z = 1.491, with a probability of observing a z statistic this large or larger of P = 0.136. Although we might be alarmed, the chances are that 13.6 times out of 100 we would observe this large of a change just by random chance.

A variation of the previous test is commonly conducted for several reasons: (1) we often are interested in the ratio of two population estimates (rather than the difference) because a ratio represents the rate of change of the population, (2) the variance of Nˆ is usually linked to its estimate by Vaˆr(Nˆ)=NˆC (e.g., Skalski and Robson, 1992, pp. 28–29), and (3) ln(Nˆ) is more likely to be normally distributed than Nˆ. Fortuitously, a log transformation provides some correction to all three of the above reasons and results in a more efficient statistical procedure. Because

(5.4)Var[ln(Nˆ)]=Var(Nˆ)Nˆ2,

we construct the z test as

(5.5)z=ln(Nˆ90)−ln(Nˆ95)Vaˆr[ln(Nˆ90)]+Vaˆr[ln(Nˆ95)]

to provide a more efficient (i.e., more powerful) test.

Suppose we had made a much more intensive effort in sampling the owl population, so that the sampling variances were one-half of the values observed (which would generally take about 4 times the effort). Thus, Vâr(Nˆ90) = 12.5 and Vâr(Nˆ95) = 10, giving a z statistic of 2.108 with probability value of P = 0.035. Now, we would conclude that the owl population was lower in 1995 than in 1990, and that this difference is unlikely due to variation in our samples, i.e., that an actual reduction in population size has taken place.

This leads us to the second variance component associated with determining whether a trend in the population is important. We would expect the size of the owl population (and any other population, for that matter) to fluctuate through time. How can we determine if this reduction is important? The answer lies in determining what the variation in the owl population has been for some period of time in the past, and then if the observed reduction is outside the range expected from this past fluctuation. Consider the example in Fig. 5.3, where the true population size (no sampling variation) is plotted. The population fluctuates around a mean of 50, but values more extreme than the range 40 to 60 are common. Note that a decline from 76 to 29 pairs occurred from 1984 to 1985, and that declines from over 50 pairs to under 40 pairs are fairly common occurrences. Thus, based on our previous example, a decline from 50 to 40 is not at all unreasonable given the past population dynamics of this hypothetical population.

Figure 5.3. Actual number of pairs of owls that exist each year. In reality, we never know these values, and can only estimate them.

To determine the level of change in population size that should receive our attention and suggest management action, we need to know something about the temporal variation in the population. The only way to estimate this variance component is to observe the population across a number of years. The exact number of years will depend on the magnitude of the temporal variation. Thus, if the population does not change much from year to year, a few observations will show this consistency. On the other hand, if the population fluctuates a lot, as in Fig. 5.3, many years of observations are needed to estimate the temporal variance. For the example in Fig. 5.3, we could compute the temporal variance as the variance of the 15 years. We find a variance of 265.7, or a standard deviation of 16.3 (Example 5.1). With a SD of 16.3, we would expect roughly 95% of the population values to be in the range of ±2 SD of the mean population size. This inference is based on the population being stable, i.e., not having an upward or downward trend, and being roughly normally distributed. For a normal distribution, 95% of the values lie in the interval ±2 SD of the mean. Therefore, a change of 2 SD, or 32.6, is not a particularly big change given the temporal variation observed over the 15-year period. Such a change should occur with probability greater than 1/20, or 0.05.

A complicating problem with estimating the temporal variance of a population’s size is that we are seldom allowed to observe the true value of the population size. Rather, we are required to sample the population, and hence only obtain an estimate of the population size each year, with its associated sampling variance. Thus, we would need to include the 95% confidence bars on the annual estimates. As a result of this uncertainty from our sampling procedure, we would conclude that many of the year-to-year changes were not really changes because the estimates were not different. This complication leads to a further problem. If we compute the variance with the usual formula when estimates of population size replace the actual population size shown in Fig. 5.3, we obtain a variance estimate larger than the true temporal variance because our sampling uncertainty is included in the variance. For low levels of sampling effort each year, we would have a high sampling variance associated with each estimate, and as a result, we would have a high variance across years. The noise associated with our low sampling intensity would suggest that the population is fluctuating widely, when in fact the population could be constant (i.e., temporal variance is zero), and the estimated changes in the population are just due to sampling variance.

This mixture of sampling and temporal variation becomes particularly important in population viability analysis (PVA). The objective of a PVA is to estimate the probability of extinction for a population, given current size, and some idea of the variation in the population dynamics (i.e., temporal variation). If our estimate of temporal variation includes sampling variation, and the level of effort to obtain the estimates is relatively low, the high sampling variation causes our naive estimate of temporal variation to be much too large. When we apply our PVA analysis with this inflated estimate of temporal variance, we conclude that the population is much more likely to go extinct than it really is, and hence the importance of separating sampling variation from process variation.

Typically, we estimate variance components with analysis of variance (ANOVA) procedures. For the example considered here, we would have to have at least two estimates of population size for a series of years to obtain valid estimates of sampling and temporal variation. Further, typical ANOVA techniques assume that the sampling variation is constant, and so do not account for differences in levels of effort, or the fact that sampling variance is usually a function of population size. For our example, we have an estimate of sampling variance for each of our estimates, obtained from the population estimation methods considered in this manual. That is, capture–recapture, mark–resight, line transects, removal methods, and quadrat counts all produce estimates of sampling variation. Thus, we do not want to estimate sampling variation by obtaining replicate estimates, but want to use the available estimate. Therefore, we present a method of moments estimator developed in Burnham et al. (1987, Part 5). Skalski and Robson (1992, Chapter 2) also present a similar procedure, but do not develop the weighted estimator presented here.

Example 5.1 Population Size, Estimates, Standard Error of the Estimates, and Confidence Intervals for Owl Pairs in Fig. 5.3

			Standard
Year	Population	Estimate	error	Lower 95% CI	Upper 95% CI
1980	44	40.04	5.926	28.42	51.66
1981	48	50.51	11.004	28.94	72.08
1982	61	61.36	15.278	31.42	91.31
1983	48	47.6	11.062	25.92	69.28
1984	76	95.51	18.988	58.3	132.72
1985	29	33.81	8.803	16.56	51.06
1986	60	34.39	5.804	23.01	45.76
1987	59	38.52	11.168	16.63	60.41
1988	76	84.57	21.312	42.8	126.34
1989	42	30.04	6.918	16.48	43.6
1990	29	20.29	7.529	5.54	35.05
1991	68	68.42	17.969	33.2	103.64
1992	42	45.51	13.225	19.6	71.44
1993	27	27.01	6.137	14.98	39.04
1994	72	71.12	14.511	42.67	99.56
1995	54	51.45	8.054	35.66	67.24

The variance of the n = 16 populations is 265.628, whereas the variance of the 16 estimates is 450.376. Sampling variation causes the estimates to have a larger variance than the actual population. The difference of these two variances is an estimate of the sampling variation, i.e., 450.376 – 265.628 = 184.748. The square root of 184.748 is 13.592, and is the approximate mean of the 16 reported standard errors.

To obtain an unbiased estimate of the temporal variance, we must remove the sampling variation from the estimate of the total variance. Define σtotal2 as the total variance, estimated for n = 16 estimates of owl pairs (Nˆi, i = 1980, …, 1995) as

(5.6)σˆtotal2=Σi=19801995(Nˆi−N¯)2(n−1)=Σi=19801995Nˆi2(Σi−19801985Nˆi)2n(n−1),

where the symbol indicates the estimate of the parameter. Thus, Nˆi are the estimates of the actual populations, N_i, and σˆtotal2 is an estimate of the total variance σˆi2 For each estimate, Nˆi, we also have an associated sampling variance, σˆi2. Then, a simple estimator of the temporal variance, σ²_time, is given by

(5.7)σˆtime2=σˆtotal2−Σi=19801995σˆi2n,

when we can assume that all of the sampling variances, σˆi2, are equal. The above equation corresponds to Eq. (2.6) of Skalski and Robson (1992). When the σˆi2 cannot all be assumed to be equal, a more complex calculation is required (Burnham et al., 1987, Section 4.3) because each estimate must be weighted by its sampling variance. We take as the weight of each estimate the reciprocal of the sum of temporal variance plus the sampling variance, 1/(σˆtime2+σˆi2). That is, Var(Nˆi)=σˆtime2+σˆi2, so wi=1/Var(Nˆi)=1/(σˆtime2+σˆi2). Then, the weighted total variance is computed as

(5.8)σˆtotal2=Σi=19801995wi(Nˆi−N¯)2(n−1)Σi=19801995wi

with the mean of the estimates now computed as a weighted mean,

(5.9)N¯=Σi=19801995wiNˆiΣi=19801995wi.

We now know that the theoretical variance N¯ is

(5.10)Var(N¯)=Var(Σi=19801995wiNˆiΣi=19801995wi)=1Σi=19801995wi

and the empirical variance estimator is Eq. (5.8). Setting these two equations equal,

(5.11)1Σi=19801995wi=Σi=19801995wi(Nˆi−N¯)2(n−1)Σi=19801995wi

(5.12)1=Σi=19801995wi(Nˆi−N¯)2(n−1).

Because we cannot solve for σˆtime2 directly, we have to use an iterative numerical approach to estimate σˆtime2 This procedure involves substituting values of σˆtime2 into Eq. (5.12) via the w_i until the two sides are equal. When both sides are the same, we have our estimate of σˆtime2. Using this estimate of σˆtime2, we can now decide what level of change in Nˆi to Nˆi+1 is important and deserves attention. If the change from a series of estimates is greater than 2σˆtime2, we may want to take action.

Typically, we do not have the luxury of enough background data to estimate σˆtime2, so we end up trying to evaluate whether a series of estimated population sizes is in fact signaling a decline in the population when both sampling and process variance are present. Note that just because we see a decline of the estimates for 3–4 consecutive years, we cannot be sure that the population is actually in a serious decline without knowledge of the mean population size and the temporal variation prior to the decline. Usually, however, we do not have good knowledge of the population size prior to some observed decline, and make a decision to act based on biological perceptions. Keep in mind the kinds of trends displayed in Fig. 5.1. Is the suggested trend part of a cycle, or are we observing a real change in population size? In this discussion, we have only considered temporal variation. A similar procedure can be used to separate spatial variation from sampling variation.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780126889604500058

Multiple Regression

Andrew F. Siegel, Michael R. Wagner, in Practical Business Statistics (Eighth Edition), 2022

Typical Prediction Error: Standard Error of Estimate

Just as for simple regression, with only one X, the standard error of estimate indicates the approximate size of the prediction errors. For the magazine ads example, S_e = $106,072. This tells you that actual page costs for these magazines are typically within about $106,072 from the predicted page costs, in the sense of a standard deviation. That is, if the error distribution is normal, then you would expect about two-thirds of the actual page costs to be within S_e of the predicted page costs, about 95% to be within 2S_e, and so forth.

The standard error of estimate, S_e = $106,072, indicates the remaining variation in page costs after you have used the X variables (audience, percent male, and median income) in the regression equation to predict page costs for each magazine. Compare this to the ordinary univariate standard deviation, S_Y = $163,549, for the page costs, computed by ignoring all the other variables. This standard deviation, S_Y, indicates the remaining variation in page costs after you have used only Y¯ to predict the page costs for each magazine. Note that S_e = $106,072 is smaller than S_Y = $163,549; your errors are typically smaller if you use the regression equation instead of just Y¯ to predict page costs. This suggests that the X variables are helpful in explaining page costs.

Think of the situation this way. If you knew nothing of the X variables, you would use the average page costs (Y¯=$187,628) as your best guess, and you would be wrong by about S_Y = $163,549. But if you knew the audience, percent male readership, and median reader income, you could use the regression equation to find a prediction for page costs that would be wrong by only S_e = $106,072. This reduction in prediction error (from $163,549 to $106,072) is one of the helpful payoffs from running a regression analysis.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128200254000129

Multiple Regression

Gary Smith, in Essential Statistics, Regression, and Econometrics, 2012

Confidence Intervals for the Coefficients

If the error term is normally distributed and satisfies the four assumptions detailed in the simple regression chapter, the estimators are normally distributed with expected values equal to the parameters they estimate:

a∼N[α, standard deviation of a]bi∼N[βi, standard deviation of bi]

To compute the standard errors (the estimated standard deviations) of these estimators, we need to use the standard error of estimate (SEE) to estimate the standard deviation of the error term:

(10.3)SEE=∑(Y−Y^)2n−(k+1)

Because n observations are used to estimate k + 1 parameters, we have n − (k + 1) degrees of freedom. After choosing a confidence level, such as 95 percent, we use the t distribution with n − (k + 1) degrees of freedom to determine the value t* that corresponds to this probability. The confidence interval for each coefficient is equal to the estimate plus or minus the requisite number of standard errors:

(10.4)a±t*(standard error ofa)bi±t*(standard error ofbi)

For our consumption function, statistical software calculates SEE = 59.193 and these standard errors:

standard error ofa=27.327standard error ofb1=0.019standard error ofb2=0.003

With 49 observations and 2 explanatory variables, we have 49 − (2 + 1) = 46 degrees of freedom. Table A.2 gives t* = 2.013 for a 95 percent confidence interval, so that 95 percent confidence intervals are

α:a±t*(standard error ofa)=−110.126±2.013(27.327)=−110.126±55.010β1:b1±t*(standard error ofb1)=0.798±2.013(0.019)=0.798±0.039β2:b2±t*(standard error ofb2)=0.026±2.013(0.003)=0.026±0.006

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123822215000106

Multiple Regression

Gary Smith, in Essential Statistics, Regression, and Econometrics (Second Edition), 2015

Confidence Intervals for the Coefficients

a∼N[α,standarddeviationofa]bi∼N[βistandarddeviationofbi]

To compute the standard errors (the estimated standard deviations) of these estimators, we need to use the standard error of estimate (SEE) to estimate the standard deviation of the error term:

(10.5)SEE=∑(y−yˆ)2n−(k+1)

Because n observations are used to estimate k + 1 parameters, we have n − (k + 1) degrees of freedom. After choosing a confidence level, such as 95 percent, we use the t distribution with n − (k + 1) degrees of freedom to determine the value t^∗ that corresponds to this probability. The confidence interval for each coefficient is equal to the estimate plus or minus the requisite number of standard errors:

(10.6)a±t∗(standarderrorofa)bi±t∗(standarderrorofbi)

For our consumption function, statistical software calculates SEE = 59.193 and these standard errors:

standarderrorofa=27.327standarderrorofb1=0.019standarderrorofb2=0.003

With 49 observations and two explanatory variables, we have 49 − (2 + 1) = 46 degrees of freedom. Table A.2 gives t^∗ = 2.013 for a 95 percent confidence interval, so that 95 percent confidence intervals are:

α:a±t∗(standarderrorofa)=−110.126±2.013(27.327)=−110.126±55.010β1:b1±t∗(standarderrorofb1)=0.798±2.013(0.019)=0.798±0.039β2:b2±t∗(standarderrorofb2)=0.026±2.013(0.003)=0.026±0.006

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128034590000108

Simple Regression

Gary Smith, in Essential Statistics, Regression, and Econometrics (Second Edition), 2015

Abstract

The simple regression model assumes a linear relationship, Y = α + βX + ε, between a dependent variable Y and an explanatory variable X, with the error term ε encompassing omitted factors. The least squares estimates a and b minimize the sum of squared errors when the fitted line is used to predict the observed values of Y. The standard error of estimate (SEE) is our estimate of the standard deviation of the error term. The standard errors of the estimates a and b can be used to construct confidence intervals for α and β and test null hypotheses, most often that the value of β is zero (Y and X are not linearly related). The coefficient of determination R² compares the model’s sum of the squared prediction errors to the sum of the squared deviations of Y about its mean, and can be interpreted as the fraction of the variation in the dependent variable that is explained by the regression model. The correlation coefficient is equal to the square root of R².

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B978012803459000008X

Bootstrap Method

K. Singh, M. Xie, in International Encyclopedia of Education (Third Edition), 2010

Approximating Standard Error of a Sample Estimate

Let us suppose, information is sought about a population parameter θ. Suppose θˆ is a sample estimator of θ based on a random sample of size n, that is, θˆ is a function of the data (X₁, X₂, …,X_n). In order to estimate standard error of θˆ, as the sample varies over the class of all possible samples, one has the following simple bootstrap approach.

Computeθ1*,θ2*,…,θN*, using the same computing formula as the one used for θˆ, but now base it on N different bootstrap samples (each of size n). A crude recommendation for the size N could be N = n² (in our judgment), unless n² is too large. In that case, it could be reduced to an acceptable size, say nlogen. One defines

SEB(θˆ)=[(1/N)∑i=1N(θi*−θˆ)2]1/2

following the philosophy of bootstrap: replace the population by the empirical population.

An older resampling technique used for this purpose is Jackknife, though bootstrap is more widely applicable. The famous example where Jackknife fails while bootstrap is still useful is that of θˆ = the sample median.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780080448947013099

Pearson, Karl

M. Eileen Magnello, in Encyclopedia of Social Measurement, 2005

The Biometric School

Although Pearson’s success in attracting such large audiences in his Gresham lectures may have played a role in encouraging him to further develop his work in biometry, he resigned from the Gresham Lectureship due to his doctor’s recommendation. Following the success of his Gresham lectures, Pearson began to teach statistics to students at UCL in October 1894. Not only did Galton’s work on his law of ancestral heredity enable Pearson to devise the mathematical properties of the product– moment correlation coefficient (which measures the relationship between two continuous variables) and simple regression (used for the linear prediction between two continuous variables) but also Galton’s ideas led to Pearson’s introduction of multiple correlation and part correlation coefficients, multiple regression and the standard error of estimate (for regression), and the coefficient of variation. By then, Galton had determined graphically the idea of correlation and regression for the normal distribution only. Because Galton’s procedure for measuring correlation involved measuring the slope of the regression line (which was a measure of regression instead), Pearson kept Galton’s “r” to symbolize correlation. Pearson later used the letter b (from the equation for a straight line) to symbolize regression. After Weldon had seen a copy of Pearson’s 1896 paper on correlation, he suggested to Pearson that he should extend the range for correlation from 0 to +1 (as used by Galton) so that it would include all values from −1 to +1.

Pearson achieved a mathematical resolution of multiple correlation and multiple regression, adumbrated in Galton’s law of ancestral heredity in 1885, in his seminal paper Regression, Heredity, and Panmixia in 1896, when he introduced matrix algebra into statistical theory. (Arthur Cayley, who taught at Cambridge when Pearson was a student, created matrix algebra by his discovery of the theory of invariants during the mid-19th century.) Pearson’s theory of multiple regression became important to his work on Mendel in 1904 when he advocated a synthesis of Mendelism and biometry. In the same paper, Pearson also introduced the following statistical methods: eta (η) as a measure for a curvilinear relationship, the standard error of estimate, multiple regression, and multiple and part correlation. He also devised the coefficient of variation as a measure of the ratio of a standard deviation to the corresponding mean expressed as a percentage.

By the end of the 19th century, he began to consider the relationship between two discrete variables, and from 1896 to 1911 Pearson devised more than 18 methods of correlation. In 1900, he devised the tetrachoric correlation and the phi coefficient for dichotomous variables. The tetrachoric correlation requires that both X and Y represent continuous, normally distributed, and linearly related variables, whereas the phi coefficient was designed for so-called point distributions, which implies that the two classes have two point values or merely represent some qualitative attribute. Nine years later, he devised the biserial correlation, where one variable is continuous and the other is discontinuous. With his son Egon, he devised the polychoric correlation in 1922 (which is very similar to canonical correlation today). Although not all of Pearson’s correlational methods have survived him, a number of these methods are still the principal tools used by psychometricians for test construction. Following the publication of his first three statistical papers in Philosophical Transactions of the Royal Society, Pearson was elected a fellow of the Royal Society in 1896. He was awarded the Darwin Medal from the Royal Society in 1898.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B0123693985002280

Источник

См. также в других словарях:

Example: Standard Error of the Estimate in Excel

Additional Resources

The mean value

Estimation of errors

Estimating the absolute error

Estimating the relative error

Estimating the percentage error

What is the line of best fit?

Obtaining the line of best fit

When to use the line of best fit

Data outliers

Drawing the line of best fit

Calculating uncertainty in a plot

How to calculate the uncertainty in a plot

Estimation of Errors — Key takeaways

Correlation and Regression

The Standard Error of Estimate: How Large Are the Prediction Errors?

Standard Error of Estimate

Detection of a Trend in Population Estimates

5.2 VARIANCE COMPONENTS

Example 5.1 Population Size, Estimates, Standard Error of the Estimates, and Confidence Intervals for Owl Pairs in Fig. 5.3

Multiple Regression

Typical Prediction Error: Standard Error of Estimate

Multiple Regression

Confidence Intervals for the Coefficients

Multiple Regression

Confidence Intervals for the Coefficients

Simple Regression

Abstract

Bootstrap Method

Approximating Standard Error of a Sample Estimate

Pearson, Karl

The Biometric School

Читайте также: