Power alpha error - Исправление ошибок и поиск оптимальных решений проблем

Four interrelated features of power can be summarized using BEAN

B	Beta Error (Power = 1 – Beta Error): Beta error (or Type II error) is the probability that a test of statistical significance will fail to reject the null hypothesis when it is false (e.g., when there really is an effect of training). As beta error increases, power decreases.
E	Effect Size: The effect size is the magnitude of the difference between the actual population mean and the null hypothesized mean (μ₁– μ₀) relative to standard deviation of scores (σ). When the effect size d = (μ₁– μ₀) / σ in the population is larger, the null and population sampling distributions overlap less and power is greater. As effect size increases, power increases (assuming no change in alpha or sample size).
A	Alpha error: Alpha error (or Type I error) is the probability that a statistical test will produce a statistically significant finding when the null hypothesis is true (e.g., there is no effect of training). For example, if alpha error = .05 and the null hypothesis is true, then out of 100 statistical tests, false significance would be found on average 5 times. The risk of false significance would be 5%. In practice, alpha is typically set by the researcher at .01 or .05 As alpha error increases, power increases (assuming no change in effect size or sample size).
N	Sample Size: As the sample size increases, the variability of sample means decreases. The population and null sampling distributions become narrower, overlapping to a lesser extent and making it easier to detect a difference between these distributions. This results in greater power. As sample size increases, power increases (assuming no change in alpha or effect size).

Select true or false for each scenario:

(Assuming no other changes)	True	False
1. As effect size increases, power decreases.
2. As sample size increases, power increases.
3. As alpha error increases, power decreases.
4. Beta error is unrelated to power.

Check your answers:

1. Click for answer

False; as effect size increases, power increases.

2. Click for answer

True; as sample size increases, power increases.

3. Click for answer

False; as alpha error increases, power increases.

4. Click for answer

False; beta error = 1 – power.

In this tutorial, we will discuss each component of the B.E.A.N. mnemonic in greater detail.

11,138 total views, 1 views today

Источник

INTRODUCTION

VADIM I. SERDOBOLSKII, in Multiparametric Statistics, 2008

The Kolmogorov Asymptotics

In 1967, Andrei Nikolaevich Kolmogorov was interested in the dependence of errors of discrimination on sample size. He solved the following problem. Let x be a normal observation vector, and xv¯ be sample averages calculated over samples from population number ν = 1, 2. Suppose that the covariance matrix is the identity matrix. Consider a simplified discriminant function

g(x)=(x¯1−x¯2)T(x−(x¯1+x¯2)/2)

and the classification rule w(x) > 0 against w(x) ≤ 0. This function leads to the probability of errors αn=Φ(−G/D), where G and D are quadratic functions of sample averages having a noncentral χ² distribution. To isolate principal parts of G and D, Kolmogorov proposed to consider not one statistical problem but a sequence of n-dimensional discriminant problems in which the dimension n increases along with sample sizes N_ν, so that Nv→∞ and n/Nv→λv>0,v=1, 2. Under these assumptions, he proved that the probability of error α_n converges in probability

(7)plim⁡ n→∞αn=Φ(−J−λ1+λ22J+λ1+λ2),

where J is the square of the Euclidean limit “Mahalanobis distance” between centers of populations. This expression is remarkable by that it explicitly shows the dependence of error probability on the dimension and sample sizes. This new asymptotic approach was called the “Kolmogorov asymptotics.”

Later, L. D. Meshalkin and the author of this book deduced formula (7) for a wide class of populations under the assumption that the variables are independent and populations approach each other in the parameter space (are contiguous) [45], [46].

In 1970, Yu. N. Blagoveshchenskii and A. D. Deev studied the probability of errors for the standard sample Fisher-Andersen-Wald discriminant function for two populations with unknown common covariance matrix. A. D. Deev used the fact that the probability of error coincides with the distribution function g(x). He obtained an exact asymptotic expansion for the limit of the error probability α. The leading term of this expansion proved to be especially interesting. The limit probability of an error (of the first kind) proved to be

α=Φ(−ΘJ−λ1+λ22J+λ1+λ2),

where the factor Θ=1−λ, with λ = λ₁λ₂/(λ₁ + λ₂), accounts for the accumulation of estimation inaccuracies in the process of the covariance matrix inversion. It was called “the Deev formula.” This formula was thoroughly investigated numerically, and a good coincidence was demonstrated even for not great n, N.

Note that starting from Deev’s formulas, the discrimination errors can be reduced if the rule g(x) > θ against g(x) ≤ θ with θ = (λ₁ — λ₂)/2 ≠ 0 is used. A. D. Deev also noticed [18] that the half-sum of discrimination errors can be further decreased by weighting summands in the discriminant function.

After these investigations, it became obvious that by keeping terms of the order of n/N, one obtains a possibility of using specifically multidimensional effects for the construction of improved discriminant and other procedures of multivariate analysis. The most important conclusion was that traditional consistent methods of multivariate statistical analysis should be improvable, and a new progress in theoretical statistics is possible, aiming at obtaining nearly optimal solutions for fixed samples.

The Kolmogorov asymptotics (increasing dimension asymp–totics [3]) may be considered as a calculation tool for isolating leading terms in case of large dimension. But the principal role of the Kolmogorov asymptotics is that it reveals specific regularities produced by estimation of a large number of parameters. In a series of further publications, this asymptotics was used as a main tool for investigation of essentially many-dimensional phenomena characteristic of high-dimensional statistical analysis. The constant n/N became an acknowledged characteristics in many-dimensional statistics.

In Section 5.1, the Kolmogorov asymptotics is applied for the development of theory allowing to improve the discriminant analysis of vectors of large dimension with independent components. The improvement is achieved by introducing appropriate weights of contributions of independent variables in the discriminant function. These weights are used for the construction of asymptotically unimprovable discriminant procedure. Then, the problem of selection of variables for discrimination is solved, and the optimum selection threshold is found.

But the main success in the development of multiparametric solutions was achieved by combining the Kolmogorov asymptotics with the spectral theory of random matrices developed independently at the end of 20th century in another region.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780444530493500047

Markov Chains

Mark A. Pinsky, Samuel Karlin, in An Introduction to Stochastic Modeling (Fourth Edition), 2011

Problems

3.1.1

A simplified model for the spread of a disease goes this way: The total population size is N = 5, of which some are diseased and the remainder are healthy. During any single period of time, two people are selected at random from the population and assumed to interact. The selection is such that an encounter between any pair of individuals in the population is just as likely as between any other pair. If one of these persons is diseased and the other not, with probability α= 0.1 the disease is transmitted to the healthy person. Otherwise, no disease transmission takes place. Let X_n denote the number of diseased persons in the population at the end of the nth period. Specify the transition probability matrix.

3.1.2

Consider the problem of sending a binary message, 0 or 1, through a signal channel consisting of several stages, where transmission through each stage is subject to a fixed probability of error α. Suppose that X₀ = 0 is the signal that is sent and let X_n be the signal that is received at the nth stage. Assume that {X_n} is a Markov chain with transition probabilities P₀₀ = P₁₁ = 1 − α and P₀₁ = P₁₀ =α, where 0 < α < 1.

(a): Determine Pr{X₀ = 0, X₁ = 0, X₂ = 0}, the probability that no error occurs up to stage n = 2.
(b): Determine the probability that a correct signal is received at stage 2.

Hint: This is Pr{X₀ = 0, X₁ = 0, X₂ = 0} + Pr{X₀ = 0, X₁ = 1, X₂ = 0}.

3.1.3

Consider a sequence of items from a production process, with each item being graded as good or defective. Suppose that a good item is followed by another good item with probability α and is followed by a defective item with probability 1 − α. Similarly, a defective item is followed by another defective item with probability β and is followed by a good item with probability 1 − β. If the first item is good, what is the probability that the first defective item to appear is the fifth item?

3.1.4

The random variables ξ₁, ξ₂, … are independent and with the common probability mass function

k =	0	1	2	3
Pr{ξ = k} =	0.1	0.3	0.2	0.4

Set X₀ = 0, and let X_n = max{ξ₁, …, ξ_n} be the largest ξ observed to date. Determine the transition probability matrix for the Markov chain {X_n}.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123814166000034

The Finite Element Method for Elliptic Problems

In Studies in Mathematics and Its Applications, 1978

Abstract error estimate

Thus we have another instance of a family of discrete problems for which the associated bilinear forms are uniformly V_h—elliptic. With this property as our main assumption, we first derive an abstract upper bound for the error. As usual, consistency conditions can be derived from inequality (8.2.24) below.

Theorem 8.2.3.

Given a family of discrete problems conforming for the displacements, for which the inequalities (8.2.22) and (8.2.23) hold for all h, there exists a constant C independent of h such that

(8.2.24)||u−uh||≤C(infυh∈ Vh||u−υh||+supwh∈ Vh|a(u,wh)−ah(u,wh)|||wh|| +supwh∈ Vh|f(wh)−fh(wh)|||wh||).

Proof. Let υ_h be an arbitrary element in the space V_h. We may write

α˜||υh−uh||2≤ ah(υh−uh,υh−uh) = ah(υh−u,υh−uh)+{ ah(u,υh−uh)−a(u,υh−uh)} +{f(υh−uh)−fh(υh−uh)},

from which we deduce

α˜||υh−uh||≤ M˜||u−υh||+|ah(u,υh−uh)−a(u,υh−uh)|||υh−uh|| +|f(υh−uh)−fh(υh−uh)|||υh−uh|| ≤M˜||u−υh||−supwh∈ Vh|a(u,wh)−ah(u,wh)|||wh|| +supwh∈ Vh|f(wh)−fh(wh)|||wh||,

and the conclusion follows by combining the above inequality with the triangular inequality

||u−uh||≤||u−υh||+||υh−uh||.

Estimate of the error
(∑α=12||uα−uαh||1,Ω2+||u3−u3h||2,Ω2)1/2

We are now in a position to obtain sufficient conditions for convergence (to shorten the statement of the next theorem, it is to be implicitly understood that possible additional hypotheses upon the integers k and l may be needed so as to insure that the V_h-interpolation operator, or the W_h-interpolation operator, are well defined).

Theorem 8.2.4.

Assume that the discrete problems are conforming for the displacements and that the spaces Φ_h, V_h and W_h are such that, for all

_h and all K ∈

_h,

(8.2.25)Pm(K)⊂PK⊂C3(K)

for some integer m ≥ 3,

(8.2.26)Pk(K)⊂P′K⊂H1(K)

for some integer k ≥ 1,

(8.2.27)P1(K)⊂P″K⊂H2(K)

for some integer l ≥ 2, respectively.

Then if the solution u = (u₁, u₂, u₃) belongs to the space

(8.2.28)Hk+1(Ω)×Hk+1(Ω)×Hl+1(Ω),

there exists a constant C independent of h such that

(8.2.29)||u−uh||≤Chmin{k,l−1,m−2}.

Proof. One has

infυh∈ Vh||u−υh||≤||u−Πhu||=(||u1−Πhu1||1,Ω2+||u2−Πhu2||1,Ω2 +||u3−Λhu3||2,Ω2)1/2,

where Π_hu = (Π_hu₁, Π_hu₂, and Λ_hu₃) is the V_h-interpolant of the solution u. Since it follows that Π_hu_α and Λ_hu₃ are the V_h-interpolants of the function u_α and the W_h-interpolant of the function u₃, respectively, an application of the standard error estimates shows that

infυh∈ Vh|u−υh|≤C{(|u1|k+1,Ω+|u2|k+1,Ω)hk+|u2|l+1,Ωhl−1)},

for some constant C independent of h.

From inequalities (8.2.14) and (8.2.15) of Theorem 8.2.1, we derive the consistency error estimates:

supwh∈ Vh|a(u,wh)−ah(u,wh)||wh|≤C|u|hm−2,supwh∈ Vh|f(wh)−fh(wh)||wh|≤C|f|0,Ωhm,

and the conclusion follows by combining the last three inequalities and inequality (8.2.24) of Theorem 8.2.3.

For instance, this result shows that the Argyris triangle yields an O(h³) convergence since it corresponds to the values k = l = m = 5. This is to be compared with the O(h⁴) convergence which it yields for plates: the decrease of one in the order of convergence is due to the approximation of the geometry.

Remark 8.2.2.

In some shell models, partial derivatives of orders only 1 and 2 of the mapping
φ˜ appear in the functions A_IJ. For such models, the analogues of Theorems 8.2.1 and 8.2.4 hold with the exponent (m − 1) instead of (m − 2).

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/S0168202408701875

Sample Design

W. Penn Handwerker, in Encyclopedia of Social Measurement, 2005

Inferences about Variables

When one’s research question calls for an inference about a variable’s parameter, differences between parameters, or the parametric relationship between variables, accurate and precise answers depend on samples of sufficient size that employ random selection criteria. However, the primary types of such samples (SRS, RSS, stratified, cluster, transect, and case–control) vary dramatically in their feasibility, cost, and power for the issue at hand. For example, SRSs cannot be drawn in the absence of a complete list of primary sampling units. Such lists commonly do not exist; it may not be possible or cost-efficient to create one. If cases with a specific experience can be distinguished from controls without that experience, a case-control sample may be selected by SRS. Even when SRSs can be drawn, however, it may not be cost-efficient to search out and contact independently selected primary sampling units.

Primary sampling units almost always can be identified either within a spatially bounded region or by enumeration units. When the population occupies a spatially bounded region, samples based on transects or sets of map coordinates are efficient choices. When a comprehensive list of enumeration units can be assembled efficiently, cluster samples of one kind or another become both feasible and relatively cheap. However, both cluster samples and SRSs exhibit large standard errors compared to stratified samples. Stratification thus makes it possible to achieve the same power with smaller sample sizes.

Power refers to one’s ability to precisely identify a parameter, to detect differences in a parameter over time or space, or identify the parametric influence of one variable on another, if the effect is real rather than due to chance. Sample design determines the population to which one can validly generalize, but you will waste your time if you don’t put in the effort to select a sample large enough to estimate parameters with requisite precisions.

Power varies with the risk of a type I or α error that one is willing to accept, sample size, and the size of the effect that one wants to be able to detect. The probability of making a type II or β error—of not detecting a real relationship between variables—is 1-power. For a fixed sample and effect size, when α is lowered, β simultaneously rises. When one wants to rigorously avoid concluding, for example, that traumatic stress in childhood influences the risk of depression in adulthood when, in fact, it does not, one might set α at 0.01. But, how power goes up and β goes down varies dramatically with the size of the effect.

Figure 1 illustrates the interdependencies between sample size, power, and effect size, when α is 0.01 (assuming the standard errors of SRSs). Figure 2 illustrates the interdependencies between sample size, β, and effect size, when α is 0.01 (assuming the standard errors of SRSs). As sample size increases, the ability to detect a real relationship (power) increases, and the possibility that it will not be detected (β) decreases. However, the way in which power increases and β decreases varies dramatically with the size of the effect. If the real shared variance between variables is approximately 0.06 (a Pearson’s r of 0.25), it would be missed approximately half the time even with a sample of 100 cases. If the real shared variance between variables is approximately 0.25 (a Pearson’s r of 0.50), it would be missed only approximately 9% of the time with a sample of only 50 cases and 1% of the time with a sample of 75 cases. In contrast, if the real shared variance between variables is approximately 0.76 (a Pearson’s r of 0.873), one could expect to miss it only approximately 3% of the time even with a sample of 10 cases and not at all with 15 cases. Decisions about sample size ordinarily seek to be able to detect the smallest important effect 80% of the time or better. Power analyses may be conducted by specialty software or by power routines that come with the major statistical software packages.

Figure 1. Relationship between power and sample size for effects of different sizes.

Figure 2. Relationship between beta and sample size for effects of different sizes.

A useful balance of feasibility, cost, and power usually comes in the form of a multistage sample design. Table I shows a multistage design appropriate for a study of drug- and sex-related HIV risk behaviors among recent Latino migrants to the United States from Mexico and Central America. The context for such a study illustrates many of the difficulties that sample designs must resolve.

Table I. Multistage Sampling Design for a Cross-Sectional Observational Study of Drug- and Sex-related Risk Behaviors among New Latino Migrants

Stage I: Comprehensive list of enumeration units
Stage Ib: Stratification of enumeration units
Stage II: Random systematic sample of each kind of enumeration unit
Stage IIb: Stratification by ethnicity and gender
Ethnicity	Mexican	Central American
Gender	Men	Women	Men	Women
No. of interviews	Contingent on power analysis	Contingent on power analysis	Contingent on power analysis	Contingent on power analysis
Stage III: Random systematic sample of primary sampling units

First, the size of the population is unknown, which eliminates the choice of an SRS. Second, it is unknown who among the population engages in drug- or sex-related HIV risk behavior, which eliminates the choice of a case-control sample. Third, the region in which migrants live is clearly delimited, but the target population of migrants may comprise a tiny proportion of the total number of people living within the region. Migrants frequently live in locations highly dispersed among the vast majority of the region’s population; many may be effectively hidden from conventional enumeration units (e.g., households) because they change residence frequently. When these conditions apply, transect or map coordinate samples would constitute costly sample design choices. Conventional ways of thinking about cluster sampling do not apply.

Fourth, it remains possible to assemble without undue cost a list of the unconventional enumeration units that would include even those migrants who otherwise remain hidden. These units might include street locations, farms, bars, community agencies, churches, and significant time differences for each. If different types of locations and times attract cases with different characteristics, the comprehensive list of enumeration units may be usefully stratified into different kinds of units based on those characteristics.

Fifth, RSS is easy to apply and does not require a comprehensive list of primary sampling units (cases). When cases are distributed randomly, RSSs exhibit the same standard errors as SRSs. Stratification of enumeration units by case characteristics orders the cases with regard to the variables studied. With ordered cases, RSSs exhibit lower standard errors (greater power) than SRSs. Further stratification on the basis of ethnicity and gender may or may not be cost-efficient relative to the gain in power it would yield. In the absence of stratification, explicit measurement of internal validity confounds implements a posttest-only control group research design that substitutes for random assignment the explicit measurement of internal validity confounds. RSSs from each kind of enumeration unit and RSSs of cases from each randomly selected enumeration unit complete the multistage design.

An appropriate power analysis focuses on the objective of the proposed study to test hypotheses about circumstances that increase or decrease the likelihood of engaging in specific HIV risk behaviors. Given an alpha level of 0.05, a two-tailed test, and the assumptions of SRS, the analysis would indicate the sample size necessary to detect effects of specific independent variables 80% of the time, or the power of tests based on different sample sizes. Table II shows how power would vary for the study in question with variation in sample size, the ratio of the reference and response groups, and varying effect sizes using binary independent variables. A sample size of up to 1204 cases would be required to detect a 50% increase in the likelihood of a given risk behavior (or a 33% reduction in the likelihood of a given risk behavior) if the ratio of reference and response groups was 20/80. However, a 500-case sample would have good to excellent power to detect an odds ratio ≥1.76 (or ≤0.57) whether the ratio of reference and response groups approximates 60/40 or 80/20. A 600-case sample does not appreciably improve the power of these analyses. An argument that we could both anticipate effects of this size and that smaller effects would not be of clinical or substantive significance at the current time—or not worth the expense of doubling the sample size—warrants a total sample size of approximately 500 cases.

Table II. Power for Logistic Regression Tests with Varying Sample Size, Ratio of Reference to Response Group, and Size of Effect (Odds Ratio) with a Binary Independent Variable

Sample split	Odds ratio	Power, N = 400(%)	Power, N = 500(%)	Power, N = 600(%)
80/20	2.07 or 0.48	82	89	94
	1.76 or 0.57	62	71	78
	1.50 or 0.67^a	37	44	50
60/40	2.07 or 0.48	94	97	99
	1.76 or 0.57	78	86	92
	1.50 or 0.67^b	50	59	67

a: Sample size necessary to detect an odds ratio of 1.50 or 0.67 at a power of 80% is 204.
b: Sample size necessary to detect an odds ratio of 1.50 or 0.67 at a power of 80% is 809.

By employing random selection criteria and sample sizes determined by a power analysis, the sample design in Table I allows accurate and reasonably precise estimates of parameters bearing on drug- and sex-related HIV risk behavior for a specific population of Mexican and Central American migrants to the United States. That the total population of cases came to be explicitly known only during the course of case selection and data collection does not bear on the validity of inferences from the sample to the population.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B0123693985000761

Measurement errors in statistical process monitoring: A literature review

Mohammad Reza Maleki, … Philippe Castagliola, in Computers & Industrial Engineering, 2017

3 The questions and possible responses for the conceptual classification scheme

Here, the questions and the possible responses for each one are discussed.

3.1 SPM area

•: Statistical design of control charts: The main goal of SPM is online assessment of the process to check its consistency over time. The most common tool for online assessing of a given process is the control chart which was first introduced by Shewhart in 1924. Designing control charts on the basis of their statistical performance such as the average run length (ARL) criterion (or any run length bases property) is referred to as statistical design of control charts (Woodall, 1985).
•: Economic/economic-statistical design of control charts: In statistical design of control charts, the chart parameters namely the sample size (n), the sampling frequency (h) and the control limit coefficient (L) are determined such that the desired values for the power of chart to detect a given shift (1-β) and the probability of Type I error (α) are obtained. However, designing the parameters of a control chart based on statistical criteria leads to ignore the economic consequences. Determining the parameters of control charts by considering the economic criteria is called as economic design. It should be noted that, an economic design neglects the statistical properties such as probabilities of Type I and Type II errors. To cover the mentioned issues (to improve the statistical features as well as to minimize the cost), the economic-statistical design of control charts are used in which both statistical and economic features of control charts are considered simultaneously.
•: Process capability analysis: Determining the statistical ability of a process to achieve measurable results that satisfy established specifications is referred to as process capability analysis. In the other words, the process capability indices show how well a process is able to fulfill the customer expectations and to conform to specification limits.

3.2 The type of measurement errors model

As noted, the relationship between the actual and the observed values of the sampled units are mostly expressed by the following three models as follows:

•: Additive model: The most commonly used model in the literature to characterize the relationship between the actual and observed values of quality characteristics under investigation is the additive model defined as:

(1)Y=A+BX+ε,

where X is the actual value of the quality characteristic under investigation which is assumed to follow a normal distribution with mean μX and variance σX2 and A and B > 1 are two constants which are fixed. In Eq. (1), ε is the measurement errors term which is assumed to follow a normal distribution with a mean value equal to 0 and a given variance (constant or non-constant) and it is assumed to be independent from X. The variance of measurement errors term is discussed in Section 3.3.

•: Multiplicative model: The relationship between the actual and observed quantities under multiplicative model is:

(2)Y=Xε,

where ε which is multiplied with the original variable is an independent random variable with mean value equal to 1 and a given variance.

•: Four-component measurement errors model: Li and Huang (2009) proposed this model which contains four types of measurement errors in a multivariate case with p correlated variables {X1,…,Xp}. The formulation of this model for variable Xj is expressed as follows:

(3)Yj=bj+sjXj+cjTVj+εj,

where

–: bj is the measurement error caused by sensor setup/calibration bias or drift when sensors are used in harsh environments.
–: sj is the measurement sensitivity.
–: cj represents the relationship between observed and actual quantities which also depends on the other variables (Vj) where Vj∈{X1,…,Xp} but Vj∉Xj.
–: εj∼N(0,var(εj)) denotes the sensor noise.
•: Two-component measurement errors (TCME) model: This type of error model is defined as follows:

(4)Y=A+BXeη+ε,

where A and B are the intercept and slope constants, ε and η are additive and multiplicative random disturbances, respectively which are independently normally distributed variables with a mean equal to 0 and a given variance.

3.3 Variance of measurement error term

•: Constant: In most researches in the literature, the variance of the measurement errors term, ε is assumed to be a constant value namely σε2. For example, in an additive covariate model with constant variance for the measurement errors term, Y will be a normally distributed variable as follows:

(5)Y∼N(A+BμX,B2σX2+σε2).

Based on Eq. (5), it is obvious that due to the measurement errors term, the variance of Y will be larger than the variance of X. Therefore, in the presence of measurement errors, the process variability increases.

•: Linearly increasing: In some applications, the variance of the measurement errors linearly depends on the process level and, therefore, the constant variance assumption is relaxed. In this case, ε is a normally distributed variable with mean equal to 0 and variance C+DμX. Hence, we have Y∼N(A+BμX,B2σX2+C+DμX), where C and D are two other constants which are fixed.
•: Constant & linearly increasing: In addition to the researches assuming constant and linearly increasing variance, the effect of measurement errors with both constant and linearly increasing variance are addressed in some papers such as Maravelakis, Panaretos, and Psarakis (2004) and Haq, Brown, Moltchanova, and Al-Omari (2015).

3.4 Type of quality characteristics

We classify the literature of the measurement errors effect on SPM into five groups namely univariate, multivariate, attribute, profile as well as fuzzy.

•: Univariate: A single quality characteristic which is expressed in a continuous scale such as size, weight, volume, time and so on.
•: Multivariate: Several correlated quality characteristics which are expressed via a continuous scale.
•: Attribute: One quality characteristic which is countable and characterized in a discrete scale.
•: Profile: Sometimes, the quality of a product or a process is summarized by a functional relationship between a response variable and one or more explanatory variables which is referred to as “profile”.
•: Fuzzy: Quality characteristics which contain some sources of uncertainties due to human judgment, evaluations and decisions and are expressed by fuzzy numbers and/or linguistic variables.

3.5 Process type

•: Multi-stage process: In multi-stage processes, the manufacturing process includes several stages. In such situations, the quality of the current stage is affected by the outcome of the previous stage(s).
•: Autocorrelated process: In autocorrelated processes, the independency assumption of consecutive sampled points is violated. For instance, in manufacturing or non-manufacturing environments when the measurements are gathered at short time intervals, it is reasonable that the observations become autocorrelated.
•: Multi-stage and autocorrelated process: In addition to the mentioned categories, there is a single research in the literature where the effect of imprecise measurements caused by measurement errors on multi-stage processes is addressed in which the observation are autocorrelated.
•: Ordinary process: Other processes which are not classified as multi-stage or autocorrelated processes are considered as ordinary processes.

3.6 Type of remedial approach

•: No remedial approach: As explained, the performance of control charts is significantly affected by the measurement errors. Although, it is important to provide some remedial approaches to decrease the adverse effect of measurement errors on SPM procedures, however, in many researches, using remedial approaches is ignored and only the effect of measurement errors on the performance of SPM procedures is investigated.
•: Multiple measurements approach: One of the most common remedial approaches to compensate for the effect of contaminated data on SPM procedures is the “multiple measurements” approach which was first introduced by Linna and Woodall (2001). In this approach, several measurements per item of each sample are taken instead of a single measurement and then the average of the measured values for each item is calculated. As a result, the variance of the measurement error component in the multiple measurements approach will be smaller than the one when using a single measurement.
•: Other approaches: As noted previously, the most commonly used method to improve the performance of SPM procedures in the presence of measurement errors is the multiple measurements approach. However, some other approaches namely increasing sample size (see Abbasi (2016) for example), RSS-based methods (discussed in Ghashghaei, Bashiri, Amiri, and Maleki (in press) and Haq et al. (2015), adjusting control limit coefficients (see Riaz (2014)), adjusting lower confidence bounds and critical values for process capability analysis area (such as in Pearn and Liao (2005)), inverse method (utilized in Villeta, Rubio, Sebastián, and Sanz (2010)), multivariate analysis of variance (MANOVA)-based method (suggested by Scagliarini (2011)) as well as omitting outliers via IQR-based approaches (see Amiri, Ghashghaei, and Maleki (in press)) are also used in some papers.
•: Multiple measurements & other approaches: The fourth possible category in this regard is devoted to the researches which contain multiple measurement approach which is used along with the other remedial approaches.

3.7 Type of statistic/index used to analyze the process

The selected papers in our survey are also evaluated based on the type of the statistic/index which is used to analyze the outcome of the process. To design control charts from a statistical, economic or an economic-statistical point of view, different statistics have been used to monitor the process mean, the process variability as well as the joint process mean and variability under different assumptions and situations. The statistics in this regard are classified as follows:

•: Shewhart-type statistic: This type of control chart which is first proposed by Shewhart in 1924 is used to monitor either variable or attribute quality characteristics. The Shewhart (or memory-less) statistics only use the information of the last sample taken from the process and they are particularly sensitive to the detection of large process shifts.
•: Memory-based statistic: In this case the current value of the monitored statistic depends on the current observation as well as on the previous ones. This approach allows to increase the sensitivity of memory-based charts such as exponentially weighted moving average (EWMA) and cumulative sum (CUSUM) charts to detect small and moderate process changes.
•: Shewhart-type & memory-based statistics: In addition to the mentioned categories, there are few researches in which both Shewhart-type and memory-based statistics are used together to monitor different processes when the measurement errors exist.

In process capability analysis area, some attentions are also devoted to process capability analysis using different indices either in univariate or multivariate cases.

Read full article

URL:

https://www.sciencedirect.com/science/article/pii/S0360835216304119

Источник

This article is about erroneous outcomes of statistical tests. For closely related concepts in binary classification and testing generally, see false positives and false negatives.

In statistical hypothesis testing, a type I error is the mistaken rejection of an actually true null hypothesis (also known as a «false positive» finding or conclusion; example: «an innocent person is convicted»), while a type II error is the failure to reject a null hypothesis that is actually false (also known as a «false negative» finding or conclusion; example: «a guilty person is not convicted»).^[1] Much of statistical theory revolves around the minimization of one or both of these errors, though the complete elimination of either is a statistical impossibility if the outcome is not determined by a known, observable causal process.
By selecting a low threshold (cut-off) value and modifying the alpha (α) level, the quality of the hypothesis test can be increased.^[2] The knowledge of type I errors and type II errors is widely used in medical science, biometrics and computer science.^{[clarification needed]}

Intuitively, type I errors can be thought of as errors of commission, i.e. the researcher unluckily concludes that something is the fact. For instance, consider a study where researchers compare a drug with a placebo. If the patients who are given the drug get better than the patients given the placebo by chance, it may appear that the drug is effective, but in fact the conclusion is incorrect.
In reverse, type II errors are errors of omission. In the example above, if the patients who got the drug did not get better at a higher rate than the ones who got the placebo, but this was a random fluke, that would be a type II error. The consequence of a type II error depends on the size and direction of the missed determination and the circumstances. An expensive cure for one in a million patients may be inconsequential even if it truly is a cure.

Definition[edit]

Statistical background[edit]

In statistical test theory, the notion of a statistical error is an integral part of hypothesis testing. The test goes about choosing about two competing propositions called null hypothesis, denoted by H₀ and alternative hypothesis, denoted by H₁. This is conceptually similar to the judgement in a court trial. The null hypothesis corresponds to the position of the defendant: just as he is presumed to be innocent until proven guilty, so is the null hypothesis presumed to be true until the data provide convincing evidence against it. The alternative hypothesis corresponds to the position against the defendant. Specifically, the null hypothesis also involves the absence of a difference or the absence of an association. Thus, the null hypothesis can never be that there is a difference or an association.

If the result of the test corresponds with reality, then a correct decision has been made. However, if the result of the test does not correspond with reality, then an error has occurred. There are two situations in which the decision is wrong. The null hypothesis may be true, whereas we reject H₀. On the other hand, the alternative hypothesis H₁ may be true, whereas we do not reject H₀. Two types of error are distinguished: type I error and type II error.^[3]

Type I error[edit]

The first kind of error is the mistaken rejection of a null hypothesis as the result of a test procedure. This kind of error is called a type I error (false positive) and is sometimes called an error of the first kind. In terms of the courtroom example, a type I error corresponds to convicting an innocent defendant.

Type II error[edit]

The second kind of error is the mistaken failure to reject the null hypothesis as the result of a test procedure. This sort of error is called a type II error (false negative) and is also referred to as an error of the second kind. In terms of the courtroom example, a type II error corresponds to acquitting a criminal.^[4]

Crossover error rate[edit]

The crossover error rate (CER) is the point at which type I errors and type II errors are equal. A system with a lower CER value provides more accuracy than a system with a higher CER value.

False positive and false negative[edit]

In terms of false positives and false negatives, a positive result corresponds to rejecting the null hypothesis, while a negative result corresponds to failing to reject the null hypothesis; «false» means the conclusion drawn is incorrect. Thus, a type I error is equivalent to a false positive, and a type II error is equivalent to a false negative.

Table of error types[edit]

Tabularised relations between truth/falseness of the null hypothesis and outcomes of the test:^[5]

Table of error types	Null hypothesis (H₀) is
True	False
Decision about null hypothesis (H₀)	Don’t reject	Correct inference (true negative) (probability = 1−α)	Type II error (false negative) (probability = β)
Reject	Type I error (false positive) (probability = α)	Correct inference (true positive) (probability = 1−β)

Table of error types

Null hypothesis (H₀) is

True

False

Decision
about null
hypothesis (H₀)

Don’t
reject

Correct inference
(true negative)

(probability = 1−α)

Type II error
(false negative)
(probability = β)

Reject

Type I error
(false positive)
(probability = α)

Correct inference
(true positive)

(probability = 1−β)

Error rate[edit]

The results obtained from negative sample (left curve) overlap with the results obtained from positive samples (right curve). By moving the result cutoff value (vertical bar), the rate of false positives (FP) can be decreased, at the cost of raising the number of false negatives (FN), or vice versa (TP = True Positives, TPR = True Positive Rate, FPR = False Positive Rate, TN = True Negatives).

A perfect test would have zero false positives and zero false negatives. However, statistical methods are probabilistic, and it cannot be known for certain whether statistical conclusions are correct. Whenever there is uncertainty, there is the possibility of making an error. Considering this nature of statistics science, all statistical hypothesis tests have a probability of making type I and type II errors.^[6]

The type I error rate is the probability of rejecting the null hypothesis given that it is true. The test is designed to keep the type I error rate below a prespecified bound called the significance level, usually denoted by the Greek letter α (alpha) and is also called the alpha level. Usually, the significance level is set to 0.05 (5%), implying that it is acceptable to have a 5% probability of incorrectly rejecting the true null hypothesis.^[7]
The rate of the type II error is denoted by the Greek letter β (beta) and related to the power of a test, which equals 1−β.^[8]

These two types of error rates are traded off against each other: for any given sample set, the effort to reduce one type of error generally results in increasing the other type of error.^[9]

The quality of hypothesis test[edit]

The same idea can be expressed in terms of the rate of correct results and therefore used to minimize error rates and improve the quality of hypothesis test. To reduce the probability of committing a type I error, making the alpha value more stringent is quite simple and efficient. To decrease the probability of committing a type II error, which is closely associated with analyses’ power, either increasing the test’s sample size or relaxing the alpha level could increase the analyses’ power.^[10] A test statistic is robust if the type I error rate is controlled.

Varying different threshold (cut-off) value could also be used to make the test either more specific or more sensitive, which in turn elevates the test quality. For example, imagine a medical test, in which an experimenter might measure the concentration of a certain protein in the blood sample. The experimenter could adjust the threshold (black vertical line in the figure) and people would be diagnosed as having diseases if any number is detected above this certain threshold. According to the image, changing the threshold would result in changes in false positives and false negatives, corresponding to movement on the curve.^[11]

Example[edit]

Since in a real experiment it is impossible to avoid all type I and type II errors, it is important to consider the amount of risk one is willing to take to falsely reject H₀ or accept H₀. The solution to this question would be to report the p-value or significance level α of the statistic. For example, if the p-value of a test statistic result is estimated at 0.0596, then there is a probability of 5.96% that we falsely reject H₀. Or, if we say, the statistic is performed at level α, like 0.05, then we allow to falsely reject H₀ at 5%. A significance level α of 0.05 is relatively common, but there is no general rule that fits all scenarios.

Vehicle speed measuring[edit]

The speed limit of a freeway in the United States is 120 kilometers per hour. A device is set to measure the speed of passing vehicles. Suppose that the device will conduct three measurements of the speed of a passing vehicle, recording as a random sample X₁, X₂, X₃. The traffic police will or will not fine the drivers depending on the average speed . That is to say, the test statistic

${displaystyle T={frac {X_{1}+X_{2}+X_{3}}{3}}={bar {X}}}$

In addition, we suppose that the measurements X₁, X₂, X₃ are modeled as normal distribution N(μ,4). Then, T should follow N(μ,4/3) and the parameter μ represents the true speed of passing vehicle. In this experiment, the null hypothesis H₀ and the alternative hypothesis H₁ should be

H₀: μ=120 against H₁: μ>120.

If we perform the statistic level at α=0.05, then a critical value c should be calculated to solve

${displaystyle Pleft(Zgeqslant {frac {c-120}{frac {2}{sqrt {3}}}}right)=0.05}$

According to change-of-units rule for the normal distribution. Referring to Z-table, we can get

${displaystyle {frac {c-120}{frac {2}{sqrt {3}}}}=1.645Rightarrow c=121.9}$

Here, the critical region. That is to say, if the recorded speed of a vehicle is greater than critical value 121.9, the driver will be fined. However, there are still 5% of the drivers are falsely fined since the recorded average speed is greater than 121.9 but the true speed does not pass 120, which we say, a type I error.

The type II error corresponds to the case that the true speed of a vehicle is over 120 kilometers per hour but the driver is not fined. For example, if the true speed of a vehicle μ=125, the probability that the driver is not fined can be calculated as

${displaystyle P=(T<121.9|mu =125)=Pleft({frac {T-125}{frac {2}{sqrt {3}}}}<{frac {121.9-125}{frac {2}{sqrt {3}}}}right)=phi (-2.68)=0.0036}$

which means, if the true speed of a vehicle is 125, the driver has the probability of 0.36% to avoid the fine when the statistic is performed at level 125 since the recorded average speed is lower than 121.9. If the true speed is closer to 121.9 than 125, then the probability of avoiding the fine will also be higher.

The tradeoffs between type I error and type II error should also be considered. That is, in this case, if the traffic police do not want to falsely fine innocent drivers, the level α can be set to a smaller value, like 0.01. However, if that is the case, more drivers whose true speed is over 120 kilometers per hour, like 125, would be more likely to avoid the fine.

Etymology[edit]

In 1928, Jerzy Neyman (1894–1981) and Egon Pearson (1895–1980), both eminent statisticians, discussed the problems associated with «deciding whether or not a particular sample may be judged as likely to have been randomly drawn from a certain population»:^[12] and, as Florence Nightingale David remarked, «it is necessary to remember the adjective ‘random’ [in the term ‘random sample’] should apply to the method of drawing the sample and not to the sample itself».^[13]

They identified «two sources of error», namely:

(a) the error of rejecting a hypothesis that should have not been rejected, and

(b) the error of failing to reject a hypothesis that should have been rejected.

In 1930, they elaborated on these two sources of error, remarking that:

…in testing hypotheses two considerations must be kept in view, we must be able to reduce the chance of rejecting a true hypothesis to as low a value as desired; the test must be so devised that it will reject the hypothesis tested when it is likely to be false.

In 1933, they observed that these «problems are rarely presented in such a form that we can discriminate with certainty between the true and false hypothesis» . They also noted that, in deciding whether to fail to reject, or reject a particular hypothesis amongst a «set of alternative hypotheses», H₁, H₂…, it was easy to make an error:

…[and] these errors will be of two kinds:

(I) we reject H₀ [i.e., the hypothesis to be tested] when it is true,^[14]

(II) we fail to reject H₀ when some alternative hypothesis H_A or H₁ is true. (There are various notations for the alternative).

In all of the papers co-written by Neyman and Pearson the expression H₀ always signifies «the hypothesis to be tested».

In the same paper they call these two sources of error, errors of type I and errors of type II respectively.^[15]

[edit]

Null hypothesis[edit]

It is standard practice for statisticians to conduct tests in order to determine whether or not a «speculative hypothesis» concerning the observed phenomena of the world (or its inhabitants) can be supported. The results of such testing determine whether a particular set of results agrees reasonably (or does not agree) with the speculated hypothesis.

On the basis that it is always assumed, by statistical convention, that the speculated hypothesis is wrong, and the so-called «null hypothesis» that the observed phenomena simply occur by chance (and that, as a consequence, the speculated agent has no effect) – the test will determine whether this hypothesis is right or wrong. This is why the hypothesis under test is often called the null hypothesis (most likely, coined by Fisher (1935, p. 19)), because it is this hypothesis that is to be either nullified or not nullified by the test. When the null hypothesis is nullified, it is possible to conclude that data support the «alternative hypothesis» (which is the original speculated one).

The consistent application by statisticians of Neyman and Pearson’s convention of representing «the hypothesis to be tested» (or «the hypothesis to be nullified») with the expression H₀ has led to circumstances where many understand the term «the null hypothesis» as meaning «the nil hypothesis» – a statement that the results in question have arisen through chance. This is not necessarily the case – the key restriction, as per Fisher (1966), is that «the null hypothesis must be exact, that is free from vagueness and ambiguity, because it must supply the basis of the ‘problem of distribution,’ of which the test of significance is the solution.»^[16] As a consequence of this, in experimental science the null hypothesis is generally a statement that a particular treatment has no effect; in observational science, it is that there is no difference between the value of a particular measured variable, and that of an experimental prediction.^{[citation needed]}

Statistical significance[edit]

If the probability of obtaining a result as extreme as the one obtained, supposing that the null hypothesis were true, is lower than a pre-specified cut-off probability (for example, 5%), then the result is said to be statistically significant and the null hypothesis is rejected.

British statistician Sir Ronald Aylmer Fisher (1890–1962) stressed that the «null hypothesis»:

… is never proved or established, but is possibly disproved, in the course of experimentation. Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis.

— Fisher, 1935, p.19

Application domains[edit]

Medicine[edit]

In the practice of medicine, the differences between the applications of screening and testing are considerable.

Medical screening[edit]

Screening involves relatively cheap tests that are given to large populations, none of whom manifest any clinical indication of disease (e.g., Pap smears).

Testing involves far more expensive, often invasive, procedures that are given only to those who manifest some clinical indication of disease, and are most often applied to confirm a suspected diagnosis.

For example, most states in the USA require newborns to be screened for phenylketonuria and hypothyroidism, among other congenital disorders.

Hypothesis: «The newborns have phenylketonuria and hypothyroidism»

Null Hypothesis (H₀): «The newborns do not have phenylketonuria and hypothyroidism»,

Type I error (false positive): The true fact is that the newborns do not have phenylketonuria and hypothyroidism but we consider they have the disorders according to the data.

Type II error (false negative): The true fact is that the newborns have phenylketonuria and hypothyroidism but we consider they do not have the disorders according to the data.

Although they display a high rate of false positives, the screening tests are considered valuable because they greatly increase the likelihood of detecting these disorders at a far earlier stage.

The simple blood tests used to screen possible blood donors for HIV and hepatitis have a significant rate of false positives; however, physicians use much more expensive and far more precise tests to determine whether a person is actually infected with either of these viruses.

Perhaps the most widely discussed false positives in medical screening come from the breast cancer screening procedure mammography. The US rate of false positive mammograms is up to 15%, the highest in world. One consequence of the high false positive rate in the US is that, in any 10-year period, half of the American women screened receive a false positive mammogram. False positive mammograms are costly, with over $100 million spent annually in the U.S. on follow-up testing and treatment. They also cause women unneeded anxiety. As a result of the high false positive rate in the US, as many as 90–95% of women who get a positive mammogram do not have the condition. The lowest rate in the world is in the Netherlands, 1%. The lowest rates are generally in Northern Europe where mammography films are read twice and a high threshold for additional testing is set (the high threshold decreases the power of the test).

The ideal population screening test would be cheap, easy to administer, and produce zero false-negatives, if possible. Such tests usually produce more false-positives, which can subsequently be sorted out by more sophisticated (and expensive) testing.

Medical testing[edit]

False negatives and false positives are significant issues in medical testing.

Hypothesis: «The patients have the specific disease».

Null hypothesis (H₀): «The patients do not have the specific disease».

Type I error (false positive): «The true fact is that the patients do not have a specific disease but the physicians judges the patients was ill according to the test reports».

False positives can also produce serious and counter-intuitive problems when the condition being searched for is rare, as in screening. If a test has a false positive rate of one in ten thousand, but only one in a million samples (or people) is a true positive, most of the positives detected by that test will be false. The probability that an observed positive result is a false positive may be calculated using Bayes’ theorem.

Type II error (false negative): «The true fact is that the disease is actually present but the test reports provide a falsely reassuring message to patients and physicians that the disease is absent».

False negatives produce serious and counter-intuitive problems, especially when the condition being searched for is common. If a test with a false negative rate of only 10% is used to test a population with a true occurrence rate of 70%, many of the negatives detected by the test will be false.

This sometimes leads to inappropriate or inadequate treatment of both the patient and their disease. A common example is relying on cardiac stress tests to detect coronary atherosclerosis, even though cardiac stress tests are known to only detect limitations of coronary artery blood flow due to advanced stenosis.

Biometrics[edit]

Biometric matching, such as for fingerprint recognition, facial recognition or iris recognition, is susceptible to type I and type II errors.

Hypothesis: «The input does not identify someone in the searched list of people»

Null hypothesis: «The input does identify someone in the searched list of people»

Type I error (false reject rate): «The true fact is that the person is someone in the searched list but the system concludes that the person is not according to the data».

Type II error (false match rate): «The true fact is that the person is not someone in the searched list but the system concludes that the person is someone whom we are looking for according to the data».

The probability of type I errors is called the «false reject rate» (FRR) or false non-match rate (FNMR), while the probability of type II errors is called the «false accept rate» (FAR) or false match rate (FMR).

If the system is designed to rarely match suspects then the probability of type II errors can be called the «false alarm rate». On the other hand, if the system is used for validation (and acceptance is the norm) then the FAR is a measure of system security, while the FRR measures user inconvenience level.

Security screening[edit]

False positives are routinely found every day in airport security screening, which are ultimately visual inspection systems. The installed security alarms are intended to prevent weapons being brought onto aircraft; yet they are often set to such high sensitivity that they alarm many times a day for minor items, such as keys, belt buckles, loose change, mobile phones, and tacks in shoes.

Here, the null hypothesis is that the item is not a weapon, while the alternative hypothesis is that the item is a weapon.

A type I error (false positive): «The true fact is that the item is not a weapon but the system still alarms».

Type II error (false negative) «The true fact is that the item is a weapon but the system keeps silent at this time».

The ratio of false positives (identifying an innocent traveler as a terrorist) to true positives (detecting a would-be terrorist) is, therefore, very high; and because almost every alarm is a false positive, the positive predictive value of these screening tests is very low.

The relative cost of false results determines the likelihood that test creators allow these events to occur. As the cost of a false negative in this scenario is extremely high (not detecting a bomb being brought onto a plane could result in hundreds of deaths) whilst the cost of a false positive is relatively low (a reasonably simple further inspection) the most appropriate test is one with a low statistical specificity but high statistical sensitivity (one that allows a high rate of false positives in return for minimal false negatives).

Computers[edit]

The notions of false positives and false negatives have a wide currency in the realm of computers and computer applications, including computer security, spam filtering, Malware, Optical character recognition and many others.

For example, in the case of spam filtering the hypothesis here is that the message is a spam.

Thus, null hypothesis: «The message is not a spam».

Type I error (false positive): «Spam filtering or spam blocking techniques wrongly classify a legitimate email message as spam and, as a result, interferes with its delivery».

While most anti-spam tactics can block or filter a high percentage of unwanted emails, doing so without creating significant false-positive results is a much more demanding task.

Type II error (false negative): «Spam email is not detected as spam, but is classified as non-spam». A low number of false negatives is an indicator of the efficiency of spam filtering.

References[edit]

^ «Type I Error and Type II Error». explorable.com. Retrieved 14 December 2019.
^ Chow, Y. W.; Pietranico, R.; Mukerji, A. (27 October 1975). «Studies of oxygen binding energy to hemoglobin molecule». Biochemical and Biophysical Research Communications. 66 (4): 1424–1431. doi:10.1016/0006-291x(75)90518-5. ISSN 0006-291X. PMID 6.
^ A modern introduction to probability and statistics : understanding why and how. Dekking, Michel, 1946-. London: Springer. 2005. ISBN 978-1-85233-896-1. OCLC 262680588.{{cite book}}: CS1 maint: others (link)
^ A modern introduction to probability and statistics : understanding why and how. Dekking, Michel, 1946-. London: Springer. 2005. ISBN 978-1-85233-896-1. OCLC 262680588.{{cite book}}: CS1 maint: others (link)
^ Sheskin, David (2004). Handbook of Parametric and Nonparametric Statistical Procedures. CRC Press. p. 54. ISBN 1584884401.
^ Smith, R. J.; Bryant, R. G. (27 October 1975). «Metal substitutions incarbonic anhydrase: a halide ion probe study». Biochemical and Biophysical Research Communications. 66 (4): 1281–1286. doi:10.1016/0006-291x(75)90498-2. ISSN 0006-291X. PMC 9650581. PMID 3.
^ Lindenmayer, David. (2005). Practical conservation biology. Burgman, Mark A. Collingwood, Vic.: CSIRO Pub. ISBN 0-643-09310-9. OCLC 65216357.
^ Chow, Y. W.; Pietranico, R.; Mukerji, A. (27 October 1975). «Studies of oxygen binding energy to hemoglobin molecule». Biochemical and Biophysical Research Communications. 66 (4): 1424–1431. doi:10.1016/0006-291x(75)90518-5. ISSN 0006-291X. PMID 6.
^ Smith, R. J.; Bryant, R. G. (27 October 1975). «Metal substitutions incarbonic anhydrase: a halide ion probe study». Biochemical and Biophysical Research Communications. 66 (4): 1281–1286. doi:10.1016/0006-291x(75)90498-2. ISSN 0006-291X. PMC 9650581. PMID 3.
^ Smith, R. J.; Bryant, R. G. (27 October 1975). «Metal substitutions incarbonic anhydrase: a halide ion probe study». Biochemical and Biophysical Research Communications. 66 (4): 1281–1286. doi:10.1016/0006-291x(75)90498-2. ISSN 0006-291X. PMC 9650581. PMID 3.
^ Moroi, K.; Sato, T. (15 August 1975). «Comparison between procaine and isocarboxazid metabolism in vitro by a liver microsomal amidase-esterase». Biochemical Pharmacology. 24 (16): 1517–1521. doi:10.1016/0006-2952(75)90029-5. ISSN 1873-2968. PMID 8.
^ NEYMAN, J.; PEARSON, E. S. (1928). «On the Use and Interpretation of Certain Test Criteria for Purposes of Statistical Inference Part I». Biometrika. 20A (1–2): 175–240. doi:10.1093/biomet/20a.1-2.175. ISSN 0006-3444.
^ C.I.K.F. (July 1951). «Probability Theory for Statistical Methods. By F. N. David. [Pp. ix + 230. Cambridge University Press. 1949. Price 155.]». Journal of the Staple Inn Actuarial Society. 10 (3): 243–244. doi:10.1017/s0020269x00004564. ISSN 0020-269X.
^ Note that the subscript in the expression H₀ is a zero (indicating null), and is not an «O» (indicating original).
^ Neyman, J.; Pearson, E. S. (30 October 1933). «The testing of statistical hypotheses in relation to probabilities a priori». Mathematical Proceedings of the Cambridge Philosophical Society. 29 (4): 492–510. Bibcode:1933PCPS…29..492N. doi:10.1017/s030500410001152x. ISSN 0305-0041. S2CID 119855116.
^ Fisher, R.A. (1966). The design of experiments. 8th edition. Hafner:Edinburgh.

Bibliography[edit]

Betz, M.A. & Gabriel, K.R., «Type IV Errors and Analysis of Simple Effects», Journal of Educational Statistics, Vol.3, No.2, (Summer 1978), pp. 121–144.
David, F.N., «A Power Function for Tests of Randomness in a Sequence of Alternatives», Biometrika, Vol.34, Nos.3/4, (December 1947), pp. 335–339.
Fisher, R.A., The Design of Experiments, Oliver & Boyd (Edinburgh), 1935.
Gambrill, W., «False Positives on Newborns’ Disease Tests Worry Parents», Health Day, (5 June 2006). [1] Archived 17 May 2018 at the Wayback Machine
Kaiser, H.F., «Directional Statistical Decisions», Psychological Review, Vol.67, No.3, (May 1960), pp. 160–167.
Kimball, A.W., «Errors of the Third Kind in Statistical Consulting», Journal of the American Statistical Association, Vol.52, No.278, (June 1957), pp. 133–142.
Lubin, A., «The Interpretation of Significant Interaction», Educational and Psychological Measurement, Vol.21, No.4, (Winter 1961), pp. 807–817.
Marascuilo, L.A. & Levin, J.R., «Appropriate Post Hoc Comparisons for Interaction and nested Hypotheses in Analysis of Variance Designs: The Elimination of Type-IV Errors», American Educational Research Journal, Vol.7., No.3, (May 1970), pp. 397–421.
Mitroff, I.I. & Featheringham, T.R., «On Systemic Problem Solving and the Error of the Third Kind», Behavioral Science, Vol.19, No.6, (November 1974), pp. 383–393.
Mosteller, F., «A k-Sample Slippage Test for an Extreme Population», The Annals of Mathematical Statistics, Vol.19, No.1, (March 1948), pp. 58–65.
Moulton, R.T., «Network Security», Datamation, Vol.29, No.7, (July 1983), pp. 121–127.
Raiffa, H., Decision Analysis: Introductory Lectures on Choices Under Uncertainty, Addison–Wesley, (Reading), 1968.

External links[edit]

Bias and Confounding – presentation by Nigel Paneth, Graduate School of Public Health, University of Pittsburgh

Источник

Four interrelated features of power can be summarized using BEAN

1. Click for answer

2. Click for answer

3. Click for answer

4. Click for answer

INTRODUCTION

The Kolmogorov Asymptotics

Markov Chains

Problems

The Finite Element Method for Elliptic Problems

Abstract error estimate

Sample Design

Inferences about Variables

Measurement errors in statistical process monitoring: A literature review

3 The questions and possible responses for the conceptual classification scheme

3.1 SPM area

3.2 The type of measurement errors model

3.3 Variance of measurement error term

3.4 Type of quality characteristics

3.5 Process type

3.6 Type of remedial approach

3.7 Type of statistic/index used to analyze the process

Definition[edit]

Statistical background[edit]

Type I error[edit]

Type II error[edit]

Crossover error rate[edit]

False positive and false negative[edit]

Table of error types[edit]

Error rate[edit]

The quality of hypothesis test[edit]

Example[edit]

Vehicle speed measuring[edit]

Etymology[edit]

[edit]

Null hypothesis[edit]

Statistical significance[edit]

Application domains[edit]

Medicine[edit]

Medical screening[edit]

Medical testing[edit]

Biometrics[edit]

Security screening[edit]

Computers[edit]

See also[edit]

References[edit]

Bibliography[edit]

External links[edit]

Читайте также: