Wednesday 29 October 2008

M ED 1.14 The Use of Discriminant Analysis

Dr See Kin Hai

The use of Discriminant Function Analysis

1. What is the purpose of using this statistics?

Discriminant function analysis is used to determine which variables discriminate between two or more naturally occurring groups. For example, an educational researcher may want to investigate which variables discriminate between high school graduates who decide (1) to go to college, (2) to attend a trade or professional school, or (3) to seek no further training or education. For that purpose the researcher could collect data on numerous variables prior to students' graduation. After graduation, most students will naturally fall into one of the three categories. Discriminant Analysis could then be used to determine which variable(s) are the best predictors of students' subsequent educational choice.

A medical researcher may record different variables relating to patients' backgrounds in order to learn which variables best predict whether a patient is likely to recover completely (group 1), partially (group 2), or not at all (group 3). A biologist could record different characteristics of similar types (groups) of flowers, and then perform a discriminant function analysis to determine the set of characteristics that allows for the best discrimination between the types.

Computationally, discriminant function analysis is very similar to analysis of variance (ANOVA). Let us consider a simple example. Suppose we measure height in a random sample of 50 males and 50 females. Females are, on the average, not as tall as males, and this difference will be reflected in the difference in means (for the variable Height). Therefore, variable height allows us to discriminate between males and females with a better than chance probability: if a person is tall, then he is likely to be a male, if a person is short, then she is likely to be a female.

Probably the most common application of discriminant function analysis is to include many measures in the study, in order to determine the ones that discriminate between groups. For example, an educational researcher interested in predicting high school graduates' choices for further education would probably include as many measures of personality, achievement motivation, academic performance, etc. as possible in order to learn which one(s) offer the best prediction.

Reliability and Cronbach’s Alpha

To return to the prejudice example (see Basic ideas), if there are several subjects who respond to our items, then we can compute the variance for each item, and the variance for the sum scale. The variance of the sum scale will be smaller than the sum of item variances if the items measure the same variability between subjects, that is, if they measure some true score. Technically, the variance of the sum of two items is equal to the sum of the two variances minus (two times) the covariance, that is, the amount of true score variance common to the two items.

We can estimate the proportion of true score variance that is captured by the items by comparing the sum of item variances with the variance of the sum scale. Specifically, we can compute:

a = (k/(k-1)) * [1- S(s2i)/s2sum]

This is the formula for the most common index of reliability, namely, Cronbach's coefficient Alpha (a). In this formula, the s2i's denote the variances for the k individual items; s2sum denotes the variance for the sum of all items. If there is no true score but only error in the items (which is esoteric and unique, and, therefore, uncorrelated across subjects), then the variance of the sum will be the same as the sum of variances of the individual items. Therefore, coefficient Alpha will be equal to zero. If all items are perfectly reliable and measure the same thing (true score), then coefficient Alpha is equal to 1. (Specifically, 1-S(si2)/ssum2 will become equal to (k-1)/k; if we multiply this by k/(k-1) we obtain 1.)

Alternative terminology. Cronbach's Alpha, when computed for binary (e.g., true/false) items, is identical to the so-called Kuder-Richardson-20 formula of reliability for sum scales. In either case, because the reliability is actually estimated from the consistency of all items in the sum scales, the reliability coefficient computed in this manner is also referred to as the internal-consistency reliability.

After the discussion so far, it should be clear that, the more reliable a scale, the better (e.g., more valid) the scale. As mentioned earlier, one way to make a sum scale more valid is by adding items. Reliability and Item Analysis methods include options that allow you to compute how many items would have to be added in order to achieve a particular reliability, or how reliable the scale would be if a certain number of items were added. However, in practice, the number of items on a questionnaire is usually limited by various other factors (e.g., respondents get tired, overall space is limited, etc.). Let us return to our prejudice example, and outline the steps that one would generally follow in order to design the scale so that it will be reliable:

Step 1: Generating items. The first step is to write the items. This is essentially a creative process where the researcher makes up as many items as possible that seem to relate to prejudices against foreign-made cars. In theory, one should "sample items" from the domain defined by the concept. In practice, for example in marketing research, focus groups are often utilized to illuminate as many aspects of the concept as possible. For example, we could ask a small group of highly committed American car buyers to express their general thoughts and feelings about foreign-made cars. In educational and psychological testing, one commonly looks at other similar questionnaires at this stage of the scale design, again, in order to gain as wide a perspective on the concept as possible.

Step 2: Choosing items of optimum difficulty. In the first draft of our prejudice questionnaire, we will include as many items as possible (note that the Reliability and Item Analysis module will handle up to 300 items in a single scale). We then administer this questionnaire to an initial sample of typical respondents, and examine the results for each item. First, we would look at various characteristics of the items, for example, in order to identify floor or ceiling effects. If all respondents agree or disagree with an item, then it obviously does not help us discriminate between respondents, and thus, it is useless for the design of a reliable scale. In test construction, the proportion of respondents who agree or disagree with an item, or who answer a test item correctly, is often referred to as the item difficulty. In essence, we would look at the item means and standard deviations and eliminate those items that show extreme means, and zero or nearly zero variances.

Step 3: Choosing internally consistent items. Remember that a reliable scale is made up of items that proportionately measure mostly true score; in our example, we would like to select items that measure mostly prejudice against foreign-made cars, and few esoteric aspects we consider random error. To do so, we would look at the following spreadsheet:

STATISTICA

RELIABL.

ANALYSIS

Summary for scale: Mean=46.1100 Std.Dv.=8.26444 Valid n:100

Cronbach alpha: .794313 Standardized alpha: .800491

Average inter-item corr.: .297818

variable

Mean if

deleted

Var. if

deleted

StDv. if

deleted

Itm-Totl

Correl.

Squared

Multp. R

Alpha if

deleted

ITEM1

41.61000

51.93790

7.206795

.656298

.507160

.752243

ITEM2

41.37000

53.79310

7.334378

.666111

.533015

.754692

ITEM3

41.41000

54.86190

7.406882

.549226

.363895

.766778

ITEM4

41.63000

56.57310

7.521509

.470852

.305573

.776015

ITEM5

41.52000

64.16961

8.010593

.054609

.057399

.824907

ITEM6

41.56000

62.68640

7.917474

.118561

.045653

.817907

ITEM7

41.46000

54.02840

7.350401

.587637

.443563

.762033

ITEM8

41.33000

53.32110

7.302130

.609204

.446298

.758992

ITEM9

41.44000

55.06640

7.420674

.502529

.328149

.772013

ITEM10

41.66000

53.78440

7.333785

.572875

.410561

.763314

Shown above are the results for 10 items, that are discussed in greater detail in Examples. Of most interest to us are the three right-most columns in this spreadsheet. They show us the correlation between the respective item and the total sum score (without the respective item), the squared multiple correlation between the respective item and all others, and the internal consistency of the scale (coefficient Alpha) if the respective item would be deleted. Clearly, items 5 and 6 "stick out," in that they are not consistent with the rest of the scale. Their correlations with the sum scale are .05 and .12, respectively, while all other items correlate at .45 or better. In the right-most column, we can see that the reliability of the scale would be about .82 if either of the two items were to be deleted. Thus, we would probably delete the two items from this scale.

Step 4: Returning to Step 1. After deleting all items that are not consistent with the scale, we may not be left with enough items to make up an overall reliable scale (remember that, the fewer items, the less reliable the scale). In practice, one often goes through several rounds of generating items and eliminating items, until one arrives at a final set that makes up a reliable scale.

Split-half reliability

An alternative way of computing the reliability of a sum scale is to divide it in some random manner into two halves. If the sum scale is perfectly reliable, we would expect that the two halves are perfectly correlated (i.e., r = 1.0). Less than perfect reliability will lead to less than perfect correlations. We can estimate the reliability of the sum scale via the Spearman-Brown split half coefficient:

rsb = 2rxy/(1+rxy)

In this formula, rsb is the split-half reliability coefficient, and rxy represents the correlation between the two halves of the scale.

The use of Power Analysis

The Power Analysis module implements the techniques of statistical power analysis, sample size estimation, and advanced techniques for confidence interval estimation. The main goal of the first two techniques is to allow you to decide, while in the process of designing an experiment, (a) how large a sample is needed to allow statistical judgments that are accurate and reliable, and (b) how likely your statistical test will be to detect effects of a given size in a particular situation. The third technique is useful in implementing objectives (a) and (b) above, and in evaluating the size of experimental effects in practice.

Performing power analysis and sample size estimation is an important aspect of experimental design, because without these calculations, sample size may be too high or too low. If sample size is too low, the experiment will lack the precision to provide reliable answers to the questions it is investigating. If sample size is too large, time and resources will be wasted, often for minimal gain.

The Power Analysis module provides a number of graphical and analytical tools to enable precise evaluation of the factors affecting power and sample size in many of the most commonly encountered statistical analyses. This information can be crucial to the design of a study that is cost-effective and scientifically useful.

Noncentrality interval estimation procedures and other sophisticated confidence interval procedures implemented in the Power Analysis module provide some sophisticated confidence interval methods for analyzing the importance of an observed experimental result. An increasing number of influential statisticians are suggesting that confidence interval estimation should augment or replace traditional hypothesis testing approaches in the analysis of experimental data.

No comments: