7. Evaluating the Quality of Performance Measures: Content Representativeness | Performance Assessment for the Workplace: Volume I |The National Academies Press

this perspective, how representative is a purposive sample? Our approach to answering the question is based on the following conceptualization of the problem.

To begin with, the purposive sample of tasks/behaviors used for a hands-on performance measure is but one of many possible samples that might be chosen from the domain of interest. The sample could be characterized by its critical features, such as the average importance of the tasks contained in the hands-on performance measure, by the average frequency with which the hands-on performance tasks are performed on the job, by the average difficulty of the tasks, by the average number of errors made while performing the tasks, and so on. A 2nd, 3rd, 4th, . . . 2,000th sample of tasks could also be chosen from the job domain and their critical features characterized.

Now, focusing on the average importance of each of the 2,000 job samples drawn from the domain, a frequency distribution can be constructed with mean importance on the x-axis and frequency on the y-axis. This frequency distribution can be called a sampling distribution of means, or sampling distribution for short. The next step is to consider where the mean importance of the purposive sample falls within this sampling distribution. If it falls in the center of the distribution, the sample can be considered representative, at least in terms of the importance feature. If it falls within plus or minus two standard deviations (standard errors) from the mean of the sampling distribution (roughly at the mean for the entire domain of tasks if 2,000 samples were really drawn), it can still be considered representative. However, if it is more than two standard deviations from the mean, then it is among the 5 percent least probable samples with respect to the characteristic being evaluated. In that case, it would be considered an extreme sample, not representative of the job on the particular feature (importance). This process could be repeated for each of the other critical features of the job; a decision could then be made as to how representative the purposive sample is from a random sampling perspective.

The above process could be simulated on a computer, but there is no need. It has a long history and a straightforward analytic solution. The sampling distribution of means described above will be normally distributed, especially with increasing task sample size. It will have a mean equal to the domain mean and a standard deviation equal to the domain standard deviation divided by the square root of the sample size.

Finally, recognizing that the critical features of a job are most likely correlated, the features could be characterized not only one at a time, but also simultaneously. In this case, a set of mean feature scores (one each for importance, frequency, difficulty, and errors) would be sampled. Each hands-on measure, then, could be characterized by a set of four mean scores, or by a point in a multivariate space that corresponds to the set of four mean scores. By drawing repeated hands-on measures, a multivariate frequency