Their effects on rating errors and rater accuracy

Journal of Occupational Psychology, 1985,58,265-275, Printed in Great Britain
© 1985 The British Psychological Society
Rater beliefs about others: Their effects
on rating errors and rater accuracy
KENNETH N. WEXLEY* AND MARGARET A. YOUTZ
Michigan State University
This study investigated the relationship between rater eharacteristies, rating errors
and rater accuracy. Several types of rater characteristics were studied: personality,
cognitive and job-related variables, and rater beliefs about human nature, tt was
hypothesized that rater beliefs would correlate significantly with both rating errors
and rater accuracy and that there would be a significant relationship between
accuracy and rating errors. The results supported the first hypothesis. However,
rating errors and rating accuracy were not significantly correlated. Practical implications of these results are discussed.
According to Taft (1955), the extent to whieh a rater can make accurate judgements about
other people is a function ofthe rater’s general ability to judge, and of specific situational
and interactional factors such as the relationship between the rater and ratee, the type of
judgement demanded, the characteristics being judged, and the material available to the
rater. The present research focuses on gaining a better understanding ofa rater’s ability to
judge other people. Specifically, this research examined the relationship between the personal characteristics of raters and both their accuracy in appraising others’ performance
efl”ectiveness and the occurrence of rating errors.
Previous research has examined the effect of rater characteristics on performance
rating. In their review of the performance rating literature, Landy & Farr (1980) classified
rater characteristics into three general categories: demographic, psychological, and
job-related variables. Sex, age, race, and education have been the most widely investigated
rater demographic characteristics. Research to date indicates no consistent eflects for
either rater sex (Rosen & Jerdee, 1973; Jacobsen & Efl”ertz, 1974; Pulakos & Wexley, 1983)
or rater age (Mandell, 1956; Klores, 1966). However, higher ratings are generally given by
raters to ratees of the same race (Cox & Krumboltz, 1958; Crooks, 1972; Hamner et al.,
1974; Wexley & Nemerofl”, 1974), and by less educated raters. However, for the latter
variable, diflerenees were too small to be of any practical significance (Cascio & Valenzi,
1977).
Psychological variables have been the most frequently examined of the three categories of rater characteristics. These variables can be further subdivided into those which
are cognitive in nature (e,g, intelligence and cognitive complexity) and those which focus
on the rater’s personality (e,g, self-esteem and interests). With regard to cognitive variables, it has been shown that rating ability is partially a function ofthe intelligence ofthe
rater (Bayrofl” et al., 1954; Borman, 1977) and the extent to which the rater engages in
analytical as opposed to global thinking (Gruenfeld & Arbuthnot, 1969), Research
•Requests for reprints should be addressed to Professor K, N, Wexley, Michigan State University,
Graduate School of Business Administration, Department of Management, East Lansing, MI 48824-1121,
USA,
265
266 KENNETH N, WEXLEY AND MARGARET A, YOUTZ
findings on cognitive complexity have been inconsistent. Cognitive complexity has been
defined by Schneier (1977) as ‘.. . the degree to which a person possesses the abihty to
perceive behavior in a multidimensional manner’ (p. 541), Schneier found that cognitively
complex raters exhibited less leniency and restriction of range when using behavioural
expectation scales than did cognitively simple raters, Bernardin et al. (1982), however,
found that rater cognitive complexity was unrelated to rating accuracy, halo error,
acceptability of format, or confidence in ratings. It is evident that more research is needed
to resolve the discrepancies in these findings.
Concerning personality variables, it has been shown that more accurate raters tend to
be free from self-doubt, tend not to worry or become stressed, have investigative interests,
and tend to be detail oriented in their approach to tasks (Borman, 1979*), According to
Taft (1955), the ability to judge others accurately also seems to be higher in people who are
emotionally adjusted, have dramatic and artistic interests, possess good insight into where
they stand in relation to peers (i,e. self-insight), and have an ability to predict how others
will respond to opinion items (i.e, social empathy). Mandell (1956) found that lenient
raters tend to like people, be uncritical, self-confident, not highly ambitious, and derive
job satisfaction from a feeling of rendering service.
As for job-related variables, it has been shown that experienced (Jurgensen, 1950;
Cascio & Valenzi, 1977) and better performing (Schneider & Bayrofl”, 1953; Kirchner &
Reisberg, 1962) raters give more reliable and less error-prone ratings. Several studies have
also shown that raters who are people oriented in their leadership style are more lenient in
their ratings of subordinates than raters who are more production oriented (Mandell,
1956; Taylor et al., 1959; Klores, 1966),
Although a variety of rater characteristics has already been examined, one important type of psychological variable, a rater’s beliefs, has been neglected. Individuals can
hold beliefs about a variety of issues (e.g. politics, religion, economics), but the type of
beliefs that would be most likely to influence a rater’s judgements about other people
would be their beliefs about human nature. In other words, how the rater views other
people in general might correlate significantly with the rater’s propensity to make certain
rating errors such as leniency and halo, and also relate to the rater’s accuracy in appraising
others. These relationships might occur because a person’s beliefs about the nature of
other human beings could influence both their observation and recall of behaviour as well
as their interpretation of that behaviour. They may rate others in terms of how they view
people in general rather than in terms of how the person actually performs, thereby
resulting in less accurate ratings. Accordingly, we chose to focus our research on several
beliefs dealing with the extent to which a rater perceives others to be trustworthy,
altruistic, independent, rational, and variable in nature. Given that there has been little
research replicating the results of most studies of rater characteristics, the present research
also examined the relationship between selected psychological (i.e, intelligence, selfesteem, and cognitive complexity) and job-related (i.e, leadership style) variables with
both rating errors and rater accuracy. Demographic v’ariables could not be investigated in
this study due to the homogeneity and size of the sample of raters.
A secondary purpose of this study was to examine the relationship between the
occurrence of rating errors and rater accuracy. Although most research on rating errors
has assumed that they decrease accuracy, there is some evidence to date which questions
this basic assumption. Four studies (Borman, 1977, 1979a; Berman & Kenny, 1977;
Warmke, 1980) have examined the relationship between rating errors and accuracy, yielding mixed results. Although all the studies found fairly weak relationships, the direction
was positive in some studies and negative in others. Based on the inconsistent results of
previous research, we felt that the relationship between rating errors and rater accuracy
deserved further study. However, no directional hypotheses concerning this relationship
were made.
RATER BELIEFS ABOUT OTHERS 267
SETTING
Data for this study were gathered from a service organization with county branches
located throughout a US state. The main objective of this state-supported agency is to
help low-income families, especially those with young children, acquire the knowledge,
skills and attitudes necessary to prepare more nutritional meals. Specifically, the programme is expected to result in: (a) improved diets and health for the total family;
(b) increased knowledge of the essentials of nutrition; (c) increased ability to select and
buy food that satisfies nutritional needs; (d) increased ability to prepare and serve palatable meals; (e) improved practices in food production, storage, safety and sanitation; and
(/) increased ability to manage resources that relate to food, including federal assistance
programmes such as food stamps.
The two major methods used by this service agency to reach low-income families are
group work and one-to-one home visits conducted by programme aides. Group work
involves encouraging homemakers to attend neighbourhood group sessions about food
and nutrition. These group sessions are then followed up by home visits for 9-12 months
by the programme aides.
Programme aides are paraprofessionals who receive direction from professional
home economists, hereafter referred to as supervisors. These supervisors have primary
responsibility for the conduct and quality of the service programme.
METHOD
Subjects
Twenty-four supervisors constituting the entire population of supervisors in the
agency participated in this study. The supervisors were all female, ranged in age from 28 to
60 (mean = 44 5), had an average of 7 3 years of managerial experience in the agency, and
were from various racial and ethnic backgrounds (black = 25, white = 71, hispanic = 4 per
cent). A total of 82 female programme aides (all the aides in the agency at the time ofthe
study) took part in the research project as ratees. The aides ranged in age from 19 to 64
(mean = 38-4), had an average of 4-7 years of tenure, and also represented various racial
and ethnic backgrounds (black=40, white=47, hispanic = 13 per cent). The mean number
of aides per supervisor was 6-5 (range = 2-14).
Measures
All supervisors were administered five psychological instruments yielding 10
measures. Cognitive complexity was assessed using the Bieri Grid which requires individuals to describe 10 familiar people using 10 characteristics on a semantic differential
scale. An individual’s score is determined by the number of different ratings the person
uses in describing these people, with low scores indicating greater cognitive complexity.
An individual who is cognitively complex possesses a relatively differentiated system of
dimensions for perceiving the behaviour of others, while an individual who is cognitively
simple possesses a relatively undifferentiated system of dimensions for perceiving others
(Bieri et al., 1966). Tripodi & Bieri (1963) report test-retest reliabihty of 0 86 in a study
using the grid format used here.
The Adaptability Test (Tiffin & Lawshe, 1943, 1967) was used to measure mental
alertness. This 15-minute test consists of 35 items assessing verbal and numerical
reasoning ability. The test has been designed for use with a broad range of persons, from
those who are limited in mental alertness to those possessing a high degree. The alternate
forms and corrected split-half reliabilities of the Adaptability Test have been found to be
0 89 and 0 90, respectively (Tiffin & Lawshe, 1967).
The Leadership Opinion Questionnaire (LOQ) provides measures of two dimensions
of supervisory leadership. Consideration and Initiating Structure (Fleishman, 1951,1969).
268 KENNETH N. WEXLEY AND MARGARET A. YOUTZ
Consideration reflects the extent to which a supervisor is likely to have job relationships
with subordinates characterized by warmth, mutual trust, respect for subordinates’ ideas,
and consideration of their feelings. A high score indicates a climate of good rapport and
two-way communication, while a low score indicates that the supervisor is likely to be
more impersonal in relations with subordinates. Initiating Structure reflects the extent to
which a supervisor is likely to structure his or her own role and those of subordinates
toward goal attainment. A high score characterizes supervisors who play an active role in
directing group activities through planning, scheduling, criticizing, and so forth. A low
score characterizes supervisors who are likely to be relatively inactive in giving directions
in these ways. There are 20 items in each scale, with each item scored from zero to four.
Fleishman (1969) reported corrected split-half reliabilities of 0 70 and 0 79 for the Consideration and Structure scales, respectively, using 122 supervisors. Further, test-retest
(three-month interval) reliabilities of 0-80 (Consideration) and 0 74 (Structure) were
reported based on 31 supervisors.
Self-esteem was measured using the Rosenberg Self-Esteem Scale (Rosenberg, 1965).
This scale asks respondents to indicate their agreement with 10 statements about their own
perceived worth and competence. Silber & Tippett (1965) found a test-retest correlation
over two weeks of 0-85. A Guttman reproducibility coefficient of 0-92 has also been
obtained (Robinson & Shaver, 1973).
The Wrightsman (1964) Philosophy of Human Nature Scales provided measures of
the five beliefs concerning human nature*. These variables were defined by Wrightsman as
follows: (a) Trustworthiness—the extent to which people are seen as moral, honest, and
reliable; (b) Independence—the extent to which people are seen as maintaining their convictions in the face of society’s pressure toward conformity; (c) Rationality/Strength of
Will—the extent to which people are perceived to understand the motives behind their
behaviour and have control over their outcomes; (d) Altruism—the extent to which people
are believed to be unselfish, sympathetic and concerned for others; and (e) Variability—the extent to which people are seen as differing from one another in basic nature
as well as being able to change over time. Each scale consists of several statements such as
People usually tell the truth, even when they know they would be better off lying’ and
‘Most people inwardly dislike putting themselves out to help other people’. Respondents
were asked to indicate the extent to which they agreed or disagreed with each item using
a six-point Likert-type format. These subseales have been shown by Wrightsman (1964)
to have an average corrected split-half reliability of 0-69, an average test-retest reliability (three-month interval) of 0 78, and inter-subscale correlations ranging from -01 2
(Rationality/Strength of Will and Variability) to 0 69 (Altruism and Trustworthiness).
The degree of halo and leniency error exhibited by each supervisor was derived from
Behavioral Observation Scales (BOS) ratings (Latham & Wexley, 1977) completed by
each supervisor on all of her aides. The BOS instrument assessed five major job dimensions: Education of the Homemaker (23 items). Record Keeping (7 items), Job-Related
Development (10 items). New Family Recruitment and Volunteer Development (12
items), and Staff Relations (3 items). Supervisors were asked to appraise the frequency, on
a five-point scale ranging from almost always to almost never, with which they had
observed an aide exhibit each ofthe 55 behavioural items.
The degree of halo error exhibited by each rater was operationalized using the
standard deviation across BOS dimension scores. This method of measuring halo has been
used extensively by other researchers (Borman, 1975; Schneier, 1977; Bernardin, 1978;
Ivancevich, 1979; Bernardin & Pence, 1980). Specifically, a mean score for each BOS
dimension was calculated and the standard deviation across these five dimension scores
was determined for each supervisor. This standard deviation was then averaged across all
•Another subscale called Complexity of Human Nature was not used in the research because it has
previously been found to have low split-half and test-retest reliabilities.
RATER BELIEFS ABOUT OTHERS 269
aides reporting to a supervisor in order to determine the supervisor’s overall degree of
halo.
Leniency error was operationalized as the mean BOS rating given by a supervisor to
all of her ratees. This operational definition of leniency has been used previously by
Bernardin & Pence (1980) and Borman (1977). This measure was obtained by calculating
the average score across the 55 behavioural items on the BOS. This average score was then
averaged across all aides reporting to a supervisor in order to determine the supervisor’s
overall degree of leniency.
To measure rater accuracy, supervisors viewed in small groups a 50-minute videotape
of a manager and two subordinates engaging in a manufacturing game (Wexley &
Nemeroff, 1975). After observing the videotape, each supervisor was given a behavioural
checklist and asked to indicate the frequency with which she had observed the manager
exhibit each of 26 effective or ineffective behaviours. Ratings were made on a frequency
scale ranging from zero to four times. The videotape was carefully constructed so that
the researchers knew the true frequency of each behaviour. (For more detail on the
construction of the videotape see Heneman & Wexley, 1983.)
A supervisor’s accuracy score was calculated by taking the absolute difference
between the supervisor’s frequency rating on each behavioral item and the true frequency
for that item. These difference scores were then summed across the 26 behavioural items to
yield an individual’s score. The lower the score, the more accurate the supervisor’s ratings.
This is similar to the accuracy measure used previously by Bernardin & Pence (1980) and
Heneman & Wexley (1983).
Procedure
The data for this study were collected over a 12-month period. First, data on rater
characteristics were gathered. Copies of all tests, with the exception of the Adaptability
Test, were given to the supervisors for self-administration. They were then mailed back to
the researchers, resulting in a 100 per cent return rate. Since the Adaptability Test was a
timed test, it was administered by the second author during regular working hours. Four
months after all the tests were administered, the BOS ratings were completed by the
supervisors as part of the agency’s normal performance appraisal programme.
Eight months later the supervisors were brought into a laboratory setting to obtain
measures of rater accuracy. Prior to observing the videotape from which the accuracy
scores were derived, a brief description of the content of the tape was provided by the
researchers, and the supervisors were given an opportunity to read through the
behavioural checklist they would be using to rate the manager. At the conclusion of
the tape, the supervisors were given time to complete the checklist.
RESULTS
The means, standard deviations, and alpha coefficients of the psychological measures
are reported in Table 1. Also reported in Table 1 are the means, standard deviations and
alpha coefficients of the BOS dimensions and total score. Psychometric considerations
(low reliability and high interscale correlations) made several modifications in the
Wrightsman scales necessary. Initially, Wrightsman had five subscales. Due to the
high intercorrelations between three of the subscales (0-67 to 0-79), Altruism, Trustworthiness and Independence were combined into one scale which was renamed Positive Human
Nature. The alpha coefficient for this new scale was found to be 0-95.
In addition, the rehability for the Strength of Will/Rationality subscale was found to
be quite low (alpha = 0-55). Examination of the content of this subscale revealed two distinct clusters of items: those dealing with Rationality (the extent to which one believes that
people understand the reasons for their behaviour) and those concerned with Strength of
270 KENNETH N. WEXLEY AND MARGARET A. YOUTZ
Table 1. Means, standard deviations, and reliabilities of the
variables
Halo
Leniency
Accuracy
LOQ-Structure
LOQ-Consideration
Self-esteem
Cognitive Complexity
Mental Alertness
Variability
Rationality
Strength of Will
Positive Human Nature
BOS—Education
BOS—Record Keeping
BOS—Development
BOS—Recruitment
BOS—Staff Relations
Total-BOS
Mean
0-34
4-29
44-78
45-56
55-19
36-12
98 52
16-39
4-70
1-04
6-19
24-78
90-69
28-49
39-05
40-39
13-22
211-84
Standard
deviation
0-14
0-50
8-07
7-94
4-59
3-30
17-30
4-96
8-13
5-37
5-09
29-64
16-04
5-69
7-74
11-94
2-3
36-54
Alpha
coefficient
0-75
0-41
0-85
0-76
0-71
0-79
0-95
0-85
0-69
0-72
0-82
0-60
0-93
Note. Due to the nature of the Cognitive Complexity and Mental Alertness
scales, it was not possible to compute alpha coefficients for those
variables.
Will (the extent to which one believes that people have control over their outcomes).
Hence, these two components were separated to form two scales (i.e. Rationality and
Strength of Will), each having an acceptable level of reliability (alphas = 0-71 and 0-79,
respectively).
The Consideration subscale of the Leadership Opinion Questionnaire was dropped
from further analyses because of its low level of reliability (alpha = 0-41).
Table 2 presents the zero-order intercorrelations among rating errors, rater accuracy
and rater characteristics. It can be seen that the degree of halo error was not significantly
related to any rater characteristics. It is also evident that Leniency was positively correlated
with Positive Human Nature, while being negatively related to Variability. These findings
suggest that more lenient ratings tend to be given by supervisors who believe that other
people are basically good (i.e. trustworthy, altruistic and independent), and who believe
that people differ little from one another. Finally, rater accuracy was found to be
negatively correlated with supervisors’ beliefs about the variability of human nature,
which suggests that supervisors who believe that people differ greatly from one another
rate more accurately.
Hierarchical multiple regression was used to determine the relative importance ofthe
rater beliefs as compared to the other rater characteristics included in this study. This was
done by entering belief measures into a regression equation after non-belief measures to
see if they accounted for a significant increase in the amount of variance explained in the
dependent variables. The procedure was then reversed so the incremental contribution of
the non-belief measures could be assessed. The results of these analyses are reported in
Table 3.
Examination of Table 3 reveals that neither belief nor non-belief measures made a
significant incremental contribution to the variance explained in either Halo or Leniency.
However, when accuracy was the dependent variable, it was found that entering belief
RATER BELIEFS ABOUT OTHERS 271
2
I
S
I
I
u
2
p
6
(O 00
O in
6 6
I
.- to m
O CN <N
6 6 6
00 00 a> o
o o o o
I I
u? in 00 (o a>
^ csi CN csi csi
6 6 6 6 6
rg m o ifl Q CM
O CVI CO O T t66660 6
I I
10 to r~ r» 05 in o
CO ,- O O O p «7
6 6 6 6 6 6 6
I I I
cMincoto<!’co<D O
CNOOO^PtOOC O
6666066 6
I I I I I
66666666 6
I II I
666666666 6
I II II I
c 2
Q O
a)
•5.
E J
01 0)
“5 o
w 0
? 2
0)
.2
a
o
O
O) O
o ‘s
U .2
3 8 II
>
0
>
c
in
g
CDrrel
0
u
<D
“S
c
O)
w
JJ
ro
in
roate
ro
.£
c
ro
Q. d)
2 = •
.EE o
272 KENNETH N, WEXLEY AND MARGARET A. YOUTZ
Table 3, Results of hierarchical multiple regression analyses
Ftesx for significance
Cumulative/?^ /?^ Change of incremental variance
Step
Variable entered Halo Leniency Accuracy Halo Leniency Accuracy Halo Leniency Accuracy
Non-belief
measures 1 0-23 0-33 0-20
Rater belief
measures 2 0 26 0-51 0-55 003 0-18 0 35 0-155 1-52 3-17*
Rater belief
measures 1 0-14 0-32 0-33
Non-belief
measures 2 0-26 0-51 0-55 0-12 0-19 0-22 0-689 1-55 2-04
•P<0-05,
measures into the regression equation after non-belief measures did result in a significant
increase in the amount of variance explained. Furthermore, when the order of entry was
reversed, it was found that entering non-beliefs did not significantly increase R^.
Using data obtained from the 20 supervisors for whom both rating error and rater
accuracy data were available, it was found that neither halo (r = -0-13, ^ = 0-29) nor
leniency (r=-0-27, P=O13) correlated significantly with rater accuracy,
DISCUSSION
The results of this study suggest that rater beliefs about other people are likely to be
important correlates of both rater accuracy and leniency. Two factors combine to support
this conclusion. Looking at the correlation matrix, it can be seen that all of the significant
correlations between accuracy and leniency with rater characteristics involved rater
beliefs. Additional support for the importance of rater beliefs comes from the results ofthe
hierarchical multiple regression analyses which were used to determine the relative importance of rater beliefs as compared to the other rater characteristics included in this study.
The results of these analyses indicated that, for rater accuracy, only rater beliefs made a
significant incremental contribution to the amount of variance explained in accuracy. This
supports the relative importance of rater beliefs as compared to other types of rater
characteristics. Furthermore, the magnitude of the multiple R^ is indicative of the
significance of rater beliefs in explaining rater accuracy.
Our results indicate that raters who are lenient tend to have a positive general view
about the nature of other people, believing that others are altruistic, independent and
trustworthy. Specifically, these raters tend to agree with statements such as ‘ if you act in
good faith with people, almost all of them will reciprocate with fairness toward you’,
‘ most people can make their own decisions uninfluenced by public opinion’, and ‘ most
people are basically honest’, Conversely, these raters are likely to disagree with such statements as ‘ it’s pathetic to see an unselfish person in today’s world because so many people
take advantage of them’ and ‘ if you want people to do a job right, you should explain
things to them in great detail and supervise them closely’.
Theories which describe a rater’s cognitive processes when making performance
ratings (e.g. Landy & Farr, 1980; Wherry & Bartlett, 1982; Ilgen & Feldman, 1983) have
suggested that ratings are a function of several major components: the performance ofthe
ratee, the observation of that performance by the rater, and the recall and interpretation
of those observations. Using this theoretical framework, one could speculate that a rater’s
RATER BELIEFS ABOUT OTHERS 273
positive beliefs about other people in general could infiuence their observation of performance and/or their recall of and judgement about that performance. Through a process of
selective attention and selective memory, these raters might be more likely to observe and
remember behaviours that are consistent with their positive beliefs about others, thereby
resulting in more lenient ratings. Furthermore, their interpretation of observed behaviours
might be affected by their beliefs in such a way that more favourable attributions
regarding the causes of behaviour could be made by these types of raters.
The results of this research also showed that raters who believe in the variability of
people tend to rate others more accurately and less leniently. Apparently, raters who think
that people differ among themselves (i.e. believe in individual differences) and who think
that people can change across situations and time (i.e. believe in the dynamic nature of
behaviour) are better raters, perhaps because they are more likely to spread out their
ratings and, thus, be more accurate. In addition, these raters may spend more time observing the behaviour of subordinates rather than merely assuming that their behaviour in one
situation will be representative of their behaviour in other situations. This belief of raters
that they do not need to carefully observe subordinates’ behaviour is partially supported
by recent research by Favero & Ilgen (1983). They found that raters spend less time
observing the behaviour of ratees who are described by prototypical traits presumably
because such information led the rater to conclude that they possessed adequate information about the ratee and, hence, did not need to spend more time observing their
behaviour. Favero & Ilgen also found that these raters were less accurate. In a similar vein,
raters who believe less in the variability among ratees may conclude that they do not need
to carefully observe ratee behaviour and may consquently rate less accurately.
There have been recent warnings in the literature that rating errors and rater accuracy
do not necessarily covary negatively (Cooper, 1981). The present findings tend to support
these claims since the correlations between rating errors and rater accuracy were not
statistically significant. Thus, there was no direct support for the notion that supervisors
who commit halo and leniency errors will necessarily be less accurate.
The findings from this study should be viewed cautiously due to the limited sample
size and the fact that the measure of accuracy used was taken over a short time period and,
hence, did not require the long-term memory processes involved in more typical rating
situations. Nevertheless, the results are potentially important since, to date, no previous
studies have examined rater beliefs about other people as a correlate of either rater
accuracy or errors. The present results suggest that this rater characteristic needs to be
further examined in other types of organizational settings. If confirmed, these findings
have possible implications for developing more competent performance appraisal raters.
Perhaps those raters who possess positive beliefs about others and who do not see the
variability in people should be given more extensive training in accurately observing and
recalling behaviour and in recognizing individual differences. These findings may also
enhance our ability to select effective raters for managerial assessment centres.
REFERENCES
BAYROFF, A. G., HAGGERTY, H . R. & RUNDQUIST, E. (1954). Validity of ratings as related to rating
technique and conditions. Personnel Psychology, 7,93-114.
BERMAN, D . S. & KENNY, D . A. (1977). Correlation bias: Not gone and not to be forgotten. Journal of
Personality and Social Psychology, 35,882-887.
BERNARDIN, H . J. (1978). Effects of rater training on leniency and halo errors in student ratings of
instructors. Journal of Applied Psychology, 63,301-308.
BERNARDIN, H . J., CARDY, R. L. & CARLYLE, J. J. (1982). Cognitive complexity and appraisal effectiveness:
Back to the drawing board? Journal of Applied Psychology, 67,151-160.
BERNARDIN, H . J. & PENCE, E. C. (1980). Effects of rater training: Creating new response sets and
decreasing accuracy. Journal of Applied Psychology, 65,60-66.
274 KENNETH N. WEXLEY AND MARGARET A. YOUTZ
BiERi, J,, ATKINS, A, L,, BRIAR, S,, LEAMAN, R, L,, MILLER, H, & TRIPODI, T, (1966), Clinical and Social
Judgment. New York: Wiley,
BORMAN, W, C, (1975), Effects of instructions to avoid halo error on reliability and validity of performance
evaluation ratings. Journal of Applied Psychology, 60, 556-560,
BORMAN, W, C, (1977), Consistency of rating accuracy and rating errors in the judgment of human
performance. Organizational Behavior and Human Performance, 20,238-252,
BORMAN, W, C, (1979a), Format and training effects on rating errors and rating accuracy. Journal of
Applied Psychology, 64,410-421,
BORMAN, W, C, (19796), Individual difference correlates of accuracy in evaluating others’ performance
effectiveness. Applied Psychological Measurement, 3,103-115,
CASCIO, W, F, & VALENZI, E, R, (1977), Behaviorally anchored rating scores: Effects of education and job
experience of raters and ratees. Journal of Applied Psychology, 62, 278-282,
COOPER, W, H, (1981), Ubiquitous halo. Psychological Bulletin, 90,218-244,
Cox, J, A, & KRUMBOLTZ, J, D, (1958). Racial bias in peer ratings of basic airmen, Sociometry, 21,292-299,
CROOKS, L, A, (ed,) (1972), An Investigation of Sources of Bias in the Prediction of Job Performance: A
Six-Year Study. Princeton, NJ: Educational Testing Service,
FAVERO, J, & ILGEN, D , (1983), The effects of rater characteristics on rater performance appraisal behavior
(Report No, 83-5), Arlington, VA: Office of Naval Research,
FLEISHMAN, E, A, (1951), Leadership Climate and Supervisory Behavior. Columbus, OH: Personnel
Research Board, Ohio State University,
FLEISHMAN, E, A, (1969), Leadership Opinion Questionnaire. Chicago: Science Research Associates,
GRUENFELD, L. & ARBUTHNOT, J, (1969), Field independence as a conceptual framework for prediction of
variability in ratings of others. Perceptual and Motor Skills, 28,31-44,
HAMNER, W, C , KIM, J, S,, BAIRD, L, & BIGONESS, W, J, (1974), Race and sex as determinants of ratings by
potential employers in a simulated work sampling task. Journal of Applied Psychology, 59,705-711,
HENEMAN, R, L, & WEXLEY, K, N , (1983), The effects of time delay in rating and amount of information
observed on performance rating accuracy. Academy of Management Journal, 26,677-686,
ILGEN, D , R, & FELDMAN, J. M, (1983), Performance appraisal: A process approach. In L, L, Cummings &
B, M, Staw (eds). Research in Organizational Behavior, vol, 5, Greenwich, CT: JAI Press,
IVANCEViCH, J, M, (1979), Longitudinal study of the effects of rater training on psychometric error in
ratings. Journal of Applied Psychology, 64,502-508,
JACOBSON, M , B, & EFFERTZ, J, (1974), Sex roles and leadership: Perceptions of the leaders and the led.
Organizational Behavior and Human Performance, 12, 383-396,
JURGENSEN, C, E, (1950), Intercorrelations in merit rating traits. Journal of Applied Psychology, 34,
240-243,
KIRCHNER, W, K, & REISBERG, D , J, (1962), Difference between better and less effective supervisors in
appraisal of subordinates. Personnel Psychology, 15,295-302,
KLORES, M , S, (1966), Rater bias in forced-distribution ratings. Personnel Psychology, 19,411-421,
LANDY, F , J, & FARR, J, L, (1980), Performance rating. Psychological Bulletin, 87, 72-107,
LATHAM, G, P, & WEXLEY, K, N , (1977), Behavioral observation scales for performance appraisal
purposes. Personnel Psychology, 30,255-268,
MANDELL, M , M , (1956), Supervisory characteristics and ratings: A summary of recent research. Personnel
Psychology, 9,435-440,
PULAKOS, E, D , & WEXLEY, K, N , (1983), The relationship among perceptual similarity, sex, and
performance ratings in manager-subordinate dyads. Academy of Management Journal, 26,129-139,
ROBINSON, J, P, & SHAVER, P, R, (1973), Measures of Social Psychological Attitudes. Ann Arbor, MI:
Institute for Social Research,
ROSEN, B, & JERDEE, T, H, (1973). The infiuence of sex role stereotypes on evaluations of male and female
supervisory behavior. Journal of Applied Psychology, 57,44-48,
ROSENBERG, M , (1965), Society and the Adolescent Self-image. Princeton, NJ: Princeton University Press,
SCHNEIDER, D , E, & BAYROFF, A, G, (1953). The relationship between rater characteristics and validity of
ratings. Journal of Applied Psychology, 37,278-280,
SCHNEIER, C, E, (1977), Operational utility and psychometric characteristics of behavioral expectation
scales: A cognitive reinterpretation. Journal of Applied Psychology, 62,541-548,
SiLBER, E, & TiPPETT, J, (1965), Self-image stability: The problem of validation. Psychological Reports, 17,
323-329,
TAFT, R, (1955), The ability to judge people. Psychological Bulletin, 52,1-23,
RATER BELIEFS ABOUT OTHERS 275
TAYLOR, E. K., PARKER, J. W., MARTENS, L. & FORD, G . L. (1959). Supervisory climate and performance
ratings, an exploratory study. Personnel Psychology, 12,453-468.
TIFFIN, J. & LAWSHE, C. H . (1943). The Adaptability Test: A fifteen-minute mental alertness test for use in
personnel allocation. Journal of Applied Psychotogy, 27,152-163.
TIFFIN, J. & LAWSHE, C. H . (1967). The Adaptability Test. Chicago, IL: Science Research Associates.
TRIPODI, T. & BIERI, J. (1963). Cognitive complexity as a function of own and provided constructs.
Psychological Reports, 13,26.
WARMKE, D . L. (1980). Effects of accountability procedures upon the utility of peer ratings of present performance. (Doctoral dissertation, Ohio State University, 1979.) Dissertation Abstracts International, 40,
4011-B.
WEXLEY, K. N . & NEMEROFF, W . F . (1974). Effects of racial prejudice, race of applicant, and biographical
similarity on interviewer evaluations of job applicants. Journal of Social and Behavioral Sciences, 20,
66-78.
WEXLEY, K. N . & NEMEROFF, W . F . (1975). Effectiveness of positive reinforcement and goal setting as
methods of management development. Journal of Applied Psychology, 60,446-450.
WHERRY, R. J. Sr & BARTLETT, C. J. (1982). The control of bias in ratings: A theory of rating. Personnel
Psychology, 35, 52\-5Sl.
WRIGHTSMAN, L. (1964). Measurement of philosophies of human nature. Psychological Reports, 14,
743-751.
Received 4 January 1985: revised version received 16 June 1985

Don't use plagiarized sources. Get Your Custom Essay on
Their effects on rating errors and rater accuracy
Just from $13/Page
Order Essay
Still stressed from student homework?
Get quality assistance from academic writers!
error: Content is protected !!
Open chat
1
Need assignment help? You can contact our live agent via WhatsApp using +1 718 717 2861

Feel free to ask questions, clarifications, or discounts available when placing an order.

Order your essay today and save 30% with the discount code LOVE