برآورد پایایی و روایی درجه‌بندی کانون ارزیابی با بهره‌گیری از نظریه تعمیم‌پذیری

Fa | Ar | En

برآورد پایایی و روایی درجه‌بندی کانون ارزیابی با بهره‌گیری از نظریه تعمیم‌پذیری


نویسنده	سلیمی مجید ,مشتاقیان ابرقوئی حسن
منبع	مطالعات روان‌ شناسي صنعتي و سازماني - 1404 - دوره : 12 - شماره : 2 - صفحه:1 -20
چکیده	نتایج کانون ارزیابی و بهره‌برداری از این نتایج، وابستگی قطعی به پایایی و روایی درجه بندی های این کانون‌ها دارد. هدف این پژوهش، بررسی پایایی و روایی کانون ارزیابی یک سازمان دولتی با بهره‌گیری از نظریه تعمیم‌پذیری بود. نمونه آماری مشتمل بر داده‌های مربوط به 340 مدیر (ارزیابی شونده) بود که براساس پنج بعد شایستگی و توسط شش ارزیاب، درجه‌بندی‌شده بود و براساس طرحی کاملاً متقاطع و تصادفی انتخاب شدند و بنابراین براساس ابعاد (d)، ارزیاب/روش (e) و ارزیابی شوندگان (p) مورد مطالعه قرار گرفتند. براساس یافته‌ها، مقدار دو ضریب g نسبی و مطلق به‌ترتیب 0.80 و 0.73 به‌دست آمد. همچنین براساس تحلیل واریانس، مولفه‌های مرتبط با ارزیابی شوندگان، ابعاد و تعامل این دو، مجموعاً 58 درصد از واریانس کل را تبیین می‌کرد، درحالی‌که مولفه‌های مرتبط با ارزیابان (روش‌ها) و تعامل آن با افراد، تنها 6 درصد از واریانس کل را تبیین می‌کرد؛ بنابراین مطابق با رویکرد روایی همگرا-تشخیصی، داده‌های این مطالعه از روایی کافی برخوردار بود.
کلیدواژه	کانون ارزیابی، نظریه تعمیم‌پذیری، پایایی، روایی همگرا/تشخیصی
آدرس	دانشگاه علامه طباطبایی, ایران. شرکت ملی نفت ایران, مرکز سنجش و ارزیابی, ایران, دانشگاه علامه طباطبایی, ایران. آموزش و پرورش شیراز, ایران
پست الکترونیکی	kavir311@yahoo.com

estimating the reliability & validity of assessment center ratings using generalizability theory

Authors	salimi majid ,moshtaghian abarghoei hassan
Abstract	introductionover the past few decades, assessment centers have become a popular method of assessing behavior for recruitment, recognition, development around the world (krause, rossberger, dowdeswell, venter, joubert, 2011; lowry, 1997; thornton byham, 1982). this popularity can be attributed to several factors, including; multiple methods used in assessment centers have been shown to have little adverse effect to be able to predict a variety of performance measures (thornton rupp, 2006), to have high predictive validity coefficients ranging from 0.37 to 0.52 (gaugler, rosenthal, thornton bentson, 1987; rupp thornton gibbons, 2008; thornton gibbons, 2009), to demonstrate high criterion-related validity (schmitt, gooding, noe kirsch, 1984) while also having content validity (gaugler, rosenthal, thornton, bentson, 1987; iles, 1992). in addition, simulations, which are an essential part of assessment centers, are more accurate in assessing candidates’ job situations conditions (lievens schollaert, 2011; thornton rupp, 2006). although assessment centers are usually costly in terms of time, effort, money, their excellent predictive validity other strengths make them worthwhile (krause, et al., 2006). accordingly, there is general agreement that assessment center ratings predict job-relevant measures such as job performance or educational success (hermelin, lievens robertson, 2007; sackett, shewach, keiser, 2017). undoubtedly, the use of the results of the ratings of applicants in the assessment center is acceptable when the collected data have adequate reliability validity. without an acceptable level of reliability, the question of the validity appropriateness of the ratings is limited. on the other hand, without validity, the results of the assessment cannot be used (whetzel, wheaton, 2007). regardless of the purpose for which an assessment center is designed, evidence supporting validity in the main constructs measured is always relevant meaningful (thornton, rupp, hoffman, 2015). of course, it is generally assumed that content validity, criterion validity, construct validity their estimation are related to each other, it can be logically assumed that the existence of two of them necessarily proves the third (arthur, woehr, maldegen, 2000). therefore, if a center has content validity criterion validity, it can be logically assumed that this center also has construct validity. although research has shown the content criterion validity of assessment centers, the results regarding the construct validity of assessment centers have been less promising (thornton, mueller-hanson, 2004; lievens christiansen, 2012; jackson, michaelides, dewberry, nelson, stephens, 2022). accordingly, some researchers have focused on a variety of methodological approaches to address the construct validity problem of assessment centers. however, these efforts have failed to address the construct validity problem of assessment centers. some researchers have suggested eliminating dimensions from assessment center studies focusing only on exercises (jackson, barney, stillman, kirkley, 2007; neidig, neidig, 1984; dewberry, 2024). such a view is probably unacceptable because it contradicts the main purpose of assessment centers, which is to measure the position of assessees on various dimensions required for job performance (bowler, woehr, 2006, 2009). according to some researchers, the lack of clear separation of sources of variance has led to the wandering of efforts to assess the construct validity of assessment centers. to establish construct validity, both convergent discriminant validity are required (campbell, fiske, 1959). convergent validity is observed when scores on the same construct are consistent despite the use of different measurement methods (e.g., different raters, multiple measures, etc.) (kraiger teachout, 1990; marcoulides, 1989). discriminant validity is observed when individuals are differentially rated on different trait measures of the measure (kraiger teachout, 1990). generalizability theory provides a way to examine both types of convergent/discriminant validity(kraiger teachout, 1990 winfred, woehr, maldegen, 2000). the main objective of this study was to use generalizability theory to examine the reliability convergent/diagnostic validity of ratings from a government organization’s assessment center. methodto study generalizability, data on 340 managers, who were rated on a 9-point scale (1 to 9) based on five competency dimensions (planning, leadership, problem recognition, innovation, decision-making) by six raters using different methods (group exercise, group discussion, interview, questionnaire), were extracted from the organization’s manager competency database analyzed based on a completely randomized crossed design with 5 dimensions (d), 6 methods (e), 340 managers (p) using the edug program (cardinet, johnson, pini, 2010). in the present study, variability due to individuals does not contribute to the formation of error variance because individuals are the objects being measured are expected to appear differently; therefore, a sufficiently large variance component for individuals indicates convergent validity (kraiger teachout, 1990; marcoulides, 1989), the degree of consistency in the ratings of managers by different raters provides evidence of discriminant validity (kraiger teachout, 1990). resultsthe variance component estimates showed that the main effect of dimensions accounted for 13.4% of the variance. this is evidence that performance varies across individuals in terms of the dimensions measured. while the value attributed to raters (methods) is about 0.3%, indicating that ratings made by raters vary very little across measurement methods (group exercises, interviews). the highest percentage of variance in the main effects belonged to the individual factor, so that this effect accounted for 29% of the total variance, which indicates that a high proportion of the variance in ratings is due to differences between individuals in terms of the competencies measured in this design. in other words, it can be said that the center’s evaluators generally acted differently in evaluating managers in terms of the competencies measured rated their performance differently, or that managers were different in terms of the competency under study. since in this study, managers constitute the object of measurement not the error, such variability is desirable. finally, a little more than 36% of the total variance is related to the residual or address. that is, the proportion of variance that is accounted for by the three-way interaction of individuals, raters (methods), dimensions, or other systematic or unsystematic sources of variability that are not measured in this design. therefore, there are other important sources of variance that are not accounted for by differences between managers, differences between raters (methods), differences between dimensions, or their two-way interaction, are considered undesirable in this design. in accordance with the convergent-diagnostic validity approach using the components of generalizability theory (kraiger teachout, 1990; marcoulides, 1989; winfred, et al., 2000), examination of the estimated variance components for the main effects interaction effects yielded a relatively clear pattern in support of the validity of the ratings of the studied dimensions. the main effect related to dimensions accounted for a relatively high percentage of the total variance. this indicates that performance varies along the dimensions under study; also, the interaction effect between dimensions individuals explained slightly more than 15% of the total variance. the variance associated with this interaction indicates the extent to which individuals’ performance is segregated different along the dimensions. this component essentially indicates the discriminant validity across dimensions; therefore, performance along the dimensions has a relatively high variability; that is, as expected, the ratings well reflect the latent individual differences in the dimensions under study (evidence of diagnostic validity). also, the findings showed that 29% of the variance in the total ratings is due to the main effect of individuals. this effect indicates that the evaluators acted differentially in evaluating the performance of the evaluation center managers. discussionaccording to the findings, the differential variance estimate (0.44) is much higher than the sum of the relative error variances (0.11) or the sum of the absolute error variances (0.16) of the instrumental aspects. also, the values of the two relative absolute g coefficients were obtained as 0.80 0.73, respectively. the value of the relative coefficient is around the conventional value of 0.8 (cardinnet, 2010). based on this coefficient, 80% of the observed variance in the average ratings is attributed to the true score 20% is related to the interaction between individuals × dimensions × rater unspecified aspects that reduce the accuracy of individual results; therefore, in general, the rating methods measurement accuracy with this number of measurement dimensions method/rater are relatively satisfactory can provide a reliable rating of individuals’ performance.
Keywords	assessment center ,generalizability theory ,reliability ,convergent/diagnostic validity