

ORIGINAL ARTICLE 

Year : 2016  Volume
: 19
 Issue : 1  Page : 5865 

Graphical modeling for item difficulty in medical faculty exams
L Tomak^{1}, Y Bek^{1}, MA Cengiz^{2}
^{1} Department of Biostatistics and Medical Informatics, Faculty of Medicine, Ondokuz Mayis University, Atakum, Samsun, Turkey ^{2} Department of Statistics, Faculty of ArtsScience, Ondokuz Mayis University, Atakum, Samsun, Turkey
Date of Acceptance  26Jun2015 
Date of Web Publication  12Jan2016 
Correspondence Address: L Tomak Department of Biostatistics and Medical Informatics, Faculty of Medicine, Ondokuz Mayis University, Atakum, Samsun Turkey
Source of Support: None, Conflict of Interest: None  Check 
DOI: 10.4103/11193077.173701
Abstract   
Background: There are different indexes used in the evaluation of exam results. One important index is the difficulty level of the item that is also used in this study to obtain control charts. This article offers some suggestions for the improvement of multiplechoice tests using item analysis statistics. Materials and Methods: The graphical modeling is important for the rapid and comparative evaluation of test results. The control chart is a tool that can be used to sharpen our teaching and testing skills by inspecting the weaknesses of measurements and producing reliable items. The research data for the application of control charts were obtained using the results of the fourth and fifthgrade student's exams at Ondokuz Mayis University, Faculty of Medicine. Ichart or moving range chart (MR) is preferred for whole variable data. Results: It is seen that all observations are within control limits for Ichart, but three points on MRchart are settled on the LCL. Using X–chart with subgroups, it was determined that control measurements were within the upper and lower limits in both charts. The difficulty levels of items were examined by obtaining different variable control charts. The difficulty level of the two items exceeded the upper control limit in R and Scharts. Conclusion: The control charts have the advantage for classifying items as acceptable or unacceptable based on item difficulty criteria. Keywords: Item difficulty, quality control, statistical process control, variable control charts
How to cite this article: Tomak L, Bek Y, Cengiz M A. Graphical modeling for item difficulty in medical faculty exams. Niger J Clin Pract 2016;19:5865 
Introduction   
Statistical process control (SPC) is an important technique to assess a production process through control charts. Control charts are developed to monitor SPC and to determine variations in the production process.^{[1]}
Shewhart developed the first type of chart in 1920. Initially, control charts were used in order to solve industrial problems.^{[2]} After Shewhart, Levey and Jennings used control chart and SPC in clinical laboratory in 1950. The chart of Levey Jennings is widely preferred in clinical laboratory nowadays. This chart uses individual values, and it is plotted according to the reference value.^{[3],[4]}
Control charts are classified according to whether they monitor attribute (discrete) data or variable (continuous) data. Attribute characteristics are used as binary data that are “conformingnot conforming,” “goodbad” or “defectivenondefective” in quality control.^{[5],[6]}
Some control charts monitor variability of characteristics. The characteristics are measured as variable data for variable control charts.^{[7]} Control charts for variables include charts for individual measurements; charts for variability or dispersion of several measures, such as the range and the standard deviation, and charts for central tendency of several measures, such as averages, medians, and midranges. Many applications include the companion use of a chart for variability and a chart for central tendency.^{[8]}
The principles of creating both variable control charts and attribute control charts are same. The aim is to determine the mean, the standard deviation, and the distance between the mean and the control limits based on the standard deviation. Control charts for variable characteristics are often used, and they are very effective in providing feedback about the performance of the process. These graphs can be classified in different ways for the individual measurement, for subgroups and for different combinations of subgroups.^{[2]}
The value of each measurement in control charts for individual observations is shown by a single point on the chart. There are three different charts for individual observations. They are individual observations (I) chart, moving range (MR) chart, and IMR chart that are obtained using both of them.^{[9]}
Other control charts for variable characteristics are preferred if there are existing subgroups. These charts are respectively average (X̄ chart, range (R) chart, standard deviation (S) chart, X̄R chart that consists of using a combination of X̄ and R and X̄ S Chart that consists of using a combination of X̄ and S. In addition, there is IMRR/Schart that is obtained with the combination of I, MR and R, or S.^{[5]}
The exam results were evaluated using the following control charts in this study. They are individual observations (I) chart, MR chart, and IMR chart, average (X̄ chart, Rchart, standard deviation (S) chart.
The production process to improve individual test items and to increase the quality of the test as a whole is evaluated through these control charts. When statistical process is an “in control” process, the test results are explained as a “good” item, but if statistical process is not an “in control” process, the test item identified need to be revised or thrown out; we may want to find out why that particular item had problems producing reliable measurements.
The purpose of this study is to offer some suggestions for the improvement of multiplechoice tests using item analysis statistics and control charts that are mainly used to control a manufacturing process. The control charts will allow us to identify outliers among parts of items. Inspecting that particular part may give us some insights into the weaknesses in our measurements. As a result, the statistical processes of these examinations were evaluated by plotting different variable control charts.
Materials and Methods   
The data for the application of control charts were obtained using the results of the fourth and fifthgrade student's exams at Ondokuz Mayis University, Faculty of Medicine.
There are different indexes used in the evaluation of exams. The difficulty level was used to obtain control charts for this study. Item difficulty is a measure of the percentage of students answering a question correctly. Values for the difficulty index range from 0% (very difficult) to 100% (very easy). The difficulty index can be used to alert the instructor about potential problems. The level of difficulty is considered as between 0.2 and 0.8 for the acceptability of a problem.^{[10]}
We chose all of fourthgrade blocks which consist of 12 examination results; we also chose randomly one block among all blocks of fifthgrade class which consists of 20 questions results. Sensory block was recruited for the whole fifth class. The difficulty levels of 20 questions were evaluated for five different training groups. Because the number of training groups is five, the number of subgroups is accepted as 5.
We used control charts for individual observations of fourthgraders examination results of whole class, we illustrated the individual observation graphs for endocrine, gastroenterology, pediatrics, and pulmonary circulation blocks on the same graphics that belong to fourth grade class. We also used all of X̄
chart, Rchart and Schart for examination of sensory block of fifth grade class to see control charts of subgroups.
The analysis of examination was evaluated by control charts. MINITAB Statistical Software, Release 15 for Windows was used to get variable control charts.^{[11]}
Control charts for individual observations
Individual observation chart
The individual observation chart is the most basic variable control chart used for individual measurements of a variable type characteristic.^{[7]} For this chart, the measurable characteristics are selected. The frequencies of obtained measurements are determined.
The control limits of this chart are calculated according to the formula1, where “k” indicates the coefficient of the desired standard deviation. The arithmetic mean (X̄) is generally used to estimate the mean of the process characteristic. The equality for k = 3 is as follows:^{[8],[12]}
The MR is used to estimate the standard deviation (σ). The MR is defined as the absolute value of the difference between the largest and smallest observations in consecutive subsets of “n” observations. So the MRs and the average are computed as:^{[1],[6]}
When MR is used to estimate the standard deviation, derivation of control limits for the individual observations chart makes use of the relationship between standard deviation and ranges. The constant of proportionality is referred to as d_{2}.^{[6]} The standard deviation is obtained as . Control limits are derived as follows:^{[9]}
Where d_{2}, as a function of the subset size “n,” is tabulated in special table (Montgomery, 1996). For n = 2, d_{2} is usually equal 1.128, and the coefficient is 3/1.128 = 2.66. So control limits are simplified as follows:^{[1],[12]}
If both µ and σ are known, they can be used directly. If only σ is known, X̄ is calculated but σ can be used directly. Control limits are determined accordingly:^{[6],[9]}
Moving range chart
The MR is also used to estimate the standard deviation (s) for MRchart.^{[1],[5]}
Once all of MR values are obtained, the average of MR shows center line. Control limits are as follows:^{[8]}
The following equations (D_{3}, D_{4}) are used to simplify the formulas of the control limits:^{[1],[13]}
Where both D_{3} and D_{4}, as a function of the subset size for “n,” are tabulated in specific table. So control limits are shown as follows:^{[6],[9]}
IMRchart
IMRchart is created using both Ichart and MRchart.^{[7]}
Control charts for subgroups
Average (X̄) chart
General principle for the creation of control limits of X̄chart is µ ± 3σ for cases given standard and for cases with no standard or
When standard value is unknown, the mean average value is calculated. In order to determine , should be calculated for each subgroup (k). creates value of CL:^{[12],[13]}
The next step is to calculate UCL and LCL:^{[8],[9]}
To obtain control limits, the value of standard deviation must be determined for the population. This can be determined in several ways. The first option is to use the standard error estimate σ/√n, and another option is to use the mean range.^{[1]}
Standard errorbased X̄chart
Based on the central limit theorem, to obtain control limits, standard deviation of the process is divided by the square root of the sample's size:^{[5]}
When the value of σ is estimated by S/c_{4} for example, control limits are created as follows:^{[7]}
Control limits can be simplified by the equation of A_{3}:^{[6],[7]}
Mean rangebased X̄chart
For the data to have normal distribution, there is a special relationship between the range and the standard deviation as follows:^{[8]}
R is called the relative range. The mean range is:^{[1]}
'k' is the number of subgroups. The estimator of σ is
The estimator of σ/√n is
Control limits are:^{[5]}
When the abbreviation of A_{2} is used, control limits are:^{[7],[13]}
If both m_{0} veya σ_{0} are known, control limits will be as follows:^{[5]}
The value of A is found using a special table that shows the different values of A for different subgroup sizes.^{[6]}
RChart
Primarily, a measurable feature of the process is selected for the creation of Rchart. The range value for each subgroup is calculated. The mean range is detected using the range values for subgroups.^{[5],[6]}
The center line of Rchart is the mean range that is shown by R̄.^{[1],[12]}
The equation of s_{R} = d_{3}.σ is used to calculate the standard deviation of the mean range. Because, the equation of σ_{R}= d_{3}.σ can be written as follows:^{[1]}
Both of the equations D_{3} and D_{4} are used to obtain control limits:^{[5],[13]}
When both of the equations D_{3} and D_{4} are used, the control limits will be as follows:^{[7]}
If the value of σ is known, the control limits are as follows:^{[5]}
Both of the equations D_{1} and D_{2} are used according to the size of subgroups.^{[6]}
Schart
A measurable feature of the process is used for Schart, too. The value of standard deviation of every group is detected.^{[8]}
When neither μnorσ is known, standard deviation (S) is calculated using variance (S ^{2}) of sample:^{[2]}
S̄ is the average standard deviation for “k” groups. When S is used the estimate of σ, the biased results appear. To prevent this situation, it is recommended to use c_{4}.σ. Here, c_{4} is a constant value associated with sample size. If the equation of is considered, s will be equal to The expected value of S [E(S)] will be equal to c_{4}.σ, which is center line. The standard deviation of S is ^{[5],[7]}
When is used as the estimate ofs, the control limits are:^{[2]}
The equations of both B_{3} and B_{4} are given below:^{[7]}
The control limits can be shown to use both B_{3} and B_{4}:^{[2]}
If both μ and σ are known, the control limits are:^{[6]}
The equations of both B_{5} and B_{6} are given as follows:^{[5]}
When these equations are used, control limits will be as follows:^{[7]}
Control charts for different combinations of subgroups
X̄Rchart consists of a combination of X̄ and R. X̄Schart consists of a combination of X̄ and S. IMRR/Schart is the combination of I, MR and R or S.^{[1],[2]}
Results   
The variable control charts were created to evaluate item difficulty level derived from exam analysis.
The control charts for individual observations are as follows: Ichart is given in [Figure 1] and MRchart is given in [Figure 2]. There were four separate item groups and exam blocks. There are 12 points on these charts. The first four points refer to difficulty level of endocrine block items for every group (four groups). The second four points refer to difficulty level of gastrology block items for every group (four groups). The thirdfour points refer to difficulty level of pediatrics block items for every group (four groups). The last four points refer to difficulty level of pulmonary circulation block items for each group (four groups).
All of the data are within control limits on Ichart [Figure 1].
All the observations are within the control limits on MRchart. However, the points 8, 12, and 13 are settled on the LCL [Figure 2].
Standard errorbased X̄chart is given in [Figure 3]. All of the observations are within control limits and the variations exhibit a random pattern, so the process is stable and under control.
Mean rangebased X̄chart is shown in [Figure 4]. Because all of the data are settled within control limits, we can conclude that the process is under control.
Schart is given in [Figure 5]. As two points, 6 and 20 are outside UCL, statistical process is out of control.
Rchart is shown in [Figure 6]. Two points, 6 and 20, are outside UCL on this chart, so the process is unstable.
Discussion   
Six different variable control charts are used for continuous data, and the quality of item analysis process is evaluated. Here, difficulty levels of items were examined by obtaining different variable control charts.
The graphical modeling is important for the rapid and comparative evaluation of test results. The control chart is just one more tool we can use to sharpen our teaching and testing skills. The control charts will allow us to identify outliers among parts of items. Inspecting that particular unstable part may give us some insights into the weaknesses in our measurements.
The control charts have the advantage for quick summaries of various aspects of the quality of the items of exams. Simply classify items as acceptable or unacceptable based on item difficulty criteria. Also, this type of charts tend to be more easily understood by lecturer or other related people unfamiliar with quality control procedure or other statistical quality control index expectations.
The item difficulty data has normal distribution. So it is assumed that 68% of the values fall in the interval ± 1.S; 95% of the values fall in the interval ± 2.S; 99.7% of the values fall in the interval ± 3.S.^{[14],[15]} If a value is distant ± 3.S from the average value, this is an unexpected observation (0.3% randomly).^{[16]} So this usually indicates a problem with the process. A measurement value that is distant ± 2.S from the average value may occur at least 5% randomly.^{[17],[18],[19]}
When there are not any subgroups, Ichart or MRchart is preferred for variable data. Both Ichart and MRchart were created to analyze item difficulty of fourthgrade exam. When these are evaluated, it is seen that all observations are within control limits, but three points on MRchart are settled on the LCL. However, the processes are assumed to be in control for both charts.
In Ichart, the mean difficulty index was found to be 0.7738. This indicates that mean difficulty level was high, and it is a proof that the questions were easy. However, it was determined that values of all the blocks were within the desired limits. While the distribution for most of the observations was close to the mean difficulty level, it was determined that the questions of gastrology items block of the second group were quite easy (0.90) and the questions of pediatric items block of the first group were more difficult (0.63).
MRchart is a chart that demonstrates the variability. Although it is within the control limits, there is a variation on this chart originating from the first items group. This difference is specifically remarkable in pediatric and pulmonary circulation training groups (internships).
Ichart is the simplest chart among all of control charts. This chart is preferred to get a result in a short time. It is particularly preferred when using X̄, R and S control charts is not practical. It is much more sensitive in detecting significant variations (variances) in cases where the sample size to be determined by the binormal distribution is very large.^{[7],[20]} MR is generally used to evaluate the mean variability of the difficulty of item. If we do not take the subgroups into consideration, one or both of these charts can be preferred.^{[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17],[18],[19],[20],[21]}
Other variable charts used in the presence of subgroups were obtained in this study. First of them is X̄chart. In cases with unknown standard values, for the estimation of σ, this chart was generated in two different ways, using standard error and mean range, and it was determined that control measurements were within the upper and lower limits in both charts. So we can say that the process is “incontrol.” The mean value was detected to be 0.86 in both charts. This indicates that the questions were easy.
X̄chart is the most commonly used control chart for the central tendency since it is highly sensitive in detecting the variations related to the distribution of the mean.^{[12],[22]} Standard Errorbased X̄chart is not very practical. Instead of standard error, using the mean range as the measurement of variation, an alternative mean rangebased X̄chart can be created more easily. When sample sizes are relatively small (n ≤ 10), the mean rangebased X̄chart can replace standard errorbased X̄
chart. Since the number of the subgroups in this study is five, it would be better to use the mean rangebased X̄chart.^{[8],[23]}
In the charts obtained in this study, it was found out that difficulty level values of the two questions exceeded the upper control limit in R and Scharts. Variability in the process was outside the determined limit at two points. Rchart is one of the most widely used control charts in variation measurement. The chart is popular because it is easy to calculate the mean range. Its effectiveness in estimating the variation of the process is better when the number of subgroups is smaller. Schart is used to determine if the level of variation in the process is significant or not. Thus, control limits of the standard deviations related to the samples taken at regular intervals are created in this chart. When there is a strong variation within the established limits, the process is considered to be unstable.^{[1],[2],[24]} Since the number of subgroups in this study were five, Rchart was preferred to Schart.
Conclusion   
Control charts are effective instruments in detecting the specific causes of the variability of the difficulty of items in exams. If some points are outside the control limits, it indicates that the item is out of control; it could be either easy or difficult. The balance of the construction of items of exam has been deteriorated, and corrective operations should be performed. As soon as deterioration occurs in the process, the factor leaving the present observation outside the limits (the factor pushing the present observation outside the limits) should be detected. It does help identify questions that need to be revised or thrown out. The tests can be improved by maintaining and developing a pool of “good” items from which future tests will be drawn in part or whole.
References   
1.  Bass I. Six Sigma Statistics with Excel and Minitab. USA: McGrawHill Company; 2007. 
2.  Winkel P, Zhang NF. Statistical Development of Quality in Medicine. England: John Wiley and Sons Press; 2007. 
3.  Westgard JO. Basic QC Practices. 2 ^{nd} ed. Madison: Westgard QC Inc.; 2002. 
4.  Westgard JO, Groth T. Design and evaluation of statistical control procedures: Applications of a computer “quality control simulator” program. Clin Chem 1981;27:153645. 
5.  Montgomery DC. Introduction to the Statistical Quality Control. 3 ^{rd} ed. New York: John Wiley and Sons Inc.; 1996. 
6.  Oakland JS. Statistical Process Control. 5 ^{th} ed. London: MPG Books Limited; 2003. 
7.  Chandra MJ. Statistical Quality Control. USA: CRC Press; 2001. 
8.  Wadsworth HM, Stephens K, Godfrey AB. Modern Methods for Quality Control and Improvement. New York: John Wiley and Sons Inc.; 1986. 
9.  Ryan TP. Statictical Methods for Quality Improvement. New York: John Wiley and Sons Inc.; 1986. 
10.  Backhoff E, Larrazolo N, Rosas M. The level of difficulty and discrimination power of the basic knowledge and skills examination (EXHCOBA). Revista Electrónica de Investigación Educativa 2000;2:116. 
11.  Minitab Inc. MINITAB Statistical Software, Release 15 for Windows. State College, Pennsylvania; 2009. 
12.  Kume H. Statistical Methods for Quality Improvement. 10 ^{th} ed. Tokyo: 3A Corporation; 1992. 
13.  Ishikawa K. Guide to Quality Control. Tokyo: Asian Productivity Organization; 1986. 
14.  Westgard JO, Barry PL, Hunt MR, Groth T. A multirule Shewhart chart for quality control in clinical chemistry. Clin Chem 1981;27:493501. 
15.  Gray JJ, Wreghitt TG, McKee TA, McIntyre P, Roth CE, Smith DJ, et al. Internal quality assurance in a clinical virology laboratory. I. Internal quality assessment. J Clin Pathol 1995;48:16873. 
16.  Westgard JO, Klee GG. Qualitymanagement. In: Burtis CA, Ashwood ER, Bruns DE, editors. Tietz Textbook of Clinical Chemistry and Moleculer Diagnostics. 4 ^{th} ed. New York: Elsevier Press; 2006. p. 485523. 
17.  Cembrowski GS, Martindale RA. Qualitycontrolandstatistics. In: Bishop ML, Fody EP, Schoeff LE, editors. Clinical Chemistry: Principles, Procedures, Correlations. 4 ^{th} ed. Philadelphia: Lippincott Williams and Wilkins; 2004. p. 4889. 
18.  Goris N, De Clercq K. Quality assurance/quality control of foot and mouth disease solid phase competition enzymelinked immunosorbent assay – Part II. Quality control: Comparison of two charting methods to monitor assay performance. Rev Sci Tech 2005;24:100516. 
19.  Taghizadegan S. Essentials of Lean Six Sigma. USA: Elsevier Press; 2006. 
20.  Lenz HC, Wilrich TH. Frontiers in Statistical Quality Control. Germany: German Copyright Law; 2006. 
21.  Koch DD, Oryall JJ, Quam EF, Feldbruegge DH, Dowd DE, Barry PL, et al. Selection of medically useful qualitycontrol procedures for individual tests done in a multitest analytical system. Clin Chem 1990;36:2303. 
22.  Thompson JR, Koronacki J. Statistical Process Control. 2 ^{nd} ed. USA: Chapman and Hall Press; 2002. 
23.  Juran JM, Godfrey AB. Juran's Quality Handbook. 5 ^{th} ed. New York, USA: McGrawHill; 1999. 
24.  Revere L, Black K. Integrating six sigma with total quality management: A case example for measuring medication errors. J Healthc Manag 2003;48:37791. 
[Figure 1], [Figure 2], [Figure 3], [Figure 4], [Figure 5], [Figure 6]
