HomeWVSU Research Journalvol. 13 no. 1 (2024)

Assessing the Quality of Multiple-Choice Test Items Based on BILOG-MG, R (ltm), IATA, and GSP-ROC

Nguyen Phuoc Hai | Nguyen Van Canh | Trinh Thi Kim Binh

 

Abstract:

The purpose of this study is to analyze, assess, and select 50 multiple-choice items of English 1 course for the final test of 876 students at the university based on item analysis software: BILOG-MG, R (ltm package), IATA, and the combination of GSP chart and ROC method (GSP-ROC). The research results show that multiple-choice items are satisfactory and are eligible to be used in the test and the unsatisfactory multiple-choice items need to be reviewed for adjustment and improvement. The combined use of multiple software to analyze, assess, and select multiple-choice items is necessary to improve the quality of multiple-choice items. This not only contributes to improving the quality of testing and assessing learners, but also contributes to improving the quality of teaching and learning in the current period at universities.



References:

  1. Baker, F. B. (2001). The basics of item response theory. Education Resources Information Center (ERIC).
  2. Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. Statistical Theories of Mental Test Scores.
  3. Bui, A. K., & Bui, N. P. (2018). Using IATA to analyze, assess and improve the quality of the multiple-choice questions in chapter power functions, exponential functions and logarithmic functions. Can Tho University Journal of Science, 54(9), 81–93.
  4. Bui, N. Q. (2017). Assessment of the quality of multiple choice test bank for the module of Introduction to Anthropology by using the RASCH model and QUEST software. Science  of Technology Development, 20(X3), 42–54.
  5. Cartwright, F. (2007). IATA 3.0 Item and Test Analysis: A software tutorial and theoretical introduction.
  6. Doan, H. C., Le, A. V., & Pham, H. U. (2016). Applying 3-parameter logistic model in validating the level of difficulty, discrimination and guessing of items in a multiple choice test. Ho Chi Minh city University of Education Journal of Science, 7 (85), 174-184.
  7. Du Toit, M. (2003). IRT from SSI: Bilog-MG, Multilog, Parscale, Testfact. Scientific Software International.
  8. Foster, R. C. (2021). KR20 and KR21 for some nondichotomous data (it’s not just Cronbach’s alpha). Educational and Psychological Measurement, 81(6), 1172-1202. https://doi.org/10.1177/001316442199253.
  9. Kumar, R., & Indrayan, A. (2011). Receiver operating characteristic (ROC) curve for medical researchers. Indian Pediatrics, 48, 277–287. https://doi.org/10.1007/s13312-011-0055-4.
  10. Lam, Q. T. (2011). Measurement in Education - Theory and Application. Hanoi: Vietnam National University Publishing House.
  11. Nguyen,  P.  H.  (2016).  Using  GSP  chart  and  ROC  method  to  analyze  mutiple-choice  test  items and  assess  learning  outcomes  of  students. Journal  of  Education  Science,  Vietnam Institute of Education Science, 134(11), 32–37.
  12. Nguyen,  P.  H.  (2017).  Using  GSP  chart  and  ROC  method  to  analyze  and  select  mutiple-choice test items. Dong Thap University Journal of Science, 24(2), 11–17. https://doi.org/10.52714/dthu.24.2.2017.426.
  13. Nguyen P. H., & Du, T. N. (2014). Assessing the rating results and predicting students’ learning outcomes based on grey relational analysis and grey model. Can Tho University Journal of Science, 32, 43–50.
  14. Nguyen, P. H., & Du, T. N. (2015). The analysis and selection of mutiple-choice test items based on S-P chart, Grey Relational Analysis, and ROC curve. Ho Chi Minh city University of Education Journal of Science, 6 (72), 163.
  15. Nguyen,  P.  H.,  &  Trinh,  T.  K.  B.  (2017).  Assessment  of  students’  learning  outcomes  a combination  of  GSP  chart  and  ROC  method. AGU  International  Journal  of  Sciences, 17(5), 103–112.
  16. Nguyen, P. H., & Trinh, T. K. B. (2022). Assessment of Students’ Learning Outcomes in Higher Education. International  Journal  of  Uncertainty  and  Innovation  Research, 4(1),  39–52. https://doi.org/10.34238/tnu-jst.5554.
  17. Nguyen,  V.  C.,  &  Nguyen,  P.  H.  (2020).  Analyzing  and  Selecting  Multiple-Choice  Test  Items Based on Classical Test Theory and Item Response Theory. Ho Chi Minh city University of Education Journal of Science, 17(10), 1804-1818.
  18. Pham, T. M., & Bui, Đ. N. (2019). The IATA software for analyzing, evaluation of multiple-choice  questions  at  Ha  Noi  Metropolitan  University. Scientific  Journal  of  Ha  Noi Metropolitan University, 20, 97–108.
  19. Rasch,  G.  (1993). Probabilistic  models  for  some  intelligence  and  attainment  tests.Education Resources Information Center (ERIC).
  20. Rizopoulos, D. (2006). ltm: An R package for latent variable modelling and item response analysis. Journal of Statistical Software, 17(5), 1–25. https://doi.org/10.18637/jss.v017.i05
  21. Tavakol,  M.,  &  Dennick,  R.  (2012).  Standard  setting:  The  application  of  the  receiver  operating characteristic   method. International   Journal   of   Medical   Education,3,   198–200). https://doi.org/10.5116/ijme.506f.1aaa.