Akhavan Masoumi, G., & Sadeghi, K. (2020). Impact of test format on vocabulary test performance of EFL learners: the role of gender. Language Testing in Asia, 10(1), 2.
Alemi, M., & Rezanejad, A. (2014). Native and non-native English teachers' rating criteria and variation in the assessment of L2 pragmatic production: The speech act of compliment.
Issues in Language Teaching,
3(1), 88-65.
https://ilt.atu.ac.ir/article_1374.html
Anthony, C. J., Styck, K. M., Volpe, R. J., & Robert, C. R. (2023). Using many-facet Rasch measurement and generalizability theory to explore rater effects for direct behavior rating–multi-item scales.
School Psychology,
38(2), 119 –128..
https://doi.org/10.1037/spq0000518
Azizi, Z., & Namaziandost, E. (2023). Implementing Peer-Dynamic Assessment to Cultivate Iranian EFL Learners' Interlanguage Pragmatic Competence: A Mixed-Methods Approach. International Journal of Language Testing, 13(1), 18-43.
Bachman, L. F., & Palmer, A. S. (1982). The construct validation of some components of communicative proficiency.
TESOL Quarterly,
16(4), 449-465.
https://doi.org/10.2307/3586464
Bardovi-Harlig, K., & Hartford, B. S. (1993). Learning the rules of academic talk: A longitudinal study of pragmatic change.
Studies in Second Language Acquisition,
15(3), 279-304.
https://doi.org/10.1017/S0272263100012122
Bardovi-Harlig, K., & Su, Y. (2023). Developing an empirically-driven aural multiple-choice DCT for conventional expressions in L2 pragmatics.
Applied Pragmatics,
5(1), 1-40.
https://doi.org/10.1075/ap.20020.bar
Billmyer, K., & Varghese, M. (2000). Investigating instrument-based pragmatic variability: Effects of enhancing discourse completion tests.
Applied Linguistics,
21(4), 517-552.
https://doi.org/10.1093/applin/21.4.517
Brown, J. D. (2001). Six types of pragmatics tests in two different contexts. In K. Rose & G. Kasper (Eds.), Pragmatics in Language Teaching (pp.301-325). New York: Cambridge University Press.
Brown, J. D. (2008). Raters, functions, item types, and the dependability of L2 pragmatics tests. Investigating pragmatics in foreign language learning, teaching and testing, 30, 224-48.
Chapelle, C. A., Cotos, E., & Lee, J. (2015). Validity arguments for diagnostic assessment using automated writing evaluation.
Language testing,
32(3), 385-405.
https://doi.org/10.1177/0265532214565386
Cardinet, J., Johnson, S., & Pini, G. (2011). Applying generalizability theory using EduG. Routledge.
Chen, Y. S., & Liu, J. (2016). Constructing a scale to assess L2 written speech act performance: WDCT and e-mail tasks. Language Assessment Quarterly, 13(3), 231-250. https://doi.org/10.1080/15434303.2016.1213844
Cordier, R., Munro, N., Wilkes-Gillan, S., Speyer, R., Parsons, L., & Joosten, A. (2019). Applying Item Response Theory (IRT) modeling to an observational measure of childhood pragmatics: The pragmatics observational measure-2.
Frontiers in Psychology,
10, 408.
https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2019.00408/full
Cohen, A. D. (2020). Considerations in assessing pragmatic appropriateness in spoken language. Language Teaching, 53(2), 183-202.
Derakhshan, A., Shakki, F., & Sarani, M. A. (2020). The effect of dynamic and non-dynamic assessment on the comprehension of Iranian intermediate EFL learners' speech acts of apology and request.
Language Related Research,
11(4), 605-637.
https://lrr.modares.ac.ir/article-14-40648-en.html.
Engelhard Jr, G., & Wind, S. (2017). Invariant measurement with raters and rating scales: Rasch models for rater-mediated assessments. Routledge.
Farashaiyan, A., Sahragard, R., Muthusamy, P., & Muniandy, R. (2020). Questionnaire development and validation of interlanguage pragmatic instructional approaches & techniques in EFL contexts.
International Journal of Higher Education,
9(2), 330-342.
https://eric.ed.gov/?id=EJ1255710.
Fussman, S., & Mashal, N. (2022). Initial validation for the Assessment of Pragmatic Abilities and Cognitive Substrates (APACS) Hebrew battery in adolescents and young adults with typical development.
Frontiers in Communication,
6, 758384.
https://doi.org/10.3389/fcomm.2021.758384
Dabbagh, A., & Babaii, E. (2021). L1 pragmatic cultural schema and pragmatic assessment: Variations in non-native teachers' scoring criteria.
TESL-EJ,
25(1), 1-17.
https://eric.ed.gov/?id=EJ1302438.
Gordon, R. A., Peng, F., Curby, T. W., & Zinsser, K. M. (2021). An introduction to the many-facet Rasch model as a method to improve observational quality measures with an application to measuring the teaching of emotion skills.
Early Childhood Research Quarterly,
55, 149-164.
https://doi.org/10.1016/j.ecresq.2020.11.005
Han, C. (2021). Detecting and measuring rater effects in interpreting assessment: A methodological comparison of classical test theory, generalizability theory, and many-facet rasch measurement. Testing and Assessment of Interpreting: Recent Developments in China, 85-113.
Hernández, T. A. (2018). L2 Spanish apologies development during short-term study abroad. Studies in Second Language Learning and Teaching, 8(3), 599-620. https://www.ceeol.com/search/article-detail?id=690173
Hernández, T. A., & Boero, P. (2018). Explicit intervention for Spanish pragmatic development during short‐term study abroad: An examination of learner request production and cognition. Foreign Language Annals, 51(2), 389-410. https://doi.org/10.1111/flan.12334
Hudson, T., Detmer, E., & Brown, J. D. (1992). A framework for testing cross-cultural pragmatics (Vol. 2). Natl Foreign Lg Resource Ctr.
Iramaneerat, C., Yudkowsky, R., Myford, C. M., & Downing, S. M. (2008). Quality control of an OSCE using generalizability theory and many-faceted Rasch measurement. Advances in Health Sciences Education, 13, 479-493.
Karami, H. (2012). The relative impact of persons, items, subtests, and academic background on performance on a language proficiency test. Psychological Test and Assessment Modeling, 54(3), 211. https://ptam-journal.com/wp-content/uploads/2025/01/04_Ravand_.pdf
Kecskes, I. (2014). Intercultural pragmatics (Vol. 288). Oxford: Oxford University Press.
Khodi, A. (2021). The affectability of writing assessment scores: a G-theory analysis of rater, task, and scoring method contribution. Language Testing in Asia, 11(1), 30. https://doi.org/10.1186/s40468-021-00134-5
Kumar, D., Jaipurkar, R., Shekhar, A., Sikri, G., & Srinivas, V. (2021). Item analysis of multiple choice questions: A quality assurance test for an assessment tool. Medical Journal Armed Forces India, 77, S85-S89.
Lozano-Ruiz, A., Fasfous, A. F., Ibanez-Casas, I., Cruz-Quintana, F., Perez-Garcia, M., & Pérez-Marfil, M. N. (2021). Cultural bias in intelligence assessment using a culture-free test in Moroccan children. Archives of Clinical Neuropsychology, 36(8), 1502-1510.
Li, G., Pan, Y., & Wang, W. (2021). Using generalizability theory and many-facet Rasch model to evaluate in-basket tests for managerial positions.
Frontiers in Psychology,
12, 660553.
https://doi.org/10.3389/fpsyg.2021.660553
Li, S., Li, X., Feng, Y., & Wen, T. (2023). Non-expert raters' scoring behavior and cognition in assessing pragmatic production in L2 Chinese. In
Crossing Boundaries in Researching, Understanding, and Improving Language Education: Essays in Honor of G. Richard Tucker (pp. 79-102). Cham: Springer International Publishing.
https://link.springer.com/chapter/10.1007/978-3-031-24078-2_4.
Li, S., Wen, T., Li, X., Feng, Y., & Lin, C. (2023). Comparing holistic and analytic marking methods in assessing speech act production in L2 Chinese.
Language Testing, 40(2), 249-275.
https://doi.org/10.1177/026553222211139.
Liu, J. (2007). Comparing native and non-native speakers' scoring in an interlanguage pragmatics test. Modern Foreign Languages, 30(4), 395-404.
Lynch, B. K., & McNamara, T. F. (1998). Using G-theory and many-facet Rasch measurement in the development of performance assessments of the ESL speaking skills of immigrants.
Language Testing,
15(2), 158-180.
https://doi.org/10.1177/026553229801500202
Mohammad Hosseinpur, R., Bagheri Nevisi, R., & Lowni, A. (2021). A tale of four measures of pragmatic knowledge in an EFL institutional context.
Pragmatics, 31(1), 114-143.
https://doi.org/10.1075/prag.18052.moh
Namaziandost, E., Nasri, M., Rahimi Esfahani, F., Neisi, L., & Ahmadpour KarimAbadi, F. (2020). A cultural comparison of Persian and English short stories regarding the use of emotive words: implications for teaching English to Iranian young learners. Asian-Pacific Journal of Second and Foreign Language Education, 5(1), 7. https://doi.org/ 10.1186/s40862-020-00085-z
Neiriz, R. (2023). Developing and evaluating a contextualized interactional competence rating scale based on a metaphorical conceptualization: A pragmatic mixed-method approach.
Journal of Second Language Studies,
6(1), 61-94.
https://doi.org/10.1075/jsls.22003.nei
Reynolds, C. R., Altmann, R. A., & Allen, D. N. (2021). The problem of bias in psychological assessment. In Mastering modern psychological testing: Theory and methods (pp. 573-613). Cham: Springer International Publishing.
Richard, P. J., Devinney, T. M., Yip, G. S., & Johnson, G. (2009). Measuring organizational performance: Towards methodological best practice.
Journal of Management,
35(3), 718-804.
https://doi.org/10.1177/0149206308330560
Roever, C. (2008). Rater, item, and candidate effects in discourse completion tests: A FACETS approach. In E.A. Soler and A.M. Flor (eds.) Investigating pragmatics in foreign language learning, teaching and testing (pp. 249–266). Clevedon, UK: Multilingual Matters. https://books.google.nl/books
Saleem, A., Saleem, T., & Aziz, A. (2022). A pragmatic study of congratulation strategies of Pakistani ESL learners and British English speakers.
Asian-Pacific Journal of Second and Foreign Language Education,
7(1), 8.
https://doi.org/10.1186/s40862-022-00134-9
Shahi, R., Ravand, H. & Rohani, G. R. (2025). Examining the Effect of Item Difficulty and Rater Leniency on Iranian Test Takers' Performance on WDCT and DSAT: A Comparative Study. International Journal of Language Testing, 15(1), 1-19. doi: 10.22034/ijlt.2024.454478.1341
Sonnenburg-Winkler, S. L., Eslami, Z. R., & Derakhshan, A. (2020). Rater variation in pragmatic assessment: The impact of the linguistic background on peer-assessment and self-assessment.
Lodz Papers in Pragmatics,
16(1), 67-85.
https://doi.org/10.1515/lpp-2020-0004
Steyer, R. (2001). Classical (psychometric) test theory. International Encyclopedia of the Social & Behavioral Sciences. 1955-1962. https://doi.org/ 10.1016. B0-08-043076-7/00721-X.
Sitorus, T. A. P., Siregar, D. Y., Aulia, D. N., Zahra, N. A., Parinduri, A. I., Lubis, D. N. A., & Wardiah, F. D. (2025). A Systematic Review of Pragmatic Competence in Second Language Acquisition.
Sintaksis: Publikasi Para ahli Bahasa dan Sastra Inggris,
3(1), 142-152. https://doi.org/
10.61132/sintaksis.v3i1.1291
Su, Y., & Shin, S. Y. (2024). Comparing two formats of data-driven rating scales for classroom assessment of pragmatic performance with role-plays.
Language Testing,
41(2), 357-383.
https://doi.org/10.1177/02655322231210217
Sydorenko, T., Maynard, C., & Guntly, E. (2014). Rater behavior when judging language learners' pragmatic appropriateness in extended discourse. TESL Canada Journal, 32(1), 19–41. https://doi.org/doi:10.18806/tesl.v32i1.1197
Taguchi, N. (2011). Rater variation in the assessment of speech acts.
Pragmatics. Quarterly Publication of the International Pragmatics Association (IPrA),
21(3), 453-471.
https://doi.org/10.1075/prag.21.3.08tag
Taguchi, N., & Li, S. (2020). Contrastive pragmatics and second language (L2) pragmatics: Approaches to assessing L2 speech act production. Contrastive Pragmatics, 2(1), 1-23. https://brill.com/view/journals/jocp/2/1/article-p1_1.xml
Tajeddin, Z., & Alemi, M. (2014). Pragmatic rater training: Does It affect non-native L2 teachers' rating accuracy and bias?.
International Journal of Language Testing,
4(1), 66-83.
https://www.ijlt.ir/article_114394.html
Tajeddin, Z., Alemi, M., & Khanlarzadeh, N. (2020). Rating Criteria and Norms for Pragmatic Assessment in the Context of EIL. In Pragmatics Pedagogy in English as an International Language (pp. 212-231). Routledge.
Timpe-Laughlin, V., & Choi, I. (2017). Exploring the validity of a second language intercultural pragmatics assessment tool. Language Assessment Quarterly, 14(1), 19-35. https://doi.org/10.1080/15434303.2016.1256406
Toe, D., Mood, D., Most, T., Walker, E., & Tucci, S. (2020). The assessment of pragmatic skills in young deaf and hard-of-hearing children. Pediatrics, 146(Supplement_3), S284-S291.
Wolcott, M. D., Olsen, A. A., & Augustine, J. M. (2022). Item response theory in high-stakes pharmacy assessments. Currents in Pharmacy Teaching and Learning, 14(9), 1206-1214.
Wilson, A. C., & Bishop, D. V. (2022). A novel online assessment of pragmatic and core language skills: An attempt to tease apart language domains in children. Journal of Child Language, 49(1), 38-59
Xu, L., & Wannaruk, A. (2018). Reliability and validity of WDCT in testing interlanguage pragmatic competence for EFL learners.
Journal of Language Teaching and Research, 6(6), 1206-1215.
https://doi.org/10.17507/jltr.0606.07
Yang, H. (2022). Second language learners' competence of and beliefs about pragmatic comprehension: Insights from the Chinese EFL context.
Frontiers in Psychology,
12, 801315.
https://doi.org/10.3389/fpsyg.2021.801315
Youn, S. J. (2007). Rater bias in assessing the pragmatics of KFL learners using facets analysis. S
econd Language Studies 26(1): 85–163.
http://hdl.handle.net/10125/40691
Youn, S. J. (2020). Interactional features of L2 pragmatic interaction in role‐play speaking assessment.
TESOL Quarterly,
54(1), 201-233.
https://doi.org/10.1002/tesq.542
Youn, S. J., & Bi, N. Z. (2019). Investigating test-takers' strategy use in task-based L2 pragmatic speaking assessment. Intercultural Pragmatics, 16(2), 185-218.
Yamashita, S.O. (1996). Comparing six cross-cultural pragmatics measures. Unpublished doctoral dissertation, Temple University, Philadelphia, PA. https://www.proquest.com/openview/a45390785a21b1a799ba10f4e346bced/1?pq-origsite=gscholar&cbl=18750&diss=y
Zangoei, A., & Derakhshan, A. (2021). Measuring the predictability of Iranian EFL students' pragmatic listening comprehension with language proficiency, self-regulated learning in listening, and willingness to communicate. Journal of Applied Linguistics and Applied Literature: Dynamics and Advances, 9(2), 79-104.
Zhai., X, Kevin, C., Haudek., Chris, H., Wilson., Molly, Stuhlsatz. (2021). A Framework of Construct-Irrelevant Variance for Contextualized Constructed Response Assessment. Frontiers in Education, 6 doi: 10.3389/FEDUC.2021.751283