Mô hình Beta rời rạc và Beta-Binomial dịch chuyển cho dữ liệu xếp hạng và đánh giá

Mariangela Sciandra1, Salvatore Fasola1, Alessandro Albano1, Chiara Di Maria1, Antonella Plaia2
1Department of Economics, Business and Statistics, University of Palermo, Viale delle Scienze, Building 13, 90128, Palermo, Italy
2Sustainable Mobility Center (Centro Nazionale per la Mobilità Sostenibile—CNMS), Milan, Italy

Tóm tắt

Tóm tắtPhương pháp xếp hạng và đánh giá cho dữ liệu ưu tiên dẫn đến một tổ chức tiềm ẩn khác nhau của dữ liệu, điều này có thể dẫn đến nhiều cách tiếp cận xác suất khác nhau cho mô hình hóa dữ liệu. Như một sự thay thế cho các phương pháp hiện có, hai phân phối xác suất linh hoạt mới được thảo luận như một khuôn khổ mô hình hóa: Beta rời rạcBeta-Binomial dịch chuyển. Thông qua việc trình bày ba ví dụ thực tế, chúng tôi cho thấy tính hữu ích thực tiễn của những phân phối này. Những trường hợp minh họa này cho thấy cách mà những phân phối mới này có thể giải quyết hiệu quả những thách thức trong thế giới thực, đặc biệt tập trung vào dữ liệu có nguồn gốc từ các cuộc khảo sát liên quan đến các vấn đề môi trường. Phân tích của chúng tôi làm nổi bật khả năng của các phân phối mới trong việc nắm bắt các cấu trúc tiềm ẩn bên trong dữ liệu ưu tiên, mang lại những hiểu biết quý giá cho lĩnh vực này.

Từ khóa


Tài liệu tham khảo

Agresti A (2010) Analysis of ordinal categorical data, vol 656. Wiley, Hoboken

Agresti A (2011) Categorical data analysis. Springer, Berlin

Albano A, Sciandra M, Plaia A (2023) A weighted distance-based approach with boosted decision trees for label ranking. Expert Syst Appl 213:119000. https://doi.org/10.1016/j.eswa.2022.119000

Alvo M, Philip L (2014) Statistical methods for ranking data, vol 1341. Springer, New York

Benney T, Chaney R, Singer P, Sloan C (2020) Utah air quality risk and behavioral action survey. Inter-university Consortium for Political and Social Research [distributor], Ann Arbor. https://doi.org/10.3886/E117904V1

Bradley RA (1976) A biometrics invited paper. Science, statistics, and paired comparisons. Biometrics 32(2):213–239

Buchholz A, Lichtenberg JM, Benedetto GD, Stein Y, Bellini V, Ruffini M (2022) Low-variance estimation in the Plackett-Luce model via quasi-Monte Carlo sampling. https://arxiv.org/abs/2205.06024

Cerda P, Varoquaux G, Kégl B (2018) Similarity encoding for learning with dirty categorical variables. Mach Learn 107(8–10):1477–1494

Critchlow DE, Fligner MA (1991) Paired comparison, triple comparison, and ranking experiments as generalized linear models, and their implementation on GLIM. Psychometrika 56(3):517–533

Critchlow DE, Fligner MA (1993) Ranking models with item covariates. In: Fligner M, Verducci J (eds) Probability models and statistical analyses for ranking data. Springer, New York, pp 1–19

Critchlow DE, Fligner MA, Verducci JS (1991) Probability models on rankings. J Math Psychol 35(3):294–318

D’Ambrosio A, Heiser WJ (2016) A recursive partitioning method for the prediction of preference rankings based upon Kemeny distances. Psychometrika 81(3):774–794

D’Elia A (2000) A shifted binomial model for rankings. In: Núñez-Antón V, Ferreira E (eds) Statistical modelling, XV international workshop on statistical modelling. New trends in statistical modelling. pp 412–416

D’Elia A (2003) Modelling ranks using the inverse hypergeometric distribution. Stat Model 3(1):65–78

D’Elia A, Piccolo D (2005) A mixture model for preferences data analysis. Comput Stat Data Anal 49(3):917–934

de Rezende NA, de Medeiros DD (2022) How rating scales influence responses’ reliability, extreme points, middle point and respondent’s preferences. J Bus Res 138:266–274. https://doi.org/10.1016/j.jbusres.2021.09.031

Dery L, Shmueli E (2020) BoostLR: a boosting-based learning ensemble for label ranking tasks. IEEE Access 8:176023–176032. https://doi.org/10.1109/ACCESS.2020.3026758

Dittrich R, Hatzinger R, Katzenbeisser W (1998) Modelling the effect of subject-specific covariates in paired comparison studies with an application to university rankings. J R Stat Soc Ser C 47(4):511–525

Dittrich R, Hatzinger R, Katzenbeisser W (2002) Modelling dependencies in paired comparison data: a log-linear approach. Comput Stat Data Anal 40(1):39–57

Dittrich R, Francis B, Hatzinger R, Katzenbeisser W (2007) A paired comparison approach for the analysis of sets of Likert scale responses. Stat Model 7:3–28

Dwass M (1957) On the distribution of ranks and of certain rank order statistics. Ann Math Stat 28(2):424–431

Falahee M, MacRae A (1997) Perceptual variation among drinking waters: the reliability of sorting and ranking data for multidimensional scaling. Food Qual Prefer 8(5):389–394

Fasola S, Sciandra M (2013) New flexible probability distributions for ranking data. In: Minerva T, Morlini I, Palumbo F (eds) SIS CLADAG 2013, 9th scientific meeting of the classification and data analysis group of the Italian Statistical Society. pp 191–194

Fasola S, Sciandra M (2015) New flexible probability distributions for ranking data. In: Morlini I, Minerva T, Vichi M (eds) Advances in statistical models for data analysis. Springer, Cham, pp 117–124

Feng Y, Tang Y (2022) On a Mallows-type model for (ranked) choices. Adv Neural Inf Process Syst 35:3052–3065

Fligner MA, Verducci JS (1988) Multistage ranking models. J Am Stat Assoc 83(403):892–901

Fok D, Paap R, Van Dijk B (2012) A rank-ordered logit model with unobserved heterogeneity in ranking capabilities. J Appl Econom 27(5):831–846

Fonseca C, Wood LE, Andriamahefazafy M, Casal G, Chaigneau T, Cornet CC, O’Leary BC (2023) Survey data of public awareness on climate change and the value of marine and coastal ecosystems. Data Brief 47:108924. https://doi.org/10.1016/j.dib.2023.108924

Francis B, Dittrich R, Hatzinger R (2010) Modeling heterogeneity in ranked responses by nonparametric maximum likelihood: how do Europeans get their scientific knowledge? Ann Appl Stat 4(4):2181–2202

Gorantla S, Bhansali E, Deshpande A, Louis A (2023) Optimizing group-fair Plackett-Luce ranking models for relevance and ex-post fairness. https://arxiv.org/abs/2308.13242

Harzing A-W, Baldueza J, Barner-Rasmussen W, Barzantny C, Canabal A, Davila A et al (2009) Rating versus ranking: what is the best way to reduce response and language bias in cross-national research? Int Bus Rev 18(4):417–432

Iannario M (2014) Modelling uncertainty and overdispersion in ordinal data. Commun Stat - Theor Method 43(4):771–786. https://doi.org/10.1080/03610926.2013.813044

Kemmelmeier M (2016) Cultural differences in survey responding: issues and insights in the study of response biases. Int J Psychol 51(6):439–444

Lee PH, Philip L (2010) Distance-based tree models for ranking data. Comput Stat Data Anal 54(6):1672–1682

Li S, Chen J (2023) Mixture of shifted binomial distributions for rating data. Ann Inst Stat Math 75:833–853. https://doi.org/10.1007/s10463-023-00865-7

Li X, Wang X, Xiao G (2019) A comparative study of rank aggregation methods for partial and top ranked lists in genomic applications. Brief Bioinform 20(1):178–189

Linacre JM (2002) Optimizing rating scale category effectiveness. J Appl Meas 3(1):85–106

Marden JI (1996) Analyzing and modeling rank data. CRC Press, Boca Raton

Maydeu-Olivares A, Böckenholt U (2005) Structural equation modeling of paired-comparison and ranking data. Psychol Methods 10(3):285

Oh C (2014) A maximum likelihood estimation method for a mixture of shifted binomial distributions. J Korean Data Inf Sci Soc 25(1):255–261

Ouimet F (2023) Deficiency bounds for the multivariate inverse hypergeometric distribution. https://arxiv.org/abs/2308.05002

Ovadia S (2004) Ratings and rankings: reconsidering the structure of values and their measurement. Int J Soc Res Methodol 7(5):403–414

Piccolo D, D’Elia A (2008) A new approach for modelling consumers’ preferences. Food Qual Prefer 19(3):247–259

Plaia A, Buscemi S, Fürnkranz J, Mencía EL (2022) Comparing boosting and bagging for decision trees of rankings. J Classif 39:78–99

Punzo A, Zini A (2012) Discrete approximations of continuous and mixed measures on a compact interval. Stat Pap 53(3):563–575

Salzberger T (2010) Does the Rasch model convert an ordinal scale into an interval scale? Rasch Meas Trans 24(2):1273–1275

Schauberger G, Tutz G (2017) Subject-specific modelling of paired comparison data: a lasso-type penalty approach. Stat Model 17(3):223–243. https://doi.org/10.1177/1471082X17693086

Schauberger G, Tutz G (2022) Multivariate ordinal random effects models including subject and group specific response style effects. Stat Model 22(5):409–429

Shen H, Hong L, Zhang X (2021) Ranking and selection with covariates for personalized decision making. INFORMS J Comput 33(4):1500–1519

Sullivan G, Artino A (2013) Analyzing and interpreting data from Likert-type scales. J Grad Med Educ 5(4):541–542. https://doi.org/10.4300/JGME-5-4-18

Tourangeau R, Rips LJ, Rasinski K (2000) The psychology of survey response. Cambridge University Press, Cambridge

Ursino M, Gasparini M (2018) A new parsimonious model for ordinal longitudinal data with application to subjective evaluations of a gastrointestinal disease. Stat Methods Med Res 27(5):1376–1393. https://doi.org/10.1177/0962280216661370

Villanueva ND, Petenate AJ, Da Silva MA (2005) Performance of the hybrid hedonic scale as compared to the traditional hedonic, self-adjusting and ranking scales. Food Qual Prefer 16(8):691–703

Vitelli V, Sørensen Ø, Crispino M, Frigessi A, Arjas E (2018) Probabilistic preference learning with the Mallows rank model. J Mach Learn Res 18(158):1–49

Wind SA (2020) Do raters use rating scale categories consistently across analytic rubric domains in writing assessment? Assess Writ 43:100416. https://doi.org/10.1016/j.asw.2019.100416

Yu PLH (2000) Bayesian analysis of order-statistics models for ranking data. Psychometrika 65(3):281–299

Yu PL, Gu J, Xu H (2019) Analysis of ranking data. WIREs Comput Stat 11(6):e1483