Sử dụng các giải thích dựa trên thuộc tính đa dạng của các phép chiếu đa chiều để khám phá dữ liệu có chiều cao

Computers and Graphics - Tập 98 - Trang 93 - 2021
Alexandru Telea1, Zonglin Tian1, Mateus Espadoto2,3, Gijs van Steenpaal1, Xiaorui Zhai3, Daan van Driel3
1Department of Information and Computing Sciences, Utrecht University, Utrecht, 3584 CC, Netherlands
2Institute of Mathematics and Statistics, University of São Paulo, São Paulo 05508-090, Brazil
3Bernoulli Institute, University of Groningen, Groningen, 9747 AG, Netherlands

Tóm tắt

Các phép chiếu đa chiều (MP) là những phương pháp hiệu quả để trực quan hóa các tập dữ liệu có chiều cao nhằm tìm kiếm cấu trúc trong dữ liệu như nhóm các điểm tương tự và các điểm ngoại lệ. Những hiểu biết thu được từ các phép chiếu MP có thể được củng cố bằng cách bổ sung thêm các cơ chế giải thích, còn được gọi là các cơ chế giải thích. Chúng tôi trình bày và thảo luận về một tập hợp sáu cơ chế như vậy, giải thích các phép chiếu MP từ góc độ các chiều tương tự, tính đa chiều địa phương và tương quan giữa các chiều. Chúng tôi triển khai các công cụ giải thích của chúng tôi bằng cách sử dụng phương pháp dựa trên hình ảnh, phương pháp này có hiệu quả trong việc tính toán, khả năng mở rộng tốt về mặt trực quan cho các biểu đồ phân tán MP lớn và dày đặc, và có thể xử lý bất kỳ kỹ thuật chiếu nào. Chúng tôi trình bày cách mà các hình ảnh giải thích được cung cấp có thể được kết hợp để làm tăng giá trị của nhau và do đó dẫn đến những hiểu biết tinh vi hơn về dữ liệu cho nhiều tập dữ liệu có chiều cao, và cách mà các hiểu biết này tương quan với các thực tế đã biết về dữ liệu được nghiên cứu.

Từ khóa

#Giảm chiều #Các kỹ thuật giải thích #Phân tích dữ liệu có chiều cao

Tài liệu tham khảo

[1] M. Greenacre Biplots in practice 2010 Fundacion BBVA, Bilbao Greenacre M.. Biplots in practice. Fundacion BBVA, Bilbao; 2010. [2] J. Gower S. Lubbe N. Roux Understanding biplots 2011 Wiley Gower J., Lubbe S., Roux N.. Understanding biplots. Wiley; 2011. [3] B. Broeksema T. Baudel A. Telea Visual analysis of multidimensional categorical datasets Computer Graphics Forum 32 8 2013 158 169 Broeksema B., Baudel T., Telea A.. Visual analysis of multidimensional categorical datasets. Computer Graphics Forum 2013;32(8):158–169. [4] D. Coimbra R. Martins T. Neves A. Telea F. Paulovich Explaining three-dimensional dimensionality reduction plots Information Visualization 15 2 2016 154 172 Coimbra D., Martins R., Neves T., Telea A., Paulovich F.. Explaining three-dimensional dimensionality reduction plots. Information Visualization 2016;15(2):154–172. [5] P. Pagliosa F. Paulovich R. Minghim H. Levkowitz L. Nonato Projection inspector: Assessment and synthesis of multidimensional projections Neurocomputing 150 2015 599 610 Pagliosa P., Paulovich F., Minghim R., Levkowitz H., Nonato L.. Projection inspector: Assessment and synthesis of multidimensional projections. Neurocomputing 2015;150:599–610. [6] P. Joia D. Coimbra J.A. Cuminato F.V. Paulovich L.G. Nonato Local affine multidimensional projection IEEE TVCG 17 12 2011 2563 2571 Joia P., Coimbra D., Cuminato J. A., Paulovich F. V., Nonato L. G.. Local affine multidimensional projection. IEEE TVCG 2011;17(12):2563–2571. [7] P. Rauber R. da Silva S. Feringa M. Celebi A. Falcao A. Telea Interactive image feature selection aided by dimensionality reduction Proc. EuroVA 2015 97 101 Rauber P., da Silva R., Feringa S., Celebi M., Falcao A., Telea A.. Interactive image feature selection aided by dimensionality reduction. In: Proc. EuroVA. 2015, p. 97–101. [8] M. Aupetit Visualizing distortions and recovering topology in continuous projection techniques Neurocomputing 10 7-9 2007 1304 1330 Aupetit M.. Visualizing distortions and recovering topology in continuous projection techniques. Neurocomputing 2007;10(7-9):1304–1330. [9] T. Schreck T. von Landesberger S. Bremm Techniques for precision-based visual analysis of projected data Information Visualization 9 3 2010 181 193 Schreck T., von Landesberger T., Bremm S.. Techniques for precision-based visual analysis of projected data. Information Visualization 2010;9(3):181–193. [10] R. Martins D. Coimbra R. Minghim A.C. Telea Visual analysis of dimensionality reduction quality for parameterized projections Computers & Graphics 41 2014 26 42 Martins R., Coimbra D., Minghim R., Telea A. C.. Visual analysis of dimensionality reduction quality for parameterized projections. Computers & Graphics 2014;41:26–42. [11] R. da Silva P. Rauber R. Martins R. Minghim A. Telea Attribute-based visual explanation of multidimensional projections Proc. EuroVA 2015 97 101 da Silva R., Rauber P., Martins R., Minghim R., Telea A.. Attribute-based visual explanation of multidimensional projections. In: Proc. EuroVA. 2015, p. 97–101. [12] D. van Driel X. Zhai Z. Tian A. Telea Enhanced attribute-based explanations of multidimensional projections Proc. EuroVA 2020 Eurographics van Driel D., Zhai X., Tian Z., Telea A.. Enhanced attribute-based explanations of multidimensional projections. In: Proc. EuroVA. Eurographics; 2020,. [13] J.B. Tenenbaum V. De Silva J.C. Langford A global geometric framework for nonlinear dimensionality reduction Science 290 5500 2000 2319 2323 Tenenbaum J. B., De Silva V., Langford J. C.. A global geometric framework for nonlinear dimensionality reduction. Science 2000;290(5500):2319–2323. [14] V. De Silva J.B. Tenenbaum Sparse multidimensional scaling using landmark points Tech. Rep. 2004 Stanford University De Silva V., Tenenbaum J. B.. Sparse multidimensional scaling using landmark points. Tech. Rep.; Stanford University; 2004. [15] L. van der Maaten G.E. Hinton Visualizing data using t-SNE JMLR 9 2008 2579 2605 van der Maaten L., Hinton G. E.. Visualizing data using t-SNE. JMLR 2008;9:2579–2605. null [17] L.G. Nonato M. Aupetit Multidimensional projection for visual analytics: Linking techniques with distortions, tasks, and layout enrichment IEEE TVCG 25 8 2018 2650 2673 Nonato L. G., Aupetit M.. Multidimensional projection for visual analytics: Linking techniques with distortions, tasks, and layout enrichment. IEEE TVCG 2018;25(8):2650–2673. [18] M. Espadoto R. Martins A. Kerren N. Hirata A. Telea Towards a quantitative survey of dimension reduction techniques IEEE TVCG 2019 Espadoto M., Martins R., Kerren A., Hirata N., Telea A.. Towards a quantitative survey of dimension reduction techniques. IEEE TVCG 2019;Doi:10.1109/TVCG.2019.2944182. Doi:10.1109/TVCG.2019.2944182 [19] X. Geng D. Zhan Z. Zhou Supervised nonlinear dimensionality reduction for visualization and classification IEEE Trans Syst Man Cybern 35 6 2005 1098 1107 Geng X., Zhan D., Zhou Z.. Supervised nonlinear dimensionality reduction for visualization and classification. IEEE Trans Syst Man Cybern 2005;35(6):1098–1107. [20] J. Venna S. Kaski Visualizing gene interaction graphs with local multidimensional scaling Proc. ESANN 2006 557 562 Venna J., Kaski S.. Visualizing gene interaction graphs with local multidimensional scaling. In: Proc. ESANN. 2006, p. 557–562. [21] F.V. Paulovich L.G. Nonato R. Minghim H. Levkowitz Least square projection: A fast high-precision multidimensional projection technique and its application to document mapping IEEE TVCG 14 3 2008 564 575 Paulovich F. V., Nonato L. G., Minghim R., Levkowitz H.. Least square projection: A fast high-precision multidimensional projection technique and its application to document mapping. IEEE TVCG 2008;14(3):564–575. [22] M. Sips B. Neubert J. Lewis P. Hanrahan Selecting good views of high-dimensional data using class consistency Comp Graph Forum 28 3 2009 831 838 Sips M., Neubert B., Lewis J., Hanrahan P.. Selecting good views of high-dimensional data using class consistency. Comp Graph Forum 2009;28(3):831–838. [23] J.A. Lee M. Verleysen Quality assessment of dimensionality reduction: Rank-based criteria Neurocomputing 72 7 2009 1431 1443 Lee J. A., Verleysen M.. Quality assessment of dimensionality reduction: Rank-based criteria. Neurocomputing 2009;72(7):1431–1443. [24] W. Lueks A. Gisbrecht B. Hammer Visualizing the quality of dimensionality reduction Neurocomputing 112 2013 109 123 Lueks W., Gisbrecht A., Hammer B.. Visualizing the quality of dimensionality reduction. Neurocomputing 2013;112:109–123. [25] S. Lespinats M. Aupetit CheckViz: Sanity check and topological clues for linear and non-linear mappings Comp Graph Forum 30 1 2011 113 125 Lespinats S., Aupetit M.. CheckViz: Sanity check and topological clues for linear and non-linear mappings. Comp Graph Forum 2011;30(1):113–125. [26] A. Tatu P. Bak E. Bertini D. Keim J. Schneidewind Visual quality metrics and human perception: An initial study on 2D projections of large multidimensional data Proc. AVI 2010 ACM 49 56 Tatu A., Bak P., Bertini E., Keim D., Schneidewind J.. Visual quality metrics and human perception: An initial study on 2D projections of large multidimensional data. In: Proc. AVI. ACM; 2010, p. 49–56. [27] S. Oeltze H. Doleisch H. Hauser Interactive visual analysis of perfusion data IEEE TVCG 13 6 2007 1392 1399 Oeltze S., Doleisch H., Hauser H.. Interactive visual analysis of perfusion data. IEEE TVCG 2007;13(6):1392–1399. [28] K. Olsen R. Korfhage K. Sochats Visualization of a document collection: the VIBE system Inform Process Manag 29 1 1993 69 81 Olsen K., Korfhage R., Sochats K.. Visualization of a document collection: the VIBE system. Inform Process Manag 1993;29(1):69–81. [29] A. Endert P. Flaux C. North Semantic interaction for visual text analytics Proc. ACM CHI 2012 324 333 Endert A., Flaux P., North C.. Semantic interaction for visual text analytics. In: Proc. ACM CHI. 2012, p. 324–333. [30] J. Yi R. Melton J. Stasko Dust & magnet: multivariate information visualization using a magnet metaphor Inform Visual 4 4 2005 239 256 Yi J., Melton R., Stasko J.. Dust & magnet: multivariate information visualization using a magnet metaphor. Inform Visual 2005;4(4):239–256. [31] H. Piringer R. Kosara H. Hauser Interactive F + C visualization with linked 2D/3D scatterplots Proc. IEEE CMV 2004 49 60 Piringer H., Kosara R., Hauser H.. Interactive F + C visualization with linked 2D/3D scatterplots. In: Proc. IEEE CMV. 2004, p. 49–60. [32] N. Elmqvist P. Dragicevic J.-D. Fekete Rolling the dice: multidimensional visual exploration using scatterplot matrix navigation IEEE TVCG 14 8 2008 1141 1148 Elmqvist N., Dragicevic P., Fekete J.-D.. Rolling the dice: multidimensional visual exploration using scatterplot matrix navigation. IEEE TVCG 2008;14(8):1141–1148. [33] F.C.M. Rodrigues M. Espadoto R. Hirata A. Telea Constructing and visualizing high-quality classifier decision boundary maps Information 10 9 2019 280 297 Rodrigues F. C. M., Espadoto M., Hirata R., Telea A.. Constructing and visualizing high-quality classifier decision boundary maps. Information 2019;10(9):280–297. [34] N. Cliff The eigenvalues-greater-than-one rule and the reliability of components Psychological Bulletin 103 2 1988 276 279 Cliff N.. The eigenvalues-greater-than-one rule and the reliability of components. Psychological Bulletin 1988;103(2):276–279. [35] I.T. Jolliffe Principal Component Analysis 2002 Springer Jolliffe I. T.. Principal Component Analysis. Springer; 2002. 2^{nd} edition. 2nd edition [36] L.J. O’Donnell C.F. Westin An introduction to diffusion tensor image analysis Neurosurg Clin N Am 22 2 2011 185 196 O’Donnell L. J., Westin C. F.. An introduction to diffusion tensor image analysis. Neurosurg Clin N Am 2011;22(2):185–196. [37] P.B. P A. Falguerolles Application of resampling methods to the choice of dimension in principal component analysis Computer Intensive Methods in Statistics 1993 Springer 167 176 P P. B., Falguerolles A.. Application of resampling methods to the choice of dimension in principal component analysis. In: Computer Intensive Methods in Statistics. Springer; 1993, p. 167–176. [38] G.R. North T.L. Bell R.F. Cahalan F.J. Moeng Sampling errors in the estimation of empirical orthogonal functions Mon Weather Rev 110 1982 699 706 North G. R., Bell T. L., Cahalan R. F., Moeng F. J.. Sampling errors in the estimation of empirical orthogonal functions. Mon Weather Rev 1982;110:699–706. [39] I.-C. Yeh Modeling of strength of high performance concrete using artificial neural networks Cement and Concrete Research 28 12 1998 1797 1808 Yeh I.-C.. Modeling of strength of high performance concrete using artificial neural networks. Cement and Concrete Research 1998;28(12):1797–1808. null [41] R. da Silva Visualizing multidimensional data similarities – improvements and applications 2016 University of Groningen, Netherlands da Silva R.. Visualizing multidimensional data similarities – improvements and applications. Ph.D. thesis; University of Groningen, Netherlands; 2016. [42] S. Wu B. Li J. Yang S. Shukla Predictive modeling of high-performance concrete with regression analysis Proc. IEEE Intl. Conf. on Industrial Engineering and Engineering Management 2010 Wu S., Li B., Yang J., Shukla S.. Predictive modeling of high-performance concrete with regression analysis. In: Proc. IEEE Intl. Conf. on Industrial Engineering and Engineering Management. 2010,. [43] P. Cortez A. Cerdeira F. Almeida T. Matos J. Reis Modeling wine preferences by data mining from physicochemical properties Decision Support Systems 47 4 2009 547 553 Cortez P., Cerdeira A., Almeida F., Matos T., Reis J.. Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems 2009;47(4):547–553. [44] J.J. van Wijk A. Telea Enridged contour maps Proc. IEEE Visualization 2001 69 74 van Wijk J. J., Telea A.. Enridged contour maps. In: Proc. IEEE Visualization. 2001, p. 69–74. [45] E.J. Beh C.I. Holdsworth A visual evaluation of a classification method for investigating the psysicochemical properties of Portugese wine Current Anal Chem 8 2 2012 205 217 Beh E. J., Holdsworth C. I.. A visual evaluation of a classification method for investigating the psysicochemical properties of Portugese wine. Current Anal Chem 2012;8(2):205–217. null [47] P. Meirelles C. Santos J. Miranda F. Kon A. Terceiro C. Chavez A study of the relationships between source code metrics and attractiveness in free software projects Proc. Brazilian Symposium on Software Engineering (SBES) 2010 11 20 Meirelles P., Santos C., Miranda J., Kon F., Terceiro A., Chavez C.. A study of the relationships between source code metrics and attractiveness in free software projects. In: Proc. Brazilian Symposium on Software Engineering (SBES). 2010, p. 11–20. [48] C. Richter Designing Flexible Object-Oriented Systems with UML 1999 New Riders Publishing Richter C.. Designing Flexible Object-Oriented Systems with UML. New Riders Publishing; 1999. [49] S. Zhang B. Guo A. Dong J. He Z. Xu S. Chen Cautionary tales on air-quality improvement in Beijing Proc Royal Society A 473 2205 2017 20170457 Zhang S., Guo B., Dong A., He J., Xu Z., Chen S.. Cautionary tales on air-quality improvement in Beijing. Proc Royal Society A 2017;473(2205):20170457. https://archive.ics.uci.edu/ml/datasets/Beijing+Multi-Site+Air-Quality+Data. https://archive.ics.uci.edu/ml/datasets/Beijing+Multi-Site+Air-Quality+Data [50] S.D. Vito E. Massera M. Piga L. Martinotto G.D. Francia On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario Sensors and Actuators B: Chemical 129 2 2008 750 757 Vito S. D., Massera E., Piga M., Martinotto L., Francia G. D.. On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario. Sensors and Actuators B: Chemical 2008;129(2):750–757. https://archive.ics.uci.edu/ml/datasets/Air+Quality. https://archive.ics.uci.edu/ml/datasets/Air+Quality null [52] R. Etemadpour R. Motta J. de Souza Paiva R. Minghim M.D. Oliveira L. Linsen Perception-based evaluation of projection methods for multidimensional data visualization IEEE TVCG 21 1 2014 81 94 Etemadpour R., Motta R., de Souza Paiva J., Minghim R., Oliveira M. D., Linsen L.. Perception-based evaluation of projection methods for multidimensional data visualization. IEEE TVCG 2014;21(1):81–94. [53] L. Wilkinson A. Arland R. Grossman Graph-theoretic scagnostics Proc. InfoVis 2005 157 164 Wilkinson L., Arland A., Grossman R.. Graph-theoretic scagnostics. In: Proc. InfoVis. 2005, p. 157–164.