Application of a dendrogram seriation algorithm to extract pattern from plant breeding data

Euphytica - Tập 213 - Trang 1-11 - 2017
Vivi Noviati Arief1, I. H. DeLacy1, K. E. Basford1, M. J. Dieters1
1School of Agriculture and Food Sciences, The University of Queensland, Brisbane, Australia

Tóm tắt

A dendrogram is often used to display the results from hierarchical clustering; however, the order of objects in a standard dendrogram is arbitrary and so similarity cannot be readily interpreted. An optimized dendrogram, a dendrogram produced by re-ordering the objects using a seriation method, has a customized ordering that reflects the similarity among objects with most similar objects located closest together. Hierarchical clustering has been applied to the analysis of data from plant breeding programs to identify the patterns in breeding populations and to study genotype by environment interactions. In this paper we demonstrate the advantage of an optimized dendrogram for interpretation of plant breeding data and, given this advantage, argue that an optimized dendrogram should be used as the default whenever hierarchical clustering is used.

Tài liệu tham khảo

Arief VN, DeLacy IH, Wenzl P, Dreisigacker S, Crossa J, Dieters MJ, Basford KE (2013) Using molecular marker order to compare genetic structure in plant populations undergoing selection. J Environ Stat 4(4):1 Arief VN, DeLacy IH, Crossa J, Payne T, Singh R, Braun H-J, Tian T, Basford KE, Dieters MJ (2015) Evaluating testing strategies for plant breeding field trials: redesigning a CIMMYT international wheat nursery to provide extra genotype connection accross cycles. Crop Sci 55:164–177 Bar-Joseph Z, Gifford DK, Jaakkola TS (2001) Fast optimal leaf ordering for hierarchical clustering. Bioinformatics 17(Suppl. 1):S22–S29 CIMMYT (2004) The International Wheat Information System™. http://www.cimmyt.org/research/wheat/iwisfol/IWISFOL.htm. Accessed 10 August 2007 Cooper M, Woodruff DR (1993) Predicting Grain-Yield in Australian Environments Using Data from CIMMYT International Wheat Performance Trials. 3. Testing Predicted Correlated Response to Selection. Field Crop Res 35:191–204 Cooper M, DeLacy IH (1994) Relationships among analytical methods used to study genotypic variation and genotype-by-environment interaction in plant breeding multi-environment experiments. Theor Appl Genet 88:561–572 de la Vega AJ, DeLacy IH, Chapman SC (2007) Progress over 20 years of sunflower breeding in central Argentina. Field Crop Res 100:61–72 DeLacy IH, Cooper M (1990) Pattern analysis for the analysis of regional variety trials. In: Kang MS (ed) Genotype-by-Environment Interaction and Plant Breeding. Louisiana State University, Baton Rouge, pp 189–213 DeLacy IH, Basford KE, Cooper M, Bull JK, McLaren CG (1996) Analysis of multi-environment trials—an historical perspective. In: Cooper M, Hammer GL (eds) Plant Adaptation and Crop Improvement. CAB International, Wallingford, pp 193–224 Dice LR (1945) Measures of the amount of ecological association between species. Ecology 26:297–302 Dreisigacker S, Shewayrga H, Crossa J, Arief VN, DeLacy IH, Singh RP, Dieters MJ, Braun H-J (2011) Genetic structures of the CIMMYT international yield trial targeted to irrigated environments. Mol Breed 29(2):529–541. doi:10.1007/s11032-011-9569-7 Earle D, Hurley CB (2014) Advances in dendrogram seriation for application to visualization. J Comput Graph Stat. doi:10.1080/10618600.2013.874295 Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95:14863–14868 Falconer DS, McKay TFC (1996) Introduction to quantitative genetics, 4th edn. Longman, Burnt Mill, Harlow Forina M, Armanino C, Raggio V (2002) Clustering with dendrograms on interpretation variables. Anal Chim Acta 454:13–19 Gabriel KR (1971) The biplot graphic display of matrices with application to principal component analysis. Biometrika 58(3):453–467 Gilmour AR, Gogel BJ, Cullis BR, Thompson R (2009) ASReml User Guide Release 3.0. VSN International Ltd, Hemel Hempstead, UK Gower JC (1966) Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53:325–338 Gruvaeus G, Wainer H (1972) Two additions to hierarchical cluster analysis. Br J Math Stat Psychol 25:200–206 Hackett CA, Wachira FN, Paul S, Powell W, Waugh R (2000) Construction of a genetic linkage map for Camellia sinensis (tea). Heredity 85:346–355 Hahsler M, Hornik K, Buchta C (2008) Getting things in order: an introduction to the R package seriation. J Stat Softw 25(3). http://www.jstatsoft.org/v25/i03/paper Hamann U (1961) Merkmalsbestand und verwandtschaftsbeziehungen der farinosae: ein beitrag zum system der monokotyledonen. Willdenowia 2(5):639–768 Hill MO (1979) TWINSPAN: a FORTRAN program for arranging multivariate data in an ordered two-way table by classification of the individuals and attributes. Ecology and Systematics, Cornell University, Ithaca, NY Hill MO, Bunce RGH, Shaw MW (1975) Indicator species analysis, a disivise polythetic method of classification, and its application to a survey of native pinewoods in Scotland. J Ecol 63:597–613 Hurley CB (2004) Clustering visualizations of multidimensional data. J Comput Graph Stat 13(4):788–806 Liiv I (2010) Seriation and matrix reordering methods: an historical overview. Statistical Analysis and Data Mining 3:70–91. doi:10.1002/sam McLaren CG (2007) TDM GMS Browse. IRRI Philippines. http://cropwiki.irri.org/icis/index.php/TDM_GMS_Browse. Accessed 4 August 2007 Mirzawan PDN, Cooper M, DeLacy IH, Hogarth DM (1994) Retrospective analysis of the relationships among the test environments of the Southern Queensland sugarcane breeding programme. Theor Appl Genet 88:707–716 Rajaram S, van Ginkel M, Fischer RA (1995) CIMMYT’s wheat breeding mega-environments (ME). In: Li ZS, Xin ZY (eds) Proceedings of the 8th international wheat genetic symposium. Beijing, China, pp. 1101–1106 Redden RJ, DeLacy IH, Butler DG, Usher T (2000) Analysis of line × environment interactions for yield in navy beans. 2. Pattern analysis of lines and environment within years. Aust J Agric Res 51:607–617 Sokal RR, Michener CD (1958) A statistical method for evaluating systematic relationships. Univ Kans Sci Bull 38:1409–1438 van Ooijen JW (2006) JoinMap® 4, Software fro the calculation of genetics linkage maps in experimental population. Kyazma BV, Wageningen, Netherland van Os H, Stam P, Visser RGF, van Eck HJ (2005) RECORD: a novel method for ordering loci on a genetic linkage map. Theor Appl Genet 112:30–40 Ward JH (1963) Hierarchical grouping to optimise an objective function. J Am Stat Assoc 58:236–244 Williams WT (1976) Pattern Analysis in Agricultural Science. Elsevier, Amsterdam Wu Y, Bhat PR, Close TJ, Lonardi S (2009) Efficient and accurate construction of genetic linkage maps from the minimum spanning tree of a graph. PLoS Genet 4(10):e1000212 Wu H-M, Tien Y-J, C-h Chen (2010) GAP: a graphical environment for matrix visualization and cluster analysis. Comput Stat Data Anal 54:767–778