Application of a dendrogram seriation algorithm to extract pattern from plant breeding data
Tóm tắt
A dendrogram is often used to display the results from hierarchical clustering; however, the order of objects in a standard dendrogram is arbitrary and so similarity cannot be readily interpreted. An optimized dendrogram, a dendrogram produced by re-ordering the objects using a seriation method, has a customized ordering that reflects the similarity among objects with most similar objects located closest together. Hierarchical clustering has been applied to the analysis of data from plant breeding programs to identify the patterns in breeding populations and to study genotype by environment interactions. In this paper we demonstrate the advantage of an optimized dendrogram for interpretation of plant breeding data and, given this advantage, argue that an optimized dendrogram should be used as the default whenever hierarchical clustering is used.
Tài liệu tham khảo
Arief VN, DeLacy IH, Wenzl P, Dreisigacker S, Crossa J, Dieters MJ, Basford KE (2013) Using molecular marker order to compare genetic structure in plant populations undergoing selection. J Environ Stat 4(4):1
Arief VN, DeLacy IH, Crossa J, Payne T, Singh R, Braun H-J, Tian T, Basford KE, Dieters MJ (2015) Evaluating testing strategies for plant breeding field trials: redesigning a CIMMYT international wheat nursery to provide extra genotype connection accross cycles. Crop Sci 55:164–177
Bar-Joseph Z, Gifford DK, Jaakkola TS (2001) Fast optimal leaf ordering for hierarchical clustering. Bioinformatics 17(Suppl. 1):S22–S29
CIMMYT (2004) The International Wheat Information System™. http://www.cimmyt.org/research/wheat/iwisfol/IWISFOL.htm. Accessed 10 August 2007
Cooper M, Woodruff DR (1993) Predicting Grain-Yield in Australian Environments Using Data from CIMMYT International Wheat Performance Trials. 3. Testing Predicted Correlated Response to Selection. Field Crop Res 35:191–204
Cooper M, DeLacy IH (1994) Relationships among analytical methods used to study genotypic variation and genotype-by-environment interaction in plant breeding multi-environment experiments. Theor Appl Genet 88:561–572
de la Vega AJ, DeLacy IH, Chapman SC (2007) Progress over 20 years of sunflower breeding in central Argentina. Field Crop Res 100:61–72
DeLacy IH, Cooper M (1990) Pattern analysis for the analysis of regional variety trials. In: Kang MS (ed) Genotype-by-Environment Interaction and Plant Breeding. Louisiana State University, Baton Rouge, pp 189–213
DeLacy IH, Basford KE, Cooper M, Bull JK, McLaren CG (1996) Analysis of multi-environment trials—an historical perspective. In: Cooper M, Hammer GL (eds) Plant Adaptation and Crop Improvement. CAB International, Wallingford, pp 193–224
Dice LR (1945) Measures of the amount of ecological association between species. Ecology 26:297–302
Dreisigacker S, Shewayrga H, Crossa J, Arief VN, DeLacy IH, Singh RP, Dieters MJ, Braun H-J (2011) Genetic structures of the CIMMYT international yield trial targeted to irrigated environments. Mol Breed 29(2):529–541. doi:10.1007/s11032-011-9569-7
Earle D, Hurley CB (2014) Advances in dendrogram seriation for application to visualization. J Comput Graph Stat. doi:10.1080/10618600.2013.874295
Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95:14863–14868
Falconer DS, McKay TFC (1996) Introduction to quantitative genetics, 4th edn. Longman, Burnt Mill, Harlow
Forina M, Armanino C, Raggio V (2002) Clustering with dendrograms on interpretation variables. Anal Chim Acta 454:13–19
Gabriel KR (1971) The biplot graphic display of matrices with application to principal component analysis. Biometrika 58(3):453–467
Gilmour AR, Gogel BJ, Cullis BR, Thompson R (2009) ASReml User Guide Release 3.0. VSN International Ltd, Hemel Hempstead, UK
Gower JC (1966) Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53:325–338
Gruvaeus G, Wainer H (1972) Two additions to hierarchical cluster analysis. Br J Math Stat Psychol 25:200–206
Hackett CA, Wachira FN, Paul S, Powell W, Waugh R (2000) Construction of a genetic linkage map for Camellia sinensis (tea). Heredity 85:346–355
Hahsler M, Hornik K, Buchta C (2008) Getting things in order: an introduction to the R package seriation. J Stat Softw 25(3). http://www.jstatsoft.org/v25/i03/paper
Hamann U (1961) Merkmalsbestand und verwandtschaftsbeziehungen der farinosae: ein beitrag zum system der monokotyledonen. Willdenowia 2(5):639–768
Hill MO (1979) TWINSPAN: a FORTRAN program for arranging multivariate data in an ordered two-way table by classification of the individuals and attributes. Ecology and Systematics, Cornell University, Ithaca, NY
Hill MO, Bunce RGH, Shaw MW (1975) Indicator species analysis, a disivise polythetic method of classification, and its application to a survey of native pinewoods in Scotland. J Ecol 63:597–613
Hurley CB (2004) Clustering visualizations of multidimensional data. J Comput Graph Stat 13(4):788–806
Liiv I (2010) Seriation and matrix reordering methods: an historical overview. Statistical Analysis and Data Mining 3:70–91. doi:10.1002/sam
McLaren CG (2007) TDM GMS Browse. IRRI Philippines. http://cropwiki.irri.org/icis/index.php/TDM_GMS_Browse. Accessed 4 August 2007
Mirzawan PDN, Cooper M, DeLacy IH, Hogarth DM (1994) Retrospective analysis of the relationships among the test environments of the Southern Queensland sugarcane breeding programme. Theor Appl Genet 88:707–716
Rajaram S, van Ginkel M, Fischer RA (1995) CIMMYT’s wheat breeding mega-environments (ME). In: Li ZS, Xin ZY (eds) Proceedings of the 8th international wheat genetic symposium. Beijing, China, pp. 1101–1106
Redden RJ, DeLacy IH, Butler DG, Usher T (2000) Analysis of line × environment interactions for yield in navy beans. 2. Pattern analysis of lines and environment within years. Aust J Agric Res 51:607–617
Sokal RR, Michener CD (1958) A statistical method for evaluating systematic relationships. Univ Kans Sci Bull 38:1409–1438
van Ooijen JW (2006) JoinMap® 4, Software fro the calculation of genetics linkage maps in experimental population. Kyazma BV, Wageningen, Netherland
van Os H, Stam P, Visser RGF, van Eck HJ (2005) RECORD: a novel method for ordering loci on a genetic linkage map. Theor Appl Genet 112:30–40
Ward JH (1963) Hierarchical grouping to optimise an objective function. J Am Stat Assoc 58:236–244
Williams WT (1976) Pattern Analysis in Agricultural Science. Elsevier, Amsterdam
Wu Y, Bhat PR, Close TJ, Lonardi S (2009) Efficient and accurate construction of genetic linkage maps from the minimum spanning tree of a graph. PLoS Genet 4(10):e1000212
Wu H-M, Tien Y-J, C-h Chen (2010) GAP: a graphical environment for matrix visualization and cluster analysis. Comput Stat Data Anal 54:767–778