Finding temporal structure in music: blues improvisation with LSTM recurrent networks

D. Eck1, J. Schmidhuber1
1IDSIA Istituto Dalle Molle di Studi sull, Intelligenza Arlificiale, Manno, Switzerland

Tóm tắt

We consider the problem of extracting essential ingredients of music signals, such as a well-defined global temporal structure in the form of nested periodicities (or meter). We investigate whether we can construct an adaptive signal processing device that learns by example how to generate new instances of a given musical style. Because recurrent neural networks (RNNs) can, in principle, learn the temporal structure of a signal, they are good candidates for such a task. Unfortunately, music composed by standard RNNs often lacks global coherence. The reason for this failure seems to be that RNNs cannot keep track of temporally distant events that indicate global music structure. Long short-term memory (LSTM) has succeeded in similar domains where other RNNs have failed, such as timing and counting and the learning of context sensitive languages. We show that LSTM is also a good mechanism for learning to compose music. We present experimental results showing that LSTM successfully learns a form of blues music and is able to compose novel (and we believe pleasing) melodies in that style. Remarkably, once the network has found the relevant structure, it does not drift from it: LSTM is able to play the blues with good timing and proper structure as long as one is willing to listen.

Từ khóa

#Intelligent networks #Multiple signal classification #Recurrent neural networks #Timing #Adaptive signal processing #Signal generators #Signal processing #Machine learning #Bars #Learning systems

Tài liệu tham khảo

hochreiter, 2001, Gradient flow in recurrent nets: the difficulty of learning long-term dependencies, A Field Guide to Dynamical Recurrent Neural Networks 10.1142/S0218488598000100 10.2307/3679550 mozer, 1994, Neural network composition by prediction: Exploring the benefits of psychophysical constraints and multiscale processing, Cognitive Science Cognitive Science, 6, 247 10.1016/S0893-6080(02)00219-8 plaut, 1986, Experiments on learning back propagation, Techn Report CMU-CS-86–126 robinson, 1987, The Utility Driven Dynamic Error Propagation Network, Technical Report CUED/F-INFENG/TR291 10.1037/0033-295X.89.4.305 stevens, 1994, Representations of Tonal Music: A Case study in the development of temporal relationship, Proceedings of the 1993 Connectionist Models Summer School, 228 10.2307/3679551 10.1007/s004260100070 10.1007/3-540-44668-0_173 10.1109/IJCNN.2000.861302 gcrs, 2002, DEKF-LSTM, ESANN'2002 proceedings - European symposium on artificial neural networks 10.1162/089976600300015015 10.1109/72.963769 cooper, 1960, The Rhythmic Structure of Music 10.2307/3679552 hochreiter, 1991, Untersuchungen dynarnischen Netzen williams, 1995, Gradient-based learning algorithms for recurrent networks and their computational complexity, Back-Propagation Theory Architectures and Applications, 433