A survey on learning from data streams: current and future trends

João Gama1
1LIAAD-INESC-Porto LA, and FEP-University of Porto, R. de Ceuta 118-6, 4050, Porto, Portugal

Tóm tắt

Từ khóa


Tài liệu tham khảo

Aggarwal, C.: On biased reservoir sampling in the presence of stream evolution. In: Dayal, U., Whang, K.-Y., Lomet, D.B., Alonso, G., Lohman, G.M., Kersten, M.L., Cha, S.K., Kim, Y.-K. (eds.) Proceedings of the International Conference on Very Large Data Bases, pp. 607–618. ACM Seoul, Korea (2006)

Aggarwal, C. (ed): Data Streams—Models and algorithms. Springer, Berlin (2007)

Aggarwal, C., Han, J., Wang, J., Yu, P.: A framework for clustering evolving data streams. In: Proceedings of the International Conference on Very Large Data Bases, pp. 81–92. Morgan Kaufmann, Berlin (2003)

Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 207–216. Washington, DC, USA (1993)

Alon N., Matias Y., Szegedy M.: The space complexity of approximating the frequency moments. J. Comput. Syst. Sci. 58, 137–147 (1999)

Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Kolaitis, P.G. (ed.) Proceedings of the 21st Symposium on Principles of Database Systems, pp. 1–16. ACM Press, Madison (2002)

Babcock, B., Datar, M., Motwani, R.: Sampling from a moving window over streaming data. In: Proceedings of the Annual ACM SIAM Symposium on Discrete Algorithms, pp. 633–634. Society for Industrial and Applied Mathematics, San Francisco (2002)

Babu S., Widom J.: Continuous queries over data streams. SIGMOD Rec. 30(3), 109–120 (2001)

Baeza-Yates, R.A., Broder, A.Z., Maarek, Y.S.: The new frontier of web search technology, Seven challenges. In: SeCO Workshop. Lecture Notes in Computer Science, vol. 6585, pp. 3–9. Springer, Berlin (2010)

Bifet, A., Gavaldà, R.: Kalman filters and adaptive windows for learning in data streams. In: Todorovski, L., Lavrac, N. (eds.) Proceedings of the 9th Discovery Science, Lecture Notes Artificial Intelligence, vol. 4265, pp. 29–40. Springer, Barcelona (2006)

Bifet, A., Gavaldà, R.: Mining adaptively frequent closed unlabeled rooted trees in data streams. In: Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining, pp. 34–42. Las Vegas, USA (2008)

Bifet, A., Gavaldà, R.: Adaptive XML tree classification on evolving data streams. In: Machine Learning and Knowledge Discovery in Databases, European Conference, Lecture Notes in Computer Science, vol. 5781, pp. 147–162. Springer, Bled (2009)

Bifet, A., Holmes, G., Pfahringer, B.: Leveraging bagging for evolving data streams. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML/PKDD (1), Lecture Notes in Computer Science, vol. 6321, pp. 135–150. Springer, Berlin (2010)

Bifet, A., Holmes, G., Pfahringer, B., Gavaldà, R.: Improving adaptive bagging methods for evolving data streams. In: Zhou, Z.-H., Washio, T. (eds.) ACML, Lecture Notes in Computer Science, vol. 5828, pp. 23–37. Springer, Berlin (2009)

Brain, D., Webb, G.: The need for low bias algorithms in classification learning from large data sets. In: Elomaa, T., Mannila, H., Toivonen, H (eds.) Principles of Data Mining and Knowledge Discovery PKDD-02, Lecture Notes in Artificial Intelligence, vol. 2431, pp. 62–73. Springer, Helsinki (2002)

Cauwenberghs, G., Poggio, T.: Incremental and decremental support vector machine learning. In: Proceedings of the Neural Information Processing Systems (2000)

Chakrabarti, A., Ba, K.D., Muthukrishnan, S.: Estimating entropy and entropy norm on data streams. In: STACS: 23rd Annual Symposium on Theoretical Aspects of Computer Science, pp.196–205. Marseille, France (2006)

Chaudhry, N.: Stream Data Management, Chapter Introduction to Stream Data Management, pp. 1–11. Springer, Berlin (2005)

Chen R., Sivakumar K., Kargupta H.: Collective mining of Bayesian networks from heterogeneous data. Knowl. Inform. Syst. J. 6(2), 164–187 (2004)

Cormode G., Muthukrishnan S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithm 55(1), 58–75 (2005)

Cormode, G., Muthukrishnan, S., Zhuang, W.: Conquering the divide: Continuous clustering of distributed data streams. In: ICDE: Proceedings of the International Conference on Data Engineering, pp. 1036–1045. Istanbul, Turkey (2007)

Cormode, G., Thottan, M. (eds.): Algorithms for Next Generation Networks. Springer, Berlin (2010)

Cortes C., Fisher K., Pregibon D., Rogers A., Smith F.: Hancock: a language for analyzing transactional data streams. ACM Trans. Progr. Languages Syst. 26(2), 301–338 (2004)

Datar, M., Gionis, A., Indyk, P., Motwani, R.: Maintaining stream statistics over sliding windows. In: Proceedings of Annual ACM-SIAM Symposium on Discrete Algorithms, Society for Industrial and Applied Mathematics, pp. 635–644. Springer, San Francisco (2002)

Domingos, P., Hulten, G.: Mining High-Speed Data Streams. In: Parsa, I., Ramakrishnan, R., Stolfo, S. (eds.) Proceedings of the ACM Sixth International Conference on Knowledge Discovery and Data Mining, pp. 71–80. ACM Press, Boston (2000)

Flajolet P., Martin G.N.: Probabilistic counting algorithms for data base applications. J Comput. Syst. Sci. 31(2), 182–209 (1985)

Gaber, M. M., Yu, P.S.: A framework for resource-aware knowledge discovery in data streams: a holistic approach with its application to clustering. In: ACM Symposium Applied Computing, pp. 649–656. ACM Press, Boston (2006)

Gaber, M.M., Krishnaswamy, S., Zaslavsky, A.: Cost-efficient mining techniques for data streams. In: Proceedings of the second workshop on Australasian information security, pp. 109–114. Australian Computer Society, Inc., Melbourne (2004)

Gama, J.: Knowledge Discovery from Data Streams. Data Mining and Knowledge Discovery. Chapman & Hall/CRC Press, Atlanta (2010)

Gama J., Fernandes R., Rocha R.: Decision trees for mining data streams. Intell. Data Anal. 10(1), 23–46 (2006)

Gama J., Medas P.: Learning decision trees from dynamic data streams. J. Univers. Comput. Sci. 11(8), 1353–1366 (2005)

Gama, J., Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data streams. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 523–528. ACM Press, Washington, DC (2003)

Gama, J., Sebastião, R., Rodrigues, P.P.: Issues in evaluation of stream learning algorithms. In: KDD, pp. 329–338 (2009)

Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.: Mining frequent patterns in data streams at multiple time granularities. In: Kargupta, H., Joshi, A., Sivakumar, K., Yesha, Y. (eds.) Data Mining: Next Generation Challenges and Future Directions, pp. 105–124. AAAI/MIT Press, Cambridge (2004)

Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. In: VLDB, pp. 79–88. Rome, Italy (2001)

Han J., Pei J., Yin Y., Mao R.: Mining frequent patterns without candidate generation. Data Min. Knowl. Discov. 8, 53–87 (2004)

Hulten, G., Domingos, P.: Catching up with the data: research issues in mining data streams. In: Proceedings of Workshop on Research Issues in Data Mining and Knowledge Discovery, Santa Baraba, USA (2001)

Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 97–106. ACM Press, San Francisco (2001)

Ikonomovska E., Gama J., Džeroski S.: Learning model trees from evolving data streams. Data Min. Knowl. Discov. 23, 128–168 (2011). doi: 10.1007/s10618-010-0201-y

Kargupta, H., Joshi, A., Sivakumar, K., Yesha, Y.: Data Mining: Next Generation Challenges and Future Directions. AAAI Press and MIT Press, Cambridge (2004)

Kargupta, H., Park, B.-H.: Mining decision trees from data streams in a mobile environment. In: IEEE International Conference on Data Mining, pp. 281–288. IEEE Computer Society, San Jose (2001)

Kargupta H., Park B.-H., Dutta H.: Orthogonal decision trees. IEEE Trans. Knowl. Data Eng. 18, 1028–1042 (2006)

Kifer, D., Ben-David, S., Gehrke, J.: Detecting change in data streams. In: Proceedings of the International Conference on Very Large Data Bases, pp. 180–191. Morgan Kaufmann, Toronto (2004)

Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: Proceedings of 28th International Conference on Very Large Data Bases, pp. 346–357. Morgan Kaufmann, Hong Kong (2002)

Motwani R., Raghavan P.: Randomized Algorithms. Cambridge University Press, Cambridge (1997)

Muthukrishnan, S.: Data Streams: Algorithms and Applications. Now Publishers, USA (2005)

Muthukrishnan, S.: Massive data streams research: Where to go. Tech. Rep., Rutgers University (2010)

Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, Inc., San Mateo (1993)

Rodrigues P.P., Gama J., Pedroso J.P.: Hierarchical clustering of time series data streams. IEEE Trans. Knowl. Data Eng. 20(5), 615–627 (2008)

Sharfman I., Schuster A., Keren D.: A geometric approach to monitoring threshold functions over distributed data streams. ACM Trans. Database Syst. 32(4), 301–312 (2007)

Tatbul, N., Cetintemel, U., Zdonik, S., Cherniack, M., Stonebraker, M.: Load shedding in a data stream manager. In: Proceedings of the International Conference on Very Large Data Bases, pp. 309–320. VLDB Endowment, Berlin (2003)

Thakar A.R., Szalay A.S., Fekete G., Gray J.: The catalog archive server database management system. Comput. Sci. Eng. 10(1), 30–37 (2008)

Vitter J.S.: Random sampling with a reservoir. ACM Trans. Math. Softw. 11(1), 37–57 (1985)

Wald, A.: Sequential Analysis. John Wiley and Sons, Inc., New York (1947)

Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 103–114. ACM Press, Montreal (1996)