A survey on learning from data streams: current and future trends
Tóm tắt
Từ khóa
Tài liệu tham khảo
Aggarwal, C.: On biased reservoir sampling in the presence of stream evolution. In: Dayal, U., Whang, K.-Y., Lomet, D.B., Alonso, G., Lohman, G.M., Kersten, M.L., Cha, S.K., Kim, Y.-K. (eds.) Proceedings of the International Conference on Very Large Data Bases, pp. 607–618. ACM Seoul, Korea (2006)
Aggarwal, C., Han, J., Wang, J., Yu, P.: A framework for clustering evolving data streams. In: Proceedings of the International Conference on Very Large Data Bases, pp. 81–92. Morgan Kaufmann, Berlin (2003)
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 207–216. Washington, DC, USA (1993)
Alon N., Matias Y., Szegedy M.: The space complexity of approximating the frequency moments. J. Comput. Syst. Sci. 58, 137–147 (1999)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Kolaitis, P.G. (ed.) Proceedings of the 21st Symposium on Principles of Database Systems, pp. 1–16. ACM Press, Madison (2002)
Babcock, B., Datar, M., Motwani, R.: Sampling from a moving window over streaming data. In: Proceedings of the Annual ACM SIAM Symposium on Discrete Algorithms, pp. 633–634. Society for Industrial and Applied Mathematics, San Francisco (2002)
Baeza-Yates, R.A., Broder, A.Z., Maarek, Y.S.: The new frontier of web search technology, Seven challenges. In: SeCO Workshop. Lecture Notes in Computer Science, vol. 6585, pp. 3–9. Springer, Berlin (2010)
Bifet, A., Gavaldà, R.: Kalman filters and adaptive windows for learning in data streams. In: Todorovski, L., Lavrac, N. (eds.) Proceedings of the 9th Discovery Science, Lecture Notes Artificial Intelligence, vol. 4265, pp. 29–40. Springer, Barcelona (2006)
Bifet, A., Gavaldà, R.: Mining adaptively frequent closed unlabeled rooted trees in data streams. In: Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining, pp. 34–42. Las Vegas, USA (2008)
Bifet, A., Gavaldà, R.: Adaptive XML tree classification on evolving data streams. In: Machine Learning and Knowledge Discovery in Databases, European Conference, Lecture Notes in Computer Science, vol. 5781, pp. 147–162. Springer, Bled (2009)
Bifet, A., Holmes, G., Pfahringer, B.: Leveraging bagging for evolving data streams. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML/PKDD (1), Lecture Notes in Computer Science, vol. 6321, pp. 135–150. Springer, Berlin (2010)
Bifet, A., Holmes, G., Pfahringer, B., Gavaldà, R.: Improving adaptive bagging methods for evolving data streams. In: Zhou, Z.-H., Washio, T. (eds.) ACML, Lecture Notes in Computer Science, vol. 5828, pp. 23–37. Springer, Berlin (2009)
Brain, D., Webb, G.: The need for low bias algorithms in classification learning from large data sets. In: Elomaa, T., Mannila, H., Toivonen, H (eds.) Principles of Data Mining and Knowledge Discovery PKDD-02, Lecture Notes in Artificial Intelligence, vol. 2431, pp. 62–73. Springer, Helsinki (2002)
Cauwenberghs, G., Poggio, T.: Incremental and decremental support vector machine learning. In: Proceedings of the Neural Information Processing Systems (2000)
Chakrabarti, A., Ba, K.D., Muthukrishnan, S.: Estimating entropy and entropy norm on data streams. In: STACS: 23rd Annual Symposium on Theoretical Aspects of Computer Science, pp.196–205. Marseille, France (2006)
Chaudhry, N.: Stream Data Management, Chapter Introduction to Stream Data Management, pp. 1–11. Springer, Berlin (2005)
Chen R., Sivakumar K., Kargupta H.: Collective mining of Bayesian networks from heterogeneous data. Knowl. Inform. Syst. J. 6(2), 164–187 (2004)
Cormode G., Muthukrishnan S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithm 55(1), 58–75 (2005)
Cormode, G., Muthukrishnan, S., Zhuang, W.: Conquering the divide: Continuous clustering of distributed data streams. In: ICDE: Proceedings of the International Conference on Data Engineering, pp. 1036–1045. Istanbul, Turkey (2007)
Cortes C., Fisher K., Pregibon D., Rogers A., Smith F.: Hancock: a language for analyzing transactional data streams. ACM Trans. Progr. Languages Syst. 26(2), 301–338 (2004)
Datar, M., Gionis, A., Indyk, P., Motwani, R.: Maintaining stream statistics over sliding windows. In: Proceedings of Annual ACM-SIAM Symposium on Discrete Algorithms, Society for Industrial and Applied Mathematics, pp. 635–644. Springer, San Francisco (2002)
Domingos, P., Hulten, G.: Mining High-Speed Data Streams. In: Parsa, I., Ramakrishnan, R., Stolfo, S. (eds.) Proceedings of the ACM Sixth International Conference on Knowledge Discovery and Data Mining, pp. 71–80. ACM Press, Boston (2000)
Flajolet P., Martin G.N.: Probabilistic counting algorithms for data base applications. J Comput. Syst. Sci. 31(2), 182–209 (1985)
Gaber, M. M., Yu, P.S.: A framework for resource-aware knowledge discovery in data streams: a holistic approach with its application to clustering. In: ACM Symposium Applied Computing, pp. 649–656. ACM Press, Boston (2006)
Gaber, M.M., Krishnaswamy, S., Zaslavsky, A.: Cost-efficient mining techniques for data streams. In: Proceedings of the second workshop on Australasian information security, pp. 109–114. Australian Computer Society, Inc., Melbourne (2004)
Gama, J.: Knowledge Discovery from Data Streams. Data Mining and Knowledge Discovery. Chapman & Hall/CRC Press, Atlanta (2010)
Gama J., Fernandes R., Rocha R.: Decision trees for mining data streams. Intell. Data Anal. 10(1), 23–46 (2006)
Gama J., Medas P.: Learning decision trees from dynamic data streams. J. Univers. Comput. Sci. 11(8), 1353–1366 (2005)
Gama, J., Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data streams. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 523–528. ACM Press, Washington, DC (2003)
Gama, J., Sebastião, R., Rodrigues, P.P.: Issues in evaluation of stream learning algorithms. In: KDD, pp. 329–338 (2009)
Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.: Mining frequent patterns in data streams at multiple time granularities. In: Kargupta, H., Joshi, A., Sivakumar, K., Yesha, Y. (eds.) Data Mining: Next Generation Challenges and Future Directions, pp. 105–124. AAAI/MIT Press, Cambridge (2004)
Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. In: VLDB, pp. 79–88. Rome, Italy (2001)
Han J., Pei J., Yin Y., Mao R.: Mining frequent patterns without candidate generation. Data Min. Knowl. Discov. 8, 53–87 (2004)
Hulten, G., Domingos, P.: Catching up with the data: research issues in mining data streams. In: Proceedings of Workshop on Research Issues in Data Mining and Knowledge Discovery, Santa Baraba, USA (2001)
Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 97–106. ACM Press, San Francisco (2001)
Ikonomovska E., Gama J., Džeroski S.: Learning model trees from evolving data streams. Data Min. Knowl. Discov. 23, 128–168 (2011). doi: 10.1007/s10618-010-0201-y
Kargupta, H., Joshi, A., Sivakumar, K., Yesha, Y.: Data Mining: Next Generation Challenges and Future Directions. AAAI Press and MIT Press, Cambridge (2004)
Kargupta, H., Park, B.-H.: Mining decision trees from data streams in a mobile environment. In: IEEE International Conference on Data Mining, pp. 281–288. IEEE Computer Society, San Jose (2001)
Kargupta H., Park B.-H., Dutta H.: Orthogonal decision trees. IEEE Trans. Knowl. Data Eng. 18, 1028–1042 (2006)
Kifer, D., Ben-David, S., Gehrke, J.: Detecting change in data streams. In: Proceedings of the International Conference on Very Large Data Bases, pp. 180–191. Morgan Kaufmann, Toronto (2004)
Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: Proceedings of 28th International Conference on Very Large Data Bases, pp. 346–357. Morgan Kaufmann, Hong Kong (2002)
Motwani R., Raghavan P.: Randomized Algorithms. Cambridge University Press, Cambridge (1997)
Muthukrishnan, S.: Massive data streams research: Where to go. Tech. Rep., Rutgers University (2010)
Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, Inc., San Mateo (1993)
Rodrigues P.P., Gama J., Pedroso J.P.: Hierarchical clustering of time series data streams. IEEE Trans. Knowl. Data Eng. 20(5), 615–627 (2008)
Sharfman I., Schuster A., Keren D.: A geometric approach to monitoring threshold functions over distributed data streams. ACM Trans. Database Syst. 32(4), 301–312 (2007)
Tatbul, N., Cetintemel, U., Zdonik, S., Cherniack, M., Stonebraker, M.: Load shedding in a data stream manager. In: Proceedings of the International Conference on Very Large Data Bases, pp. 309–320. VLDB Endowment, Berlin (2003)
Thakar A.R., Szalay A.S., Fekete G., Gray J.: The catalog archive server database management system. Comput. Sci. Eng. 10(1), 30–37 (2008)
Wald, A.: Sequential Analysis. John Wiley and Sons, Inc., New York (1947)