Mạng xác suất thích ứng với biến ẩn

Machine Learning - Tập 29 - Trang 213-244 - 1997
John Binder1, Daphne Koller2, Stuart Russell1, Keiji Kanazawa3
1Computer Science Division University of California, Berkeley
2Computer Science Department, Stanford University, Stanford
3Microsoft Corporation, Redmond

Tóm tắt

Mạng xác suất (còn được biết đến với tên gọi mạng niềm tin Bayes) cho phép mô tả ngắn gọn các mối quan hệ ngẫu nhiên phức tạp giữa nhiều biến ngẫu nhiên. Chúng được sử dụng rộng rãi cho lý luận không chắc chắn trong trí thông minh nhân tạo. Trong bài báo này, chúng tôi điều tra vấn đề học mạng xác suất với cấu trúc đã biết và các biến ẩn. Đây là một vấn đề quan trọng, vì cấu trúc dễ dàng được thu thập từ các chuyên gia hơn so với các giá trị số, và thế giới hiếm khi hoàn toàn có thể quan sát được. Chúng tôi trình bày một thuật toán dựa trên đạo hàm và chỉ ra rằng đạo hàm có thể được tính toán tại chỗ, sử dụng thông tin có sẵn như một sản phẩm phụ của các thuật toán suy diễn chuẩn cho mạng xác suất. Kết quả thực nghiệm của chúng tôi cho thấy rằng việc sử dụng kiến thức trước đó về cấu trúc, ngay cả khi có các biến ẩn, có thể cải thiện đáng kể tốc độ học của mạng xác suất. Chúng tôi mở rộng phương pháp đến các mạng mà các bảng xác suất điều kiện được mô tả bằng một số lượng nhỏ tham số. Các ví dụ bao gồm các nút noisy-OR và mạng xác suất động. Chúng tôi chỉ ra cách mà cấu trúc bổ sung này có thể được khai thác bởi thuật toán của chúng tôi để tăng tốc độ học còn hơn nữa. Chúng tôi cũng phác thảo một sự mở rộng cho các mạng lai, trong đó một số nút nhận giá trị trong miền liên tục.

Từ khóa

#mạng xác suất #biến ẩn #thuật toán #suy diễn #trí thông minh nhân tạo

Tài liệu tham khảo

Andersen, S. K., Olesen, K. G., Jensen, F. V., & Jensen, F. (1989). HUGIN—A shell for building Bayesian belief universes for expert systems. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 1080–1085). Detroit, MI: Morgan Kaufmann. Apolloni, B., & de Falco, D. (1991). Learning by asymmetric parallel Boltzmann machines. Neural Computation, 3, 402–408. Baum, E. B., & Wilczek, F. (1988). Supervised learning of probability distributions by neural networks. In Anderson, D. Z. (Ed.), Neural Information Processing Systems, (pp. 52–61). American Institute of Physics, New York. Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford University Press, Oxford. Bridle, J. S. (1990). Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In F. Fogelman Soulié & J. Hérault, (Eds.), Neurocomputing: Algorithms, architectures and applications. Berlin: Springer-Verlag. Buntine, W. L. (1994). Operations for learning with graphical models. Journal of Artificial Intelligence Research, 2, 159–225. Buntine, W. L. (1996). A guide to the literature on learning probabilistic networks from data. IEEE Transactions on Knowledge and Data Engineering, 8, 195–210. Cooper, G., & Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9, 309–347. Daganzo, C. (1979). Multinomial probit: The theory and its application to demand forecasting. Academic Press, New York. Dasgupta, S. (1997). The sample complexity of learning Bayesian nets. Machine Learning, 29, 165–180. Dean, T., & Kanazawa, K. (1988). Probabilistic temporal reasoning. Proceedings of the Seventh National Conference on Artificial Intelligence (pp. 524–528). St. Paul, MN: American Association for Artificial Intelligence. Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39 (Series B), 1–38. Finney, D. J. (1947). Probit analysis; a statistical treatment of the sigmoid response curve. Cambridge, UK: Cambridge University Press. Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian network classifiers. Machine Learning, 29, 131–163. Friedman, N., & Goldszmidt, M. (1996). Learning Bayesian networks with local structure. Uncertainty in Artificial Intelligence: Proceedings of the Twelfth Conference (pp. 252–262). Portland, OR: Morgan Kaufmann. Friedman, N., & Yakhini, M. (1996). On the sample complexity of learning Bayesian networks. Uncertainty in Artificial Intelligence: Proceedings of the Twelfth Conference (pp. 274–282). Portland, OR: Morgan Kaufmann. Fung, R., & Chang, K. C. (1989). Weighting and integrating evidence for stochastic simulation in Bayesian networks. Proceedings of the Fifth Conference on Uncertainty in Artificial Intelligence. Windsor, Ontario: Morgan Kaufmann. Geiger, D., Heckerman, D., & Meek, C. (1996). Asymptotic model selection for directed networks with hidden variables. Uncertainty in Artificial Intelligence: Proceedings of the Twelfth Conference (pp. 283–290). Portland, OR: Morgan Kaufmann. Ghahramani, Z., & Jordan, M. I. (1997). Factorial hidden Markov models. Machine Learning, 29, 245–273. Golmard, J.-L., & Mallet, A. (1991). Learning probabilities in causal trees from incomplete databases. Revue d'Intelligence Artificielle, 5, 93–106. Haddawy, P. (1994). Generating Bayesian networks from probability logic knowledge bases. Uncertainty in Artificial Intelligence: Proceedings of the Tenth Conference (pp. 262–269). Seattle, WA: Morgan Kaufmann. Heckerman, D. (1996). A tutorial on learning with Bayesian networks (Technical report MSR-TR–95–06). Microsoft Research, Redmond, Washington. Heckerman, D., Geiger, D., & Chickering, M. (1994). Learning Bayesian networks: The combination of knowledge and statistical data (Technical report MSR-TR–94–09). Microsoft Research, Redmond,Washington. Heckerman, D., & Wellman, M. (1995). Bayesian networks. Communications of the Association for Computing Machinery, 38, 27–30. Koller, D., & Pfeffer, A. (1996). Learning the parameters of first order probabilistic rules. Working Notes of the AAAI Fall Symposium on Learning Complex Behaviors in Adaptive Intelligent Systems. Cambridge, Massachusetts. Kwoh, C.-K., & Gillies, D. F. (1996). Using hidden nodes in Bayesian networks. Artificial Intelligence, 88, 1–38. Laskey, K. B. (1990). Adapting connectionist learning to Bayes networks. International Journal of Approximate Reasoning, 4, 261–282. Lauritzen, S. L. (1995). The EM algorithm for graphical association models with missing data. Computational Statistics and Data Analysis, 19, 191–201. Lauritzen, S. L., & Wermuth, N. (1989). Graphical models for associations between variables, some of which are qualitative and some quantitative. Annals of Statistics, 17, 31–57. Lauritzen, S. L. (1991). The EM algorithm for graphical association models with missing data (Technical report TR–91–05). Department of Statistics, Aalborg University. Lauritzen, S. L., & Spiegelhalter, D. J. (1988). Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society, B 50, 157–224. MacKay, D. J. C. (1992). A practical Bayesian framework for back-propagation networks. Neural Computation, 4, 448–472. Neal, R. M. (1992a). Asymmetric parallel Boltzmann machines are belief networks. Neural Computation, 4, 832–834. Neal, R. M. (1992b). Connectionist learning of belief networks. Artificial Intelligence, 56, 71–113. Neal, R. M., & Hinton, G. E. (1993). A new view of the EM algorithm that justifies incremental and other variants. Unpublished manuscript, Department of Computer Science, University of Toronto, Toronto, Ontario. Ngo, L., Haddawy, P., & Helwig, J. (1995). A theoretical framework for context-sensitive temporal probability model construction with application to plan projection. Uncertainty in Artificial Intelligence: Proceedings of the Eleventh Conference (pp. 419–426). Montreal, Canada: Morgan Kaufmann. Olesen, K. G., Lauritzen, S. L., & Jensen, F. V. (1992). aHUGIN: A system for creating adaptive causal probabilistic networks. Uncertainty in Artificial Intelligence: Proceedings of the Eighth Conference. Stanford, CA: Morgan Kaufmann. Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Mateo, CA: Morgan Kaufmann. Poggio, T., & Girosi, F. (1990). Regularization algorithms for learning that are equivalent to multilayer networks. Science, 247, 978–982. Pradhan, M., Provan, G. M., Middleton, B., & Henrion, M. (1994). Knowledge engineering for large belief networks. Uncertainty in Artificial Intelligence: Proceedings of the Tenth Conference (pp. 484–490). Seattle, WA: Morgan Kaufmann. Price, W. H. (1992). Numerical recipes in C. Cambridge, UK: Cambridge University Press. Russell, S. J., & Grosof, B. (1987). A declarative approach to bias in concept learning. Proceedings of the Sixth National Conference on Artificial Intelligence. Seattle, WA: Morgan Kaufmann. Shachter, R. D., & Peot, M. A. (1989). Simulation approaches to general probabilistic inference on belief networks. Proceedings of the Fifth Conference on Uncertainty in Artificial Intelligence. Windsor, Ontario: Morgan Kaufmann. Shachter, R. S., & Kenley, C. R. (1989). Gaussian influence diagrams. Management Science, 35, 527–550. Smyth, P., Heckerman, D., & Jordan, M. (1997). Probabilistic independence networks for hidden Markov probability models. Neural Computation, 9, 227–269. Spiegelhalter, D., Dawid, P., Lauritzen, S., & Cowell, R. (1993). Bayesian analysis in expert systems. Statistical Science, 8, 219–282. Spiegelhalter, D. J., & Cowell, R. G. (1992). Learning in probabilistic expert systems. In Bernardo, J. M., Berger, J. O., Dawid, A. P., & Smith, A. F. M. (Eds.), Bayesian Statistics 4. Oxford, UK: Oxford University Press. Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, prediction, and search. Berlin: Springer-Verlag. Tadepalli, P. (1993). Learning from queries and examples with tree-structured bias. Proceedings of the Tenth International Conference on Machine Learning. Amherst, MA: Morgan Kaufmann. Thiesson, B. (1995a). Accelerated quantification of Bayesian networks with incomplete data. Proceedings of the First International Conference on Knowledge Discovery and Data Mining (pp. 306–311). Montreal, Canada: AAAI Press. Thiesson, B. (1995b). Score and information for recursive exponential models with incomplete data (Technical report R–95–2020). Institute for Electronic Systems, Aalborg University, Denmark. Titterington, D., Smith, A., & Makov, U. (1985). Statistical analysis of finite mixture distributions. New York: John Wiley. Towell, G., & Shavlik, J. (1994). Knowledge-based artificial neural networks. Artificial Intelligence, 70, 119–165. Wellman, M. P. (1990). Fundamental concepts of qualitative probabilistic networks. Artificial Intelligence, 44, 257–303. Werbos, P. J. (1990). Backpropagation through time: What it does and how to do it. Proceedings of the IEEE, 78, 1550–1560. Zweig, G. (1996). Methods for learning dynamic probabilistic networks and a comparison with hidden Markov models. Master's thesis, Computer Science Division, University of California, Berkeley.