Nội dung được dịch bởi AI, chỉ mang tính chất tham khảo
Xây dựng các mô hình sinh 3D từ dữ liệu tối thiểu
Springer Science and Business Media LLC - Trang 1-26 - 2023
Tóm tắt
Chúng tôi đề xuất một phương pháp để xây dựng các mô hình sinh của các đối tượng 3D từ một lưới 3D duy nhất và cải thiện chúng thông qua việc học không giám sát từ các hình ảnh 2D với ít dữ liệu. Phương pháp của chúng tôi tạo ra một mô hình biến hình 3D đại diện cho hình dạng và độ phản xạ theo các quá trình Gaussian. Trong khi các phương pháp trước đây chủ yếu xây dựng các mô hình biến hình 3D từ nhiều quét 3D chất lượng cao thông qua phân tích thành phần chính, chúng tôi xây dựng các mô hình biến hình 3D từ một quét hoặc mẫu duy nhất. Như chúng tôi đã chứng minh trong miền khuôn mặt, các mô hình này có thể được sử dụng để suy luận lại các cấu trúc 3D từ dữ liệu 2D (đồ họa nghịch đảo) hoặc dữ liệu 3D (đăng ký). Cụ thể, chúng tôi cho thấy phương pháp của chúng tôi có thể được sử dụng để nhận diện khuôn mặt chỉ với một mẫu 3D duy nhất (tổng cộng một quét, không phải mỗi người một quét). Chúng tôi mở rộng mô hình của mình thành một khung học không giám sát sơ bộ cho phép học phân phối của các khuôn mặt 3D bằng cách sử dụng một mẫu 3D và một số hình ảnh 2D ít ỏi. Phương pháp của chúng tôi được thúc đẩy như một mô hình tiềm năng cho sự phát triển của nhận thức khuôn mặt ở trẻ sơ sinh, những người dường như bắt đầu với một mẫu khuôn mặt bẩm sinh và sau đó phát triển một hệ thống linh hoạt để nhận biết cấu trúc 3D của bất kỳ khuôn mặt mới nào chỉ từ kinh nghiệm với các hình ảnh 2D của một số khuôn mặt quen thuộc.
Từ khóa
#mô hình sinh 3D #lưới 3D #học không giám sát #nhận diện khuôn mặt #cấu trúc 3DTài liệu tham khảo
Abrevaya, V. F., Wuhrer, S., & Boyer, E. (2018). Multilinear autoencoder for 3D face model learning. In 2018 IEEE winter conference on applications of computer vision (WACV) (pp. 1–9). https://doi.org/10.1109/WACV.2018.00007.
Bartoli, A., Gérard, Y., Chadebecq, F., & Collins, T. (2012). On template-based reconstruction from a single view: Analytical solutions and proofs of well-posedness for developable, isometric and conformal surfaces. In 2012 IEEE conference on computer vision and pattern recognition (pp. 2026–2033). https://doi.org/10.1109/CVPR.2012.6247906.
Blanz, V. & Vetter, T. (1999). A morphable model for the synthesis of 3D faces. In Proceedings of the 26th annual conference on computer graphics and interactive techniques (pp. 187–194).
Blanz, V., & Vetter, T. (2003). Face recognition based on fitting a 3D morphable model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9), 1063–1074. https://doi.org/10.1109/TPAMI.2003.1227983
Booth, J., Roussos, A., Ponniah, A., Dunaway, D., & Zafeiriou, S. (2018). Large scale 3D morphable models. International Journal of Computer Vision, 126(2), 233–254. https://doi.org/10.1007/s11263-017-1009-7
Bouritsas, G., Bokhnyak, S., Ploumpis, S., Bronstein, M., & Zafeiriou, S. (2019). Neural 3D morphable models: Spiral convolutional networks for 3D shape representation learning and generation. In 2019 IEEE/CVF international conference on computer vision (ICCV). https://doi.org/10.1109/ICCV.2019.00731.
Brunet, F., Hartley, R., Bartoli, A., Navab, N., & Malgouyres, R. (2011). Monocular template-based reconstruction of smooth and inextensible surfaces. In R. Kimmel, R. Klette, & A. Sugimoto (Eds.), Computer vision: ACCV 2010. Lecture notes in computer science (pp. 52–66). Springer. https://doi.org/10.1007/978-3-642-19318-7_5
Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E., & Sheikh, Y. (2019). OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. arXiv:1812.08008 [cs]. Accessed 2020-07-26.
Cashman, T. J., & Fitzgibbon, A. W. (2012). What shape are dolphins? Building 3D morphable models from 2D images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 232–244.
Chaudhuri, S., Ritchie, D., Wu, J., Xu, K., & Zhang, H. (2020). Learning generative models of 3D structures. In Computer graphics forum (Vol. 39, pp. 643–666). Wiley Online Library.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In CVPR09.
Egger, B., Schönborn, S., Schneider, A., Kortylewski, A., Morel-Forster, A., Blumer, C., & Vetter, T. (2018). Occlusion-aware 3d morphable models and an illumination prior for face image analysis. International Journal of Computer Vision, 126(12), 1269–1287.
Egger, B., Smith, W. A., Tewari, A., Wuhrer, S., Zollhoefer, M., Beeler, T., Bernard, F., Bolkart, T., Kortylewski, A., Romdhani, S., et al. (2020). 3D morphable face models-past, present, and future. ACM Transactions on Graphics (TOG), 39(5), 1–38.
Gerig, T., Morel-Forster, A., Blumer, C., Egger, B., Lüthi, M., Schönborn, S. & Vetter, T. (2018). Morphable face models—An open framework. In 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018) (pp. 75–82). IEEE.
Germine, L. T., Duchaine, B., & Nakayama, K. (2011). Where cognitive development and aging meet: Face learning ability peaks after age 30. Cognition, 118(2), 201–210.
Gross, R., Matthews, I., Cohn, J., Kanade, T., & Baker, S. (2010). Multi-PIE. Image and Vision Computing, 28(5), 807–813. https://doi.org/10.1016/j.imavis.2009.08.002
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Identity mappings in deep residual networks. In European conference on computer vision (pp. 630–645). Springer.
Hinton, G. E., Dayan, P., Frey, B. J., & Neal, R. M. (1995). The “wake-sleep” algorithm for unsupervised neural networks. Science, 268(5214), 1158–1161.
Horn, R. A. (2012). Matrix analysis (2nd ed.). Cambridge University Press.
Huang, G. B., Mattar, M., Berg, T., & Learned-Miller, E. (2008). Labeled faces in thewild:Adatabase for studying face recognition in unconstrained environments. In Workshop on Faces in ’Real-Life’ Images: Detection, Alignment, and Recognition, Erik Learned-Miller and Andras Ferencz and Frédéric Jurie, Oct 2008, Marseille, France. https://inria.hal.science/inria-00321923
Kellman, P. J. & Arterberry, M. E. (2007). Infant Visual Perception. In W. Damon, R. M. Lerner, D. Kuhn & R. Siegler (Eds.). Handbook of Child Psychology Wiley. https://doi.org/10.1002/9780470147658.chpsy0203
Kemelmacher-Shlizerman, I., & Basri, R. (2010). 3D face reconstruction from a single image using a single reference face shape. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(2), 394–405. https://doi.org/10.1109/TPAMI.2010.63
Kilian, M., Mitra, N. J., & Pottmann, H. (2007). Geometric modeling in shape space. In ACM SIGGRAPH 2007 papers (p. 64).
Leopold, D. A., O’Toole, A. J., Vetter, T., & Blanz, V. (2001). Prototype-referenced shape encoding revealed by high-level aftereffects. Nature Neuroscience, 4(1), 89–94. https://doi.org/10.1038/82947
Li, T., Bolkart, T., Black, M. J., Li, H., & Romero, J. (2017). Learning a model of facial shape and expression from 4D scans. ACM Transactions on Graphics, 36(6), 194–119417. https://doi.org/10.1145/3130800.3130813
Lüthi, M., Forster, A., Gerig, T., & Vetter, T. (2017). Shape modeling using Gaussian process morphable models. In G. Zheng, S. Li, & G. Székely (Eds.), Statistical shape and deformation analysis (pp. 165–191). Academic Press. https://doi.org/10.1016/B978-0-12-810493-4.00008-0
Lüthi, M., Gerig, T., Jud, C., & Vetter, T. (2017). Gaussian process morphable models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(8), 1860–1873.
Malti, A., Bartoli, A., & Collins, T. (2011). A pixel-based approach to template-based monocular 3D reconstruction of deformable surfaces. In 2011 IEEE international conference on computer vision workshops (ICCV Workshops) (pp. 1650–1657). https://doi.org/10.1109/ICCVW.2011.6130447.
Meltzoff, A. N., & Moore, M. K. (1989). Imitation in newborn infants: Exploring the range of gestures imitated and the underlying mechanisms. Developmental Psychology, 25(6), 954.
Mercer, J. (1909). Functions of positive and negative type, and their connection with the theory of integral equations. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 209, 415–446.
Morel-Forster, A. (2016). Generative shape and image analysis by combining Gaussian processes and mcmc sampling. PhD thesis, University of Basel.
Moreno-Noguer, F., Salzmann, M., Lepetit, V., & Fua, P. (2009). Capturing 3D stretchable surfaces from single images in closed form. In 2009 IEEE conference on computer vision and pattern recognition (pp. 1842–1849). https://doi.org/10.1109/CVPR.2009.5206758.
Moreno-Noguer, F., Porta, J. M., & Fua, P. (2010). Exploring ambiguities for monocular non-rigid shape estimation. In K. Daniilidis, P. Maragos, & N. Paragios (Eds.), Computer vision: ECCV 2010. Lecture notes in computer science (pp. 370–383). Springer. https://doi.org/10.1007/978-3-642-15558-1_27
Östlund, J., Varol, A., Ngo, D. T., & Fua, P. (2012). Laplacian meshes for monocular 3D shape recovery. In A. Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato, & C. Schmid (Eds.), Computer vision: ECCV 2012. Lecture notes in computer science (pp. 412–425). Springer. https://doi.org/10.1007/978-3-642-33712-3_30
Ovsjanikov, M., Li, W., Guibas, L., & Mitra, N. J. (2011). Exploration of continuous variability in collections of 3D shapes. ACM Transactions on Graphics, 30(4), 33–13310. https://doi.org/10.1145/2010324.1964928
Patel, A., & Smith, W. A. P. (2012). Driving 3D morphable models using shading cues. Pattern Recognition, 45(5), 1993–2004. https://doi.org/10.1016/j.patcog.2011.11.013
Paysan, P., Knothe, R., Amberg, B., Romdhani, S. & Vetter, T. (2009). A 3D face model for pose and illumination invariant face recognition. In 2009 Sixth IEEE international conference on advanced video and signal based surveillance (pp. 296–301). IEEE.
Powell, L. J., Kosakowski, H. L., & Saxe, R. (2018). Social origins of cortical face areas. Trends in Cognitive Sciences, 22(9), 752–763.
Ranjan, A., Bolkart, T., Sanyal, S., & Black, M. J. (2018). Generating 3D faces using convolutional mesh autoencoders. In V. Ferrari, M. Hebert, C. Sminchisescu, & Y. Weiss (Eds.), Computer vision: ECCV 2018. Lecture notes in computer science (pp. 725–741). Springer. https://doi.org/10.1007/978-3-030-01219-9_43
Rasmussen, C. E. (2003). Gaussian processes in machine learning. In Summer school on machine learning. Springer.
Salzmann, M., & Fua, P. (2011). Linear local models for monocular reconstruction of deformable surfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(5), 931–944. https://doi.org/10.1109/TPAMI.2010.158
Salzmann, M., Urtasun, R., & Fua, P. (2008). Local deformation models for monocular 3D shape recovery. In 2008 IEEE conference on computer vision and pattern recognition (pp. 1–8). https://doi.org/10.1109/CVPR.2008.4587499.
Salzmann, M., Moreno-Noguer, F., Lepetit, V., & Fua, P. (2008). Closed-form solution to non-rigid 3D surface registration. In D. Forsyth, P. Torr, & A. Zisserman (Eds.), Computer vision: ECCV 2008. Lecture notes in computer science (pp. 581–594). Springer. https://doi.org/10.1007/978-3-540-88693-8_43
Schönborn, S., Egger, B., Forster, A., & Vetter, T. (2015). Background modeling for generative image models. Computer Vision and Image Understanding, 136, 117–127.
Schönborn, S., Egger, B., Morel-Forster, A., & Vetter, T. (2017). Markov chain Monte Carlo for automated face image analysis. International Journal of Computer Vision, 123(2), 160–183.
Shaji, A., Varol, A., Torresani, L., & Fua, P. (2010). Simultaneous point matching and 3D deformable surface reconstruction. In 2010 IEEE computer society conference on computer vision and pattern recognition (pp. 1221–1228). https://doi.org/10.1109/CVPR.2010.5539827.
Slater, A., Von der Schulenburg, C., Brown, E., Badenoch, M., Butterworth, G., Parsons, S., & Samuels, C. (1998). Newborn infants prefer attractive faces. Infant Behavior and Development, 21(2), 345–354.
Styner, M. A., Rajamani, K. T., Nolte, L.-P., Zsemlye, G., Székely, G., Taylor, C. J., & Davies, R. H. (2003). Evaluation of 3d correspondence methods for model building. In Biennial international conference on information processing in medical imaging (pp. 63–75). Springer.
Sutherland, S., Egger, B., & Tenenbaum, J. (2021). Building 3d morphable models from a single scan. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 2514–2524).
Szabó, A., Meishvili, G., & Favaro, P. (2019). Unsupervised generative 3d shape learning from natural images. arXiv preprint arXiv:1910.00287.
Tegang, N. H. N., Fouefack, J.-R., Borotikar, B., Burdin, V., Douglas, T. S., & Mutsvangwa, T. E. (2020). A Gaussian process model based generative framework for data augmentation of multi-modal 3d image volumes. In International workshop on simulation and synthesis in medical imaging (pp. 90–100). Springer.
Tewari, A., Seidel, H.-P., Elgharib, M., & Theobalt, C., et al. (2020). Learning complete 3d morphable face models from images and videos. arXiv preprint arXiv:2010.01679.
Tewari, A., Zollhöfer, M., Garrido, P., Bernard, F., Kim, H., Pérez, P., & Theobalt, C. (2018). Self-supervised multi-level face model learning for monocular reconstruction at over 250 Hz. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2549–2559).
Tran, L., Liu, F., & Liu, X. (2019). Towards high-fidelity nonlinear 3D face morphable model. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1126–1135).
Tran, L., & Liu, X. (2019). On learning 3D face morphable model from in-the-wild images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(1), 157–171. https://doi.org/10.1109/TPAMI.2019.2927975.
Tuan Tran, A., Hassner, T., Masi, I., & Medioni, G. (2017). Regressing robust and discriminative 3d morphable models with a very deep neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5163–5172).
Umeyama, S. (1991). Least-squares estimation of transformation parameters between two point patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(4), 376–380. https://doi.org/10.1109/34.88573
Valentine, T., Lewis, M. B., & Hills, P. J. (2016). Face-space: A unifying concept in face recognition research. The Quarterly Journal of Experimental Psychology, 69(10), 1996–2019.
Wah, C., Branson, S.,Welinder, P., Perona, P., & Belongie, S. J. (2011). The Caltech-UCSD Birds-200-2011 dataset. California Institute of Technology, No. CNS-TR-2011-001. https://www.vision.caltech.edu/datasets/cub_200_2011/
Wu, S., Rupprecht, C., & Vedaldi, A. (2020). Unsupervised learning of probably symmetric deformable 3d objects from images in the wild. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1–10).
Yildirim, I., Belledonne, M., Freiwald, W., & Tenenbaum, J. (2020). Efficient inverse graphics in biological face processing. Science Advances, 6(10), 5979.
Yu, R., Russell, C., Campbell, N. D. F., & Agapito, L. (2015). Direct, dense, and deformable: Template-based non-rigid 3D reconstruction from RGB video. In 2015 IEEE international conference on computer vision (ICCV) (pp. 918–926). https://doi.org/10.1109/ICCV.2015.111.
Yuille, A., & Kersten, D. (2006). Vision as Bayesian inference: Analysis by synthesis? Trends in Cognitive Sciences, 10(7), 301–308.
Zhang, R., Tsai, P.-S., Cryer, J. E., & Shah, M. (1999). Shape-from-shading: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(8), 690–706. https://doi.org/10.1109/34.784284
Zivanov, J., Forster, A., Schönborn, S., & Vetter, T. (2013). Human face shape analysis under spherical harmonics illumination considering self occlusion. In 2013 International conference on biometrics (ICB) (pp. 1–8). https://doi.org/10.1109/ICB.2013.6612967.