How close are we to solving the problem of automated visual surveillance?
Tóm tắt
The problem of automated visual surveillance has spawned a lively research area, with 2005 seeing three conferences or workshops and special issues of two major journals devoted to the topic. These alone are responsible for somewhere in the region of 240 papers and posters on automated visual surveillance before we begin to count those presented in more general fora. Many of these systems and algorithms perform one small sub-part of the surveillance task, such as motion detection. But even with low level image processing tasks it is often difficult to compare systems on the basis of published results alone. This review paper aims to answer the difficult question “How close are we to developing surveillance related systems which are really useful?” The first section of this paper considers the question of surveillance in the real world: installations, systems and practises. The main body of the paper then considers existing computer vision techniques with an emphasis on higher level processes such as behaviour modelling and event detection. We conclude with a review of the evaluative mechanisms that have grown from within the computer vision community in an attempt to provide some form of robust evaluation and cross-system comparability.
Tài liệu tham khảo
Aguilera, J., Wildenauer, H., Kampel, M., Borg, M., Thirde, D., Ferryman, J.: Evaluation of motion segmentation quality for aircraft activity surveillance. In: Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (VS-PETS 2005), pp. 293–300. Beijing, China (2005)
Aoki, M.: Imaging and analysis of traffic scene. In: IEEE International Conference on Image Processing, vol.4, pp. 1–5. Kobe, Japan (1999)
Armitage R. (2002). To CCTV or not to CCTV? A review of current research in the effectiveness of CCTV systems in reducing crime. NACRO, London
Baumberg A. and Hogg D.C. (1996). Learning spatiotemporal models from examples. Image Vis. Comput. 14(8): 525–532
BBC news online. CCTV voyeurism story. 2005. http://www.news. bbc.co.uk/1/hi/england/merseyside/4521342.stm
Black, J., Velastin, S., Boghossian, B.: A real-time surveillance system for metropolitan railways. In: Proceedings of. International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 189–194. Como, Italy (2005)
Boiman, O., Irani, M.: Detecting irregularities in images and in video. In: Proceedings of International Conference on Computer Vision (ICCV). Beijing, China (2005)
Brand M. and Kettnaker V. (2000). Discovery and segmentation of activities in video. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 22(8): 747–757
Brand, M., Oliver, N., Pentland, A.: Coupled hidden markov models for complex action recognition. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 994–999 (1997)
Brémond, F., Thonnat, M., Zuniga, M.: Video understanding framework for automatic behavior recognition. Behav. Res. Meth. (in print) (2006)
Buxton H. (2003). Learning and understanding dynamic scene activity: a review. Image Vis. Comput. 21(1): 125–136
Buxton H. and Gong S. (1995). Visual surveillance in a dynamic and uncertain world. Artif. Intell. 78(1–2): 431–459
Dee, H.M., Hogg, D.C.: Detecting inexplicable behaviour. In: of British Machine Vision Conference (BMVC). Kingston-on-Thames, UK (2004)
Dee, H.M., Hogg, D.C.: Is it interesting? comparing human and machine judgements on the PETS dataset. In: ECCV-PETS: the Performance Evaluation of Tracking and Surveillance workshop at the European Conference on Computer Vision. Prague, Czech Republic (2004)
Ditton J., Short E.: Evaluating Scotland’s first town centre CCTV scheme. In: Norris, C., Moran, J., Armstrong, G. (eds.) Surveillance, closed circuit television and social control, pp. 155–173. Ashgate, Aldershot (1998)
François A.R.J., Nevatia R., Hobbs J. and Bolles R.C. (2005). VERL: an ontology for representing and annotating video events. IEEE Multimed. Mag. 12(4): 76–86
Galata, A., Cohn, A.G., Magee, D.R., Hogg, D.C.: Modeling interaction using learnt qualitative spatio-temporal relations and length Markov models. In: Proceedings of European Conference on Artificial Intelligence (ECAI), pp. 741–745. Lyon, France (2002)
Gong, S., Xiang, T.: Recognition of group activities using dynamic probablistic networks. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 742–749. Nice, France (2003)
Graves, A., Gong, S.: Wavelet based holistic sequence descriptor for generating video summaries. In: Proceeedings of British Machine Vision Conference (BMVC), pp. 167–176. Kingston, UK (2004)
Greenhill, D., Renno, J., Orwell, J., Jones, G.A.: Occlusion analysis: learning and utilising depth maps in object tracking. In: of British Machine Vision Conference (BMVC), pp. 467–476. Kingston, UK (2004)
Grimson, W.E.L., Stauffer, C., Romano, R., Lee, L.: Using adaptive tracking to classify and monitor activities in a site. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 246–252. Santa Barbara, CA (1998)
Hampel, F.: Robust statistics: a brief introduction and overview. In: Seminar für Statistik, Eidgenössische Technische Hochschule. Zürich, Switzerland (2001)
Hockaday, S.: Evaluation of image processing technology for applications in highway operations. Technical Report Final Report TR91-2, Transportation Research Group, California Polytechnic State University, San Luis Obispo, California (1991)
Home Office Scientific Development Branch. Evaluating ‘intelligent’ CCTV—i-LIDS: imagery library for intelligent detection systems 2005.http://www.scienceandresearch.homeoffice.gov.uk/hosdb/news-events/270405
Hongeng, S., Nevatia, R.: Multi-agent event recognition. In: of International Conference on Computer Vision (ICCV), pp. 84–91. Vancouver, Canada (2001)
Howarth, R.J., Buxton, H.: Conceptual descriptions from monitoring and watching image sequences. Image Vis. Comput. 18, 105–135 (2000)
Hu W., Tan T., Wang L. and Maybank S. (2004). A survey on visual surveillance of object motion and behaviours. IEEE Tran. Syst. Man and Cybern. 34(3): 334–352
Huang, T., Russell, S.: Object identification in a Bayesian context. In: Proceedings of International Joint Conference on Artificial Intelligence(IJCAI), pp. 1276–1283. Nagoya, Japan (1997)
Hung, H., Gong, S.: Detecting and quantifying unusual interactions by correlating salient action. In: Proceedings of International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 46–51. Como, Italy (2005)
Institute of Electrical and Electronics Engineers: IEEE standard computer dictionary: a compilation of IEEE standard computer glossaries. IEEE, New York (1990)
Intille S.S. and Bobick A.F. (2001). Recognising planned, multiperson action. Comput. Vis. Image Underst. (CVIU) 81: 414–445
Isard, M., Blake, A.: A mixed-state CONDENSATION tracker with automatic model-switching. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 107–112. Bombay, India (1998)
Isard, M., MacCormick, J.: BraMBLe: a Bayesian multiple-blob tracker. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 34–41. Vancouver, Canada (2001)
Ivanov Y.A. and Bobick A.F. (2000). Recognition of visual activities and interactions by stochastic parsing. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 22(8): 852–872
Jan, T., Piccardi, M., Hintz, T.: Detection of suspicious pedestrian behavior using modified probabilistic neural network. In: Proceedings of Image and Vision Computing, pp. 237–241. Auckland, New Zealand, 2002
Johnson, N., Galata, A., Hogg, D.C.: The acquisition and use of interaction behaviour models. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 866–871. Santa Barbara, CA (1998)
Johnson N. and Hogg D.C. (1996). Learning the distribution of object tractories for event recognition. Image Vis. Comput. 14(8): 609–615
Kalman R. (1960). A new approach to linear filtering and prediction problems. Trans. ASME J. Basic Eng. 82: 35–45
Kingston University, Mott MacDonald and Ipsotek Limited: Maximising benefits from CCTV on the railway—existing systems. Technical report, Rail Safety and Standards Board (2003)
Liberty CCTV, 2005. http://www.liberty-human-rights.org.uk/ privacy/cctv.shtml
List, T., Bins, J., Vazquez, J., Fisher, R.B.: Performance evaluating the evaluator. In: Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (VS-PETS 2005). Beijing, China (2005)
Magee D.R. and Boyle R.D. (2002). Detecting lameness using ‘ condensation’ and ‘multi-stream cyclic Hidden Markov models’. Image Vis. Comput. 20(8): 581–594
Makris D. and Ellis T. (2005). Learning semantic scene models from observing activity in visual surveillance. IEEE Trans. Syst. Man Cybern. 35(3): 397–408
Makris D. and Ellis T.J. (2002). Path detection in video surveillance. Image Vis Comput 20(12): 895–903
McCahill, M., Norris, C.: CCTV in Britain. In: On the threshold to Urban Panopticon?: Analysing the Employment of CCTV in European Cities and Assessing its Social and Political Impacts. Technical University Berlin (2003)
McCahill, M., Norris, C.: CCTV systems in London: their structures and practices. In: On the threshold to Urban Panopticon?: Analysing the Employment of CCTV in European Cities and Assessing its Social and Political Impacts. Technical University Berlin (2003)
McKenna S.J. and Nait Charif H. (2004). Summarising contextual activity and detecting unusual inactivity in a supportive home environment. Pattern Anal. Appl. 7(4): 386–401
Medioni G., Cohen I., Brémond F., Hongeng S. and Nevatia R. (2001). Event detection and analysis from video streams. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 23(8): 873–889
Meer P.: Robust techniques for computer vision. In: Medioni, G., Kang, S.B. (ed.) Emerging topics in computer vision pp. 107–190. Prentice Hall, Englewood cliffs (2004)
Morris R.J. and Hogg D.C. (2000). Statistical models of object interaction. Int. J. Comput. Vis. 37(2): 209–215
Needham, C.J., Boyle, R.D.: Performance evaluation metrics and statistics for postitional tracker evaluation. In: Proceedings of International Conference on Computer Vision Systems, pp. 278–289. Austria (2003)
Norris C. and Armstrong C. (1999). The Maximum Surveillance Society. Berg, Oxford
Norris C., McCahill M. and Wood D. (2004). Editorial: the growth of CCTV: a global perspective on the international diffusion of video surveillance in publicly accessible space. Surveill. Soc. 2(2/3): 110–135
Oliver, N., Rosario, B., Pentland, A.: Statistical modeling of human interactions. In: Proceedings of IEEE CVPR Workshop on the Interpretation of Visual Motion, pp. 39–46. Santa Barbara, CA (1998)
Oliver N.M., Rosario B. and Pentland A.P. (2000). A Bayesian computer system for modeling human interactions. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 22(8): 831–843
Pasula, H., Russell, S., Ostland, M., Ritov, Y.: Tracking many objects with many sensors. In: Proceedings of International Joint Conference on Artificial Intelligence(IJCAI), pp. 1160–1171. Stockholm, Sweden (1999)
Remagnino, P., Baumberg, A., Grove, T., Hogg, D.C., Tan, T., Worrall, A., Baker, K.: An integrated traffic and pedestrian model-based vision system. In: Proceedings of British Machine Vision Conference (BMVC), pp. 380–389. Essex, UK (1997)
Remagnino, P., Tan, T., Baker, K.: Agent orientated annotation in model based visual surveillance. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 857–862. Bombay, India (1998)
Remagnino P., Tan T. and Baker K. (1998). Multi-agent visual surveillance of dynamic scenes. Image Vis. Comput. 16: 529–532
Robertson, N., Reid, I.: Behaviour understanding in video: a combined method. In: Proceedings of International Conference on Computer Vision (ICCV). Beijing, China (2005)
Rowe, N.C.: Detecting suspicious behaviour from positional information. In: Modelling Others from Observations Workshop at IJCAI. Edinburgh, Scotland (2005)
Sacks H. (1972). Notes on police assessment of moral character. In: Sudnow, D. (eds) Studies in social interaction., pp 280–293. Free Press, New York
Sage, K.H., Buxton, H.: Joint spatial and temporal structure learning for task based control. In: Proceedings of International Conference on Pattern Recognition (ICPR), pp. 48–51. Cambridge, UK (2004)
Schwerdt, K., Maman, D., Bernas, P., Paul, E.: Target segmentation and event detection at video-rate: the eagle project. In: Proceedings of International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 183–188. Como, Italy (2005)
Scödl, A., Essa, I.: Depth layers from occlusions. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 339–644. Kawai, Hawaii (2001)
Senior, A.: Tracking people with probabilistic appearance models. In: IEEE workshop on Performance Evaluation of Tracking and Surveillance, pp. 48–55. Copenhagen, Denmark (2002)
Seyve, C.: Metro railway security algorithms with real world experience adapted to the RATP dataset. In: Proceedings of International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 177–182. Como, Italy (2005)
Sherrah, J., Gong, S.: Automated detection of localised visual events over varying temporal scales. In: Proceedings of European Workshop on Advanced Video-based Surveillance Systems, pp. 215–227. Kingston, UK (2001)
Sherrah, J., Gong, S.: Continuous global evidence-based modality fusion for simultaneous tracking of multiple objects. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 42–49. Vancouver, Canada (2001)
Siebel, N.T., Maybank, S.: The advisor visual surveillance system. In: Proceedings of the ECCV 2004 workshop Applications of Computer Vision (ACV’04), pp. 103–111. Prague, Czech Republic (2004)
Siegal S. and Castellan N.J. (1988). Nonparametric statistics for the behavioral sciences, 2nd edn. McGraw Hill, Singapore
Silogic: Evaluation du traitement et de l’interpretation de séquences video . Introduction to evaluation and metrics, 2005. Available from http://www.silogic.fr/etiseo/bibliothequeDocuments00010058. html
Skinns, D.: Crime reduction, diffusion and displacement: the effectiveness of CCTV. In: Norris, C., Moran, J., Armstrong, G. (eds.) Surveillance, closed circuit television and social control, pp. 175–188. Ashgate, Aldershot (1988)
Smith G.J.D. (2004). Behind the screens: examining constructions of deviance and informal practices among CCTV control room operators in the UK. Surveil Soc. 2(2/3): 376–395
Spirito, M., Regazzoni, C.S., Marcenaro, L.: Automatic detection of dangerous events for underground surveillance. In: Proceedings of International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 195–200. Como, Italy (2005)
Stauffer, C.: Automatic hierarchical classification using time-based co-occurrences. In: Proceedings of. Computer Vision and Pattern Recognition (CVPR), pp. 333–339. Ft. Collins, CO (1999)
Stauffer, C.: Estimating tracking sources and sinks. In: Proceedings of 2nd IEEE workshop on event mining, pp. 259–266. Madison, WI (2003)
Stauffer C. and Grimson E. (2000). Learning patterns of activity using real-time tracking. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 22(8): 747–757
Stauffer, C., Grimson, W.: Adaptive background mixture models for real-time tracking. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 246–252. Fort Collins, CO (1999)
Sumpter N. and Bulpitt A. (1999). Learning spatio-temporal patterns for predicting object behaviour. Image Vis. Comput. 18(9): 697–704
Svensson, M.S., Heath, C., Luff, P.: Monitoring practice: event detection and system design. In: Velastin, S.A., Remagnino, P. (eds.) Intelligent Distributed Surveillance Systems. The Institution of Electrical Engineers (IEE) (2005)
Tilley, N.: Evaluating the effectiveness of CCTV schemes. In: Norris, C., Moran, J., Armstrong, G. (eds.), Surveillance, closed circuit television and social control, pp. 139–153. Ashgate, Aldershot (1998)
Troscianko T., Holmes A., Stillman J., Mirmehdi M., Wright D. and Wilson A. (2004). What happens next? the predictability of natural behaviour viewed through CCTV cameras. Perception 33(1): 87–101
Velastin S.A., Boghossian B.A., Lo B.P.L., Sun J. and Vicencio-Silva M.A. (2005). PRISMATICA: toward ambient intelligence in public transport environments. IEEE Trans. Syst. Man Cybern. Part A 35(1): 164–182
Viola, P., Jones, M.J., Snow, D.: Detecting pedestrians using patterns of motion and appearance. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 734–741. Nice, France (2003)
Vogler C. and Metaxas D. (2001). A framework for recognising the simultaneous aspects of american sign language. Comput. Vis. Image Underst. (CVIU) 81: 358–384
Wallace, E., Diffley, C.: CCTV control room ergonomics. Technical Report 14/98, Police Scientific Development Branch (PSDB), UK Home Office (1988)
Wallace, R.: Finding natural clusters through entropy minimization. Ph.D. Thesis, CMU (1989)
Wu, G., Wu, Y., Jiao, L., Wang, Y., Chang, E.: Multicamera -temporal fusion and biased sequence-data learning for security surveillance. In: Proceedings. of ACM International Conference on Multimedia, November 2003., pp. 528–538. Berkeley, CA (2003)
Xu, M., Ellis, T.: Partial observation vs. blind tracking through occlusion. In: Proceedings of British Machine Vision Conference (BMVC), pp. 777–786. Cardiff, UK (2002)
Young, D.P., Ferryman, J.M.: PETS metrics on-line performance evaluation service. In: Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (VS-PETS 2005). Beijing, China (2005)
Zhong, H., Shi, J., Visontai, M.: Detecting unusual activity in video. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), p. 819826. Washington, DC (2004)
Zilani, F., Velastin, S., Porikli, F., Marcenaro, L., Kelliher, T., Cavallaro, A., Bruneaut, P.: Performance evaluation of event detection solutions: the CREDS experience. In: Proceedings of International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 201–206. Como, Italy (2005)