Call for papersMachine Vision and Applications - Tập 6 - Trang 179-179 - 1993
The general framework for few-shot learning by kernel HyperNetworksMachine Vision and Applications - Tập 34 Số 4 - 2023
Marcin Sendera, Marcin Przewięźlikowski, Jan Miksa, Mateusz Rajski, Konrad Karanowski, Maciej Zięba, Jacek Tabor, Przemysław Spurek
AbstractFew-shot models aim at making predictions using a minimal number of labeled examples from a given task. The main challenge in this area is the one-shot setting, where only one element represents each class. We propose the general framework for few-shot learning via kernel HyperNetworks—the fusion of kernels and hypernetwork paradigm. Firstly, we introduce the classical realization of this framework, dubbed HyperShot. Compared to reference approaches that apply a gradient-based adjustment of the parameters, our models aim to switch the classification module parameters depending on the task’s embedding. In practice, we utilize a hypernetwork, which takes the aggregated information from support data and returns the classifier’s parameters handcrafted for the considered problem. Moreover, we introduce the kernel-based representation of the support examples delivered to hypernetwork to create the parameters of the classification module. Consequently, we rely on relations between the support examples’ embeddings instead of the backbone models’ direct feature values. Thanks to this approach, our model can adapt to highly different tasks. While such a method obtains very good results, it is limited by typical problems such as poorly quantified uncertainty due to limited data size. We further show that incorporating Bayesian neural networks into our general framework, an approach we call BayesHyperShot, solves this issue.
Convolutional neural network-based cross-corpus speech emotion recognition with data augmentation and features fusionMachine Vision and Applications - Tập 33 - Trang 1-16 - 2022
Rashid Jahangir, Ying Wah Teh, Ghulam Mujtaba, Roobaea Alroobaea, Zahid Hussain Shaikh, Ihsan Ali
Speech emotion recognition (SER) is one of the most challenging and active research topics in data science due to its wide range of applications in human–computer interaction, computer games, mobile services and psychological assessment. In the past, several studies have employed handcrafted features to classify emotions and achieved good classification accuracy. However, such features degrade the classification accuracy in complex scenarios. Thus, recent studies employed deep learning models to automatically extract the local representation from given audio signals. Though, automated feature engineering overcomes the issues of handcrafted feature extraction approach. However, still there is a need to further improve the performance of reported techniques. This is because, in reported techniques, single-layer and two-layer convolutional neural networks (CNNs) were used and these architectures are not capable of learning optimal features from complex speech signals. Thus, to overcome this limitation, this study proposed a novel SER framework, which applies data augmentation methods before extracting seven informative feature sets from each utterance. The extracted feature vector is used as input to the 1D CNN for emotions recognition using the EMO-DB, RAVDESS and SAVEE databases. Moreover, this study also proposed a cross-corpus SER model using the all audio files of common emotions of aforementioned databases. The experimental results showed that our proposed SER framework outperformed existing SER frameworks. Specifically, the proposed SER framework obtained 96.7% accuracy for EMO-DB with all utterances in seven emotions, 90.6% RAVDESS with all utterances in eight emotions, 93.2% for SAVEE with all utterances in seven emotions and 93.3% for cross-corpus with 1930 utterances in six emotions. We believe that our proposed framework will bring significant contribute to SER domain.
Robust contour reconstruction of red blood cells and parasites in the automated identification of the stages of malarial infectionMachine Vision and Applications - Tập 22 - Trang 461-469 - 2010
Saravana Kumar Kumarasamy, S. H. Ong, K. S. W. Tan
We present a novel method for detecting malaria parasites and determining the stage of infection from digital images comprising red blood cells (RBCs). The proposed method is robust under varying conditions of image luminance, contrast and clumping of RBCs. Both strong and weak boundary edges of the RBCs and parasites are detected based on the similarity measure between local image neighborhoods and predefined edge filters. A rule-based algorithm is applied to link edge fragments to form closed contours of the RBCs and parasite regions, as well as to split clumps into constituent cells. A radial basis support vector machine determines the stage of infection from features extracted from each parasite region. The proposed method achieves 97% accuracy in cell segmentation and 86% accuracy in parasite detection when tested on a total of 530 digitally captured images of three species of malaria parasites: Plasmodium falciparum, Plasmodium yoelii and Plasmodium berghei.
Detecting violent and abnormal crowd activity using temporal analysis of grey level co-occurrence matrix (GLCM)-based texture measuresMachine Vision and Applications - Tập 28 - Trang 361-371 - 2017
Kaelon Lloyd, Paul L. Rosin, David Marshall, Simon C. Moore
The severity of sustained injury resulting from assault-related violence can be minimised by reducing detection time. However, it has been shown that human operators perform poorly at detecting events found in video footage when presented with simultaneous feeds. We utilise computer vision techniques to develop an automated method of abnormal crowd detection that can aid a human operator in the detection of violent behaviour. We observed that behaviour in city centre environments often occurs in crowded areas, resulting in individual actions being occluded by other crowd members. We propose a real-time descriptor that models crowd dynamics by encoding changes in crowd texture using temporal summaries of grey level co-occurrence matrix features. We introduce a measure of inter-frame uniformity and demonstrate that the appearance of violent behaviour changes in a less uniform manner when compared to other types of crowd behaviour. Our proposed method is computationally cheap and offers real-time description. Evaluating our method using a privately held CCTV dataset and the publicly available Violent Flows, UCF Web Abnormality and UMN Abnormal Crowd datasets, we report a receiver operating characteristic score of 0.9782, 0.9403, 0.8218 and 0.9956, respectively.
Retina-like visual sensor for fast tracking and navigation robotsMachine Vision and Applications - Tập 10 - Trang 1-8 - 1997
Cheon W. Shin, Seiji Inokuchi, Kwang I. Kim
This paper describes the development of an anthropomorphic visual sensor which generates a spatially variant resolution image by using a retina-like structure. This sensor consists of a dove prism for image rotation and two linear CCD sensors with 512 pixel/line resolution and holds approximately 45 kbytes of image data. The retina-like sensor has variable resolution with increasing density towards the center of the visual field and yields a polar-coordinate image directly. The motion analysis of the object in the scene from the optical flow is considerably simplified if the velocity is represented in polar coordinates, compared to the case when the image is represented in cartesian coordinates. A calibration procedure for the proposed retina-like sensor is also presented with experimental data to verify the validity of the system. Development of this sensor holds promise in applications to high-speed tracking systems, such as the eyes of navigation robots, because it has data reduction and polar mapping characteristics.
Multitask learning for neural generative question answeringMachine Vision and Applications - Tập 29 - Trang 1009-1017 - 2018
Yanzhou Huang, Tao Zhong
Neural generative model in question answering (QA) usually employs sequence-to-sequence (Seq2Seq) learning to generate answers based on the user’s questions as opposed to the retrieval-based model selecting the best matched answer from a repository of pre-defined QA pairs. One key challenge of neural generative model in QA lies in generating high-frequency and generic answers regardless of the questions, partially due to optimizing log-likelihood objective function. In this paper, we investigate multitask learning (MTL) in neural network-based method under a QA scenario. We define our main task as agenerative QA via Seq2Seq learning. And we define our auxiliary task as a discriminative QA via binary QAclassification. Both main task and auxiliary task are learned jointly with shared representations, allowing to obtain improved generalization and transferring classification labels as extra evidences to guide the word sequence generation of the answers. Experimental results on both automatic evaluations and human annotations demonstrate the superiorities of our proposed method over baselines.
A system for the generation of in-car human body pose datasetsMachine Vision and Applications - Tập 32 - Trang 1-15 - 2020
João Borges, Sandro Queirós, Bruno Oliveira, Helena Torres, Nelson Rodrigues, Victor Coelho, Johannes Pallauf, José Henrique Brito, José Mendes, Jaime C. Fonseca
With the advent of autonomous vehicles, detection of the occupants’ posture is crucial to tackle the needs of infotainment interaction or passive safety systems. Generative approaches have been recently proposed for human body pose in-car detection, but this type of approaches requires a large training dataset for a feasible accuracy. This requirement poses a difficulty, given the substantial time required to annotate such large amount of data. In the in-car scenario, this requirement risk increases even further, since a robust human body pose ground-truth system capable of working in it is needed but inexistent. Currently, the gold standard for human body pose capture is based on optical systems, requiring up to 39 visible markers for a plug-in gait model, which in this case are not feasible given the occlusions inside the car. Other solutions, such as inertial suits, also have limitations linked to magnetic sensitivity and global positioning drift. In this paper, a system for the generation of images for human body pose detection in an in-car environment is proposed. To this end, we propose to smartly combine inertial and optical systems to suppress their individual limitations: By combining the global positioning of 3 visible head markers provided by the optical system with the inertial suit’s relative human body pose, we obtain an occlusion-ready, drift-free full-body global positioning system. This system is then spatially and temporally calibrated with a time-of-flight sensor, automatically obtaining in-car image data with (multi-person) pose annotations. Besides quantifying the inertial suit inherent sensitivity and accuracy, the feasibility of the overall system for human body pose capture in the in-car scenario was demonstrated. Our results quantify the errors associated with the inertial suit, pinpoint some sources of the system’s uncertainty and propose how to minimize some of them. Finally, we demonstrate the feasibility of using system generated data (which was made publicly available), independently or mixed with two publicly available generic datasets (not in-car), to train 2 machine learning algorithms, demonstrating the improvement in their algorithmic accuracy for the in-car scenario.
Towards infield, live plant phenotyping using a reduced-parameter CNNMachine Vision and Applications - Tập 31 - Trang 1-14 - 2019
John Atanbori, Andrew P. French, Tony P. Pridmore
There is an increase in consumption of agricultural produce as a result of the rapidly growing human population, particularly in developing nations. This has triggered high-quality plant phenotyping research to help with the breeding of high-yielding plants that can adapt to our continuously changing climate. Novel, low-cost, fully automated plant phenotyping systems, capable of infield deployment, are required to help identify quantitative plant phenotypes. The identification of quantitative plant phenotypes is a key challenge which relies heavily on the precise segmentation of plant images. Recently, the plant phenotyping community has started to use very deep convolutional neural networks (CNNs) to help tackle this fundamental problem. However, these very deep CNNs rely on some millions of model parameters and generate very large weight matrices, thus making them difficult to deploy infield on low-cost, resource-limited devices. We explore how to compress existing very deep CNNs for plant image segmentation, thus making them easily deployable infield and on mobile devices. In particular, we focus on applying these models to the pixel-wise segmentation of plants into multiple classes including background, a challenging problem in the plant phenotyping community. We combined two approaches (separable convolutions and SVD) to reduce model parameter numbers and weight matrices of these very deep CNN-based models. Using our combined method (separable convolution and SVD) reduced the weight matrix by up to 95% without affecting pixel-wise accuracy. These methods have been evaluated on two public plant datasets and one non-plant dataset to illustrate generality. We have successfully tested our models on a mobile device.