Multimedia Tools and Applications

Công bố khoa học tiêu biểu

* Dữ liệu chỉ mang tính chất tham khảo

Sắp xếp:  
Extracting the symmetry axes of partially occluded single apples in natural scene using convex hull theory and shape context algorithm
Multimedia Tools and Applications - Tập 76 - Trang 14075-14089 - 2016
Leilei Niu, Weicong Zhou, Dandan Wang, Dongjian He, Haihui Zhang, Huaibo Song
Accurate identification of apples partially occluded by branches and leaves is an urgent and key issue for a picking robot. The objective of this study was to detect the symmetry axes of partially occluded single apples accurately using the convex hull theory and Shape Context algorithm. Firstly, apple regions were obtained by using K-means clustering algorithm. Secondly, image pre-processing steps such as image binarization, hole filling, area opening and edge detection were applied. Thirdly, false contours were removed based on the convex hull theory to enhance the accuracy and stability of this method. Finally, the point matching relationship of each two contours and the two best symmetrical contours were found by using the Shape Context algorithm and Hungarian algorithm. Then the symmetry axes of apples were extracted using the matching point pairs. Least squares ellipses fitting algorithm and moment of inertia algorithm were used to compare with the presented algorithm. The angle difference between extracted symmetry axis and ideal symmetry axis for every method was computed, and the execution time of program as well. Ninety partially occluded single apple images were tested. The experimental results showed that the average angle error of the Shape Context algorithm were 7.72°, 37.5 % of the ellipses fitting algorithm and 31.3 % of the inertia moment algorithm. And its average execution time is 1.86 s, 103 % of the ellipses fitting algorithm and 106 % of the inertia moment algorithm. In conclusion, it was feasible to use the proposed method to extract the symmetry axes of partially occluded apples.
Infrared and visible image fusion based on quaternion wavelets transform and feature-level Copula model
Multimedia Tools and Applications - Tập 83 Số 10 - Trang 28549-28577
Xiangfeng Luo, Kai Li, Anqi Wang, Zhancheng Zhang, Xiaojun Wu
JUIVCDv1: development of a still-image based dataset for indian vehicle classification
Multimedia Tools and Applications - - Trang 1-28 - 2024
Sourajit Maity, Debam Saha, Pawan Kumar Singh, Ram Sarkar
An automatic vehicle classification (AVC) system designed from either still images or videos has the potential to bring significant benefits to the development of a traffic control system. On AVC, numerous articles have been published in the literature. Over the years, researchers in this domain have created and used a variety of datasets, but most often, these datasets may not reflect the exact scenarios of the Indian subcontinent due to specific peculiarities of the road conditions, road congestion nature, and vehicle types usually seen in Indian subcontinent. The primary goal of this paper is to create a new still image dataset, called JUIVCDv1, which contains 12 different local vehicle classes that are collected using mobile cameras in a different way for developing an automated vehicle management system. We have also discussed the characteristics of the current datasets, and various other factors taken into account while creating the dataset for the Indian scenario. Apart from this, we have benchmarked the results on the developed dataset using eight state-of-the-art pre-trained convolutional neural network (CNN) models, namely Xception, InceptionV3, DenseNet121, MobileNetV2, and VGG16, NasNetMobile, ResNet50 and ResNet152. Among these, the Xception, InceptionV3 and DenseNet121 models produce the best classification accuracy scores of 0.94, 0.93 and 0.92 respectively. These models are further utilized to make an ensemble model to enhance the performance of the overall categorization model. Majority voting-based ensemble, Weighted average-based ensemble, and Sum rule-based ensemble approaches are used as ensemble models that give accuracy scores of 0.95, 0.94, and 0.94, respectively.
Diabetic retinopathy detection and stage classification in eye fundus images using active deep learning
Multimedia Tools and Applications - Tập 80 Số 8 - Trang 11691-11721 - 2021
Qureshi, Imran, Ma, Jun, Abbas, Qaisar
Retinal fundus image analysis (RFIA) for diabetic retinopathy (DR) screening can be used to reduce the risk of blindness among diabetic patients. The RFIA screening programs help the ophthalmologists to cope with this paramount visual impairment problem. In this article, an automatic recognition of the DR stage is proposed based on a new multi-layer architecture of active deep learning (ADL). To develop the ADL system, we used the convolutional neural networks (CNN) model to automatically extract features compare to handcrafted-based features. However, the training of CNN procedure requires an immense size of labeled data that makes it almost difficult in the classification phase. As a result, a label-efficient CNN architecture is presented known as ADL-CNN by using one of the active learning methods known as an expected gradient length (EGL). This ADL-CNN model can be seen as a two-stage process. At first, the proposed ADL-CNN system selects both the most informative patches and images by using some ground truth labels of training samples to learn the simple to complex retinal features. Next, it provides useful masks for prognostication to assist clinical specialists for the important eye sample annotation and segment regions-of-interest within the retinograph image to grade five severity-levels of diabetic retinopathy. To test and evaluate the performance of ADL-CNN model, the EyePACS benchmark is utilized and compared with state-of-the-art methods. The statistical metrics are used such as sensitivity (SE), specificity (SP), F-measure and classification accuracy (ACC) to measure the effectiveness of ADL-CNN system. On 54,000 retinograph images, the ADL-CNN model achieved an average SE of 92.20%, SP of 95.10%, F-measure of 93% and ACC of 98%. Hence, the new ADL-CNN architecture is outperformed for detecting DR-related lesions and recognizing the five levels of severity of DR on a wide range of fundus images.
A system for detection of moving caption text in videos: a news use case
Multimedia Tools and Applications - Tập 80 - Trang 25607-25631 - 2021
Hossam Elshahaby, Mohsen Rashwan
Extraction of news text captions aims at a digital understanding of what is happening in a specific region during a certain period that helps in better communication between different nations because we can easily translate the plain text from one language to another. Moving text captions causes blurry effects that are a significant cause of text quality impairments in the news channels. Most of the existing text caption detection models do not address this problem in a way that captures the different dynamic motion of captions, gathers a full news story among several frames in the sequence, resolves the blurring effect of text motion, offers a language-independent model, or provides it as an end-to-end solution for the community to use. We process the frames coming in sequence and extract edge features using either the Hough transform or our color-based technique. We verify text existence using a Convolutional Neural Network (CNN) text detection pre-trained model. We analyze the caption motion status using hybrid pre-trained Recurrent Neural Network (RNN) of Long Short-Term Memory (LSTM) type model and correlation-based model. In case the motion is determined to be horizontal rotation, there are two problems. First, it means that text keeps rotating with no stop resulting in a high blurring effect that affects the text quality and consequently resulting in low character recognition accuracy. Second, there are successive news stories which are separated by the channel logo or long spaces. We managed to solve the first problem by deblurring the text image using either Bicubic Spline Interpolation (BSI) technique or the Denoising Autoencoder Neural Network (DANN). We solved the second problem using a Point Feature Matching (PFM) technique to match the existing channel logo with the channels’ logo database (ground truth). We evaluate our framework using Abbyy® SDK as a standalone tool used for text recognition supporting different languages.
Aggregated pyramid attention network for mass segmentation in mammograms
Multimedia Tools and Applications - Tập 81 - Trang 13335-13353 - 2021
Meng Lou, Yunliang Qi, Xiaorong Li, Chunbo Xu, Wenwei Zhao, Xiangyu Deng, Yide Ma
Intra-class inconsistency and inter-class indistinction are intractable problems that commonly exist in breast mass segmentation from mammograms. In this work, a novel deep learning segmentation model is presented to address these problems. Firstly, we propose a simple yet effective aggregated pyramid attention module (APAM) for capturing intra-class dependencies, aiming at effectively aggregating contextual dependencies from different receptive fields to reinforce feature representations. Then, a novel aggregated pyramid attention network (APANet) is developed for further releasing the limitation of both intra-class inconsistency and inter-class indistinction. The APANet can combine low-level spatial details and high-level contextual information via encoder-decoder structure for further refining semantic representations. Finally, our proposed APANet is greatly demonstrated on two public mammographic databases including the DDSM-BCRP and INbreast, separately achieving the Dice Similarity Coefficient (DSC) of 91.04% and 94.02%.
Detection and localization of anomalies in video surveillance using novel optimization based deep convolutional neural network
Multimedia Tools and Applications - Tập 82 - Trang 28895-28915 - 2023
Baliram Sambhaji Gayal, Sandip Raosaheb Patil
Nowadays, the demand for the surveillance applications is increased in order to guarantee the safety and security of the people/society. Due to the rapid growth in surveillance, human intervention is required to process the human behaviors in monitoring the anomaly attempts. Hence, this research proposes an automatic anomaly detection model for monitoring the anomalies from the surveillance videos. Accordingly, hierarchical-based social hunting optimization tuned Deep-convolutional neural network (HiS- Deep CNN) is proposed for video anomaly detection for which the object detection and tracking are the done initially. The detection improvement of the classifier is based on the training algorithm, hierarchical social hunting optimization (HiS) algorithm, which is designed based on the hybrid characteristics inherited from timber wolf and Ateles geoffrogis search agents. Moreover, the hierarchical social hunting strategy tracking of the video object is done using the features of Minimum output sum of squared error tracking algorithm (MOSSE) and Simple moving average based algorithm (SMA). The effectiveness of the hierarchical-based social hunting tuned Deep-convolutional neural network anomaly detection model is analyzed concerning the indices, such as sensitivity, accuracy, specificity, and Multiple Object Tracking Precision, which is 96.62%, 96.56%, 96.14%, and 0.994, respectively.
Gait recognition based on margin sample mining loss
Multimedia Tools and Applications - Tập 82 - Trang 969-987 - 2022
Xuan Nie, Hongmei Li
Gait recognition, as one of the new biometric techniques, mainly judges and identifies a target pedestrian through its walking posture. Gait recognition is effective at long distances, difficult to camouflage and requires no contact or cooperation with the target pedestrian. However, the accuracy of gait recognition is affected by external factors, such as the shooting angle of the video, the clothes and bags worn by the target. In this paper, we solve the above problems based on two aspects. Firstly, a gait recognition method based on MSM Loss is proposed. In this way we are able to extract more discriminative spatio-temporal features; Secondly, we also introduce a new input method, which makes each input sequence more closely related, thus improving the gait recognition rate. Finally, the proposed method is verified on the CASIA-B and OU-MVLP dataset. In CASIA-B, the average recognition rate is obtained under the walking conditions of normal, with bags and with clothes. With rank-1 accuracy under LT, the method proposed in this paper can reach 96.4% under NM, 89.1% under BG and 71.2% under CL. And under the normal walking conditions, our method performs better compared with the best existing gait recognition methods. And in OU-MVLP, we get 87.5% accuracy.
Background subtraction based on deep convolutional neural networks features
Multimedia Tools and Applications - Tập 78 - Trang 14549-14571 - 2018
Jianfang Dou, Qin Qin, Zimei Tu
Background modeling and subtraction, the task to detect moving objects in a scene, is a fundamental and critical step for many high level computer vision tasks. However, background subtraction modeling is still an open and challenge problem particularly in practical scenarios with drastic illumination changes and dynamic backgrounds. In this paper, we propose a novel foreground detection method based on CNNs(Convolutional Neural Networks) to deal with challenges confronted with background subtraction. Firstly, given a cleaned background image without moving objects, constructing adjustable neighborhood of each pixel in the background image to form windows; CNN features are extracted with a pre-trained CNN model for each window to form a features based background model. Secondly, for the current frame of a video scene, extracting features with the same operation as the background model. Euclidean distance is adopted to build distance map for current frame and background image with CNN features. Thirdly, the distance map is fed into graph cut algorithm to obtain foreground mask. In order to deal with background changes, the background model is updated with a certain rate. Experimental results verify that the proposed approach is effective to detect foreground objects from complex background environments, and outperforms some state-of-the-art methods.
Adaptive golden eagle optimization based multi-objective scientific workflow scheduling on multi-cloud environment
Multimedia Tools and Applications - - Trang 1-24 - 2023
S. Immaculate Shyla, T. Beula Bell, C. Jaspin Jeba Sheela
An exemplary for emerging knowledges and the capacity to provide reliable cloud services, cloud computing. Giving consumers on-demand access to “unlimited” computer resources is one of the key components of cloud computing. Single cloud-holding resources, however, are typically constrained and might not be able to handle the unexpected spike in user demands. In order to support resource sharing amongst clouds, the multi-cloud concept is thus established. These days, offering resources and administrations across numerous clouds is unquestionably amazing. The goal of conventional research on cloud scheduling is to reduce costs or increase speed. However, the major indicator of QoS and a vital problem is the dependability of work process scheduling. As a result, multi-objective scheduling for a logical work process in a multi-cloud environment is suggested in this research with the goal of controlling the work process while also balancing cost and timeliness while satisfying the criterion of reliability. The adaptive golden eagle optimisation (AGEO) algorithm is created to realise this idea. The solution encoding, fitness analysis, and updating functions are used in the proposed algorithm’s validation. Different workflow models are employed for the experimental study, and performance is assessed using various indicators. The projected approach attained 1920 utilization. Similarly, the PSO and GA achieved 1901 and 1900 utilization.
Tổng số: 12,898   
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 10