AUTOMATIC HEART DISEASE PREDICTION USING FEATURE SELECTION AND DATA MINING TECHNIQUE
Tập 34 Số 1 - 2018
TOAN DINH TRAN, HUNG MINH LE, LANG VAN TRAN
This paper presents an automatic Heart Disease (HD) prediction method based on feature selection and data mining techniques using provided symptoms and clinical information in the patient’s dataset. Data mining which allows the extraction of hidden knowledges from the data and explores the relationship between attributes, is the promising technique for HD prediction. HD symptoms can be effectively learned by the computer to classify HD into different classes. However, the informationprovided may include redundant and interrelated symptoms. The use of such information may degrade the classification performance. Feature selection is an effective way to remove such noisy informationmeanwhile improving the learning accuracy and facilitating a better understanding for learning model. In our method, HD attributes are re-selected based on their rank and weights assigned by Infinite LatentFeature Selection (ILFS) method. Support Vector Machine (SVM) algorithm is applied to classify a subset of the selected attributes into different HD classes. SMOTE (Synthetic Minority Over-sampling Technique) data over-sampling technique is adopted to generate more amounts and varieties of data. The experiment is performed on the UCI Machine Learning Repository Heart Disease public dataset. Experimental results demonstrated that by only using a subset of selected 24 attributes over a total of 46 attributes, our method achieved an accuracy of 97.87% for distinguishing ‘no presence’ HD with ‘presence’ HD and an accuracy of 93.92% for distinguishing 5 different classes of HD.
#Data mining #Heart Disease Prediction #Feature Selection #Classification
THEORETICAL ANALYSIS OF PICTURE FUZZY CLUSTERING: CONVERGENCE AND PROPERTY
Tập 34 Số 1 - 2018
Pham Thi Minh Phuong
Recently, picture fuzzy clustering (FC-PFS) has been introduced as a new computational intelligence tool for various problems in knowledge discovery and pattern recognition. However, an important question that was lacked in the related researches is examination of mathematical properties behind the picture fuzzy clustering algorithm such as the convergence, the boundary or the convergence rate, etc. In this paper, we will prove that FC-PFS converges to at least one local minimum. The similarities and differences between this algorithm and other clustering methods are compared. Analysis on the loss function is also considered.
#Convergence analysis #picture fuzzy sets #picture fuzzy clustering.
Rough picture fuzzy set and picture fuzzy topologies
Tập 31 Số 3 - 2015
Approximation of a picture fuzzy set on a crisp approximation space gives a rough picture fuzzy set. In this paper, the concept of a rough picture set is introduced, besides, we also investigate some topological structures of a rough picture fuzzy set are investigated, such are lower and upper rough picture fuzzy approximation operators.
#Rough set #picture fuzzy set #rough picture fuzzy set #approximation operators #picture fuzzy topological space
NEW DISSIMILARITY MEASURES ON PICTURE FUZZY SETS AND APPLICATIONS
Tập 34 Số 3 - 2018
Chau Minh Ngoc, Dinh Van Nguyen, Thao Xuan Nguyen, Nhung Thi Le
The dissimilarity measures between fuzzy sets/intuitionistic fuzzy sets/picture fuzzy sets are studied and applied in various matters. In this paper, we propose some new dissimilarity measures on picture fuzzy sets. This new dissimilarity measures overcome the restrictions of all existing dissimilarity measures on picture fuzzy sets. After that, we apply these new measures to the pattern recognition problems. Finally, we introduce a multi-criteria decision making (MCDM) method that used the new dissimilarity measures and apply them in the supplier selection problems.
#Picture fuzzy set #dissimilarity measure #MCDM
Độ đo tính mờ, ánh xạ ngữ nghĩa định lượng và ứng dụng phương pháp lập luận xấp xỉ nội suy trong một hệ chuyên gia y tế.
Tập 18 Số 3 - Trang 237-252 - 2012
Nguyễn Cát Hồ, Trần Đình Khang, Lê Xuân Việt, Trần Thái Sơn
VLSP Shared Task: Named Entity Recognition
Tập 34 Số 4 - 2019
Vu M Tran, Huyen T M Nguyen, Luong X Vu, Quyen T Ngo, Hien T T Nguyen
Named entities (NE) are phrases that contain the names of persons, organizations, locations, times and quantities, monetary values, percentages, etc. Named Entity Recognition (NER) is the task of recognizing named entities in documents. NER is an important subtask of Information Extraction, which has attracted researchers all over the world since 1990s. For Vietnamese language, although there exists some research projects and publications on NER task before 2016, no systematic comparison of the performance of NER systems has been done. In 2016, the organizing committee of the VLSP workshop decided to launch the first NER shared task, in order to get an objective evaluation of Vietnamese NER systems and to promote the development of high quality systems. As a result, the first dataset with morpho-syntactic and NE annotations has been released for benchmarking NER systems. At VLSP 2018, the NER shared task has been organized for the second time, providing a bigger dataset containing texts from various domains, but without morpho-syntactic annotation. These resources are available for research purpose via the VLSP website vlsp.org.vn/resources. In this paper, we describe the datasets as well as the evaluation results obtained from these two campaigns.
#CoNLL format #evaluation #named entity #named entity recognition #shared task #Vietnamese #VLSP workshop
A TRANSFORMATION METHOD FOR ASPECT-BASED SENTIMENT ANALYSIS
Tập 34 Số 4 - 2019
Thin Van Dang
Along with the explosion of user reviews on the Internet, sentiment analysis has becomeone of the trending research topics in the field of natural language processing. In the last five years,many shared tasks were organized to keep track of the progress of sentiment analysis for various lan-guages. In the Fifth International Workshop on Vietnamese Language and Speech Processing (VLSP2018), the Sentiment Analysis shared task was the first evaluation campaign for the Vietnamese lan-guage. In this paper, we describe our system for this shared task. We employ a supervised learningmethod based on the Support Vector Machine classifiers combined with a variety of features. Weobtained the F1-score of 61% for both domains, which was ranked highest in the shared task. For theaspect detection subtask, our method achieved 77% and 69% in F1-score for the restaurant domainand the hotel domain respectively.
#sentiment analysis #aspect-based sentiment analysis #natural language processing #text analysis
Một số phép toán trên tập mờ trực cảm loại hai
Tập 28 Số 3 - Trang 274-283 - 2012
Bùi Dương Hải, Bùi Công Cường, Tống Hòang Anh
Trong những thập niên gần đây, một số mở rộng của khái niệm tập mờ được đề xuất. Tập mờ loại hai và tập mờ trực cảm là hai khái niệm mới đã thu hút được rất nhiều sựquan tâm của các nhà nghiên cứu vì sự phong phú của các ứng dụng. Bài báo này giới thiệu một khái niệm mới – tập mờ trực cảm loại hai và chứng minh một số tính chất của các phép toán trên đó.
NONSTANDARD FINITE DIFFERENCE SCHEMES FOR SOLVING A MODIFIED EPIDEMIOLOGICAL MODEL FOR COMPUTER VIRUSES
Tập 34 Số 2 - 2018
Long Quang Dang, A Quang Dang, Tuan Manh Hoang
In this paper we construct two families of nonstandard finite difference (NSFD) schemes preserving the essential properties of a computer virus propagation model, such as positivity, boundedness and stability. The first family of NSFD schemes is constructed based on the nonlocal discretization and has first order of accuracy, while the second one is based on the combination of a classical Runge-Kutta method and selection of a nonstandard denominator function and it is of fourth order of accuracy. The theoretical study of these families of NSFD schemes is performed with support of numerical simulations. The numerical simulations confirm the accuracy and the efficiency of the fourth order NSFD schemes. They hint that the disease-free equilibrium point is not only locally stable but also globally stable, and then this fact is proved theoretically. The experimental results also show that the global stability of the continuous model is preserved.
#Computer viruses #High order NSFD schemes #Lyapunov stability theorem #NSFD schemes #Numerical simulations
An Evaluation Method for Unsupervised Anomaly Detection Algorithms
Tập 32 Số 3 - 2017
Uy Quang Nguyen, Thanh Trung Nguyen, Van Huy Nguyen
In data mining, anomaly detection aims at identifying the observations which do not conform to an expected behavior. To date, a large number of techniques for anomaly detection have been proposed and developed. These techniques have been successfully applied to many real world applications such as fraud detection for credit cards and intrusion detection in network security. However, there are very little research relating to the method for evaluating the goodness of unsupervised anomaly detection techniques. In this paper, the authors introduce a method for evaluating the performance of unsupervised anomaly detection techniques. The method is based on the application of internal validation metrics in clustering algorithms to anomaly detection. The experiments were conducted on a number of benchmarking datasets. The results are compared with the result of a recent proposed approach that shows that some proposed metrics are very consistent when being used to evaluate the performance of unsupervised anomaly detection algorithms.
#anomaly detection #evaluation #clustering validation