2.5

CiteScore

8.8

Global Impact Factor

An Analysis of Gaussian Kernel Density Estimation for Feature Selection of Gene Expression


Paper ID: EIJTEM_2015_2_2_1-7

Author's Name: M. Praneesh

Volume: 2

Issue: 2

Year: 2015

Page No: 1-7

Abstract:

Data mining is often defined as finding hidden information in a database. Clustering is a data mining (machine learning) technique used to predict group membership for data instances. In clustering gene expression data it contains noisy data, irrelevant data, missing data, proper preprocessing is made by some dimensional reduction technique. In this research, a Gaussian Kernel Density Estimation method is presented to select the best features for the recognition of the cancer, and then applied the Sparse Nonnegative Matrix Factorization clustering method for clustering the tumor samples into meaningful clusters. Sample based clustering which is one type of gene expression data clustering has been performed in this work because the samples are generally related to various disease or drug effects within a gene expression matrix. The performance evaluation of the proposed system is evaluated and compared with existing approaches. The experimental result shows that using Gaussian Kernel Density Estimation works well than Feature selection using ICA method. This proposed work has applied the Sparse NMF algorithm for sample clustering which has been given the more accuracy and used the less computing time to compare with the existing work. For the clustering validation, Rand Index measurement is used to compute the accuracy between the class label and clustered samples.

Keywords: Unsupervised Feature Selection, Tumor Clustering, Sparse NMF, Gaussian Kernel Density Estimation

View PDF