Protein secondary structure prediction using support vector  machine and hierarchical clustering

DSpace@NEVÜ
→
Fakülteler / Faculties
→
Mühendislik ve Mimarlık Fakültesi / Faculty of Engineering and Architecture
→
Bilgisayar Mühendisliği Bölümü
→
Bilgisayar Mühendisliği Bölümü Koleksiyonu
→
Öğe Göster

Protein secondary structure prediction using support vector machine and hierarchical clustering

Atasever, Sema; Aydın, Zafer; Erbay, Hasan

URI: http://hdl.handle.net/20.500.11787/4527

Tarih: 2018-04-30

Özet:

Predicting the secondary structure from protein sequence plays a crucial role in predicting the 3D structure and understanding the function of proteins. As new genes and proteins are discovered the size of the protein databases and datasets that can be used for training prediction models grows considerably. A two-stage hybrid classifier which employs dynamic Bayesian networks and a support vector machine (SVM) has been shown to provide state-of-the-art prediction accuracy. However, SVM is not effective for large datasets due to the quadratic optimization involved in model training. In this paper, we implemented two techniques on CB513 benchmark for reducing the number of samples in the train set of the SVM. The first method randomly selects a fraction of data samples from the train set using a stratified selection strategy. This approach can remove approximately %50 of the data samples from the train set and reduce the model training time by %82.38 without decreasing the prediction accuracy significantly. The second method clusters the data samples by a hierarchical clustering algorithm and replaces the train set samples with nearest neighbors of the cluster centers. We employed single linkage clustering, average linkage clustering and the Ward’s method for clustering the feature vectors. We optimized the number of clusters and the maximum number of nearest neighbors by computing the prediction accuracy on validation sets. We observed that clustering can also reduce the size of the train set by %50 without sacrificing prediction accuracy. Among the clustering techniques the Ward’s method provided the best accuracy on test data.

Tüm öğe kaydını göster

Bu öğenin dosyaları

Dosyalar	Boyut	Biçim	Göster
Bu öğe ile ilişkili dosya yok.

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

Bilgisayar Mühendisliği Bölümü Koleksiyonu [33]
Bilgisayar Mühendisliği Bölümü koleksiyonuna ait yayınları içerir.

Protein secondary structure prediction using support vector machine and hierarchical clustering

Protein secondary structure prediction using support vector machine and hierarchical clustering

Özet:

Bu öğenin dosyaları

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

DSpace'de Ara

DSpace @ NEVU

External Links

Göz at

Tüm DSpace

Bu Koleksiyon

Hesabım