The influence of alignment-free sequence representations on the semi-supervised classification of class C G protein-coupled receptors

Medical & Biological Engineering & Computing - Tập 53 - Trang 137-149 - 2014
Raúl Cruz-Barbosa1,2, Alfredo Vellido3,4, Jesús Giraldo2
1Computer Science Institute, Universidad Tecnológica de la Mixteca, Huajuapan, México
2Institut de Neurociències and Unitat de Bioestadística, Universitat Autònoma de Barcelona, Bellaterra, Spain
3Departament de Ciències de la Computació, Universitat Politècnica de Catalunya, BarcelonaTech, Barcelona, Spain
4Centro de Investigación Biomédica en Red en Bioingeniería, Biomateriales y Nanomedicina. (CIBER-BBN), Barcelona, Spain

Tóm tắt

G protein-coupled receptors (GPCRs) are integral cell membrane proteins of relevance for pharmacology. The tertiary structure of the transmembrane domain, a gate to the study of protein functionality, is unknown for almost all members of class C GPCRs, which are the target of the current study. As a result, their investigation must often rely on alignments of their amino acid sequences. Sequence alignment entails the risk of missing relevant information. Various approaches have attempted to circumvent this risk through alignment-free transformations of the sequences on the basis of different amino acid physicochemical properties. In this paper, we use several of these alignment-free methods, as well as a basic amino acid composition representation, to transform the available sequences. Novel semi-supervised statistical machine learning methods are then used to discriminate the different class C GPCRs types from the transformed data. This approach is relevant due to the existence of orphan proteins to which type labels should be assigned in a process of deorphanization or reverse pharmacology. The reported experiments show that the proposed techniques provide accurate classification even in settings of extreme class-label scarcity and that fair accuracy can be achieved even with very simple transformation strategies that ignore the sequence ordering.

Tài liệu tham khảo