<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD 2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
	<front>
		<journal-meta>
			<journal-id journal-id-type="nlm-ta">J Proteomics Bioinform</journal-id>
			<journal-id journal-id-type="publisher-id">opg</journal-id>						
			<journal-title>Journal of Proteomics &amp; Bioinformatics</journal-title>			 
			<issn pub-type="epub">0974-276X</issn>
			<publisher>
				<publisher-name>OMICS Publishing Group</publisher-name>
				<publisher-loc>India, USA</publisher-loc>
			</publisher>
		</journal-meta>
		<article-meta>			
			<article-id pub-id-type="publisher-id">000063</article-id>
			<article-categories>
				<subj-group subj-group-type="heading">
					<subject>Research Article</subject>
				</subj-group>
				<subj-group subj-group-type="Discipline">
					<subject>Biochemistry</subject>
				</subj-group>
				<subj-group subj-group-type="System Taxonomy">
					<subject>Proteomics</subject>
					<subject>Bioinformatics</subject>
					<subject>Genomics</subject>
					<subject>Transcriptomics</subject>
					<subject>Biomarkers</subject>
				</subj-group>
			</article-categories>
			<title-group>
				<article-title>An <italic>In Silico</italic> Approach to Cluster CAM Kinase Protein Sequences</article-title>
			</title-group>
			<contrib-group>
				<contrib contrib-type="author">
					<name>
						<surname>Murty</surname>
						<given-names>U. S. N</given-names>
					</name>										
					<xref ref-type="corresp" rid="cor1">&ast;</xref>					
				</contrib>
				<contrib contrib-type="author">
					<name>
						<surname>Kumar Banerjee</surname>
						<given-names>Amit</given-names>
					</name>										
				</contrib>
				<contrib contrib-type="author">
					<name>
						<surname>Arora</surname>
						<given-names>Neelima</given-names>
					</name>					
				</contrib>							
			</contrib-group>
			<aff>Bioinformatics Group, Biology Division, Indian Institute of Chemical Technology, Hyderabad-500607, A.P, India</aff>			
			<author-notes>
				<corresp id="cor1">&ast; To whom correspondence should be addressed: Dr. U.S.N Murty, Deputy Director/ Scientist "F" Head, Biology Division, Indian Institute of Chemcal Technology, Hyderabad- 500007, India, Phone: +91 40 27193134; Fax: +91 40 27193227; E-mail: <email>murty_usn@yahoo.com</email></corresp>
			</author-notes>
			<pub-date pub-type="collection">
			     <month>02</month>
				 <year>2009</year>
			</pub-date>
			<pub-date pub-type="epub">
				<day>20</day>
				<month>02</month>
				<year>2009</year>
			</pub-date>			
			<volume>2</volume>
			<issue>2</issue>
			<fpage>097</fpage>
			<lpage>107</lpage>
			<history>
			<date date-type="received">
			     <day>12</day>
				 <month>12</month>
				 <year>2008</year>
			</date>
			<date date-type="accepted">
			      <day>20</day>
				  <month>02</month>
				  <year>2009</year>
			</date>
			</history>
			<permissions>			 
			<copyright-statement><bold>Copyright:</bold> &copy; 2009 Murty USN, et al.</copyright-statement>
			<copyright-year>2009</copyright-year>
			<license license-type="open access">
			 <p>This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</p>
			 </license>
			 </permissions>			
			<abstract>
				<p>As we are ushering in new age of data driven world, we face an enormous challenge of deriving information from heaps of data available. The amount of data being generated is overwhelming and this calls for exploring novel and effective methods for clustering and classification of such data. CAM kinase family is known to contain many enzymes involved in important physiological processes. In the present study, 13 important physicochemical parameters were calculated for 56 sequences of CAM kinase family <italic>in silico</italic>. Self organizing Maps (SOM) were employed for the classifying and clustering similar sequences and visualization of high dimensional data spaces as they are known for their capability to maintain the essence of topological relationships between the features. SOM effectively yielded 4 clusters which were distinct from each other and marked by characteristic features.</p>				
			</abstract>
			 <kwd-group>
				<kwd>Kohonen map</kwd>
				<kwd>Self Organizing Maps (SOM)</kwd>
				<kwd>CAM Kinase</kwd>
				<kwd>Bioinformatics</kwd>
				<kwd><italic>in silico</italic></kwd>
				<kwd>clustering</kwd>				
			</kwd-group>
			<custom-meta-wrap>
				<custom-meta>
					<meta-name>citation</meta-name>
					<meta-value>Murty USN, Amit KB, Neelima A (2009) An In Silico Approach to Cluster CAM Kinase Protein Sequences.</meta-value>
				</custom-meta>
			</custom-meta-wrap>
		</article-meta>
	</front>
	<body>
	      <sec id="s1">
		   <title>Introduction</title>
		     <p>The urge to describe and explore not only the complex phenomena of life but also to seek answers to what lies beyond the realm of current understanding of life processes at molecular level continues to be a major inspiration in modern biology. Human mind is an advance neural cognitive system, a fact exemplified and reinforced by its learning and decision making ability, hence, his endeavor to automate the process of learning and decision making process by devising and employing machine learning techniques should not come as a surprise. In the present data- driven, information - starved world, this gold rush to produce enormous volumes of data would have been of no avail if not empowered by advanced powerful and comprehensive machine learning methods for analysis. The exponential rise in challenges posed to a biologist has propelled a new impetus for development of new and efficient algorithms and methods for analysis of such data or exploring the existing ones in biological contexts. Proliferation of low cost technology, astonishing growth in computing power and interdisciplinary nature of this field has led to revolution of a sort in recent times. Literature abounds with examples of application of machine learning in biological systems (<xref ref-type="bibr" rid="r42">Tarca et al., 2007</xref>). Though both supervised and unsupervised learning methods are being employed in bioinformatics analyses yet unsupervised leaning methods are attracting more interest as they offer many advantages like elimination of need of labeling and predefined knowledge of classes and are valuable in gaining an understanding of basic nature of data.</p>
			 <p>Self Organizing Map (also known as Kohonen Map) is a unsupervised learning algorithm (<xref ref-type="bibr" rid="r25">Kohonen et al.,2001</xref>) used for clustering and reducing dimensions of complex data with out loosing 'essence' of the data and is capable of organizing data based on the similarity by putting entities geometrically close to each other. SOMs have been applied in diverse fields like assessment of water quality (<xref ref-type="bibr" rid="r46">Walley et al., 2000</xref>), classification of communities (<xref ref-type="bibr" rid="r10">Chon et al., 1996</xref>, <xref ref-type="bibr" rid="r3">Arab et al., 2004</xref>; <xref ref-type="bibr" rid="r44">Tison et al., 2005</xref>), gene expression studies (<xref ref-type="bibr" rid="r41">Tamayo et al., 1999</xref>), disease diagnosis (<xref ref-type="bibr" rid="r9">Chen et al., 2000</xref>; <xref ref-type="bibr" rid="r20">Hoshi et al., 2006</xref>), medical imaging(<xref ref-type="bibr" rid="r11">Chuang et al., 2007</xref>), biochemical profiling (<xref ref-type="bibr" rid="r23">Kaartinen et al., 1998</xref>) and epidemiology (<xref ref-type="bibr" rid="r32">Murty and Arora, 2007</xref>). Self organizing maps have been earlier used in classification of families (<xref ref-type="bibr" rid="r2">Andrade et al., 1997</xref>), secondary structure determination(<xref ref-type="bibr" rid="r45">Unneberg et al., 2001</xref>) and pattern recognition in proteins (<xref ref-type="bibr" rid="r19">Hanke et al. 1996</xref>).Owing to its use for multidimensional data visualization, SOM has aptly become the method of choice in bioinformatics studies (<xref ref-type="bibr" rid="r21">Hsu et al., 2003</xref>). Previously, data mining techniques have been employed for clustering and classification of Internal Transcribed Spacer sequences in mosquito species (<xref ref-type="bibr" rid="r34">Banerjee et al, 2008</xref>, <xref ref-type="bibr" rid="r33">2009</xref>).</p>
			 <p>The interplay of various inherent sequence and structural features of biological molecules is quite complex and intriguing. Minute and slight variation in physiochemical properties even in the member of same protein family is of common occurrence. Data mining techniques like SOM can be employed to aid the knowledge discovery processes in such instances.</p>
			 <p>The Ca2+/calmodulin-dependent kinases (CaMK) belong to family of structurally related Serine /threonine-specific protein kinase, which are activated in response to elevation of intracellular Ca2+, and include CaMKI, CaMKII, CaMKIV and CaMK-kinases (CaMKKs). These are known to play a role in a wide range of activities like regulation of diverse biological events mediated by intracellular calcium like muscle contraction, neurotransmitter release and gene expression (<xref ref-type="bibr" rid="r14">Eto et al., 1999</xref>; <xref ref-type="bibr" rid="r36">Nairn et al., 1985</xref>; <xref ref-type="bibr" rid="r13">Edelman et al., 1987</xref>; <xref ref-type="bibr" rid="r40">Soderling et al., 1996</xref>; <xref ref-type="bibr" rid="r8">Braun et al., 1995</xref>). This study is an attempt to cluster CaMK kinase sequences belonging to different species on basis of their physiochemical properties by applying Kohonen maps.</p>
		   </sec>
		   <sec sec-type="methods">
		    <title>Materials and Methods</title>
			 <sec>
			  <title>Sequence Collection and Pre-processing</title>
			   <p>CAM kinase protein sequences were retrieved from the SWISS-PROT, a public domain protein database (<xref ref-type="bibr" rid="r5">Bairoch and Apweiler, 2000</xref>).During the sequence retrieval process, the keyword 'Calcium/calmodulin-dependent protein kinase' was used which yielded 68 sequences. Sequences representing putative, partial, precursor and fragment of CAM Kinase protein were excluded from the study. Hence, 56 unique proteins were retrieved and considered for this study. The selected CAM kinase protein sequences were retrieved in FASTA format and used for further analysis.</p>
			 </sec>
			 <sec>
			  <title>Reconstruction of Phylogeny</title>
			   <p>All 56 sequences were considered for reconstruction of phylogeny. PHYLIP (<xref ref-type="bibr" rid="r16">Felsenstein, 1982</xref>) was used for this purpose. CLUSTALW (<xref ref-type="bibr" rid="r43">Thompson et al., 1994</xref>) was employed for the initial multiple sequence alignment. Alignment output was used as input for Seqboot and Protpars program and finally Consense program was used to get the best tree with maximum parsimony method (<xref ref-type="bibr" rid="r15">Felsenstein, 1983</xref>) which was visualized with TREEVIEW (<xref ref-type="bibr" rid="r38">Page, 1996</xref>) (Fig. 7 in Supplement).</p>
			 </sec>	
			 </sec>		
			<sec id="s3">
			 <title>Feature Identified as Parameters for SOM</title>
			  <sec>
			   <title>Physicochemical Characterization</title>
			    <p>Calculation of physiochemical properties of proteins by traditional experimental methods besides being expensive, is time consuming and cumbersome. The ProtParam is a program used for predicting various physical and chemical
properties which may be useful in enhancing our knowledge for experiment design. Physiochemical properties like
Length, Molecular Weight, Isoelectric point, Number of negatively charged amino acids, Number of Positively charged amino acids, Extinction coefficient (considering all cysteine residues appear as half cystines), Extinction coefficient &ast;(assuming that no cysteine appears as half cystine), Instability coefficient, aliphatic index and GRAVY were calculated using Protparam (http://expasy.org/tools/ protscale.html) (<xref ref-type="bibr" rid="r17">Gasteiger et al., 2005</xref>) for these sequences(Table 1 in Supplement). Amino acid composition of the protein sequences can reveal their nature; hence, amino acid composition was also computed (Data not shown).</p> 
			  </sec>
			  <sec>
			  	<title>Secondary Structure Prediction</title>
					<p>SOPMA (Self Optimized Prediction Method from Alignment) (<xref ref-type="bibr" rid="r18">Geourjon and Del&eacute;age, 1995</xref>) was employed for prediction of secondary structure features like alpha helix, extended strand, beta turn and random coils in terms of percentage for all the sequences (Table 2 in Supplement). These features (except amino acid composition) were considered as input parameters for self organizing maps for further analysis.</p>
			 </sec>
			 <sec>
			 	<title>Data Mining &ndash; Self Organizing Maps</title>
					<p>In SOM, the neurons are organized in a lattice, typically a one or two-dimensional array, which is placed in the input space and is spanned over the input distribution. It is feasible to achieve a map of input space where imminence
between units or clusters in the map represents closeness of the input data using a two-dimensional SOM network. Processing units in the SOM lattice are associated with weights of the same dimension of the input data. Using the weights of each processing unit as a set of coordinates, the lattice can be positioned in the input space. Throughout the learning stage, the weights of the units change their position and &ldquo;move&rdquo; towards the input points. Progress of the movement acquires a gradually slower pace and network is almost &ldquo;frozen&rdquo; in the input space at the end of the learning stage. On the completion of the learning stage, the inputs can be associated to the nearest network unit. On visualization, the inputs can be associated to each cell on the map. Cells that evidently contain analogous entities can be considered as a cluster on the map. These clusters are generated during the learning phase without any prior information. The main application of the SOM is the visualization of high-dimensional data in a two dimensional way and the construction of abstractions akin to other clustering techniques.</p>
			 </sec>
			 <sec>
			 	<title>Steps Involved in the Algorithm</title>
					<list id="l1" list-type="order">
						<list-item>
							<p><bold>Initialization:</bold> Randomly initialize a weight vector (Wi) for each neuron I Wi = [ wi1; wi2; . . . ; wi n ]; n denotes the dimension of input data.</p>
						</list-item>
						<list-item>
							<p><bold>Sampling:</bold> Select an input vector X=[x1, x2, . .. , xn]</p>
						</list-item>
						<list-item>
							<p><bold>Similarity matching:</bold> Find the winning neuron whose weight vector best matches with the input vector j(t)= arg min {||X-Wi||}</p>
						</list-item>
						<list-item>
							<p><bold>Updating:</bold> Update weight vector of winning neuron, such that it becomes still closer to the input vector. Also, update weight vectors of neighbouring neurons-the further the neighbour, the lesser the degree of change.</p>
							<p>Wi(t+1)=Wi(t) +&alpha;(t) X hij(t)X [X(t)-Wi(t)]</p>
							<p>&alpha; (t): learning rate that decreases with time t, 0&lt; &alpha; (t) = 1</p>
							<p>hij(t)= exp(-|| rj- ri || 2/2 X &sigma;(t)2)</p>
							<p>||rj-r i||2=distance between winning neuron and other neurons</p>
							<p>&sigma;(t)=neighbourhood radius that decreases with time t.</p>
						</list-item>
						<list-item>
							<p><bold>Continuation:</bold> Repeat steps 2&ndash;4 until there is no change in weight vectors or up to certain number of iterations. For each input vector, find the best matching weight vector and allot the input vector to the corresponding neuron/cluster.</p>
						</list-item>
					</list>
			</sec>
			<sec>
				<title>Data Normalization</title>
					<p>Data was normalized linearly such that value in each category ranged between 0 and 1. This is done to get unbiased results while ensuring equal importance to all parameters while clustering.</p>
					<p>Normalization Formula = Original data value - Minimum Data value / Maximum data value - Minimum Data value</p>
			</sec>
		</sec>  
		<sec>
			<title>Results and Discussion</title>
				<p>The length of considered sequences varied from 335 to 926 and the molecular weight was found to be in the range
of 38163.7-105122.7.The sequences that lie on higher extreme of molecular weight were found to be peripheral Plasma protein belonging to <italic>Homo sapiens, Mus musculus and Rattus novergicus</italic>. All the sequences possess more negatively charged residues except Q10KY3, Q96NX5, Q91VB2, Q7TNJ7, Q9P7I2, P11730, Q07250, Q13554, Q13555 , Q6DGS3 while Q923T9 and Q2HJF7 contains
equal number of negatively and positively charged residues.</p>
<p>The pH at which a protein carries no charge and exists as zwitterion is termed as Isoelectric point (pI). The pI value of all considered CAM kinase protein sequences were in the range of 4.83 -9.11 where 13 proteins (understandably those with higher number of negative amino acids except for Q00168) are basic and rest of them are acidic. The instability index which gives clue about the stability of a protein <italic>in vitro</italic> can be calculated using the following formula:</p>
<p>i=L-1</p>
<p>II = (10/L) &ast; Sum DIWV(x(i)x(i+1))</p>
<p>i = 1 </p>
<p>where L denotes length of sequence, DIWV(x(i)x(i+1)) is the instability weight value for the dipeptide starting in position i.</p>
<p>This will be particularly useful in comparing the metabolic stabilities of proteins. All the considered sequences were classified as unstable except Q14012 (37.09), Q9P7I2 (38), Q16566 (31.64) and O42844 (36.67) as a value &gt; 40 indicates
an unstable protein. The aliphatic index (AI) which is defined as the relative volume of a protein occupied by aliphatic side chains is regarded as a positive factor for the increase of thermal stability of globular proteins(Ikai, 1980). It can be calculated by the formula:</p>
<p>Aliphatic index = X (Ala) + a&ast;X (Val) + b&ast;X (Leu) + b&ast;X (Ile)</p>
<p>where X (Ala), X (Val), X (Ile) and X (Leu) are the amino acid compositional fractions.</p>
<p>Aliphatic index ranged from 76.24- 96.31.From the molar extinction coefficient of tyrosine, tryptophan and cystine
(cysteine does not absorb appreciably at wavelengths &gt; 260 nm, while cystine does) at a given wavelength, the extinction
coefficient of the native protein in water can be computed using the following equation:</p>
<p>E (Prot) = N (Tyr)&ast;Ext (Tyr) + N (Trp)&ast;Ext (Trp) + N (Cystine)&ast;Ext (Cystine)</p>
<p>swhere (for proteins in water measured at 280 nm): N= number , Ext(Tyr) = 1490, Ext(Trp) = 5500, Ext(Cystine) = 125.</p>
<p>Extinction coefficients of considered sequences at 280 nm range from 30410 to 98180 M&ndash;1 cm&ndash;1 assuming all cysteine residues appear as half cystines. High value of extinction coefficients of some sequences connotes incidence of Cys, Trp and Tyr in high concentration. The extinction coefficients are useful in determining protein concentration required for quantitative study of protein-protein and protein-ligand interactions in solutions.</p>
<p>The Grand Average hydropathy (GRAVY) value for a peptide or protein is calculated as the sum of hydropathy values of all the amino acids, divided by the number of residues in the sequence (<xref ref-type="bibr" rid="r27">Kyte and Doolittle, 1982</xref>). Low values of GRAVY indices which ranged from -0.571 to -0.214 indicate the possibility of better interaction with water. The secondary structure indicates whether a given amino acid lies in a helix, strand or coil. Secondary structure features as predicted using SOPMA are represented in Table 4. The results revealed that alpha helices were found to be predominant followed by random coil, extended strands and beta turns in majority of the sequences while for sequences(Accession number: Q91YS8, Q63450, Q96NX5, Q91VB2, Q7TNJ7, Q13554, Q13555, Q923T9, O42844, Q8N5S9, Q8VBY2, P97756, Q96RR4, Q8C078, O88831), random coils outnumbered other secondary structural features. For Calcium/calmodulindependent protein kinase type II beta chain (Protein ID: P28652) belonging to <italic>Mus musculus</italic>, random coils were found to be equal to alpha helices. Normalized data was clustered using SOM on a 2x2 grid (shown in <xref ref-type="fig" rid="g1">Figure 1</xref>). Unsupervised learning was done on the fly using the data using a learning constant of 0.01 and for 10,000 iterations following which the data got clustered based on the neighborhood distance.</p>
			<fig id="g1">
				<label>Figure 1:</label>
				<caption>
				<title>2&ast;2 grid showing SOM clusters.</title>
				</caption>
				<graphic xlink:href="JPB-02-097-g001.tif"/>
		  </fig>
		  <sec>
		  	<title>In short:</title>
				<p>Total no of sequences selected for study =56</p>
				<p>Total number of input parameters =13</p>
				<p>Total iterations per sequence to form a neuron = 100000</p>
				<p>Total iterations to form 4 grid (2X2) structure = 5600000</p>
				<p>Successful or winning neurons = 4</p>
				<p>Unsuccessful neuron = 0</p>
				<p>In short, all 4 neurons were successful and the data got assembled into 4 clusters. The pie chart below (<xref ref-type="fig" rid="g2">Fig.2</xref>) shows the distribution of sequences in the clusters.</p>
				<p><bold>Cluster (1, 1):</bold> This cluster contains 6 sequences which are exclusively Calcium/calmodulin-dependent protein kinase kinase sequences belonging to <italic>Homo sapiens, Mus musculus and Rattus novergicus</italic> and thus, is characterized by very similar trends which make this cluster distinct from all other clusters. This cluster is also marked by lowest values of GRAVY and isoelectric point. At the same time, this cluster shows a distinctly high range of values of instability index and random coils and uniformly low range of extinction coefficient.</p>
		<fig id="g2">
				<label>Figure 2:</label>
				<caption>
				<title>Pie chart showing distribution of sequences in SOM clusters.</title>
				</caption>
				<graphic xlink:href="JPB-02-097-g002.tif"/>
		  </fig>
		  <fig id="g3">
				<label>Figure 3:</label>
				<caption>
				<title>Cluster (1, 1).</title>
				</caption>
				<graphic xlink:href="JPB-02-097-g003.tif"/>
		  </fig>
		  <fig id="g4">
				<label>Figure 4:</label>
				<caption>
				<title>Cluster (1, 2).</title>
				</caption>
				<graphic xlink:href="JPB-02-097-g004.tif"/>
		  </fig>
		  <fig id="g5">
				<label>Figure 5:</label>
				<caption>
				<title>Cluster (2, 1).</title>
				</caption>
				<graphic xlink:href="JPB-02-097-g005.tif"/>
		  </fig>
		  <p><bold>Cluster (1, 2):</bold> 5 sequences lie in this cluster. Q91YS8 and Q8IU85 although similar in length and type of amino acids varied in isoelectric point, GRAVY, alpha helix and random coils. Q96Nx5, Q91VB2, 7TNJ7which belonged to Calcium/calmodulin dependent protein kinase type 1G got clustered together and showed similar profiles though differing
slightly in Instability index, GRAVY and beta turn. This cluster comprised of shortest sequences where random coils were more than alpha helices. This cluster is marked by uniformly high range of extended strands.</p>
<p><bold>Cluster (2, 1):</bold> This cluster is constituted by 26 sequences. Except for 4 sequences (Q24210, o14396, O70859,
Q62915, this cluster comprises of sequences with low molecular weight and length. O14936, O70589 and Q62915 which belonged to peripheral plasma membrane protein showed nearly identical profiles in SOM cluster and got assembled in neighboring cells. Calcium/calmodulin-dependent serine/threonine-protein kinases sequences also showed similar range of values and were placed at neighboring places. Sequences that belonged to Calcium/ calmodulin-dependent protein kinase type II alpha chain also were lying in proximity in the cluster with similar profiles and differed markedly from next sequence that belonged to Calcium/calmodulin dependent protein kinase type II Delta chain sequence from <italic>Xenopus laevis</italic>.</p>
<p>Cluster (2, 2): 19 sequences that got assembled in this cluster are Calcium/calmodulin dependent protein kinase type II sequences except Q10KY3 which is described as Calcium/ calmodulin-dependent serine/threonine-protein kinase 1. All these sequences are longer and are of high molecular weights. In general, the alpha helices were more in number as compared to random coils in the considered sequences. Sequences belonging to Calcium/calmodulin-dependent protein kinase type II beta chain got positioned in vicinity in this cluster and showed similar profiles for all the parameters except for Q13554. 3 sequences that belong to Calcium/
calmodulin-dependent protein kinase type II gamma chain also got clustered together with slight variation. Gradient in parameter values can be attributed to the fact that this cluster is assemblage of various types of sequences belonging to Calcium/calmodulin-dependent protein kinase type II &aacute;, &acirc; chains, &auml; and &atilde; chains belonging to various species. Even trivial differences at the sequence level and type of chain are clearly reflected in case of sequences from Danio rerio.</p>
		<fig id="g6">
				<label>Figure 6:</label>
				<caption>
				<title>Cluster (2, 2).</title>
				</caption>
				<graphic xlink:href="JPB-02-097-g006.tif"/>
		  </fig>
			</sec>
		</sec>	
		<sec>
			<title>Future Perspective</title>
				<p>A fairly good amount of raw sequence data pertaining to Protein super families and families exist in public domain databases. Conventional methods for defining a protein family rely on signatures, motifs and structural or functional
domain information. The method presented in this report allow us to think in a different direction where we can go for further sub-classification of these available large data and this approach may provide a cue for sophisticated intelligent classification and clustering enabling categorization of new subclasses or classes which may aid in new criteria generation for tapping into this wealth of information.</p>
		</sec>
		<sec>
			<title>Conclusion</title>
				<p>Bioinformatics analyses have been employed by researchers to provide substantial information about the biological macromolecules in shortest span while eliminating to a certain extent, the need of time consuming expensive experiments. With the exponential rise in amount of data being generated, one can not overlook the need of exploring new
methods for clustering and classification of such data. Recently, there have been attempts to employ data mining approaches
in biological relevance (<xref ref-type="bibr" rid="r7">Banerjee et al., 2007</xref>, <xref ref-type="bibr" rid="r6">2008</xref>). Artificial Neural Networks (ANN) like Self Organizing Maps have innate penchant to learn and can recognize patterns in data without prior information (<xref ref-type="bibr" rid="r28">Lampinen and Oja, 1992</xref>). SOM is highly effective sophisticated data clustering tool for visualizing complex data by reducing dimensions. These have been successfully exploited in bioinformatics in chromosome structural studies (<xref ref-type="bibr" rid="r26">Kyan et al., 2001</xref>), motif discovery( <xref ref-type="bibr" rid="r31">Mahony et al., 2006</xref>, <xref ref-type="bibr" rid="r4">Arrigo et al., 1991</xref>), identification of genome signature (<xref ref-type="bibr" rid="r1">Abe et al., 2002</xref>), codon usage diversity (<xref ref-type="bibr" rid="r24">Kanaya et al., 2001</xref>, <xref ref-type="bibr" rid="r47">Wang et al., 2001</xref>), gene prediction(<xref ref-type="bibr" rid="r29">Mahony et al., 2004</xref>), identification of transcription binding sites(<xref ref-type="bibr" rid="r30">Mahony et al., 2005</xref>), sequence analysis(<xref ref-type="bibr" rid="r37">Oja et al., 2005</xref>), nucleic acid classification (<xref ref-type="bibr" rid="r35">Naenna et al., 2003</xref>) and gene expression analysis (<xref ref-type="bibr" rid="r39">Ressom et al., 2003</xref>; <xref ref-type="bibr" rid="r12">Covell et al., 2003</xref>).</p>
<p>In this study, physiochemical properties were calculated for 56 CAM kinase sequences using <italic>in silico</italic> tools. SOMs were employed to segregate data according to variation in properties and group them in separate clusters according to trend observed in properties. SOMs seem to be a perfect solution for clustering and visualization of such sequence data for easy interpretation owing to its innate simplicity.</p>
		</sec>     
			 
	</body>	    	  	 	
	<back>
	<ack>
			<p>Authors thank Dr. J.S.Yadav, Director, IICT for his continuous support and encouragement. AKB thanks CSIR for
Senior Research Fellowship. NA thanks DST for Research Associate fellowship. Authors thank anonymous reviewers for their valuable suggestions for improvement of manuscript.</p>
	</ack>			
		<ref-list>
			<title>References</title>
				<ref id="r1">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Abe</surname>
								<given-names>T</given-names>
							</name>
							<name>
								<surname>Kanaya</surname>
								<given-names>S</given-names>
							</name>
							<name>
								<surname>Kinouchi</surname>
								<given-names>M</given-names>
							</name>
							<name>
								<surname>Ichiba</surname>
								<given-names>Y</given-names>
							</name>
							<name>
								<surname>Kozuki</surname>
								<given-names>T</given-names>
							</name><etal/>														
							</person-group>
							<year>2002</year>
							<article-title>A novel bioinformatics strategy for unveiling hidden genome signatures of eukaryotes: Self organizing map of oligonucleotide frequency</article-title>
							<source>Genome Informatics</source>
							<volume>13</volume>
							<fpage>12</fpage>
							<lpage>20</lpage>
				</citation>
				</ref>
				<ref id="r2">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Andrade</surname>
								<given-names>MA</given-names>
							</name>
							<name>
								<surname>Casari</surname>
								<given-names>G</given-names>
							</name>
							<name>
								<surname>Sander</surname>
								<given-names>C</given-names>
							</name>
							<name>
								<surname>Valencia</surname>
								<given-names>A</given-names>
							</name>							
							</person-group>
							<year>1997</year>
							<article-title>Classification of protein families and detection of the determinant residues with an improved self organizing map</article-title>
							<source>Biological Cybernetics</source>
							<volume>76</volume>
							<fpage>441</fpage>
							<lpage>450</lpage>
			 	</citation>
				</ref>
				<ref id="r3">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Arab</surname>
								<given-names>A</given-names>
							</name>
							<name>
								<surname>Lek</surname>
								<given-names>S</given-names>
							</name>
							<name>
								<surname>Lounaci</surname>
								<given-names>A</given-names>
							</name>
							<name>
								<surname>Park</surname>
								<given-names>YS</given-names>
							</name>							
							</person-group>
							<year>2004</year>
							<article-title>Spatial and temporal patterns of benthic invertebrate communities in an intermittent river (North Africa)</article-title>
							<source>Ann De Limnol</source> 
							<volume>40</volume>
							<fpage>317</fpage>
							<lpage>327</lpage>
			 	</citation>
				</ref>
				<ref id="r4">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Arrigo</surname>
								<given-names>P</given-names>
							</name>
							<name>
								<surname>Giuliano</surname>
								<given-names>F</given-names>
							</name>
							<name>
								<surname>Scalia</surname>
								<given-names>F</given-names>
							</name>
							<name>
								<surname>Rapallo</surname>
								<given-names>A</given-names>
							</name>
							<name>
								<surname>Damiani</surname>
								<given-names>G</given-names>
							</name>
							</person-group>
							<year>1991</year>
							<article-title>Identification of a new motif on nucleic acid sequence data using Kohonen's self-organizing map</article-title>
							<source>Computer Applications in Biosciences</source>
							<volume>7</volume> 							
							<fpage>353</fpage>
							<lpage>357</lpage>
			 	</citation>
				</ref>
				<ref id="r5">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Bairoch</surname>
								<given-names>A</given-names>
							</name>
							<name>
								<surname>Apweiler</surname>
								<given-names>R</given-names>
							</name>														
							</person-group>
							<year>2000</year>
							<article-title>The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000</article-title>
							<source>Nucleic Acids Research</source> 
							<volume>28</volume>
							<fpage>45</fpage>
							<lpage>48</lpage>
			 	</citation>
				</ref>
				<ref id="r6">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Banerjee</surname>
								<given-names>AK</given-names>
							</name>
							<name>
								<surname>Arora</surname>
								<given-names>N</given-names>
							</name>
							<name>
								<surname>Varakantham</surname>
								<given-names>P</given-names>
							</name>
							<name>
								<surname>Murty</surname>
								<given-names>USN</given-names>
							</name>							
							</person-group>
							<year>2008</year>
							<article-title>Exploring the Interplay of Sequence and Structural Features in Determining the Flexibility of AGC Kinase Protein Family: A Bioinformatics Approach</article-title>
							<source>Journal of Proteomics and Bioinformatics</source> 
							<volume>1</volume>
							<fpage>77</fpage>
							<lpage>89</lpage>
			 	</citation>
				</ref>
				<ref id="r7">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Banerjee</surname>
								<given-names>AK</given-names>
							</name>
							<name>
								<surname>Arora</surname>
								<given-names>N</given-names>
							</name>
							<name>
								<surname>Murty</surname>
								<given-names>USN</given-names>
							</name>														
							</person-group>
							<year>2007</year>
							<article-title>Stability of ITS2 secondary structure in Anopheles: What Lies Beneath?</article-title>
							<source>International Journal of Integrative Biology</source> 
							<volume>1</volume>
							<fpage>232</fpage>
							<lpage>238</lpage>
			 	</citation>
				</ref>
				<ref id="r8">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Braun</surname>
								<given-names>AP</given-names>
							</name>
							<name>
								<surname>Schulman</surname>
								<given-names>H</given-names>
							</name>														
							</person-group>
							<year>1995</year>
 							<article-title>The multifunctional calcium/ calmodulin-dependent protein kinase: From form to function</article-title>
							<source>Annu Rev Physiol</source> 
							<volume>57</volume>
							<fpage>417</fpage>
							<lpage>445</lpage>
			 	</citation>
				</ref>
				<ref id="r9">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Chen</surname>
								<given-names>D</given-names>
							</name>
							<name>
								<surname>Chang</surname>
								<given-names>RF</given-names>
							</name>
							<name>
								<surname>Huang</surname>
								<given-names>YL</given-names>
							</name>							
							</person-group>
							<year>2000</year>
 							<article-title>Breast Cancer Diagnosis using Self-Organizing Map for Sonography</article-title>
							<source>Ultrasound in Med &amp; Biol</source> 
							<volume>26</volume>
							<fpage>405</fpage>
							<lpage>411</lpage>
			 	</citation>
				</ref>
				<ref id="r10">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Chon</surname>
								<given-names>TS</given-names>
							</name>
							<name>
								<surname>Park</surname>
								<given-names>YS</given-names>
							</name>
							<name>
								<surname>Moon</surname>
								<given-names>KH</given-names>
							</name>
							<name>
								<surname>Cha</surname>
								<given-names>EY</given-names>
							</name>							
							</person-group>
							<year>1996</year>
 							<article-title>Patternizing communities by using an artificial neural network</article-title>
							<source>Ecol Model</source> 
							<volume>90</volume>
							<fpage>69</fpage>
							<lpage>78</lpage>
			 	</citation>
				</ref>
				<ref id="r11">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Chuang</surname>
								<given-names>CH</given-names>
							</name>
							<name>
								<surname>Cheng</surname>
								<given-names>PE</given-names>
							</name>
							<name>
								<surname>Liou</surname>
								<given-names>M</given-names>
							</name>
							<name>
								<surname>Liou</surname>
								<given-names>CE</given-names>
							</name>
							<name>
								<surname>Kuo</surname>
								<given-names>YT</given-names>
							</name>							
							</person-group>
							<year>2007</year>
 							<article-title>Application of Self- Organizing Map (SOM) for Cerebral Cortex Reconstruction</article-title>
							<source>International Journal of Computational Intelligence Research</source> 
							<volume>3</volume>
							<fpage>26</fpage>
							<lpage>30</lpage>
			 	</citation>
				</ref>
				<ref id="r12">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Covell</surname>
								<given-names>DG</given-names>
							</name>
							<name>
								<surname>Wallqvist</surname>
								<given-names>A</given-names>
							</name>
							<name>
								<surname>Alfred</surname>
								<given-names>A</given-names>
							</name>
							<name>
								<surname>Rabow</surname>
								<given-names>TN</given-names>
							</name>														
							</person-group>
							<year>2003</year>
 							<article-title>Molecular Classification of Cancer: Unsupervised Self-Organizing Map Analysis of Gene Expression Microarray Data</article-title>
							<source>Molecular Cancer Therapeutics</source> 
							<volume>2</volume>
							<fpage>317</fpage>
							<lpage>332</lpage>
			 	</citation>
				</ref>
				<ref id="r13">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Edelman</surname>
								<given-names>AM</given-names>
							</name>
							<name>
								<surname>Blumenthal</surname>
								<given-names>DK</given-names>
							</name>
							<name>
								<surname>Krebs</surname>
								<given-names>EG</given-names>
							</name>																					
							</person-group>
							<year>1987</year>
 							<article-title>Protein serine/threonine kinases</article-title>
							<source>Annu Rev Biochem</source> 
							<volume>56</volume>
							<fpage>567</fpage>
							<lpage>613</lpage>
			 	</citation>
				</ref>
				<ref id="r14">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Eto</surname>
								<given-names>K</given-names>
							</name>
							<name>
								<surname>Takahashi</surname>
								<given-names>N</given-names>
							</name>
							<name>
								<surname>Kimura</surname>
								<given-names>Y</given-names>
							</name>
							<name>
								<surname>Masuho</surname>
								<given-names>Y</given-names>
							</name>
							<name>
								<surname>Arai</surname>
								<given-names>K</given-names>
							</name><etal/>																					
							</person-group>
							<year>1999</year>
 							<article-title>Ca2+/Calmodulin-dependent Protein Kinase Cascade in Caenorhabditis elegans :Implication In Transcriptional Activation</article-title>
							<source>J Biol Chem</source> 
							<volume>32</volume>
							<fpage>22556</fpage>
							<lpage>22562</lpage>
			 	</citation>
				</ref>
				<ref id="r15">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Felsenstein</surname>
								<given-names>J</given-names>
							</name>																												
							</person-group>
							<year>1983</year>
 							<article-title>Parsimony in systematics: biological and statistical issues</article-title>
							<source>Annual Review of Ecology and Systematics</source> 
							<volume>14</volume>
							<fpage>313</fpage>
							<lpage>333</lpage>
			 	</citation>
				</ref>
				<ref id="r16">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Felsenstein</surname>
								<given-names>J</given-names>
							</name>																												
							</person-group>
							<year>1982</year>
 							<article-title>Numerical methods for inferring evolutionary trees</article-title>
							<source>Quarterly Review of Biology</source> 
							<volume>57</volume>
							<fpage>379</fpage>
							<lpage>404</lpage>
			 	</citation>
				</ref>
				<ref id="r17">
				<citation citation-type="book">
							<person-group>
							<name>
								<surname>Gasteiger</surname>
								<given-names>E</given-names>
							</name>
							<name>
								<surname>Hoogland</surname>
								<given-names>C</given-names>
							</name>
							<name>
								<surname>Gattiker</surname>
								<given-names>A</given-names>
							</name>
							<name>
								<surname>Duvaud</surname>
								<given-names>S</given-names>
							</name>
							<name>
								<surname>Wilkins</surname>
								<given-names>MR</given-names>
							</name><etal/>																					
							</person-group>
							<year>2005</year>
 							<article-title>Protein identification and analysis tools on the ExPASy server</article-title>
							<publisher-name>In: Walker JM (ed) The proteomics protocols handbook</publisher-name>
							<publisher-loc>Humana New York</publisher-loc>
							<fpage>571</fpage>
							<lpage>607</lpage>
			 	</citation>
				</ref>
				<ref id="r18">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Geourjon</surname>
								<given-names>C</given-names>
							</name>
							<name>
								<surname>Del&eacute;age</surname>
								<given-names>G</given-names>
							</name>																												
							</person-group>
							<year>2005</year>
 							<article-title>SOPMA: significant improvements in protein secondary structure prediction by consensus prediction from multiple alignments</article-title>
							<source>Comput Appl Biosci</source>
							<volume>11</volume>							
							<fpage>681</fpage>
							<lpage>684</lpage>
			 	</citation>
				</ref>
				<ref id="r19">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Hanke</surname>
								<given-names>J</given-names>
							</name>
							<name>
								<surname>Beckmann</surname>
								<given-names>G</given-names>
							</name>
							<name>
								<surname>Bork</surname>
								<given-names>P</given-names>
							</name>
							<name>
								<surname>Reich</surname>
								<given-names>JG</given-names>
							</name>																												
							</person-group>
							<year>1996</year>
 							<article-title>Self-Organizing hierarchic network for pattern recognition in protein sequence</article-title>
							<source>Protein Science</source>
							<volume>5</volume>
							<fpage>72</fpage>
							<lpage>82</lpage>
			 	</citation>
				</ref>
				<ref id="r20">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Hoshi</surname>
								<given-names>K</given-names>
							</name>
							<name>
								<surname>Kawakami</surname>
								<given-names>J</given-names>
							</name>
							<name>
								<surname>Sato</surname>
								<given-names>W</given-names>
							</name>
							<name>
								<surname>Sato</surname>
								<given-names>K</given-names>
							</name>
							<name>
								<surname>Sugawara</surname>
								<given-names>A</given-names>
							</name><etal/>																												
							</person-group>
							<year>2006</year>
 							<article-title>Assisting the Diagnosis of Thyroid Diseases with Bayesian-Type and SOM-Type Neural Networks Making Use of Routine Test Data</article-title>
							<source>Chemical &amp; Pharmaceutical Bulletin</source>
							<volume>54</volume>
							<fpage>1162</fpage>
							<lpage>1169</lpage>
			 	</citation>
				</ref>
				<ref id="r21">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Hsu</surname>
								<given-names>AL</given-names>
							</name>
							<name>
								<surname>Tang</surname>
								<given-names>SL</given-names>
							</name>
							<name>
								<surname>Halgamuge</surname>
								<given-names>SK</given-names>
							</name>																																			
							</person-group>
							<year>2003</year>
 							<article-title>An unsupervised hierarchical dynamic self organizing approach to cancer class discovery and marker gene identification in microarray data</article-title>
							<source>Bioinformatics</source>
							<volume>19</volume>
							<fpage>2131</fpage>
							<lpage>2140</lpage>
			 	</citation>
				</ref>
				<ref id="r22">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Ikai</surname>
								<given-names>AJ</given-names>
							</name>																												
							</person-group>
							<year>1980</year>
 							<article-title>Thermostability and aliphatic index of globular proteins</article-title>
							<source>J Biochem</source> 
							<volume>88</volume>
							<fpage>1895</fpage>
							<lpage>1898</lpage>
			 	</citation>
				</ref>
				<ref id="r23">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Kaartinen</surname>
								<given-names>J</given-names>
							</name>
							<name>
								<surname>Hiltunen</surname>
								<given-names>Y</given-names>
							</name>
							<name>
								<surname>Kovanen</surname>
								<given-names>PT</given-names>
							</name>
							<name>
								<surname>Ala</surname>
								<given-names>KM</given-names>
							</name>																																			
							</person-group>
							<year>1998</year>
 							<article-title>Application of self organizing maps for the detection and classification of human blood plasma lipoprotein lipid profiles on the basis of 1H NMR spectroscopy data</article-title>
							<source>NMR in Biomedicine</source>
							<volume>11</volume>
							<fpage>168</fpage>
							<lpage>176</lpage>
			 	</citation>
				</ref>
				<ref id="r24">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Kanaya</surname>
								<given-names>S</given-names>
							</name>
							<name>
								<surname>Kinouchi</surname>
								<given-names>M</given-names>
							</name>
							<name>
								<surname>Abe</surname>
								<given-names>T</given-names>
							</name>
							<name>
								<surname>Kudo</surname>
								<given-names>Y</given-names>
							</name>
							<name>
								<surname>Yamada</surname>
								<given-names>Y</given-names>
							</name><etal/>																																			
							</person-group>
							<year>2001</year>
 							<article-title>Analysis of codon usage diversity of bacterial genes with a self-organizing map (SOM): Characterization of horizontally transferred genes with emphasis on the <italic>E. coli</italic> O157 genome</article-title>
							<source>Gene</source>
							<volume>276</volume>
							<fpage>89</fpage>
							<lpage>99</lpage>
			 	</citation>
				</ref>
				<ref id="r25">
				<citation citation-type="book">
							<person-group>
							<name>
								<surname>Kohonen</surname>
								<given-names>T</given-names>
							</name>																																										
							</person-group>
							<year>2001</year>
 							<article-title>Self-Organizing Maps</article-title>
							<edition>3rd edition</edition>							
							<publisher-loc>Berlin, Heideberg</publisher-loc>
							<publisher-name>Springer Press</publisher-name>
			 	</citation>
				</ref>
				<ref id="r26">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Kyan</surname>
								<given-names>MJ</given-names>
							</name>
							<name>
								<surname>Guan</surname>
								<given-names>L</given-names>
							</name>
							<name>
								<surname>Arnison</surname>
								<given-names>MR</given-names>
							</name>
							<name>
								<surname>Cogswell</surname>
								<given-names>CJ</given-names>
							</name>																																									
							</person-group>
							<year>2001</year>
 							<article-title>Feature Extraction of Chromsomes From 3-D Confocal Microscope Images</article-title>
							<source>IEEE Transactions on Biomedical Engineering</source>
							<volume>48</volume>
							<fpage>1306</fpage>
							<lpage>1318</lpage>
			 	</citation>
				</ref>
				<ref id="r27">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Kyte</surname>
								<given-names>J</given-names>
							</name>
							<name>
								<surname>Doolittle</surname>
								<given-names>RF</given-names>
							</name>																																																
							</person-group>
							<year>1982</year>
 							<article-title>A simple method for displaying the hydropathic character of a protein</article-title>
							<source>J Mol Biol</source>
							<volume>157</volume>
							<fpage>105</fpage>
							<lpage>132</lpage>
			 	</citation>
				</ref>
				<ref id="r28">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Lampinen</surname>
								<given-names>J</given-names>
							</name>
							<name>
								<surname>Oja</surname>
								<given-names>E</given-names>
							</name>																																																
							</person-group>
							<year>1992</year>
 							<article-title>Clustering properties of hierarchical self-organizing maps</article-title>
							<source>Journal of Mathematical Imaging and Vision</source>
							<volume>2</volume>
							<fpage>261</fpage>
							<lpage>272</lpage>
			 	</citation>
				</ref>
				<ref id="r29">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Mahony</surname>
								<given-names>S</given-names>
							</name>
							<name>
								<surname>McInerney</surname>
								<given-names>JO</given-names>
							</name>
							<name>
								<surname>Smith</surname>
								<given-names>TJ</given-names>
							</name>
							<name>
								<surname>Golden</surname>
								<given-names>A</given-names>
							</name>																																									
							</person-group>
							<year>2004</year>
 							<article-title>Gene prediction using the Self- Organizing Map: Automatic generation of multiple gene models</article-title>
							<source>BMC Bioinformatics</source>
							<volume>5</volume>
							<fpage>23</fpage>							
			 	</citation>
				</ref>
				<ref id="r30">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Mahony</surname>
								<given-names>S</given-names>
							</name>
							<name>
								<surname>Hendrix</surname>
								<given-names>D</given-names>
							</name>
							<name>
								<surname>Golden</surname>
								<given-names>A</given-names>
							</name>
							<name>
								<surname>Smith</surname>
								<given-names>TJ</given-names>
							</name>
							<name>
								<surname>Rokhsar</surname>
								<given-names>D</given-names>
							</name>																																									
							</person-group>
							<year>2005</year>
 							<article-title>Transcription factor binding site identification using the self-organizing map</article-title>
							<source>Bioinformatics</source>
							<volume>21</volume>
							<fpage>1807</fpage>
							<lpage>1814</lpage>
			 	</citation>
				</ref>
				<ref id="r31">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Mahony</surname>
								<given-names>S</given-names>
							</name>
							<name>
								<surname>Benos</surname>
								<given-names>PV</given-names>
							</name>
							<name>
								<surname>Smith</surname>
								<given-names>TJ</given-names>
							</name>
							<name>
								<surname>Golden</surname>
								<given-names>A</given-names>
							</name>																																																
							</person-group>
							<year>2006</year>
 							<article-title>Selforganizing neural networks to support the discovery of DNA-binding motifs</article-title>
							<source>Neural Networks</source>
							<volume>19</volume>
							<fpage>950</fpage>
							<lpage>962</lpage>
			 	</citation>
				</ref>
				<ref id="r32">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Murty</surname>
								<given-names>USN</given-names>
							</name>
							<name>
								<surname>Arora</surname>
								<given-names>N</given-names>
							</name>																																																						
							</person-group>
							<year>2007</year>
 							<article-title>Application Of Self-Organizing Maps For Prioritization Of Malaria Control Operations In Changlang District, Arunachal Pradesh</article-title>
							<source>The Internet Journal of Epidemiology 4(2)</source>							
			 	</citation>
				</ref>
				<ref id="r33">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Banerjee</surname>
								<given-names>AK</given-names>
							</name>
							<name>
								<surname>Arora</surname>
								<given-names>N</given-names>
							</name>
							<name>
								<surname>Murty</surname>
								<given-names>US</given-names>
							</name>																																																							
							</person-group>
							<year>2009</year>
 							<article-title>Clustering and Classification of <italic>Anopheline</italic> Spacer Sequences using Self Organizing Maps</article-title>
							<source>The Internet Journal of Genomics and Proteomics 7 (1)</source>							
			 	</citation>
				</ref>
				<ref id="r34">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Banerjee</surname>
								<given-names>AK</given-names>
							</name>
							<name>
								<surname>Kiran</surname>
								<given-names>K</given-names>
							</name>
							<name>
								<surname>Murty</surname>
								<given-names>US</given-names>
							</name>
							<name>
								<surname>Venkateswarlu</surname>
								<given-names>Ch</given-names>
							</name>																																																
							</person-group>
							<year>2008</year>
 							<article-title>Classification and identification of mosquito species using artificial neural networks</article-title>
							<source>Comput Biol Chem</source>
							<volume>32</volume>
							<fpage>442</fpage>
							<lpage>447</lpage>
			 	</citation>
				</ref>
				<ref id="r35">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Naenna</surname>
								<given-names>T</given-names>
							</name>
							<name>
								<surname>Bress</surname>
								<given-names>RA</given-names>
							</name>
							<name>
								<surname>Embrechts</surname>
								<given-names>MJ</given-names>
							</name>																																																							
							</person-group>
							<year>2003</year>
 							<article-title>DNA classifications with self-organizing maps (SOMs). In Proceedings of the IEEE international workshop on soft computing in industrial applications</article-title>							
			 	</citation>
				</ref>
				<ref id="r36">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Nairn</surname>
								<given-names>AC</given-names>
							</name>
							<name>
								<surname>Hemmings</surname>
								<given-names>HC</given-names>
								<suffix>Jr</suffix>
							</name>
							<name>
								<surname>Greengard</surname>
								<given-names>P</given-names>
							</name>																																																							
							</person-group>
							<year>1985</year>
 							<article-title>Protein kinases in the brain</article-title>
							<source>Annu Rev Biochem</source>
							<volume>54</volume>
							<fpage>931</fpage>
							<lpage>976</lpage>
			 	</citation>
				</ref>
				<ref id="r37">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Oja</surname>
								<given-names>M</given-names>
							</name>
							<name>
								<surname>Sperber</surname>
								<given-names>GO</given-names>								
							</name>
							<name>
								<surname>Blomberg</surname>
								<given-names>J</given-names>
							</name>
							<name>
								<surname>Kaski</surname>
								<given-names>S</given-names>								
							</name>																																																							
							</person-group>
							<year>2005</year>
 							<article-title>Selforganizing map-based discovery and visualization of human endogenous retroviral sequence groups</article-title>
							<source>International Journal of Neural Systems</source>
							<volume>15</volume>
							<fpage>163</fpage>
							<lpage>179</lpage>
			 	</citation>
				</ref>
				<ref id="r38">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Page</surname>
								<given-names>RDM</given-names>
							</name>																																																														
							</person-group>
							<year>1996</year>
 							<article-title>TREEVIEW: An application to display phylogenetic trees on personal computers</article-title>
							<source>Computer Applications in the Biosciences</source>
							<volume>12</volume>
							<fpage>357</fpage>
							<lpage>358</lpage>
			 	</citation>
				</ref>
				<ref id="r39">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Ressom</surname>
								<given-names>H</given-names>
							</name>
							<name>
								<surname>Wang</surname>
								<given-names>D</given-names>								
							</name>
							<name>
								<surname>Natarajan</surname>
								<given-names>P</given-names>
							</name>																																																														
							</person-group>
							<year>2003</year>
 							<article-title>Clustering gene expression data using adaptive double self-organizing map</article-title>
							<source>Physiol Genomics</source>
							<volume>14</volume>
							<fpage>35</fpage>
							<lpage>46</lpage>
			 	</citation>
				</ref>
				<ref id="r40">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Soderling</surname>
								<given-names>TR</given-names>
							</name>																																																																					
							</person-group>
							<year>1996</year>
 							<article-title>Structure and regulation of calcium/calmodulin-dependent protein kinases II and IV</article-title>
							<source>Biochim Biophys Acta</source>
							<volume>1297</volume>
							<fpage>131</fpage>
							<lpage>138</lpage>
			 	</citation>
				</ref>
				<ref id="r41">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Tamayo</surname>
								<given-names>P</given-names>
							</name>
							<name>
								<surname>Slonim</surname>
								<given-names>D</given-names>								
							</name>
							<name>
								<surname>Mesirov</surname>
								<given-names>J</given-names>
							</name>
							<name>
								<surname>Zhu</surname>
								<given-names>Q</given-names>								
							</name>
							<name>
								<surname>Kitareewan</surname>
								<given-names>S</given-names>
							</name><etal/>																																																														
							</person-group>
							<year>1999</year>
 							<article-title>Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation</article-title>
							<source>Proc Natl Acad Sci USA</source>
							<volume>96</volume>
							<fpage>2907</fpage>
							<lpage>2912</lpage>
			 	</citation>
				</ref>
				<ref id="r42">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Tarca</surname>
								<given-names>AL</given-names>
							</name>
							<name>
								<surname>Carey</surname>
								<given-names>VJ</given-names>								
							</name>
							<name>
								<surname>Chen</surname>
								<given-names>X</given-names>
							</name>
							<name>
								<surname>Romero</surname>
								<given-names>R</given-names>								
							</name>
							<name>
								<surname>Draghici</surname>
								<given-names>S</given-names>
							</name>																																																														
							</person-group>
							<year>2007</year>
 							<article-title>Machine Learning and Its Applications to Biology</article-title>
							<source>PLoS Computational Biology</source>
							<volume>3</volume>
							<fpage>e116</fpage>							
			 	</citation>
				</ref>
				<ref id="r43">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Thompson</surname>
								<given-names>JD</given-names>
							</name>
							<name>
								<surname>Higgins</surname>
								<given-names>DG</given-names>								
							</name>
							<name>
								<surname>Gibson</surname>
								<given-names>TJ</given-names>
							</name>																																																																					
							</person-group>
							<year>1994</year>
 							<article-title>CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice</article-title>
							<source>Nucleic Acids Research</source>
							<volume>22</volume>
							<fpage>4673</fpage>
							<lpage>4680</lpage>							
			 	</citation>
				</ref>
				<ref id="r44">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Tison</surname>
								<given-names>J</given-names>
							</name>
							<name>
								<surname>Park</surname>
								<given-names>YS</given-names>								
							</name>
							<name>
								<surname>Coste</surname>
								<given-names>M</given-names>
							</name>
							<name>
								<surname>Wasson</surname>
								<given-names>JG</given-names>								
							</name>
							<name>
								<surname>Ector</surname>
								<given-names>L</given-names>
							</name><etal/>																																																																					
							</person-group>
							<year>2005</year>
 							<article-title>Typology of diatom communities and the influence of hydro-ecoregions: A study on the French hydrosystem scale</article-title>
							<source>Wat Res</source>
							<volume>39</volume>
							<fpage>3177</fpage>
							<lpage>3188</lpage>							
			 	</citation>
				</ref>
				<ref id="r45">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Unneberg</surname>
								<given-names>P</given-names>
							</name>
							<name>
								<surname>Merelo</surname>
								<given-names>JJ</given-names>								
							</name>
							<name>
								<surname>Chacon</surname>
								<given-names>P</given-names>
							</name>
							<name>
								<surname>Moran</surname>
								<given-names>F</given-names>								
							</name>																																																																												
							</person-group>
							<year>2001</year>
 							<article-title>SOMCD: Method for evaluating protein secondary structure from UV circular dichroism spectra</article-title>
							<source>Proteins: Structure, Function, and Bioinformatics</source>
							<volume>42</volume>
							<fpage>460</fpage>
							<lpage>470</lpage>							
			 	</citation>
				</ref>
				<ref id="r46">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Walley</surname>
								<given-names>WJ</given-names>
							</name>
							<name>
								<surname>Martin</surname>
								<given-names>RW</given-names>								
							</name>
							<name>
								<surname>O'Connor</surname>
								<given-names>MA</given-names>
							</name>																																																																																			
							</person-group>
							<year>2000</year>
 							<article-title>Selforganising maps for classification of river quality from biological and environmental data. In: R. Denzer, D.A. Swayne, M. Purvis and G. Schimak, Editors, Environmental Software Systems: Environmental Information and Decision Support</article-title>
							<source>IFIP Conference Series, Kluwer Academic Publishers, Boston Hardbound</source>							
							<fpage>pp27</fpage>
							<lpage>41</lpage>							
			 	</citation>
				</ref>
				<ref id="r47">
				<citation citation-type="journal">
							<person-group>
							<name>
								<surname>Wang</surname>
								<given-names>HC</given-names>
							</name>
							<name>
								<surname>Badger</surname>
								<given-names>J</given-names>								
							</name>
							<name>
								<surname>Kearney</surname>
								<given-names>P</given-names>
							</name>
							<name>
								<surname>Li</surname>
								<given-names>M</given-names>								
							</name>																																																																												
							</person-group>
							<year>2001</year>
 							<article-title>Analysis of codon usage patterns of bacterial genomes using the self-organizing map</article-title>
							<source>Molecular Biology and Evolution</source>
							<volume>18</volume>
							<fpage>792</fpage>
							<lpage>800</lpage>							
			 	</citation>
				</ref>								
        </ref-list>		
		<glossary>
			<def-list>
				<title>Abbreviations</title>
				<def-item>
					<term>Blast</term>
					<def>
						<p>Basic Local Alignment Search Tool</p>
					</def>
				</def-item>
				<def-item>
					<term>Blastp</term>
					<def>
						<p>Blast for protein database</p>
					</def>
				</def-item>
				<def-item>
					<term>EBI</term>
					<def>
						<p>European Bioinformatics Institute</p>
					</def>
				</def-item>
				<def-item>
					<term>EST</term>
					<def>
						<p>Expressed sequence tag</p>
					</def>
				</def-item>
				<def-item>
					<term>i.d</term>
					<def>
						<p>Identidity</p>
					</def>
				</def-item>
				<def-item>
					<term>IDA</term>
					<def>
						<p>Information-dependent acquisition</p>
					</def>
				</def-item>
				<def-item>
					<term>MS</term>
					<def>
						<p>Mass Spectrometry</p>
					</def>
				</def-item>
				<def-item>
					<term>MS/MS</term>
					<def>
						<p>tandem mass spectrometry</p>
					</def>
				</def-item>
				<def-item>
					<term>NCBI</term>
					<def>
						<p>National Centre of Biological Information</p>
					</def>
				</def-item>
				<def-item>
					<term>NCBInr</term>
					<def>
						<p>NCBI nonredundant protein database</p>
					</def>
				</def-item>
				<def-item>
					<term>NCBI-EST</term>
					<def>
						<p>NCBI Expressed Sequence Tag database</p>
					</def>
				</def-item>
				<def-item>
					<term>PAGE</term>
					<def>
						<p>Polyacrylamide gel electrophoresis</p>
					</def>
				</def-item>
				<def-item>
					<term>PMF</term>
					<def>
						<p>Peptide Mass Fingerprinting</p>
					</def>
				</def-item>
				<def-item>
					<term>rpsblast</term>
					<def>
						<p>Reverseposition specific Blast (search conserved domains on a protein)</p>
					</def>
				</def-item>
				<def-item>
					<term>tblastn</term>
					<def>
						<p>translate blast nucleotide</p>
					</def>
				</def-item>
				<def-item>
					<term>2-D</term>
					<def>
						<p>Two dimensional</p>
					</def>
				</def-item>
				<def-item>
					<term>2-DE</term>
					<def>
						<p>Two dimensional electrophoresis</p>
					</def>
				</def-item>								
			 </def-list>
			</glossary> 			
		</back>
		<floats-wrap>
	<table-wrap position="float" id="t1">
	<label>Table 1.</label>
  			<caption>
  				<title>Peptide sequences of identified proteins.</title>
  			</caption>
   <table frame="hsides" rules="groups">
      <thead>
         <tr>
            <th align="left">Spot Id.</th>
            <th align="left">Protein identity</th>
            <th align="left">Cov.% Score</th>
			<th align="left">Matched peptide sequence</th>											
         </tr>
      </thead>
      <tbody>
         <tr>
            <td>1</td>
            <td>Phenylalanine hydroxylase 32442452 (nr, <italic>D. rerio</italic>) BJ911084(MEPD, <italic>O. latipes</italic>)</td>
            <td>25<sup>a</sup> 76/79</td>
			<td>13/86 peptides</td>											
         </tr>
         <tr>
            <td>2</td>
            <td>Phenylalanine hydroxylase 32442452 (nr, <italic>D. rerio</italic>) BJ737725 (MEPD, <italic>O. latipes</italic>)</td>
            <td>25<sup>a</sup> 74/79</td>
			<td>12/86 peptides</td>														
         </tr>
         <tr>
            <td>3</td>
            <td>Selenium binding protein 1 XP_707845 (nr, <italic>D. rerio</italic>) BJ007475 (MEPD, <italic>O. latipes</italic>) AAH5690(nr, <italic>D. rerio</italic>) BJ007475 (MEPD,<italic>O. latipes</italic>) BJ007475 (est, <italic>O. latipes</italic>)</td>
            <td>28<sup>a</sup> 28/72 5<sup>b</sup> 43/44 26<sup>b</sup> 144/68</td>
			<td>14/56 peptides EEIVYLPCIYR;LILPSLISSR MVEPVEVLWK STGILKPDYLATVDVDPK EEIVYLPCIYR; IYVIDVGTDPRAPK</td>										
         </tr>
         <tr>
            <td>4</td>
            <td>Keratin 18 type 1 CAA74664 (nr, <italic>O. mykiss</italic>) BJ498018 (MEPD, <italic>O. latipes</italic>) AAC38007 (nr, <italic>C. auratus</italic>) BJ747804 (MEPD, <italic>O. latipes</italic>)</td>
            <td>23<sup>a</sup> 40/54 14<sup>b</sup> 137/54</td>
			<td>10/67 peptides LQDALEEQK; MAMQNLNDR; VMTVTQTLVDGK</td>													
         </tr>
		 <tr>
            <td>5</td>
            <td>Unidentified</td>           		         
            <td></td>
            <td></td>           		                                            				
         </tr>
		 <tr>
            <td>6</td>
            <td>DJ-1 BAD67176 (nr, <italic>O. latipes</italic>) Unnamed protein (RKIP) CAG08164 (nr, <italic>T. nigroviridis</italic>) BJ747456 (MEPD,<italic>O. latipes</italic>) Raf kinase inhibitor protein AU170544 (est, <italic>O. latipes</italic>)</td> 
			<td>30<sup>b</sup>  126/50 14<sup>b</sup> 126/50 16<sup>b</sup> 254/61</td>
            <td>NVVICPDTSLEEASK GAEEMETVIPVDVMRR QGPYDVVLLPGGMPGAQNLAESPAVK LYEQLAGK; LYTLALTDPDAPSR YGSLEIDELGK; GNDVSSGCVLSDYVGSGPPK; LYTLALTDPDAPSR</td>
         </tr>
		  <tr>
            <td>7</td>
            <td>Acidic ribosomal phosphoprotein P0 AAP20211 (nr, <italic>P. major</italic>) BJ003458 (MEPD, <italic>O. latipes</italic>)</td>
            <td>14<sup>b</sup> 223/49</td>
            <td>IIQLLDDYPK;GHLENNPALEK TSFFQALGITTK DLLLANKVPAAAR</td>
         </tr>
		  <tr>
            <td>8</td>
            <td>Natural killer enhancing factor AAY25400 (nr, <italic>P. olivaceus</italic>) BJ714211 (MEPD, <italic>O. latipes</italic>) RahpC-TCA family AV669883 (est, <italic>O. latipes</italic>)</td>           		         
            <td>27<sup>b</sup> 172/47 11<italic>b</italic> 81/65</td>
            <td>SVEETLRL AVMPDGQFK;QITINDLPVGR GLFIIDDKGVL ; TISTDYGVLKEDEGIAYR IGSLAPDFTAK TISTDYGVLKEDEGIAYR</td>
         </tr>
		  <tr>
            <td>9</td>
            <td>Unidentified</td>           		         
            <td></td>
            <td></td>           		                                            				
         </tr>
		  <tr>
            <td>10</td>
            <td>Proteasome &alpha; type 1 AAP20159 (nr, <italic>P. major</italic>)BJ01318 (est, <italic>O. latipes</italic>)</td>
            <td>25<sup>b</sup> 141/49 28<sup>b</sup> 167/62</td>
            <td>LVSLIGSK;QGSATVGLK IHQIEYAMEAVK; ALRETLPAEQDLTTK Same 4 peptides + FVFDRPLPTSR</td>
         </tr>
		  <tr>
            <td>11</td>
            <td>Hypoxanthine guanine phosphoribosyl transferase CAA35648 (nr, <italic>C. longicaudatus</italic>) BJ729370(MEPD, <italic>O. latipes</italic>)</td>
            <td>10<sup>b</sup> 62/50</td>
            <td>VIGGDDLSTLTGK SIPMTVDFIR</td>           		                                            				
         </tr>
		  <tr>
            <td>12</td>
            <td>F-actin capping protein A AAR16282 (nr, <italic>T. rubripes</italic>) AM151914 (MEPD, <italic>O. latipes</italic>)</td>           		         
            <td>3<sup>b</sup> 84/51</td>
            <td>ILLNNDNLLR</td>           		                                            				
         </tr> <tr>
            <td>13</td>
            <td>Actin capping protein B subunit AAA52222 (nr, <italic>G. Gallus</italic>) BJ729321 (MEPD, <italic>O. latipes</italic>)</td>           		         
            <td>15<sup>b</sup> 208/47</td>
            <td>TGSGTMNLGGSLTR LVEDMENKIR TKDIVNGLR STLNEIYFGK</td>           		                                            				
         </tr> <tr>
            <td>14</td>
            <td>Enolase AAA70080 (nr, <italic>S. pombe</italic>) BJ722994 (MEPD,<italic>O. latipes</italic>)</td>
            <td>15<sup>b</sup></td>
            <td>IEEELGSR IEEELGDK</td>           		                                            				
         </tr> <tr>
            <td>15</td>
            <td>14-3-3 protein zeta/delta (RKIP-1) P29361 (nr, O. aries ) BJ727482 (MEPD, O. latipes)</td>
            <td>5<sup>b</sup> 84/50</td>
            <td>SVTEQGAELSNEER</td>           		                                            				
         </tr>		 
     </tbody>
 	  </table>
	  <table-wrap-foot>
  				<fn>
  					<p><sup>a</sup>: MALDI-TOF identification.</p>
					<p><sup>b</sup>: LC-ESI-Q-TOF identification.</p>
  				</fn>
  			</table-wrap-foot>
	 </table-wrap>
	</floats-wrap> 
</article>	 
