IDENTIFICATION OF MOTIFS IN BIOACTIVE PEPTIDES PRECURSORS

In-silico study was carried out to study motifs present in precursors of antimicrobial, antithrombotic, casein derived and mineral binding bioactive peptides by using online servers. MEME suite was used for defining the common consensus pattern present in bioactive peptides precursors. It was found that although three different consensus patterns was identified in precursors but theses consensus pattern overlapped in many sequences and all three motifs may be present in many peptides precursor sequences.


Introduction
Bioactive peptides have been defined as specific protein fragments that have a positive impact on body functions and conditions and may ultimately influence health [3]. According to Fitzgerald & Murray (2006) bioactive peptides have been defined as 'peptides with hormone-or drug-like activity that eventually modulate physiological function through binding interactions to specific receptors on target cells leading to induction of physiological responses [4]. Most of the bioactivities of bioactive peptides are encrypted within the primary sequence of the native protein and peptides require to be released through one of the following ways: Hydrolysis by digestive enzymes such as trypsin and pepsin [5,7], Food processing [6] and Through hydrolysis by proteolytic microorganisms or through the action of proteolytic enzymes derived from the microorganisms [4]. MEME (Multiple EM for Motif Elicitation) is one of the most widely used tools for searching for novel 'Signals' in sets of biological sequences. Applications include the discovery of new transcription factor binding sites and protein domains. MEME is an unsupervised learning algorithm for discovering motifs in sets of protein or DNA sequences. To use MEME via the website, the user provides a set of sequences in the FASTA format by either uploading a file or by cut-and-paste. The only other required input is an email address where the results will be sent. The purpose of MEME (Multiple EM for Motif Elicitation) (rhymes with 'team') (1,2) is to allow users to discover signals (called 'motifs') in DNA or protein sequences. Functional similarities between milk and blood coagulation as well as sequence homologies exist in the fibrinogen g-chain and k-casein [8]. Jolles et al., (1986) showed that bovine k-casein f106-116 inhibited platelet aggregation and combined with the receptor site, consequently preventing fibrinogen binding with blood platelets [9]. This inhibition was dependent on peptide concentration. The two smaller tryptic peptides (k-casein f106-112 and f113-116) exerted a much more minimal effect on platelet aggregation and did not inhibit fibrinogen binding. These peptides are referred to as casoplatelins. The behaviour of k-casein f106-116 is similar to that of the C-terminal peptide of the human fibrinogen g-chain [12]. The mechanism involved in milk clotting, defined by interaction of kappa casein with chymosin bear a remarkable similarity to the process involved in blood clotting, defined by interaction of fibrinogen with thrombin. ( Jollès and Henschen,1982) Structural homologies between cow κ-casein [10] and human fibrinogen γ-chain were found by Jolle's and Caen The κ-casein fragment named casoplatelins, obtained from tryptic hydrolysates, shows antithrombotic activity by inhibiting fibrinogen binding platelets [9,10]. These peptides are released during gastrointestinal digestion and absorbed intact into the blood, which supports the concept that they exert an antithrombotic effect in vivo. The potential physiological effects of these antithrombotic peptides have not been established, but such peptides have been detected in the plasma of newborn children after breastfeeding or ingestion of cow milk-based infant formulae [11].

Materials and Methods
We have select nineteen amino acids sequences of antithrombotic bioactive peptides in fasta format retrieved from NCBI. We use OOPS model of MEME (Multiple EM for Motif Elicitation) for finding out motifs present in given sequence. The output shows colour graphical alignment and it also displays common regular expression of motifs and the block represent start and end point of the seq and shows AA length. E-value describes the statistical significance of the motif. MEME usually finds the most statistically significant (low E-value) motifs first. The E-value is an estimate of the expected number of motifs with the given log likelihood ratio (or higher), and with the same width and site count, that one would find in a similarly sized set of random sequences. On the other hand width describes that each motif describes a pattern of a fixed with as no gaps are allowed in MEME motifs. Sites in the MEME package define the conserved region present in the motifs. The number of sites contributes to the construction of the motif. The information content of the motif in bits, It is equal to the sum of the uncorrected information content, R(), in the columns of the LOGO. This is equal relative entropy of the motif relative to a uniform background frequency model. Whereas relative entropy of the motif, computed in bits and relative to the background letter frequencies given in the command line summary. It is equal to the log-likelihood ratio (llr) divided by the number of contributing sites of the motif times 1/ln(2), re = llr / (sites * ln (2)).

Sequence LOGO
MEME motifs are represented by position-specific probability matrices that specify the probability of each possible letter appearing at each possible position in an occurrence of the motif. These are displayed as "sequence LOGOS", containing stacks of letters at each position in the motif. The total height of the stack is the "information content" of that position in the motif in bits.For proteins, the categories are based on the biochemical properties of the various amino acids.

Results and Discussion
By default, MEME looks for up to three motifs, each of which may be present in some or all of the input sequences. MEME chooses the width and number of occurrences of each motif automatically in order to minimize the 'E-value' of the motif-the probability of finding an equally well-conserved pattern in random sequences. By default, only motif widths between 6 and 50 are considered, The MEME output is HTML and shows the motifs as local multiple alignments of (subsets of) the input sequences, as well as in several other formats. 'Block diagrams' show the relative positions of the motifs in each of the input sequences. After the submission of sequences in query box of MEME, results display in the form of graph and seq will displays in the form of sequence logo or regular expression. Here sequences are given as seq1,seq2 etc. and their accession numbers are given in table1.9. Due to restriction of pages, figure and tables of motif one discovered in each peptides type are given here and along with it details of peptides sequence taken for study are also given. On the submission of multiple amino acid sequences, Antithrombotic peptides, we find that there are three motifs present in these sequences. Motif one is present in seq 6, 8, 9, 10, 11, 12, and 13. On the other hand motif two is found in seq 6, 9, 10, 11, 12 and 13.it is again noted that sequence no 6, 8, 9, 10, 11, 12, 13 and 18 carry motif three. On the bases of this analysis, we found that seq no. 6,8,9,10,11,12 and 13 contains all three motifs but seq no. 1-5, 7, 14-17 and 19 contains no consensus region and seq 18 has consensus region like 6, 8-13.
We have taken thirty five opioid peptide precursor sequences and on the bases of analysis, we found that the consensus region belongs to Motif one universally present in opioid peptides precursors named seq no 32, 27, 26, 34, 14, 31, 29, 18, 19, 35 and 33. On the other hand consensus region belong to Motif two present in opioid peptides precursor seq no 34, 32, 14, 31, 29, 18, 35, 30, 1, 19, 17 and 33. Again we found that consensus pattern present in motif three belongs to the opioid peptides precursor seq no 32, 27, 26, 34, 14 and 30. We found that the motif one, two and three present in seq no 34, 32, 14 and 30. Here motif presents in previously stated sequences may be overlaps.
We have taken ten mineral binding peptide precursor sequences and on the bases of analysis, we found that the consensus region belongs to Motif one universally present in mineral binding peptides precursors named seq no 2, 5 and 7. On the other hand consensus region belong to Motif two present in opioid peptides precursor seq no 6 and 7. Again we found that consensus pattern present in motif three belongs to the opioid peptides precursor seq no 5 and 6. We found that the motif one and two three present in seq no 7. But the motif one and three share their sequence with se no 6. Seq no 5 carry consensus pattern of motif one and three.
We have taken eleven immunomodulatory peptide precursor sequences and on the bases of analysis, we found that the consensus region belongs to Motif one universally present in immunomodulatory peptides precursors named seq no 3 and 4. On the other hand consensus region belong to Motif two present in immunomodulatory peptides precursor seq no 2 and 9. Again we found that consensus pattern present in motif three belongs to the opioid peptides precursor seq no 1 and 5.
In case of combined study of different peptides, we have taken one hundred and eleven peptide precursor sequences of antimicrobial, immunomodulatory, opioid and casein peptide precursor and on the bases of analysis, we found that the consensus region belongs to Motif one universally present in peptides precursors named seq no 10, 6, 11, 12, 8, 9 and 15. On the other hand consensus region belong to Motif two present in peptides precursor seq no 12, 10, 6, 11, 8, 9 and 13. Again we found that consensus pattern present in motif three belongs to the precursor seq no 12, 10, 6, 11, 8, 9 and 13. We found that the motif one, two and three present in seq no 10, 12, 11, 6, 8 and 9. Interestingly motif two and three also share their sequence with seq no 13.

Conclusion
In genetics, a sequence motif is a nucleotide or amino-acid sequence pattern that is widespread and has, or is conjectured to have, a biological significance. For proteins, a sequence motif is distinguished from a structural motif, a motif formed by the three dimensional arrangement of amino acids, which may not be adjacent. where x signifies any amino acid, and the square brackets indicate an alternative (see below for further details about notation). Usually, however, the first letter is I, and both [RK] choices resolve to R. Since the last choice is so wide, the pattern IQxxxRGxxxR is sometimes equated with the IQ motif itself, but a more accurate description would be a consensus sequence for the IQ motif. De novo computational discovery of motifs are very common today and there are software programs which, given multiple input sequences, attempt to identify one or more candidate motifs and MEME suite is most suitable as it generates statistical information for each sequence. As stated above MEME describes that although bioactive peptides precursors present in different source of origin, they contains common patterns of amino acids. It is again noted that motifs may overlap with one another due to the reason of common consensus patterns.