Proteins are the of import group of biomolecules present in an life being and are known to execute critical maps of the organic structure. Chemically a protein is a polymer of amino acids, linked by peptide bonds and arranged in a consecutive mode. This consecutive agreement of the amino acid gives is referred as its primary construction. The primary construction of a protein is determined by the cistron matching to the protein. A specific sequence of nucleotidesA inA DNAA isA transcribedA intoA messenger RNA, which is read by the ribosome in a procedure called interlingual rendition. The sequence of a protein is alone to that protein, and defines the construction and map of the protein.
Structurally, polypeptide concatenation of a protein has its N= terminus and C terminus which is determined by the linkage form between two aminic acids. The belongings of the protein is chiefly determined by the type of amino acerb nowadays in the primary construction. The sequence of aminic acids in a protein/ polypeptide concatenation is determined by Edman ‘s Degradation and mass spectroscopy.
In the recent twelvemonth ‘s promotion in protein sequencing techniques have generated a big sum of informations which is deposited in the databases. The protein sequence databases contain informations sing protein sequences. The deposition of the sequence informations in the databases have led to the innovation of informations analysis tools in bioinformatics. The information analysis tools help in understanding the belongingss of a peculiar protein whose sequence is under consideration. Analysis of primary protein sequence / construction besides helps in be aftering the research lab experiment for the purification, understanding the physical chemical belongingss of the protein, aminic acerb composing of protein etc.
Recovering sequences from database/s
Introduction: The sequences informations generated by the high throughput techniques is saved in databases, so that the information is readily available for analysis. The primary set of informations stored in primary databases. The sequence informations of protein can be downloaded and analysed by assorted analysis tools.
Exercise: To recover the protein sequence informations from NCBI ‘s protein database
Goto NCBI hypertext transfer protocol: //www.ncbi.nlm.nih.gov/
Select “ protein ” from the dropdown bill of fare of databases
Type the name of the protein, eg: myoglobin
Click on the nexus provided
The inside informations of the protein venue, accession figure and detect the protein sequence in GenPept format
Click on “ FASTA ” and detect the sequence in FASTA format
Copy the sequence and paste in a word papers OR chink on “ SEND TO ” choose the finish as “ FILE ” and download the sequence in FASTA format and chink on “ CREATE FILE ” . Salvage a file at specific finish for farther use
Consequence: Silk emitted by the silkworm consists of two chief proteins, sericin and fibroin.The sequence of Silk fibroin L-chain was retrieved from NCBI database and inside informations like accession figure GenPept format were observed.Its FASTA format is as follows
& gt ; gi|19221230|gb|AAL83649.1| silk fibroin [ Bombyx mori ]
Translation of DNA / RNA sequences into protein
Introduction: Translate tool is the on-line tool for the interlingual rendition of DNA / RNA sequences into a protein sequence. The tool is developed by ExPASy ( ExpertA ProteinA AnalysisA System ) A Translation ToolA – Swiss Institute of Bioinformatics.
EXCERISE 1: You are provided with a sequence of cistron. Translate the cistron sequence and happen out the protein merchandise.
& gt ; gi|50540477|ref|NM_001002706.1| Danio rerio lysozyme g-like 1 ( lygl1 ) , messenger RNA
Travel to http: //web.expasy.org/translate/
Paste the given sequence of DNA/ RNA in the given slot
Click on translate sequence
Note down the consequences
Consequence: The EXPASY tool examines the input sequence in all six possible frames ( i.e. reading the sequence from 5 ‘ to 3 ‘ and from 3 ‘ to 5 ‘ get downing with nt 1, nt 2 and nt 3 ) .The translated cistron sequence gives assorted frames one of those is every bit follows
5’3 ‘ Frame 1
Ten X X X X X X X X X X X X F S C N H N S T T F S G L T S H S S N I L F C S Q Q LA StopA V IA MetA G I P V I L TA MetA Y F L A C I Y G D IA MetA K I D T T G A S E V T A K Q D K L T V K G V E A S K K L A E H D L A RA MetA E Q Y K S K I L K V A R A K QA MetA D P A V I A A I I S R E S R A G A A L K D G W G D H G N G F G LA MetA Q V D K R Y H K L V G A W D S E E H L T Q G T E I L I G Y I K D I K A K F P T W T K E Q C F K G G I S A Y N A G V K N V Q T Y E RA MetA D V G T T G G D Y A N D V V A R A Q W F K S K G YA StopA G I N V VA StopA C Y FA StopA StopA L S L T T D H S F I L Y F V F A G N KA StopA N V F I Q K K K K K K K K K K
Finding the isoelectric point and molecular weight
Introduction: Compute pI/MW is a tool calculates the estimated pi and Mw of a specified Swiss-Prot/TrEMBL entry or a user-entered AA sequence. These parametric quantities are utile if you want to cognize the approximative part of a 2-D gel where a protein may be found.
Exercise: You are given a protein sequence find out the theoretical pi and molecular weight of the sequence.
V IA MA G I P V I L TA MA Y F L A C I Y G D IA MA K I D T T G A S E V T A K Q D K L T V K G V E A S K K L A E H D L A RA MA E Q Y K S K I L K V A R A K QA MA D P A V I A A I I S R E S R A G A A L K D G W G D H G N G F G LA MA Q V D K R Y H K L V G A W D S E E H L T Q G T E I L I G Y I K D I K A K F P T W T K E Q C F K G G I S A Y N A G V K N V Q T Y E RA MA D V G T T G G D Y A N D V V A R A Q W F K S K G Y
Travel to http: //web.expasy.org/compute_pi/
Paste the individual missive amino acerb sequence of the protein/ upload the sequence from a file/ uniprot Database.
Click on compute pI/MW
Note the consequences
Consequence: The theorotical isoelectric point andd molecular weight of the given protein sequence was estimated utilizing Swiss-Prot/TrEMBL to be 9.04 and 21859.19 Da
10 20 30 40 50 60
VIMGIPVILT MYFLACIYGD IMKIDTTGAS EVTAKQDKLT VKGVEASKKL AEHDLARMEQ
70 80 90 100 110 120
YKSKILKVAR AKQMDPAVIA AIISRESRAG AALKDGWGDH GNGFGLMQVD KRYHKLVGAW
130 140 150 160 170 180
DSEEHLTQGT EILIGYIKDI KAKFPTWTKE QCFKGGISAY NAGVKNVQTY ERMDVGTTGG
Theoretical pI/Mw: 9.04 / 21859.19A
Study of peptides
Peptide Cutter predicts possible cleavage sites cleaved by peptidases or chemicals in a given protein sequence. PeptideCutter returns the question sequence with the possible cleavage sites mapped on it and /or a tabular array of cleavage site places.
PeptideCutterA searches a protein sequence from the SWISS-PROT and/or TrEMBL databasesA orA a user-entered protein sequence for peptidase cleavage sites. Single peptidases and chemicals, a choice or the whole list of peptidases and chemicals can be used. Different signifiers of end product of the consequences are available: Tables of cleavage sites either grouped alphabetically harmonizing to enzyme names or consecutive harmonizing to the amino acerb figure. A 3rd option for end product is a map of cleavage sites. The sequence and the cleavage sites mapped onto it are grouped in blocks, the size of which can be chosen by the user to supply a convenient signifier of print-out.
The plan accepts the complete input as one individual sequence, even if several are entered.
Numbers and infinite characters are neglected.
If a sequence in FASTA format is entered, the first line is neglected during farther stairss of the plan.
If letters are entered that do non find an amino acid ( B, J, X or Z ) the user will be asked for rectification.
The plan is instance insensitive.
Goto hypertext transfer protocol: //web.expasy.org/peptide_cutter/
Paste the given sequence
Choose the enzyme or chemical to be used for the cleavage
Click on perform
Consequences: Peptide cutter predicetd 9 possible cleavage sites in the given protein sequence by CNBr
10 20 30 40 50 60
MESLKKLFQP VHEKVDETWS KVTIVGVGQV GMAAAFSMLT QNVTNNIALV DMMADKLKGE
70 80 90 100 110 120
MMDLQHGSAF MRNAKIQSST DYSITAGSKI CVVTAGVRQR EGESRLDLVQ RNTDVLKQII
130 140 150 160 170 180
PQLIKYSPDT ILVIASNPVD ILTYVTWKIS GLPKHRVIGS GTNLDSARFR YLLSDRLGIA
190 200 210 220 230 240
TTSCHGYIIG EHGDSSVPVW SAVNIAGVRL SDLNNQIGTD DDPENWKELH ENVVKSAYEV
250 260 270 280 290 300
IKLKGYTSWA IGLSLAQIVR AILTNANSVH AVSTYLKGEH GIEDEVFLSL PCVLSHCGVS
310 320 330
DVIRQPLTEL EVAQLRKSAK VMAKVQNDIK F
The sequence is 331 aminic acids long.
Name of enzyme
No. of cleavages
Positions of cleavage sites
1 32 38 52 53 61 62 71 322
Analyzing of physical and chemical belongingss of proteins
ProtParam is a tool which allows the calculation of assorted physical and chemical parametric quantities for a given protein stored inA Swiss-Prot or TrEMBLA or for a user entered sequence. The computed parametric quantities include the molecular weight, theoretical pi, aminic acid composing, atomic composing, extinction coefficient, estimated half life, instability index, aliphatic index and expansive norm of hydropathicity
Travel to http: //web.expasy.org/protparam/
Enter the sequence provided
Click on compute parametric quantities
Analyse and enter the consequences.
Consequence: The physical and chemical belongingss of the given protein sequence were computed by ProtParam. Some of them are
Number of aminic acids: 331
Molecular weight: 36362.8
Theoretical pi: 6.76
Top of Form
Amino acerb composing: A
Ala ( A ) 23 6.9 %
Arg ( R ) 13 3.9 %
Asn ( N ) 15 4.5 %
Asp ( D ) 19 5.7 %
Cys ( C ) 4 1.2 %
Gln ( Q ) 14 4.2 %
Glu ( E ) 16 4.8 %
Gly ( G ) 22 6.6 %
His ( H ) 9 2.7 %
Ile ( I ) 25 7.6 %
Leu ( L ) 31 9.4 %
Lys ( K ) 21 6.3 %
Met ( M ) 9 2.7 %
Phe ( F ) 6 1.8 %
Pro ( P ) 9 2.7 %
Ser ( S ) 29 8.8 %
Thr ( T ) 19 5.7 %
Trp ( W ) 5 1.5 %
Tyr ( Y ) 8 2.4 %
Val ( V ) 34 10.3 %
Pyl ( O ) 0 0.0 %
Sec ( U ) 0 0.0 %
( B ) 0 0.0 %
( Z ) 0 0.0 %
( Ten ) 0 0.0 %
Bottom of Form
Entire figure of negatively charged residues ( Asp + Glu ) : 35
Entire figure of positively charged residues ( Arg + Lys ) : 34
Carbon C 1609
Hydrogen H 2603
Nitrogen N 443
Oxygen O 487
Sulfur S 13
Entire figure of atoms: 5155
Extinction coefficients are in units of M-1 cm-1, at 280 nanometers measured in H2O.
Ext. coefficient 39670
Abs 0.1 % ( =1 g/l ) 1.091, presuming all braces of Cys residues form cystines
Ext. coefficient 39420
Abs 0.1 % ( =1 g/l ) 1.084, presuming all Cys residues are reduced
Estimated half life:
The N-terminal of the sequence considered is M ( Met ) .
The estimated half life is: 30 hours ( mammalian reticulocytes, in vitro ) .
& gt ; 20 hours ( barm, in vivo ) .
& gt ; 10 hours ( Escherichia coli, in vivo ) .
The instability index ( II ) is computed to be 15.96
This classifies the protein as stable.
Aliphatic index: 102.72
Grand norm of hydropathicity ( GRAVY ) : -0.028
Peptide Primary construction
Introduction: PepDraw is a tool that was developed to ease the survey of the chemical construction and belongingss of peptides. It allows users to pull the primary chemical construction of an amino acid sequence and predict some chemical belongingss such as mass, charge, and hydrophobicity.A PepDraw was designed to be a powerful yet user-friendly tool for peptide analysis. It is particularly utile for learning pupils about the construction and belongingss of the amino acids.
Goto hypertext transfer protocol: //www.tulane.edu/~biochem/WW/PepDraw/index.html
Paste the given sequence
Click on draw peptide
Record the belongingss.
Consequence: The peptide construction belongingss and its belongingss analysed utilizing PepDraw is as follows
Isoelectric point ( pi ) :
+46.33 Kcal * mol -1
4595 M-1 * cm-1
4470 M-1 * cm-1
Peptide construction image:
Degree centigrades: UsersJyotiDesktopASHpeptide.png
Random protein sequence coevals
Introduction: RandSeqA is a tool which generates a random protein sequence. One can utilize equal sums of amino acids to bring forth a random sequence or can utilize specific sum of amino acid per centums. The tool generates random protein sequences which can be analyzed utilizing different tools.
Goto hypertext transfer protocol: //web.expasy.org/randseq/
Choose the parameters/ composing of each amino acid
Click on submit
Analyse the consequences
Consequence: Random protein sequence generated holding equal composing of all aminic acids to be analyzed farther is as follows
Virtual Sequence: RND29006
ID RND_29006 Unreviewed ; 200 AA.
AC RND29006 ;
DE Randomly generated sequence, created by ExPASy WWW waiter tool
DE RandSeq for 184.108.40.206.
CC – ! – MISCELLANEOUS: This sequence was generated utilizing equal composing for all aminic acids.
SQ SEQUENCE 200 AA ; 23795 MW ; FED65773033E0235 CRC64 ;
WFWYDMPEME QDMDSKQVYM GRGKDDIICT INNRYPAFHC LNCPNMQMTE NNRFGRCRDS
TLWWSQHASA NCPQMYRCKP NGEAHIWEEY VCNWTWKKIK GFPGMVYKIP WPDHSITLFI
DMELGLQCLT KSSHAFPLMV PFARGHYETS WHHGYCQVGT VVDQFAWSQQ TCFEAHVIFI