The knowledge of the secondary structure of proteins is necessary for
the complete understanding of its function. However, many more sequences
are known of proteins than are structures, the main reason being the non-availablity
of crystals of proteins that may allow one to do X-ray diffraction to solve
the structure. The number of proteins with known structure (5000 ) is a
very small fraction of all the protein sequences (>100,000) that have been
identified so far. Protein data bank stores the data related to the known
protein structures. A number of statistical methods have been proposed
for prediction of secondary structure of a protein chain . This is a mammoth
task considering the various ways in which a polypeptide may fold and also
the specific guidelines, if any, that it follows in the process of forming
the three-dimensional structure from the elongated polypeptide chain.
The above discussions leads to the fact that - identification of secondary
structure of a Protein is an open problem in which a large number of researchers
have been currently working around the world. Work in this field was initiated
by the Chief Investigator of this project at IIT Kharagpur since 1996 with
a group consisting of Bio-chemists and Computer specialists. In last 3
years this team has made considerable progress and worked out a strategy
to address the problem. Based on this framework the current project proposal
has been developed.
The main tasks of the proposed project has been summarised below:
PART II : PARTICULARS OF INVESTIGATORS
Indicate whether Principal Investigator / Co-investigator
:
Principal Investigator
.
Designation :
Professor
.
Department :
Computer Science & Technology
..
Institute / University :
Bengal Engineering College
( Deemed University )..
.
Address :
P.O-Botanic Garden, Howrah-711103, West-Bengal
.
PIN : 711103 .
Telephone : 033-668-5437(Off) ..Telephone : 033-668-5426(Res).Fax : 033-668-2916
e.mail:
..ppc@ppc.becs.ac.in
&033-668-4564
No. of Projects being handled at present : 5 ( 3 funded by US multinationals- Intel,..
Nortel, Fujitsu & 2 funded by - DRDO Govt. of India), & SAS
( Bangalore ).
14. Name :
Prof. L .M. Patnaik
.
.
Indicate whether Principal Investigator / Co-investigator
: Co-investigator / Consultant.
Designation : ..Professor, Microprocessor Application
Lab
Department :
Department of Computer Science and
Automation.
.
Institute / University : Indian Institute Science. ..
Address :
Bangalore.
..
PIN : 560012. .
Telephone : 080-3342085/080-3341683... Telex : 080-3341683 Fax :
e.mail :..lalit@micro.iisc.ernet.in/lalit@postoffice.iisc.ernet.in
No. of Projects being handled at present :
1
....
15. Name :
Prof. Swagata Dasgupta
.
Indicate whether Principal Investigator / Co-investigator : Co-investigator .. ..
Designation : Asst. Professor . .
Department : Chemistry .
Institute /University : IIT ..Kharagpur ..
Address :
Kharagpur, ..West-Bengal.
..
PIN : 721302. .
Telephone : 03222-55221 to 55224... Telex : . Fax : .
e.mail:
.swagata@che.iitkgp.ernet.in
No. of Projects being handled at present : 1 .
Note : Use separate page, if more investigators are involved.
PART III : TECHNICAL DETAILS OF PROJECT
( Under the following heads on separate sheets )
Since 1996 Prof. P. Pal Chaudhuri has been building a research group with Bio-chemists and Computer Scientists at IIT Kharagpur to develop a CA based model of protein chain. Subsequent to completing his assignment of Visiting Faculty at Intel Research Labs, Portland, USA, Prof. Pal Chaudhuri has joined Bengal Engineering College ( Deemed University ) and built a multi-institutional research group drawn from Bengal Engineering College ( DU), IIT Kharagpur, Jadavpur University, and ISI Calcutta. This research team has started developing CA based model to study Protein Folding problem. Very encouraging results have been derived from this model.
FROM THE FOUNDATION LAID BY THIS RESEARCH GROUP UNDER
THE SUPERVISION OF PROF. P PAL CHAUDHURI.
TO PREDICT THE SECONDARY FOLDED STRUCTURE OF A
PROTEIN CHAIN FROM THE GIVEN PRIMARY STRUCTURE.
A Protein is a long chain of amino acids. Most of its structural and chemical properties depend on the sequence of amino acids present in it.This linear structure is referred to as primary structure. Depending on the interaction between these amino acids and the solvent in a cell, the Protein Chain Folds itself into unique structures. The folding may generate helices (a -helix) , parallel strands ( b -sheets ), or random structures. This folded structure is referred to as Secondary Structure. The enzymatic or the catalytic properties of a protein depend on this secondary structure. Hence identification of the secondary folded structure of a protein is of paramount importance for bio-chemists, medical scientists, and drug researchers.
In the above background, the problem we address in this project is- given the Primary Structure of a Protein Chain, predict its Secondary (Folded) Structure.
16.3 Objectives: The secondary structure of a Protein is usually determined by X-ray crystallography. However, out of thousands of proteins so far identified, only a few can be crystallized and studied with X-ray crystallograply. Besides, crystallization often destroys the original uncrystallised structure.
In the above background, researchers around the world have tried to build a theoretical model to study the Protein Folding problem. However, till to-day no theoretical model has evolved to predict the secondary structure of a protein with reasonable level of confidence.
In view of the above situation we have been developing a totally new model to address the problem. The model is based on Cellular Automata ( CA ). The Chief Investigator of this project has been doing research in the field of CA since late 80s.In the background of above scenario, the project objective has been set as follows:
The amino acids and their chain forming a given protein will be modelled
based on the theoretical framework of Cellular Automata. Based on the data
available in the Protein Data Bank, the model will be tuned and made robust.
A software package will be next developed based on this robust model to
predict the secondary structures of a Protein Chain with 70 to 80% correct
prediction.
The knowledge of the three-dimensional structure of proteins is necessary
for the complete understanding of its function. However, many more sequences
are known of proteins than are structures, the main reason being the non-availability
of crystals of proteins that may allow one to do X-ray diffraction to solve
the structure. The number of proteins with known structure (~5000) is a
very small fraction of all the protein sequences (>100,000) that have been
identified, and this has also been considered a limitation to the use of
statistical methods in secondary structure prediction [1].
This is a mammoth task considering the various ways in which a polypeptide
may fold and also the specific guidelines, if any, that it follows in the
process of forming the three-dimensional structure from the elongated polypeptide
chain.
Some researchers believe that the formation of secondary structural
elements like the a-helix or the b-sheet
form first which then bring distant parts of the protein together that
then get involved in other non-covalent interactions to form a globular
structure. Another school of thought considers a hydrophobic "collapse"
to occur that would tend to bury all the hydrophobic amino acid residues
to effect minimal exposure of them to the solvent. The resulting coils
and loops would then in a sense "settle down" to form the structure of
the protein by formation of the various secondary structural elements along
with other non-covalent interactions.
Prediction of protein structure and function from its amino acid sequence,
or from the nucleotide sequence of the corresponding gene, is one of the
largest challenges in biophysics[2,3]. There are many approaches to this
problem which include a thermodynamic approach which is based on computing
the free energies of all possible conformations, and selecting the structure
that is at the minima of this free energy
profile. A review by Neumaier[4] states that the "geometry defined by the global minimum of potential energy surface is the correct geometry describing the conformation observed in folded proteins". This underlying assumption forms the foundation of the molecular mechanics approach to predicting protein structure. The general principle behind molecular mechanics is a computer simulation sorts through a large number of different folding conformations, gradually increasing the stability of the conformations, until the native state is reached.
Another approach involves a homology search where the query sequence
is compared with sequences of proteins whose structures and/or functions
are known, and if there is a high similarity, the corresponding properties
are assumed to be identical. Statistics is used here to quantify the extent
of similarity[5,6]. Looking into sequence alignments is a commonly used
method. The rapidly increasing number of nucleotide and amino acid sequence
data has become a major source of information. The reason for attempting
a sequence alignment is that they may share a common origin and we say
they are homologous in nature if they have some sort of an evolutionary
relationship. In that case they may be conserved amino acid residues that
facilitate any sort of sequence alignment allowing the prediction of the
structure to some extent. It is possible to then compare the shared sequences
to one other to try and define the common domains that provide a particular
function [7-14].
In the statistical approach a propensity is defined for an entity that may be either an individual, or a specific sequence of, amino acid residues. This entity is considered to have a certain structure/function and a value for the overall propensity is computed. The query sequence is subsequently assigned a structure based on such computations. Reddy et al [15] have used a statistical approach, based on residue properties, to predict the effects of a substitution mutation. Statistical methods are widely used in homology searching. Selbig and Argos [16] used clustering and Bayesian classification to develop correlations between sequence patterns and structural motifs. Chou and Elrod[17] have developed a classifier for prediction of membrane protein type and sub-cellular location. Kihara et al[18] have cleverly incorporated statistics and physico-chemical properties to detect membrane protein class defined by number of transmembrane segments, and to locate transmembrane segments in a amino acid sequence.
Statistical methods are based on deriving empirical relationships between
the primary and secondary structure of a protein [19-22]. They use the
database of proteins of known structure to derive these relationships.
These methods have the advantage of consistently and explicitly using the
ever-expanding databases. However, they have the disadvantage of not using
the physico-chemical knowledge of residue interactions.
17.2 National Status:
In India there are a few laboratories working on the analysis of protein
structures. Dr. P. Balaram of the Molecular Biophysics Unit of IISc, Bangalore
works on the recognition of novel conformational features in proteins.
Dr. M. Bansal also at the Molecular Biophysics Unit recently published
a paper in which the structural and sequence characteristics of a-helices
were studied. This is required for for reliable protein structure prediction
and de novo design of proteins. Dr. N. Srinivasan at Mbu, IISc has worked
on the validation of protein structures. Dr. C. Mukhopadhyay of
Calcutta University is interested in computer simulation of folding
pathways which can give us important insights into the folding trajectories.
Dr. P. Chakrabarti and Dr. G. Basu of the Bose Institute in Calcutta also
work on analysis of protein structures. There are many others working in
this field of analyzing protein structures but protein structure prediction
still has a long way to go.
18. Work Plan
18.1 Methodology:
The methodology to solve the problem addressed in this Project is as follows:
( i ) To develop a theoretical model for a Protein.
( ii ) To validate the model and make the model robust with the data available in
the Protein Data Bank.
(iii) Develop software package based on the CA based model to predict
secondary structure of a given Protein Chain.
Note : We target for 70 to 80% correct prediction with Q the proposed
model.
18.2 Organisation of work elements: The Project encompasses following
tasks:
T1: Refine the CA theory and develop Cellular Automata ( CA ) based model of
Amino Acids.
T2: Development of a robust CA based model of a Protein Chain based on the
Task T1.
T3: Predicting location of single turn a -helix in the secondary structure of given
protein chain based on the model developed under T2.
T4: Predicting location of multi-turn a -helix in the secondary structure of a given
protein chain.
T5: Predicting location of b -sheets in the secondary structure of a given protein
chain.
T6: Complete the software package for the model.
NOTE: The framework for the task T1 has been already developed. The CA based model of amino acids has been found to be quite robust. This model is being tested exhaustively to ensure that the specific biochemical property of an amino acid gets reflected in the specific parameter of the CA based model. Based on the above groundwork, the tasks T2 to T5 will be undertaken under the proposed project.
T7: Complete the documentation, user manual & technical report for the software
module developed.
18.3 Suggested plan of action for utilisation of research outcome expected from the project.
with highest priority assigned to number 1 option.
year- quarter Qi refers to ith quarter ( i =1,2,3,4) for a year.
1st T1 T2 T2 T3
2nd T3 T3 T4 T4
3rd T5 T6 T6 T7
Aspects Amount %
College( D. Univ.) Garden. Model, its
Howrah-711103 validation,
robustness &
the prediction
algorithm and
the software
package-nearly
90% of the work.
2. IIT Kharagpur Kharagpur-721302 Providing consul 5.62 Lakhs. 10%
tation and checking
biochemical proper-
ties of the protein
parameters in the
CA based model.
3. IISC Bangalore Bangalore-560012 Providing consul-
tation/Advice on
the computing aspects
Sl.No. Item Year 1 Year 2 Year 3 Year 4 Year 5
Total
1.Computing 15.00 Lakhs. Nil Nil 15.00Lakhs.
Server.
Sub- Total ( A ) 19.00 Lakhs. Nil Nil 19.00Lakhs.
----------------------------------------------------------------------------------------------------
B. Recurring
B.1 Manpower
( Lakhs of Rs. )
Sl.No. Position No. Consolidated Emolument Year 1 Year 2 Year 3 Year
4 Year 5 Total
2. Project Staff-2 8000/month 1.92 1.92 1.92 5.76
Sub- Total ( B.1 ) = 18.82
B.2 Consumables
( Lakhs of Rs. )
Sl.No. Item Quantity Year 1 Year 2 Year 3 Year 4 Year 5 Total Stationary, 1.50 1.50 1.50
Hard disk,
Printer,
Cartridge etc.
Sub- Total ( B.2 ) = 4.50
Sub- Total ( B = B. 1 + B. 2 + B. 3 + B. 4 + B. 5 )
41.12
Grand Totat ( A + B ) 60.12
Note :
Please give justification for each head and sub-head separately mentioned in the above table.
Financial Year : April - March
Count six months from submission of the proposal to arrive at expected time point for commencement of the project.
In case of multi-institutional project the budget estimate
to be given separately for each institution.
PART V : EXISTING FACILITIES
Sl.No Nameof Make Model Funding Year of
equipments/ Agency Procurement
accessories
PCs and workstations are available in the C.S.T department of B.E .College.
However, all these equipments are heavily used for UG, PGProgram.Further,
the work to be undertaken is highly computing intensive. Hence support of
highly powerful computing server and a work station are essential for this
project.
PART VI : DECLARATION / CERTIFICATION
It is certified that
(a ) the research work proposed in the scheme / project does not in any way duplicate the work already done or being carried
out elsewhere on the subject.
( b ) the same project has not been submitted to any other
agency / ies for financial support.
( c ) the emoluments for the manpower proposed are those admissible to persons of corresponding status employed in the
institute / university or as per the Ministry of Science
& Technology guidelines.
(d ) necessary provision for the scheme / project will be made in the Institute / University / State budget in anticipation of the
sanction of the scheme / project.
( e ) if the project involves the utilisation of genetically engineered organism, it is agreed by us that we will ensure that an appli-
cation will be submitted through our Institutional Biosafety Committee and we will declare that while conducting experiments
the Biosafety Guidelines of the Department of biotechnology
would be followed in toto.
( f ) if the project involves field trials / experiments / exchange of specimen, etc. we will ensure that ethical clearances would be
taken from concerned ethical Committees / Competent authorities and the same would be conveyed to the Department of
Biotechnology before implementing the project.
( g ) We agree to accept the Terms and Conditions as enclosed
as Annexure - III.
Signature of Project Coordinator Signature of Executive Authority of
( applicable only for multi-institutional projects ) Institute / University with seal
Date : Date :
Signature of Principal Investigator :
Date :
Signature of Co - Investigator Signature of Co - Investigator
Date : Date :
PART VII : PROFORMA FOR BIODATA OF
Project Coordinator / Principal Investigator / Co- Investigators
Name :
Prof
P. Pal Chaudhuri.
Date of Birth :
26th.Oct, 1941.
Sex:
Male.
SC / ST :
No.
Educational ( Post - Graduation onwards & Professional Career)
Sl.No. Institution Place Degree Awarded Year Award / Prize /certificate
______________________________________________________________________________
Research Experience in Various institutions ( if necessary,
attach separate sheets ).
Professional and research experience, publication list is enclosed as the
Annexure -I.
Publications ( Numbers only )
.
Books :
Research Papers, Reports
General
articles :
.
Patents :
.. Others ( Please specify ):
List of important publications relevent to the proposed
area of work.
Sl.No. Title of Paper Authors Reference of Journal Year of Publication
Project( s ) submitted / being pursued / carried out by Investigator
Sl.No. Title of Project Funding Agency Duration No. of Scientist/ Totalapproved
From To Associates working Cost of the
under the project Project(in Rs.)
Cellular
Automata.
Convertion Nortel USA 1998 1999 2 US $ 50,000
Highlights of progress of the project ( s ) to date ( in 200 words ) for ongoing projects only
( if necessary attach separate sheets )
Place :
Date : Signature of Investigator
PART VII : PROFORMA FOR BIODATA OF
Project Consultant / Advisor
Name :
Prof
L .M. Patnaik
..
Date of Birth :
.
..
Sex:
Male.
SC / ST :
No.
Educational ( Post - Graduation onwards & Professional Career ) : Prof. L . M . Patnaik is a very
senior professor and internationally renowned researcher. His brief bio-data is enclosed.
A letter stating is commitment to be associated with this project as an Adviser is enclosed.
along with a written commitment to be associated with this project as Consultant/ Adviser.
Sl.No. Institution Place Degree Awarded Year Award / Prize
/certificate
Research Experience in Various institutions ( if necessary,
attach separate sheets ).
Publications ( Numbers only )
.
Books :
Research Papers, Reports
General
articles :
.
Patents :
.. Others ( Please specify ):
List of important publications relevent to the proposed
area of work.
Sl.No. Title of Paper Authors Reference of Journal Year of Publication
Project( s ) submitted / being pursued / carried out by Investigator
Sl.No. Title of Project Funding Agency Duration No. of Scientist/ Total approved
From To Associates working Cost of the
under the project Project(in Rs.)
Highlights of progress of the project ( s ) to date ( in 200 words ) for ongoing projects only
( if necessary attach separate sheets )
Place :
Date : Signature of Investigator
PART VII : PROFORMA FOR BIODATA OF
Project Co- Investigators
Name :
Dr. Swagata Dasgupta.
..
Date of Birth :
.
..
Sex:
.Female
SC / ST :
No.
Educational ( Post - Graduation onwards & Professional Career
Sl.No. Institution Place Degree Awarded Year Award / Prize /certificate
______________________________________________________________________________
NY, USA
Research Experience in Various institutions ( if necessary,
attach separate sheets ).
Research Experience:
Saha Institute of Nuclear Physics ( July 1988- December 1989)
Isolation and purification of crystallins from goat eye lens.
Rensselaer Polytechnic Institute, Troy NY, USA ( January1990-December1994)
Protein structure analysis from Protein Data Bank, Study of protein-protein
interactions.
Note: A letter from Dr. Dasgupta is enclosed ; the letter specifies her commitment
to be associated with this project as Co-Investigator.
Publications ( Numbers only )
.
Books :
Nil
Research Papers,
Reports
15
General articles :
Patents :
Nil
.. Others ( Please
specify ):
List of important publications relevent to the proposed
area of work.
Sl.No. Title of Paper Authors Reference of Journal Year
of Publication
Project( s ) submitted / being pursued / carried out by Investigator
Sl.No. Title of Project Funding Agency Duration No. of Scientist/ Total approved
From To Associates working Cost of the
under the project Project(in Rs)
Highlights of progress of the project ( s ) to date ( in 200 words ) for ongoing projects only
( if necessary attach separate sheets )
Place :
Date : Signature of Investigator
Annexure - I
Status of the Institute submitting the Project Proposal
Annexure - II
Illustrative List of Subject Areas for the consideration
of the Project Proposal
1. AGRICULTURE & ALLIED AREAS 5. AQUACULTURE
modelling of industrial processes.
2.3 Non-infectious Diseases of Humans 7.5 Any Other (Please specify)
Preservation. etc.
2.8 Hospital /Contaminated Waste Management 8.2 Food Additives / Ingredients
2.9 Any Other (Please specify) 8.3 Food borne diseases /Diagnostics
3.1 Plant Biodiversity 8.6 Any Other (Please specify)
3.2 Microbial Biodiversity
3.3 Pollution Control /Reclamation 9. BASIC RESEARCH ( Please indicate
subject area /Keywords )
3.4 Forestry and Green Cover 10. BIOINFORMATICS , BIOCOMPUTING
3.5 Any Other ( Please specify ) STRUCTURAL / THEORETICAL
BIOLOGY, DATABASE DEVELOPMENT
ALGORITHMS / SOFTWARE
DEVELOPMENT, ETC. ( please indicate
subject area /keywords )
4. ANIMAL SCIENCES
4.1 Breeding and Genetic Enhancement
4.2 Feed and Nutrition