Universitat Internacional de Catalunya
Structural Bioinformatics
Other languages of instruction: Catalan, Spanish
Teaching staff
Questions will be answered before or after class. Questions that are not answered in person will be answered via videoconference.
Introduction
This course provides the fundamental concepts for the use of structural information in biomedical/clinical problems. To do so, the fundamental relationship between structure and function will be addressed, analyzing the three-dimensional structures of proteins, DNA and RNA. We will see how to graphically and automatically process the three-dimensional structures of macromolecules. Later, we will see the different methodologies to generate structural information using cutting-edge methodologies that include Artificial Intelligence algorithms. Finally, we will see how we can use structural bioinformatics to address biomedical problems related to the molecular understanding of hereditary diseases.
Pre-course requirements
It is recommended to have completed and passed:
Introduction to bioinformatics
Biomolecular interactions
Objectives
- Understanding how the relationship between structure and function is the basis for the application of structural bioinformatics methods in biomedicine
- Knowing how to generate structural information: difference between extraction and prediction
- Using structural information to analyze genetic information and solve associated biomedical/clinical problems
Competences/Learning outcomes of the degree programme
- CB01 - Students must demonstrate that they have and understand knowledge in an area of study that is based on general secondary education, and it tends to be found at a level that, although it is based on advanced textbooks, also includes some aspects that involve knowledge from the cutting-edge of their field of study.
- CB03 - Students must have the ability to bring together and interpret significant data (normally within their area of study) to issue judgements that include a reflection on significant issues of a social, scientific and ethical nature.
- CB04 - That students can transmit information, ideas, problems and solutions to specialist and non-specialist audiences.
- CB05 - That students have developed the necessary learning skills to undertake subsequent studies with a high degree of autonomy.
- CE07 - To apply statistical tools to Health Science studies.
- CE19 - To be aware of the principles of biomedical science related to health and learn how to work in any field of Biomedical Sciences (biomedical companies, bioinformatics laboratories, research laboratories, clinical analysis companies, etc.).
- CG07 - To incorporate basic concepts related to the field of biomedicine both at a theoretical and an experimental level.
- CG10 - To design, write up and execute projects connected to the field of Biomedical Sciences.
- CG11 - To be aware of basic concepts from different fields connected to biomedical sciences.
- CT01 - To develop the organisational and planning skills that are suitable in each moment.
- CT02 - To develop the ability to resolve problems.
- CT03 - To develop analytical and summarising skills.
- CT04 - To interpret experimental results and identify consistent and inconsistent elements.
- CT05 - To use the internet as a means of communication and a source of information.
- CT06 - To know how to communicate, give presentations and write up scientific reports.
- CT07 - To be capable of working in a team.
- CT08 - To reason and evaluate situations and results from a critical and constructive point of view.
- CT09 - To have the ability to develop interpersonal skills.
- CT10 - To be capable of autonomous learning.
- CT11 - To apply theoretical knowledge to practice.
- CT12 - To apply scientific method.
- CT13 - To be aware of the general and specific aspects related to the field of nutrition and ageing.
- CT14 - To respect the fundamental rights of equality between men and women, and the promotion of human rights and the values that are specific to a culture of peace and democratic values.
Learning outcomes of the subject
The following are considered as specific learning outcomes for this subject:
- Learning to analyse biomedical problems and to identify the aspects that require the use of structural information. Learning to obtain the structural information of interest by applying techniques from the field of structural bioinformatics, such as the extraction of structural information, molecular modelling/prediction, database management/big data generation, etc. Applying knowledge to understanding the molecular basis of hereditary diseases and to predicting the pathogenicity of genetic variants.
Syllabus
Bioinformatics as a tool for managing heterogeneous information, extracting knowledge about clinical/biomedical problems, from image diagnosis to identifying pathogenic genetic variants. In Struc. Bioinf. we focus on those problems happening at the molecular level. Most of them are related with the molecular view of disease, especially genetic disease. Examples: pathogenic variants in coding and non-coding sequence. These variants cause molecular malfunction; to understand this malfunction we need to understand function first and this means looking at structure. It is important to note that the information we have about structure will determine what we know about function. That is: quality determines usefulness.
Relevant questions: (i) how to determine which information level we want; (ii) how do we extract 3D info; (iii) how to conclude from what we have? (seeking consistency between different sources of evidence), particularly in the context of biomedical applications.
A. STRUCTURE: THE LOCUS OF FUNCTION
1) PRINCIPLES OF STRUCTURAL BIOLOGY
1.1 Function at the sequence level.
- Master Class
- Domains, disordered regions, repetitive regions.
- Value and limitations. It is useful for a first assessment of the impact of protein variants; inadequate for deeper understanding. Not of value for drug design, nor for a mechanistic interpretation of disease
- Case Methods
- UniProt: the most important resource for general information on proteins.
- Specific resources for functional domains, disordered, and repetitive regions. PFAM and SMART.
1.2 The structure of macromolecules
- Master Class
- What do we mean by structure? Examples of 3D for different macromolecules, e.g., protein, RNA, DNA, and chromatin structure.
- Key relevant aspects of function that depend on structure: stability and interactions (the role of environment)
1.3 Relating structure to function.
- Master Class
- The broad spectrum of protein function. Antibody, hemoglobin, a-galactosidase, BRCA1, etc. Moonlighting proteins.
- DNA: the structure of the genetic code
- Chromatin: packing is the key to compaction. But it also requires unpacking.
- Case Methods.
- PDB: The experimental understanding of structure and the storage databases. PDB. EMDB.
1.4 What parts of structure are relevant for function? It depends on the protein. How does bioinformatics allow us to know them?
- Master Class.
- Understanding structure I. (i) The hierarchy of protein structure: 1D, 2D, 3D, and quaternary. (ii) Core/Surface. (iii) The need to have a 3D structure in many problems.
- Understanding structure II. (i) The network of atomic interactions. Not all interactions are the same (ii) The energetics of 3D structures. Protein stability predictions (iii) Protein-complexes.
- Understanding structure III. Disordered proteins/regions. Resources
B. USING BIOINFORMATICS TO GENERATE STRUCTURAL INFORMATION
2) EXTRACTING INFORMATION FROM STRUCTURE: VISUAL EXPLORATION AND COMPUTATIONAL ANALYSES.
- Master Class.
- How to retrieve experimental structural information: (i) protein/gene name (uniprot); (ii) blast sequence
- Possible situations: is structure always available?
- Choose your visualization software: PyMol / VMD
- What to look at? The different representations we can choose: Calpha, main chain, all atoms, focus on local regions.
- Clarifying your goal: understanding the contribution of a residue to the stability or its interaction with other molecules, define the amount of work
- Tools: Arpeggio, String.
- Be careful with what you analyze: some structures are not meaningful biologically, e.g. crystal contacts, etc. Working with the monomer is the safest option, reading the original papers is the next step, checking specialized websites like Eldorado (for NMR) or biological complexes at the EBI.
- Case Methods:
- PYMOL: Representing protein sequence variants
3) GENERATING STRUCTURAL INFORMATION: STRUCTURE PREDICTION
3.1 From sequence to structure: a key problem.
- Master Class.
- How sequence determines 3D. Why prediction is such a difficult problem?
- Alternatives: the most fundamental relationship in molecular biology: conservation vs structure/stability
- Another fundamental principle: the conservation of structure beyond sequence identity
- Case Methods:
- Homology. Sequence alignment and comparison.
- Building and understanding MSA: generalizing sequence comparison to many sequences. A difficult problem: NP-complete. Main tools: clustal, T-Coffee, Muscle, etc. Representing and editing MSA.
- Case Methods:
- Structure comparison. Going beyond sequence similarity: Aligning different versions of the same structure. Finding how similar are different structures. Structure comparison: DALI. Structure classification: CATH, SCOP.
3.2 Homology/comparative modeling.
- Master Class
- Predicting the structure of close homologs
- The comparative modeling pipeline
- Quality checks: PROSA, program listings, etc
- Online modeling: SwissProt
- Predicting the structure of remote homologs/paralogs: threading
- The fold space
- Threading: principles and tools.
- Master Class
- Ab initio structure prediction. AlphaFold: the new paradigm.
3.3. Disorder predictions.
- Master Class.
- Disordered state
- Predicting disorder
4) APPLYING STRUCTURAL BIOINFORMATICS IN BIOMEDICINE
5) APPLYING MACHINE LEARNING TO UNDERSTAND DISEASE
- Master Class.
- A mild introduction to artificial intelligence and machine learning.
- Case Methods.
- Building a machine learning model: four main steps
- Retrieving protein sequence variants. Clinvar. HGMD. Mapping variants to our protein: what information do we need to know.
- Generating pathogenicity predictions for our variants. Polyphen-2, SIFT, PON-P2, CADD, REVEL, EVE, etc.
- Predicting functional impact vs. clinical phenotype.
Clases magistrales: 15 hours
Métodos del Caso: 15 hours
Prácticas: 0 hours
Teaching and learning activities
In person
- Lectures: 50-minute presentation of a theoretical topic by the teacher.
- Clinical cases or case methods (CM): Presentation of a real or imaginary situation. Students work on the questions asked in small groups or in active interaction with the teacher and the answers are discussed. The teacher actively intervenes and, if necessary, contributes new knowledge.
Virtual education (VE): Online material that the student can consult from any computer, at any time, and which will contribute to self-learning of concepts related to the subject.
Evaluation systems and criteria
In person
1. Students in the first call:
Resolution of case methods: 35%
Final exam: 50%
Presentation of a scientific article: 15%
2. Students in the second or subsequent call: same criteria as in the first call. The grade for continuous assessment and participation will be kept.
- General points to keep in mind about the assessment system:
- In order to be able to make an average, a minimum grade of 5 must be obtained in the final exam.
- Attendance at case methods is mandatory.
- Participation in class is understood as the contribution of interesting ideas or the raising of pertinent questions that help to improve the quality of the session, whether it is a master class or case methods.
- Attendance at theoretical classes is not mandatory, but attendees must abide by the rules indicated by the teachers. If you do not arrive on time, enter quietly without disturbing or interrupting the class. If at least 65% of the students do not attend, class participation will be given a very low score.
Bibliography and resources
- Kessel, A., & Ben-Tal, N. (2018). Introduction to proteins: structure, function, and motion. Chapman and Hall/CRC.
- Xiong, J. (2006). Essential bioinformatics. Cambridge University Press
- Creighton, Thomas E. The biophysical chemistry of nucleic acids & proteins. Helvetian Press, 2010
- PETER ALAN & Tompa FERSHT Structure and function of intrinsically disordered proteins / by Peter Tompa, Fersht Alan Publications. 2010