COMP571/BIOC571
Who
Instructor:
- Huw A. Ogilvie
- Duncan Hall 3098
- hao3@rice.edu
TA:
- Mohammadamin Edrisi
- Duncan Hall 3111
- edrisi@rice.edu
Where and when
Distribution of class materials and assignment submission will be conducted via Canvas.
Seminars will be held in Duncan Hall 1046, on Tuesdays and Thursdays, between 2:30–3:45 PM. (Please note that this room is different from the original venue)
Office hours will be by appointment.
Intended audience
Anyone interested in learning about algorithms and their use in biological sequence analysis. A good understanding of both algorithms and probability is essential. Knowledge of genetics is a plus, but not essential.
Course objectives and learning outcomes
The primary objective of the course is to teach the theory behind methods in biological sequence analysis, including sequence alignment, sequence motifs, and phylogenetic tree reconstruction. By the end of the course, students are expected to understand and be able to write basic implementations of the algorithms which power those methods.
Recommended textbooks
For sequence alignment:
“Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids”, by Richard Durbin et al., Cambridge University Press, 1998.
“Problems and Solutions in Biological Sequence Analysis”, by Mark Borodovsky and Svetlana Ekisheva, Cambridge University Press, 2006.
For phylogenetics:
“Molecular Evolution: A Statistical Approach”, by Ziheng Yang, Oxford University Press, 2014.
Software for the course
Algorithms and statistics will be demonstrated using R and Python. Don’t worry if you are not fluent in either language, as no programs will have to be written from scratch.
The NumPy library for scientific computing will be used with Python. To install NumPy, first install the latest official distribution of Python 3. This can be downloaded for macOS or for Windows from Python.org, and should already be included with your operating system if you are using Linux.
Then simply use the Python package manage pip to install NumPy from the command line, by running pip install numpy
.
Schedule
The course is organized around four themes;
- Algorithms and data structures for sequence alignment
- Algorithms and data structures for sequence motifs
- Phylogenetic distances and hierarchical clustering methods
- Optimality criteria and tree search methods
Each theme will have a corresponding homework assignment. Themes 1 and 2 will be covered in the first midterm, and themes 3 and 4 in the second midterm.
The below schedule is very preliminary and may change subject to Rice University policy
Week | Tuesday class | Thursday class | Assignment |
---|---|---|---|
2018/08/20 | Introduction to the course | Central dogma and sequence homology^{1,2} | |
2018/08/27 | Global alignment (Needleman–Wunsch)^{1} | Local alignment (Smith–Waterman)^{1} | |
2018/09/03 | Heuristic alignment (BLAST/FASTA)^{1} | Burrows–Wheeler transform^{1} | #1 issued |
2018/09/10 | Position specific score matrices^{2} | Bayesian statistics and PSSMs^{2} | |
2018/09/17 | Markov models of sequence evolution^{2} | Hidden Markov models^{2} | #1 due |
2018/09/24 | Viterbi algorithm^{2} | Forward algorithm^{2} | #2 issued |
2018/10/01 | First in-class midterm^{1,2} | Tree data structures^{3,4} | |
2018/10/08 | Midterm recess | No class (catch up midterm) | |
2018/10/15 | Simple phylogenetic distances^{3} | GTR nested model distances^{3} | #2 due |
2018/10/22 | Clustering (neighbor-joining)^{3} | Clustering (UPGMA)^{3} | #3 issued |
2018/10/29 | K-mer distance (kWIP)^{3} | Tree search (NNI, SPR, hill climbing)^{4} | |
2018/11/05 | Parsimony (Fitch’s algorithm)^{4} | The Felsenstein zone^{4} | #3 due #4 issued |
2018/11/12 | Phylogenetic likelihood^{4} | Felsenstein’s algorithm^{4} | |
2018/11/19 | Course review | Thanksgiving recess | |
2018/11/26 | Second in-class exam^{3,4} | TBA | #4 due |
Superscript numbers refer to the theme(s) for that day’s class or midterm. Assignments will be both issued and due before midnight on Fridays.
Grade policies
- First in-class midterm: 25%
- Second in-class midterm: 25%
- Four homework assignments: 12.5% each
Students with a strong and valid excuse for not attending a midterm will be allowed to pick from one of the following options:
- Sit the midterm on a different day or time
- Adjust their grading to increase the contribution of the corresponding homework assignments to match the midterm’s contribution
- Adjust their grading to double the contribution of the alternate midterm
Students with a strong and valid excuse for being unable to submit a homework assignment will be allowed to pick from one of the following options:
- Submit the homework assignment on a later day and time
- Adjust their grading to increase the contribution of the other homework assignments to match the assignment contribution
- Adjust their grading to increase the contribution of the corresponding midterm to match the assignment contribution
For both assignments and midterms the strength and validity of excuses, and which of the above options are made available, will be solely the instructor’s purview. Without a strong and valid excuse, a penalty of 10 percentage points per day (which is equivalent to 1.25 points off the final course percent per day) will be applied to any assignment submitted after the deadline.
Absence policies
Attendance is expected at every class. Attendance for the midterm exams is compulsory and, without a strong and valid excuse, required to pass the course even if a student would have otherwise received a passing grade.
