Important note: The information contained in the course syllabus, other than the absence policies, may be subject to change with reasonable advance notice, as deemed appropriate by the instructor.

Instructor:

TA:

Where and when

Distribution of class materials and assignment submission will be conducted via Canvas.

Seminars will be held in Duncan Hall 1046, on Tuesdays and Thursdays, between 2:30–3:45 PM. (Please note that this room is different from the original venue)

Office hours will be by appointment.

Intended audience

Anyone interested in learning about algorithms and their use in biological sequence analysis. A good understanding of both algorithms and probability is essential. Knowledge of genetics is a plus, but not essential.

Course objectives and learning outcomes

The primary objective of the course is to teach the theory behind methods in biological sequence analysis, including sequence alignment, sequence motifs, and phylogenetic tree reconstruction. By the end of the course, students are expected to understand and be able to write basic implementations of the algorithms which power those methods.

Recommended textbooks

For sequence alignment:

“Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids”, by Richard Durbin et al., Cambridge University Press, 1998.

“Problems and Solutions in Biological Sequence Analysis”, by Mark Borodovsky and Svetlana Ekisheva, Cambridge University Press, 2006.

For phylogenetics:

“Molecular Evolution: A Statistical Approach”, by Ziheng Yang, Oxford University Press, 2014.

Software for the course

Algorithms and statistics will be demonstrated using R and Python. Don’t worry if you are not fluent in either language, as no programs will have to be written from scratch.

The NumPy library for scientific computing will be used with Python. To install NumPy, first install the latest official distribution of Python 3. This can be downloaded for macOS or for Windows from Python.org, and should already be included with your operating system if you are using Linux.

Then simply use the Python package manage pip to install NumPy from the command line, by running pip install numpy.

Schedule

The course is organized around four themes;

1. Algorithms and data structures for sequence alignment
2. Algorithms and data structures for sequence motifs
3. Phylogenetic distances and hierarchical clustering methods
4. Optimality criteria and tree search methods

Each theme will have a corresponding homework assignment. Themes 1 and 2 will be covered in the first midterm, and themes 3 and 4 in the second midterm.

The below schedule is very preliminary and may change subject to Rice University policy

Week Tuesday class Thursday class Assignment
2018/08/20 Introduction to the course Central dogma and sequence homology1,2
2018/08/27 Global alignment (Needleman–Wunsch)1 Local alignment (Smith–Waterman)1
2018/09/03 Heuristic alignment (BLAST/FASTA)1 Burrows–Wheeler transform1 #1 issued
2018/09/10 Position specific score matrices2 Bayesian statistics and PSSMs2
2018/09/17 Markov models of sequence evolution2 Hidden Markov models2 #1 due
2018/09/24 Viterbi algorithm2 Forward algorithm2 #2 issued
2018/10/01 First in-class midterm1,2 Tree data structures3,4
2018/10/08 Midterm recess No class (catch up midterm)
2018/10/15 Simple phylogenetic distances3 GTR nested model distances3 #2 due
2018/10/22 Clustering (neighbor-joining)3 Clustering (UPGMA)3 #3 issued
2018/10/29 K-mer distance (kWIP)3 Tree search (NNI, SPR, hill climbing)4
2018/11/05 Parsimony (Fitch’s algorithm)4 The Felsenstein zone4 #3 due
#4 issued
2018/11/12 Phylogenetic likelihood4 Felsenstein’s algorithm4
2018/11/19 Course review Thanksgiving recess
2018/11/26 Second in-class exam3,4 TBA #4 due

Superscript numbers refer to the theme(s) for that day’s class or midterm. Assignments will be both issued and due before midnight on Fridays.

• First in-class midterm: 25%
• Second in-class midterm: 25%
• Four homework assignments: 12.5% each

Students with a strong and valid excuse for not attending a midterm will be allowed to pick from one of the following options:

• Sit the midterm on a different day or time
• Adjust their grading to increase the contribution of the corresponding homework assignments to match the midterm’s contribution
• Adjust their grading to double the contribution of the alternate midterm

Students with a strong and valid excuse for being unable to submit a homework assignment will be allowed to pick from one of the following options:

• Submit the homework assignment on a later day and time
• Adjust their grading to increase the contribution of the other homework assignments to match the assignment contribution
• Adjust their grading to increase the contribution of the corresponding midterm to match the assignment contribution

For both assignments and midterms the strength and validity of excuses, and which of the above options are made available, will be solely the instructor’s purview. Without a strong and valid excuse, a penalty of 10 percentage points per day (which is equivalent to 1.25 points off the final course percent per day) will be applied to any assignment submitted after the deadline.

Absence policies

Attendance is expected at every class. Attendance for the midterm exams is compulsory and, without a strong and valid excuse, required to pass the course even if a student would have otherwise received a passing grade.

Rice Honor Code

In this course, all students will be held to the standards of the Rice Honor Code, a code that you pledged to honor when you matriculated at this institution. If you are unfamiliar with the details of this code and how it is administered, you should consult the Honor System Handbook at http://honor.rice.edu/honor-system-handbook/. This handbook outlines the University’s expectations for the integrity of your academic work, the procedures for resolving alleged violations of those expectations, and the rights and responsibilities of students and faculty members throughout the process.

Students with a disability

If you have a documented disability or other condition that may affect academic performance you should: 1) make sure this documentation is on file with Disability Support Services (Allen Center, Room 111 / adarice@rice.edu / x5841) to determine the accommodations you need; and 2) talk with me to discuss your accommodation needs.