COMP571 (Fall 2022)

6 minute read

Important note: The information contained in the course syllabus, other than the absence policies, may be subject to change with reasonable advance notice, as deemed appropriate by the instructor.

Who

Instructor:

TAs:

Where and when

Lectures will be held in Maxfield Hall 252, on Mondays and Wednesdays, between 4:00–5:15 PM. All lectures will be delivered in-person and, without a reasonable excuse, attendence is expected. One scheduled office hour will be held on Zoom, at a time to be determined based on scheduling conflicts, and attendance by the whole class is encouraged so that everyone can benefit from the discussion. Office hours will not be recorded. Individual appointments outside this time are welcome.

Distribution of class materials and submission of assignments and projects will be conducted via Canvas. Slack will be used for coordination and communication around group projects.

Intended audience

The students who should take COMP571 are generally studying computer science, biology or genomics, and wish to learn how to apply algorithms and statistical models to important problems in biology and genomics.

Course objectives and learning outcomes

The primary objective of the course is to teach the theory behind methods in biological sequence analysis, including sequence alignment, sequence motifs, and phylogenetic tree reconstruction. By the end of the course, students are expected to understand and be able to write basic implementations of the algorithms which power those methods.

Course materials

The main material for this course will be lectures and the course blog. That said, Statistics for Biology and Health by Ewens & Grant is a good resource and a PDF copy should be accessible here or through the Fondren Library.

Software for the course

Algorithms and statistics will be demonstrated using Python. Assignments and projects will require some Python coding. R may be used for some demonstrations (because it is nice for data visualization) but not for assessment.

The NumPy and SciPy libraries for scientific computing will be used with Python. To install these libraries, first install the latest official distribution of Python 3. This can be downloaded for macOS or for Windows from Python.org, and should already be included with your operating system if you are using Linux.

Then simply use the Python package manage pip to install NumPy and SciPy from the command line, by running pip3 install numpy scipy.

Schedule and assessment

The course is organized around five themes, and there will be a corresponding theory-based homework assignment for each one;

  1. Substitution models for gapless alignments
  2. Global alignment, local alignment, and BLAST
  3. Hidden Markov Models and transmembrane domain identification
  4. Sankoff’s and hill climbing algorithms
  5. Phylogenetic likelihood and coalescent theory

Assignments should be completed individually, copying answers without attribution from other sources including other students will be considered plagiarism.

In addition to these assignments, each student will have to complete one project of implementing a novel or existing statistical model, applying it to a public data set, and writing up the results in the style of a scientific paper. The statistical model should be relevant to one (or more) of the course themes. Projects will be designed by groups of 3–5 students, but the implementation, application and write up will be individual. Copying and pasting code or your report (in part or in whole) without attribution from other sources including other students will be considered plagiarism. Project code and reports will be due on December 2nd. A design document will be due on November 4th before the project due date which will be completed and submitted by the group rather than individually.

The below schedule may change subject to Rice University policy

Monday’s date Monday’s lecture Wednesday’s lecture Homework
08/22/22 Introduction Central dogma and gene models  
08/29/22 canceled canceled  
09/05/22 Labor Day, no scheduled classes Margaret Dayhoff’s Atlas1  
09/12/22 Homology testing using log-odds2 Global alignment2  
09/19/22 Local alignment2 BLAST (gapless phase)2  
09/26/22 BLAST (gapped phase)2 Hidden Markov models3 #1 issued
10/03/22 Viterbi algorithm3 Forward algorithm3 #1 due
10/10/22 Midterm recess, no scheduled classes Backward algorithm3  
10/17/22 canceled Baum-Welch algorithm  
10/24/22 Phylogenetic trees4 Hill climbing4 #2 issued
10/31/22 The Felsenstein zone4 Time-reversible models of evolution5 #2 due
11/07/22 Felsenstein’s pruning algorithm5 Wright–Fisher model5  
11/14/22 Kingman’s coalescent5 Bayesian phylogenetic inference  
11/21/22 Metropolis algorithm Thanksgiving recess, no scheduled classes #3 issued
11/28/22 Buffer Buffer #3 due
12/05/22 Scheduled classes finished Scheduled classes finished  

Each row in the above table lists the lecture topics, homework and project milestones for the week beginning on the specified Monday and ending the following Sunday. Superscript numbers refer to the theme(s) for that day’s class or midterm. Assignments will be issued before midnight on the Monday at the start of the week. Assignments will also be due before midnight on Fridays.

Project reports

The goal of the project design document is to enable your instructor and TA to provide feedback before you spend a lot of time implementing the required code. It is not intended to be very formal like a typical grant application or software requirement specification. It should describe the dataset(s) and method(s) you plan on using and implementing, and the rationale behind why the dataset(s) and method(s) are appropriate for the problem at hand, in sufficient detail for your instructor and TA to understand your approach.

I (Huw) would expect this to be approximately 2 pages in length, but it may be longer if that’s required for sufficient explanation. If adding figures and tables improves the clarity of your proposal, by all means include them. You should include references, but the formatting of the document and citation style is up to you.

Grade policies

  • Attendance: 5%
  • Homework assignments: 15% each
  • Project design document: 10%
  • Project implementation: 15%
  • Project report: 25%

Assignments or projects submitted late with a strong and valid excuse will be accepted without penalty. The strength and validity of excuses will be solely the instructor’s purview. Without a strong and valid excuse, the final course percentage will be reduced by 2% for each day any submission is late, up to the contribution of that submission to the final percentage. For example if submitted homework is given a mark of 70%, it contributes 70% × 20% = 14% to the final percentage.

No assignments or projects will be accepted after the end of the semester on Tuesday, December 13, 2022. In exceptional circumstances, if a student is unable to complete an assignment or project before the semester ends, the final percentage will be calculated by scaling the assessment which that student has completed. Again, this will be solely within the instructor’s purview.

Absence policies

Please let your instructor know if you are going to be absent from lectures and why. In-person lecture attendence is expected without a reasonable excuse. A non-exhaustive list of reasonable excuses includes illness, COVID exposure, time-sensitive experiments, or conference/workshop commitments.

Rice Honor Code

In this course, all students will be held to the standards of the Rice Honor Code, a code that you pledged to honor when you matriculated at this institution. If you are unfamiliar with the details of this code and how it is administered, you should consult the Honor System Handbook at http://honor.rice.edu/honor-system-handbook/. This handbook outlines the University’s expectations for the integrity of your academic work, the procedures for resolving alleged violations of those expectations, and the rights and responsibilities of students and faculty members throughout the process.

Students with a disability

If you have a documented disability or other condition that may affect academic performance you should: 1) make sure this documentation is on file with Disability Support Services (Allen Center, Room 111 / adarice@rice.edu / x5841) to determine the accommodations you need; and 2) talk with me to discuss your accommodation needs.

Categories:

Updated: