Computational Genomics (BioE 582)
This course focuses on the study and implementation of methods for data mining and machine learning. Particular attention is given to those methods which are useful in the analysis of gene expression data from genome comparisons, microarray experiments, and protein function prediction. Students will gain practical skills in addition to theoretical knowledge, especially in the area of microarray data analysis. We will use the R and Bioconductor packages for the microarray data analysis and MATLAB for other implementation tasks. All three tools are widely used in industry and academia.
Topics covered:
- microarray technology and experiments, preprocessing of microarray data
- statistical methods (hypothesis testing, resampling, bootstrap, multiple testing)
- distances and expression measures
- feature selection
- cluster and classification analysis for microarray data
- Inference of genetic networks
Syllabus:
- Lecture 1: Introduction of the course
- Lecture 2: Introduction to Microarrays
- Lecture 3: Introduction to R
- Lecture 4: Review of statistics I
- Lecture 5: Review of statistics II
- Lecture 6: Review of statistics III
- Lecture 7: Preprocessing DNA Microarray Data
- Lecture 8: Preprocessing Affymetrix Data
- Lecture 9: Introduction of Bioconductor
- Lecture 10: Identification of differentially expressed genes
- Lecture 11: Multiple testing for the identification of differentially expressed genes
- Lecture 12: Significance Analysis of Microarrays (SAM)
- Lecture 13: Annotation
- Lecture 14: Genomic data-mining method 1 - overview
- Lecture 15: Genomic data-mining method 2 - clustering
- Lecture 16: Genomic data-mining method 3 - dimension reduction in unsupervised learning
- Lecture 17: Genomic data-mining method 4 - classification
- Lecture 18: Genomic data-mining method 5 – feature selection
- Lecture 19: Application for bayesian classifier
- Lecture 20: Genomic data-mining method 6 - Support vector machines
- Lecture 21: Identification of Transcription Binding Site
- Lecture 22: Using Bayesian Networks to Analyze expression data
We will normally post 2 lectures per week (on Mondays).
Prerequisites:
- Calculus: Basic knowledge
- Linear Algebra: Matrix operations
- Statistics: Hypothesis testing (F-test, T-test)
Textbook:
Stekel, D., Microarray Bioinformatics, First edition, Cambridge University Press, ISBN 0-521-52587-X
Grading:
- Homework
- worth 100 points will be assigned each week.
- The homework will be posted on Wednesday and will be due the following Wednesday.
- For any week, we may post homework early. This will not affect the due date of the homework.
- Late homework will be accepted until the first Friday following the due date with a penalty of 20 points per day late. Homework will not be accepted after Friday.
- Comprehensive project (due on the same day as the final exam)
- Midterm and final exams
