Advanced Bioinformatics


 

IB 67110  (2 credits)

Course home page: http://gel.ym.edu.tw/~chc/ab2004.html 

Advanced Bioinformatics

Course organizer: Ueng-Cheng Yang (楊永正)

Lecture room:  傳醫乙棟 room 713

Office: 傳醫乙棟712,  Tel: (02) 28267128,  Email: yang@ym.edu.tw

Thursdays, 10:10am-12:00pm

Course organizer: Chuan-Hsiung Chang (張傳雄)

Office hours by appointment

Office:  IG room 555,  Tel: (02) 28267316,  Email: cchang@ym.edu.tw 


Course Description

Syllabus

 Grading

Course Materials


Course description

An introduction to the fundamental theory and practice of bioinformatics.

Goals for the course: The course will familiarize students with the fundamental principles of bioinformatics. By the end of the course, students will
have a working knowledge of a variety of topics important in bioinformatics, and a grasp of the underlying principles that is adequate for them to
evaluate and use new bioinformatics methods as they arise in the future.

Course Requirements:

Syllabus

The assigned reading materials will be available in Acrobat (pdf) format.

  1. 授課對象:生物資訊研究所博士班                                                                 負責教師姓名:楊永正、張傳雄

    時  間:每週四上午10:10-12:00                                                                                                        (02) 28267128

    地  點:傳醫大樓乙棟七樓713室                                                   張傳雄教師: (02) 28267316

    週次

    日期

    討論主題

    時數

     教師姓名

    Reading Assignments

    1

     

    2

    3

    4

    5

    6


    7

    8

    9

    10

    11

    12

    13

    14

    15

    16

    17
    18

    02/23

     

    03/02

    03/09

    03/16

    03/23

    03/30


    04/06

    04/13

    04/20

    04/27

    05/04

    05/11

    05/18

    05/25

    06/01

    06/08

    06/15

    06/22

    Course Introduction & Overview

    First Section Topic: * Sequence Comparison *      

    Dot Plot

    Dynamic Programming (Global & Local)

    Scoring Matrices
    Heuristic Sequence Search (FASTA & BLAST)

    Multiple Sequence Alignment

    Second Section Topic: * Pathway Analysis *
    TBA
    TBA
    Genome-scale sequence alignment

    Miderterm Report-1

    Miderterm Report-2

    Miderterm Report-3

    TBA

    TBA

    TBA

    TBA

    TBA

    Final Report

    2

     

    2

    2

    2

    2

    2

     

    2

    2

    2

    2

    2

    2

    2

    2

    2

    2

    2

    2

    張傳雄

     

    張傳雄

    張傳雄

    張傳雄

    張傳雄

    張傳雄

     

    楊永正

    楊永正

    張傳雄

    張傳雄

    張傳雄

    張傳雄

    楊永正

    楊永正

    楊永正

    楊永正

    楊永正

    楊永正

     

     

    Articles # 01-04

    Articles # 05-13

    Articles # 14-17

    Articles # 18-29

    Articles # 30-37

     

     

     

    Articles # 38-50

     


    Grading

    Midterm

    Midterm:  There will be an in-class Presentation Report on May 11, 2006 and it will be worth fifty points.

    Final

    Final:  There will be an in-class Presentation Report on June 22, 2006, and it will be worth fifty points.


    Reading assignments

    * Sequence Comparison *

    Dot Plot
        1. Gibbs, A. J. & McIntyre, G. A. (1970).
            The diagram, a method for comparing sequences. its use with amino acid and nucleotide sequences.
            Eur. J. Biochem. 16, 1-11.
            [pdf]

        2. Sonnhammer EL, Durbin R. (1995).
            A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis.
            Gene. 167, GC1-10.
            [pdf]

        3. Junier T, Pagni M. (2000).
            Dotlet: diagonal plots in a web browser.
            Bioinformatics. 16, 178-179.
            [pdf]

        4. Huang Y, Zhang L. (2004).
            Rapid and sensitive dot-matrix methods for genome analysis.
            Bioinformatics. 20, 460-466.
            [pdf]

    Dynamic Programming

        5. Eddy SR. (2004).
            What is dynamic programming?
            Nat Biotechnol. 22, 909-910.
            [pdf]

        Global Alignment:
        6. Needleman, S. B. & Wunsch, C. D. (1970).
            A general method applicable to the search for similarities in the amino acid sequence of two proteins.
            J. Mol. Biol. 48, 443-453.
            [pdf]

        Local Alignment:
        7. Smith, T. F. & Waterman, M. S. (1981).
            Identification of common molecular subsequences.
            J. Mol. Biol. 147, 195-197.
            [pdf]

        8. Waterman, M. S. (1983).
            Sequence alignments in the neighborhood of the optimum with general application to dynamic programming.
             Proc. Natl. Acad. Sci. USA, 80, 3123-3124.
            [pdf]

    Statistical Significance of Sequence Alignments & others
        9. Gotoh, O. (1982).
            An improved algorithm for matching biological sequences.
            J. Mol. Biol. 162, 705-708.
            [pdf]

        10. Waterman, M. (1994).
            Estimating statistical of sequence alignments.
            Phil.Trans.R.Soc.Lond B 344, 383-390.
            [pdf]

        11. Altschul, S.F. & Gish, W. (1996).
            Local alignment statistics.
            Meth. Enzymol. 266, 460-480.
            [pdf]

        12. Pearson, W. R. (1998).
           
    Empirical statistical estimates for sequence similarity searches.
           
    J Mol Biol. 276, 71-84.
            [pdf]

        13. W. R. Pearson and T. C. Wood. (2001).
            "Statistical significance in biological sequence comparison"
              in Handbook of Statistical Genetics, D. J. Balding, M. Bishop, and C. Cannings eds.
             
    London: Wiley, pp. 39-65.
             [pdf]

    Substitution Matrices
       
    14. Henikoff, S. & Henikoff, J. G. (1993).
            Performance evaluation of amino acid substitution matrices.
            Proteins: Struct., Funct., Genet. 17, 49-61.
            [pdf]

        PAM:
        15. Dayhoff, M. O., Schwartz, R. M. & Orcutt, B. C. (1978).
            A model of evolutionary change in proteins. [pdf]
            Matrices for detecting distant relationships. [pdf]
            In Atlas of protein sequence and structure, (Dayhoff, M. O., ed.), 5, 345-358.
            National biomedical research foundation Washington DC.

        BLOSUM:
        16. Henikoff, S. & Henikoff, J. G. (1992).
            Amino acid substitution matrices from protein blocks.
            Proc. Natl. Acad. Sci. USA, 89, 10915-10919.
            [pdf]

        17. Eddy SR. (2004).
            Where did the BLOSUM62 alignment score matrix come from?
            Nat Biotechnol. 22, 1035-1036.
            [pdf]

    FASTA
        18. Lipman DJ, Pearson WR. (1985).
              Rapid and sensitive protein similarity searches.
              Science 227, 1435-1441.
              [pdf]

        19. Pearson WR, Lipman DJ. (1988).
              Improved tools for biological sequence comparison.
              Proc Natl Acad Sci U S A. 85, 2444-2448.

              [pdf]

        20. Pearson WR. (1995).
             
    Comparison of methods for searching protein sequence databases.
             
    Protein Sci. 4, 1145-1160.
              [pdf]

        21. Pearson WR. (1997).
             
    Identifying distantly related protein sequences.
             
    Comput. Appl. Biosci. 13, 325-332.
              [pdf]

        22. Pearson WR. (2000).
             
    Flexible sequence similarity searching with the FASTA3 program package.
             
    Methods Mol Biol. 132, 185-219.
              [pdf]

        23.
    Smoot ME, Guerlain SA, Pearson WR. (2004).
              Visualization of near-optimal alignments.
             
    Bioinformatics. 20, 953-958.
              [pdf]

    BLAST
       
    24. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. (1990).
              Basic local alignment search tool.
              J. Mol. Biol. 215, 403-410.
              [pdf]

        25.
    Shpaer EG, Robinson M, Yee D, Candlin JD, Mines R, Hunkapiller T. (1996).
             
    Sensitivity and selectivity in protein similarity searches: a comparison of Smith-Waterman in hardware to BLAST and FASTA.
             
    Genomics. 38, 179-191.
              [pdf]

        26. Altschul, S.; Madden, T.; Schaffer, A.; Zhang, J.; Zhang, Z.; Miller, W.; and Lipman, D. (1997).
              Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
             
    Nucleic Acids Res. 25, 3389-3402.
              [pdf]

        27. Altschul, S.F. & Koonin, E.V. (1998).
              Iterated profile searches with PSI-BLAST - a tool for discovery in protein databases.
              Trends Biochem. Sci. 23, 444-447.
              [pdf]

        28. Zhang Z, Schaffer AA, Miller W, Madden TL, Lipman DJ, Koonin EV, Altschul SF. (1998).
             
    Protein sequence similarity searches using patterns as seeds. (PHI-BLAST)
         
        Nucleic Acids Res. 26, 3986-3990.
              [pdf]

        29. A. Pearson, H. Peri, O. Jabado, O. Wood. (2000).
              eBLAST - Building a Better BLAST.
              [pdf]

    Profile Analysis
        30. Gribskov, M., McLachlan, A. D. & Eisenberg, D. (1987).
               Profile analysis: Detection of distantly related proteins.
               Proc. Natl. Acad. Sci. USA, 84, 4355-4358.
               [pdf]

    Multiple Sequence Alignment
        31. Lipman, D. J., Altschul, S. F. & Kececioglu, J. (1989).
             A tool for multiple sequence alignment.
             Proc. Natl. Acad. Sci. USA, 86, 4412-4415.
              [pdf]

        32.
    Thompson JD, Higgins DG, Gibson TJ. (1994).
             CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific
             gap penalties and weight matrix choice.
             Nucleic Acids Res.  22, 4673-4680.
     
            [pdf]

        33. Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ, Higgins DG, Thompson JD
    . (2003).
             
    Multiple sequence alignment with the Clustal series of programs.
              Nucleic Acids Res.  31, 3497-3500.
     
            [pdf]

        34.
    Brudno M, Do CB, Cooper GM, Kim MF, Davydov E, Green ED, Sidow A, Batzoglou S; NISC Comparative Sequencing Program. (2003).
             
    LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA.
             
    Genome Res. 13, 721-731.
     
            [pdf]

        35.
    Sammeth M, Morgenstern B, Stoye J. (2003).
             
    Divide-and-conquer multiple alignment with segment-based constraints.
             
    Bioinformatics. 19 Suppl 2, II189-II195.
     
            [pdf]

        36.
    Edgar RC. (2004).
            
    MUSCLE: multiple sequence alignment with high accuracy and high throughput.
            
    Nucleic Acids Res. 32, 1792-1797.
     
            [pdf]

        37.
    Edgar RC. (2004).
             
    MUSCLE: a multiple sequence alignment method with reduced time and space complexity.
             
    BMC Bioinformatics. 5, 113-121.
     
            [pdf]

    Suffix Tree & MUMmer
        38. Delcher AL, Kasif S, Fleischmann RD, Peterson J, White O, Salzberg SL. (1999).
              Alignment of whole genomes.
           
      Nucleic Acids Res. 27, 2369-2376.  
              [pdf]

        39. Delcher AL, Phillippy A, Carlton J, Salzberg SL. (2002).
              Fast algorithms for large-scale genome alignment and comparison.
           
      Nucleic Acids Res. 30, 2478-2483.
              [pdf]

        40. Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL. (2004).
              Versatile and open software for comparing large genomes.
           
      Genome Biol. 5, R12.  
              [pdf]

        41. Wong PW, Lam TW, Lu N, Ting HF, Yiu SM. (2004).
              An efficient algorithm for optimizing whole genome alignment with noise.
           
      Bioinformatics. [Epub ahead of print].
              [pdf]

    PatternHunter, PipMaker, zPicture, VISTA, MAVID, & Mauve
        42. Ma B, Tromp J, Li M. (2002).
             PatternHunter: faster and more sensitive homology search.
             Bioinformatics. 18, 440-445.
              [pdf]

        43. Li M, Ma B, Kisman D, Tromp J
    . (2004).
             PatternHunter II: highly sensitive and fast homology search.
             J Bioinform Comput Biol. 2, 417-439.
              [pdf]

        44. Schwartz S, Zhang Z, Frazer KA, Smit A, Riemer C, Bouck J, Gibbs R, Hardison R, Miller W
    . (2000).
             
    PipMaker--a web server for aligning two genomic DNA sequences.
              Genome Res. 10, 577-586.
     
            [pdf]

        45. Ovcharenko I, Loots GG, Hardison RC, Miller W, Stubbs L
    . (2004).
             
    zPicture: dynamic alignment and visualization tool for analyzing conservation profiles.
              Genome Res. 14, 472-477.
              [pdf]

        46.
    Shah N, Couronne O, Pennacchio LA, Brudno M, Batzoglou S, Bethel EW, Rubin EM, Hamann B, Dubchak I. (2004).
             
    Phylo-VISTA: interactive visualization of multiple DNA sequence alignments.
              Bioinformatics. 20, 636-643.
     
            [pdf]

        47. Frazer KA, Pachter L, Poliakov A, Rubin EM, Dubchak I
    . (2004).
             
    VISTA: computational tools for comparative genomics.
              Nucleic Acids Res. 32(Web Server issue), W273-279.
     
            [pdf]

        48. Bray N, Pachter L
    . (2003).
             
    MAVID multiple alignment server.
              Nucleic Acids Res. 2003 Jul 1;31(13):3525-3526.
              [pdf]

        49. Bray N, Pachter L
    . (2004).
             
    MAVID: constrained ancestral alignment of multiple sequences.
              Genome Res. 14, 693-699.
              [pdf]

        50. Darling AC, Mau B, Blattner FR, Perna NT
    . (2004).
             
    Mauve: multiple alignment of conserved genomic sequence with rearrangements.
              Genome Res. 14, 1394-1403.
     
            [pdf]



    * PATHWAY ANALYSIS*


Web sites

Sites mentioned in class