Courses:

Foundations of Computational and Systems Biology >> Content Detail



Study Materials



Study Materials

Media player software, such as QuickTime® PlayerRealOne™ Player, or Windows Media® Player, is required to run the .avi files in this section. RasMol software is required to run the .pdb files found in this section. Use the Python Interpreter to run the .py files found in this section.

Amazon logo Help support MIT OpenCourseWare by shopping at Amazon.com! MIT OpenCourseWare offers direct links to Amazon.com to purchase the books cited in this course. Click on the Amazon logo to the left of any citation and purchase the book from Amazon.com, and MIT OpenCourseWare will receive up to 10% of all purchases you make. Your support will enable MIT to continue offering open access to MIT courses.

Lecture 8 Motif Finding Animation

Gibbs Sampler, Weak Motif Animation (AVI) courtesy of Professor Chris Burge.

Python Programming

The majority of the homework assignments will include problems that involve writing simple programs in the scripting language, Python. Python, as well as Perl, is widely used in the fields of bioinformatics and computational biology. Because many students may have little or no programming experience, a hands-on python tutorial to take place across three sessions will be offered by Dr. Peter Woolf during the second week of classes.

Python Tutorial Overview

The aim of this tutorial is to give students a basic working knowledge of the scripting language Python. This course is intended for students with little or no programming experience, and will focus on the tools and utilities needed to do research in bioinformatics and computational biology.

My goal is to make the class informal and hands on, so please speak up if something does not make sense. Programming is not something that can be easily learned by watching, but must be learned by doing.

At minimum, by the end of this class, you should be able to read in a FASTA sequence from a file, parse it, and return the reverse compliment of that sequence to a file.

Tutorial Outline

Session One: Introduction to Unix, Text Editors, Basic Python Commands and Data Structures

Session Two: Flow Control in Python, Input/Output, Files, HTML

Session Three: Modules, Program Organization, and Regular Expressions

Text

Amazon logo Lutz, Mark, and David Ascher. Learning Python. 2nd ed. Beijing; Cambridge, MA: O'Reilly, 2003. ISBN: 9780596002817.

Online Resources

The tutorial will roughly follow the structure of the standard documentation tutorial that can be found at: Online Python Tutorial.

If you are already a proficient programmer, look at: Dive into Python.

A good Unix-command Cheat Sheet can be found at: Unix-command Cheat Sheet.

For an introduction to regular expressions:  Regular Expression HOWTO.

To quickly test your regular expressions, try the program: Kodos.

Finally, for lots of examples of good Python code related to Bioinformatics and Computational Biology, see: Biopython Web site.

In Class Exercise for Session Two

Review the notes on Unix Commands and Beginner's Python (PDF).

  • Parse the string in fasta.txt (TXT) to obtain the reverse compliment of the sequence section alone. Output this new string to a file called output.txt.

In Class Exercises for Session Three

In Python you can write programs that can run as a stand alone program or you can import them into other Python code. In fact, you have already been using Python programs every time you use an import command.

As an example of the framework of a basic Python program, see SampleProg.py (PY).

  • Load SampleProg.py from within the Python Interpreter and test the functions read_format and do_comparison.
  • Modify SampleProg.py so that it compares two numbers given at command line using do_comparison.

Regular expressions are a powerful text parsing tool that is widely used in bioinformatics. See the notes on regular expressions (PDF) for a summary of the commands.

  • Write a regular expression to extract all of the carbon atom position data from the file example.pdb (PDB). Print this data out.

 

 

 


 








© 2009-2020 HigherEdSpace.com, All Rights Reserved.
Higher Ed Space ® is a registered trademark of AmeriCareers LLC.