CS 4/504, Fall 2001 Computational Biology

Homework 6, Due: 11/15/01

 

 

  1. Use Entrez to search for AAL05380, and display the sequence in FASTA format. Paste this into the search box for a PSI-BLAST search using the nr database (see http://www.ncbi.nlm.nih.gov/blast). How many of the sequences were new on this iteration? What is the score and E-value of the top scoring sequence.

 

Ans: all of them. This is an iterative algorithm, and this is the first iteration. So, everything returned was new! Top E-value score (on 1 Dec 2001 around 17.00) was e-115 for AF286365.

 

  1. Run a second iteration of PSI-BLAST. You will have to go back to the original PSI-BLAST window (not the results window) and press the “FORMAT!” button again. This will change the results window. How many of the top 10 scoring sequences were added on this iteration (iteration 2)? Has the top score and E-value changed?

 

Ans: One (AF042100) was new. The new top score is e-119, a better score than before even though this sequence (AF286365) is the same as the last top scorer. This is because the second iteration used more information (the results of the first iteration), and so the probability of having this match by chance is lower than it was on the first iteration—which is what the lower e-value means.

 

  1. How many of the top 10 were added on iteration 3? What was the top score and E-value? How many of the top 10 were added on iteration 4? What was the top score and E-value?

 

Ans: On iteration 3, none were new and the top e-value was again e-119 for AF286365. On iteration 4, none were new and the top e-value was again e-119 for AF286365. It looks like the algorithm has converged, and we’re unlikely to find anything new.

 

  1. Look up how PSI-BLAST works. What do your observations tell you about how quickly this particular iterative algorithm converges.

 

Ans: Psi-Blast builds a profile from the top results of one search, and then uses the profile to search for new sequences. The profile is essentially a scoring matrix for each position. Using this new scoring information, the next iteration matches proteins in the database with profile. The sequences returned may or may not include some that were not returned on the prior iteration—or it may score some sequences higher than before, given the new scoring matrix. Using these new results, Psi-Blast computes a new scoring matrix to be used for the next iteration. And so on. Once the top scoring sequences stop changing, the scoring matrix also stops changing—so the algorithm has converged. This happens very quickly in most cases (as in this one).

 

  1. go to an online ClustalW webpage (for example http://dot.imgen.bcm.tmc.edu:9331/multi-align/Options/clustalw.html, or you could install ClastalW on your own machine and run it <grin>.  Enter the top two and the bottom 2 scoring sequences from a Blast on AAL05380. (You may have to re-do the Blast from question 1, and you will certainly have to do a lot of scrolling and cutting and pasting). Answer the following questions:
    1. Annoying, isn’t it? Imagine doing this with 100 or 200 sequences. Do you see why biologists want better tools?
    2. Describe the alignments you see (just enough so that I know you actually did the exercise). Be sure to try the JALViewer. If it works for you, it’s MUCH easier.
    3. Can you find the guide tree for these four sequences? If so, what does it look like?

 

Ans:

  1. Yes indeed. Damned annoying. That’s why we have grad students do it!
  2. The top 2 and bottom 2 were AF286365, AF156820, AY032069, and AF077704 (respectively, on 1 December 2001 at 17.43). The first two of these sequences were very long, and the last two very short. The alignment has long gaps at the beginning and ends, with a moderately long gap at the beginning the second sequence as well. It would appear the middle of the long sequence is the highly conserved part.
  3. Nope. I can’t find it. Remember, ClustalW first builds a guide tree, then uses that to build the MSA. I know that if you install Clusta\lW yourself, you can get this guide tree. But I couldn’t find it on this online service.