# Statistical Bioinformatics

**Homework 1**

**Question 1 (10pts)
**a) What is probability of observing 61325 when rolling fair dice?

*Probabilities for fair dice*: P(1)=P(2)=P(3)=P(4)=P(5)=P(6)=1/6

b) What is probability of observing 61325 when rolling loaded dice?

*Probabilities for fair dice*: P(1)=P(2)=P(3)=P(4)=P(5)= 0.1 and P(6)=0.5

**Question 2 (50pts)**. On a hypothetical island virus outbreak becomes a threat of future pandemic. Researchers have narrowed down the cause of outbreak to two viruses (virus 1 and virus 2). The DNA sequencing lab receives a sample for further analysis. Unfortunately, the sample was contaminated and the removal of foreign DNA leaves the lab with a short DNA fragment: AGTAGCTTCCAG. Given all available information (provided below) how can lab determine the type of the virus that caused the outbreak.

Nucleotide probabilities of virus1

P(A)=P(T) =0.3

P(G)=P(C) = 0.2

Nucleotide probabilities of virus 2

P(A)=P(T)=P(G)=P(C)= .25

Assume:

– Virus 1 and Virus 2 are equally likely to occur in nature.

– nucleotides are independent and identically distributed.

**Question 3 (40 pts)**

Align two sequences shown below using Needelman Wunsch algorithm.

Use match score of 4, mismatch score of -4 and gap penalty score of -2.

Show:

a) dynamic programming matrix with scores (as it shown in Figure 6.1, Ewens or in Figure 2.5, Durbin which is available under Course Content.)

b) trace back pointers

c) alignment score

sequences:

sequence 1:

AGAGCTCACAA

sequence 2:

AGTAGCTTCCAAA