Burkhard Morgenstern Institut f ür Mikrobiologie und Genetik

83
Burkhard Morgenstern Institut für Mikrobiologie und Genetik Molekulare Evolution und Rekonstruktion von phylogenetischen Bäumen WS 2006/2007

description

Burkhard Morgenstern Institut f ür Mikrobiologie und Genetik Molekulare Evolution und Rekonstruktion von phylogenetischen B äumen WS 2006/2007. Goal: Phylogeny reconstruction based on molecular sequence data (DNA, RNA, protein sequences). Multiple sequence alignment. - PowerPoint PPT Presentation

Transcript of Burkhard Morgenstern Institut f ür Mikrobiologie und Genetik

Page 1: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Burkhard Morgenstern

Institut für Mikrobiologie und Genetik

Molekulare Evolution und Rekonstruktion

von phylogenetischen Bäumen

WS 2006/2007

Page 2: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Goal:

Phylogeny reconstruction based on molecular sequence data (DNA, RNA, protein sequences)

Page 3: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Multiple sequence alignment

Molecular phylogeny reconstruction relies on comparative nucleic acid and protein sequence analysis

Alignment most important tool for sequence comparison

Multiple alignment contains more information than pair-wise alignment

Page 4: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

Y I M Q E V Q Q E R

Sequence duplicates in history (e.g. speciation event)

Page 5: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

Y I M Q E V Q Q E R

Page 6: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

Y I M Q E V Q Q E R

Y I M Q E V Q Q E R

Page 7: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

Y I M Q E A Q Q E R

Y L M Q E V Q Q E R

Substitutions occur

Page 8: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

Y I M Q E A Q Q E R

Y L M Q E V Q Q E R

Page 9: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

YAI M Q E A Q Q E R

Y L M - - V Q Q E R V

Insertions/deletions (indels) occur

Page 10: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

YAI M Q E A Q Q E R

Y L M - - V Q Q E R V

Page 11: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

Y A I M Q E A Q Q E R

Y L M V Q Q E R V

because of insertions/deletions: sequence similarity no longer immediately visible!

Page 12: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

Y A I M Q E A Q Q E R -

Y - L M V - - Q Q E R V

Alignment brings together related parts of the sequences by inserting gaps into sequences

Page 13: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

Y A I M Q E A Q Q E R -

Y - L M V - - Q Q E R V

Page 14: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

Y A I M Q E A Q Q E R -

Y - L M V - - Q Q E R V

Mismatches correspond to substitutions Gaps correspond to indels

Page 15: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

Pairwise alignment: alignment of two sequences

Multiple alignment: alignment of N > 2 sequences

Page 16: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

s1 R Y I M R E A Q Y E S A Q

s2 R C I V M R E A Y E

s3 Y I M Q E V Q Q E R

s4 W R Y I A M R E Q Y E

Assumtion: sequence family related by common ancestry; similarity due to common history

Sequence similarity not obvious (insertions and deletions may have happened)

Page 17: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

s1 - R Y I - M R E A Q Y E S A Q

s2 - R C I V M R E A - Y E - - -

s3 - - Y I - M Q E V Q Q E R - -

s4 W R Y I A M R E - Q Y E - - -

Multiple alignment = arrangement of sequences by introducing gaps

Alignment reveals sequence similarities

Page 18: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

s1 - R Y I - M R E A Q Y E S A Q

s2 - R C I V M R E A - Y E - - -

s3 - - Y I - M Q E V Q Q E R - -

s4 W R Y I A M R E - Q Y E - - -

Page 19: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

s1 - R Y I - M R E A Q Y E S A Q

s2 - R C I V M R E A - Y E - - -

s3 - - Y I - M Q E V Q Q E R - -

s4 W R Y I A M R E - Q Y E - - -

Page 20: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

s1 - R Y I - M R E A Q Y E S A Q

s2 - R C I V M R E A - Y E - - -

s3 - - Y I - M Q E V Q Q E R - -

s4 W R Y I A M R E - Q Y E - - -

General information in multiple alignment: Functionally important regions more conserved than

non-functional regions Local sequence conservation indicates functionality!

Page 21: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

s1 - R Y I - M R E A Q Y E S A Q

s2 - R C I V M R E A - Y E - - -

s3 - - Y I - M Q E V Q Q E R - -

s4 W R Y I A M R E - Q Y E - - -

Phylogeny reconstruction based on multiple alignment: Estimate pairwise distances between sequences

(distance-based methods for tree reconstruction) Estimate evloutionary events in evolution (parsimony

and maximum likelihood methods)

Page 22: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

s1 - R Y I - M R E A Q Y E S A Q

s2 - R C I V M R E A - Y E - - -

s3 - - Y I - M Q E V Q Q E R - -

s4 W R Y I A M R E - Q Y E - - -

Task in bioinformatics: Find best multiple alignment for given sequence set

Page 23: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

s1 - R Y I - M R E A Q Y E S A Q

s2 - R C I V M R E A - Y E - - -

s3 - - Y I - M Q E V Q Q E R - -

s4 W R Y I A M R E - Q Y E - - -

Astronomical number of possible alignments!

Page 24: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

s1 - R Y I - M R E A Q Y E S A Q

s2 - R C I V M R E A - - - Y E -

s3 Y I - - - M Q E V Q Q E R - -

s4 W R Y I A M R E - Q Y E - - -

Astronomical number of possible alignments!

Page 25: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

s1 - R Y I - M R E A Q Y E S A Q

s2 - R C I V M R E A - - - Y E -

s3 Y I - - - M Q E V Q Q E R - -

s4 W R Y I A M R E - Q Y E - - -

Computer has to decide: which one is best??

Page 26: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

Questions in development of alignment programs:

(1) What is a good alignment?

→ objective function (`score’)

(2) How to find a good alignment?

→ optimization algorithm

First question far more important !

Page 27: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

Before defining an objective function (scoring scheme)

What is a biologically good alignment ??

Page 28: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

Criteria for alignment quality:

1. 3D-Structure: align residues at corresponding positions in 3D structure of protein!

Page 29: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

Criteria for alignment quality:

Page 30: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

Criteria for alignment quality:

1. 3D-Structure: align residues at corresponding positions in 3D structure of protein!

Page 31: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

Species related by common history

Page 32: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

Genes / proteins related by common history

Page 33: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

Criteria for alignment quality:

1. 3D-Structure: align residues at corresponding positions in 3D structure of protein!

2. Evolution: align residues with common ancestors!

Page 34: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

s1 - R Y I - M R E A Q Y E S A Q

s2 - R C I V M R E A - Y E - - -

s3 - - Y I - M Q E V Q Q E R - -

s4 W R Y I A M R E - Q Y E - - -

Alignment hypothesis about sequence evolution Mismatches correspond to substitutions Gaps correspond to insertions/deletions

Page 35: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

s1 - R Y I - M R E A Q Y E S A Q

s2 - R C I V M R E A - Y E - - -

s3 - - Y I - M Q E V Q Q E R - -

s4 W R Y I A M R E - Q Y E - - -

Alignment hypothesis about sequence evolution Search for most plausible scenario! Estimate probabilities for individual evolutionary

events: insertions/deletions, substitutions

Page 36: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

s1 - R Y I - M R E A Q Y E S A Q

s2 - R C I V M R E A - Y E - - -

s3 - Y - I - M Q E V Q Q E R - -

s4 W R Y I A M R E - Q Y E - - -

Alignment hypothesis about sequence evolution Search for most plausible scenario! Estimate probabilities for individual evolutionary

events: insertions/deletions, substitutions

Page 37: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

Compute score s(a,b) for degree of similarity between amino acids a and b based on probability

pa,b

of substitution

a → b (or b → a)

(Extremely simplified!)

Page 38: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

Page 39: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

Reason for different substitutin probabilities pa,b :

Different physical and chemical properties of amino acids

Amino acids with similar properties more likely to be substituted against each other

Page 40: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik
Page 41: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

Use penalty for gaps introduced into alignment

Simplest approach: linear gap costs: penalty proportional to gap length

Non-linear gap penalties more realistic: long gap caused by single insertion/deletion

Most frequently used: affine linear gap penalties: more realistic, but efficient to calculate!

Page 42: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Traditional Objective functions:

Define Score of alignments as

Sum of individual similarity scores s(a,b) Minus gap penalties

Needleman-Wunsch scoring system for pairwise alignment (1970)

Page 43: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Pair-wise sequence alignment

T Y W I V

T - - L V

Example:

Score = s(T,T) + s(I,L) + s (V,V) – 2 g

Assumption: linear gap penalty!

Page 44: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Pair-wise sequence alignment

T Y W I V

T - - L V

Dynamic-programming algorithm finds

alignment with best score.

(Needleman and Wunsch, 1970)

Page 45: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Pair-wise sequence alignment

T Y W I V

T - - L V

Running time proportional to product of sequence length

Time-complexity O(l1 * l2)

Page 46: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Pair-wise sequence alignment

Algorithm for pairwise alignment can be generalized to multiple alignment of N sequences

Time-complexity O(l1 * l2 * … * lN)

Not feasable in reality (too long running time!)

Heuristic necessary, i.e. fast algorithm that does not necessarily produce mathematically best alignment

Page 47: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

`Progressive´ Alignment

Most popular approach to (global) multiple sequence alignment:

Progressive Alignment

Since mid-Eighties: Feng/Doolittle, Higgins/Sharp, Taylor, …

Page 48: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

`Progressive´ Alignment

WCEAQTKNGQGWVPSNYITPVN

WWRLNDKEGYVPRNLLGLYP

AVVIQDNSDIKVVPKAKIIRD

YAVESEAHPGSFQPVAALERIN

WLNYNETTGERGDFPGTYVEYIGRKKISP

Page 49: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

`Progressive´ Alignment

WCEAQTKNGQGWVPSNYITPVN

WWRLNDKEGYVPRNLLGLYP

AVVIQDNSDIKVVPKAKIIRD

YAVESEAHPGSFQPVAALERIN

WLNYNETTGERGDFPGTYVEYIGRKKISP

Guide tree

Page 50: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

`Progressive´ Alignment

WCEAQTKNGQGWVPSNYITPVN

WW--RLNDKEGYVPRNLLGLYP-

AVVIQDNSDIKVVP--KAKIIRD

YAVESEASFQPVAALERIN

WLNYNEERGDFPGTYVEYIGRKKISP

Profile alignment, “once a gap - always a gap”

Page 51: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

`Progressive´ Alignment

WCEAQTKNGQGWVPSNYITPVN

WW--RLNDKEGYVPRNLLGLYP-

AVVIQDNSDIKVVP--KAKIIRD

YAVESEASVQ--PVAALERIN------

WLN-YNEERGDFPGTYVEYIGRKKISP

Profile alignment, “once a gap - always a gap”

Page 52: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

`Progressive´ Alignment

WCEAQTKNGQGWVPSNYITPVN-

WW--RLNDKEGYVPRNLLGLYP-

AVVIQDNSDIKVVP--KAKIIRD

YAVESEASVQ--PVAALERIN------

WLN-YNEERGDFPGTYVEYIGRKKISP

Profile alignment, “once a gap - always a gap”

Page 53: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

`Progressive´ Alignment

WCEAQTKNGQGWVPSNYITPVN--------

WW--RLNDKEGYVPRNLLGLYP--------

AVVIQDNSDIKVVP--KAKIIRD-------

YAVESEA---SVQ--PVAALERIN------

WLN-YNE---ERGDFPGTYVEYIGRKKISP

Profile alignment, “once a gap - always a gap”

Page 54: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

`Progressive´ Alignment

WCEAQTKNGQGWVPSNYITPVN--------

WW--RLNDKEGYVPRNLLGLYP--------

AVVIQDNSDIKVVP--KAKIIRD-------

YAVESEA---SVQ--PVAALERIN------

WLN-YNE---ERGDFPGTYVEYIGRKKISP

Most important implementation: CLUSTAL W

Page 55: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

`Progressive´ Alignment

CLUSTAL W; Thompson et al., 1994 (~17.000 citations)

Pairwise distances as 1 - percentage of identity Calculate un-rooted tree with Neighbor Joining Define root as central position in tree Define sequence weights based on tree Gap penalties calculated based on various

parameters

Page 56: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

Problems with traditional approach:

Results depend on gap penalty

Heuristic guide tree determines alignment; alignment used for phylogeny reconstruction

Algorithm produces global alignments.

Page 57: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Tools for multiple sequence alignment

Problems with traditional approach:

But:

Many sequence families share only local similarity

E.g. sequences share one conserved motif

Page 58: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

The DIALIGN approach

Morgenstern, Dress, Werner (1996),PNAS 93, 12098-12103

Combination of global and local methods

Assemble multiple alignment from gap-free local pair-wise alignments (,,fragments“)

Page 59: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 60: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 61: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 62: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 63: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 64: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 65: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

The DIALIGN approach

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 66: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

The DIALIGN approach

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaa--gagtatcacccctgaattgaataa

Page 67: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

The DIALIGN approach

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaa--gagtatcacc----------cctgaattgaataa

Page 68: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

The DIALIGN approach

atc------taatagttaaactcccccgtgc-ttag

cagtgcgtgtattactaac----------gg-ttcaatcgcg

caaa--gagtatcacc----------cctgaattgaataa

Page 69: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

The DIALIGN approach

atc------taatagttaaactcccccgtgc-ttag

cagtgcgtgtattactaac----------gg-ttcaatcgcg

caaa--gagtatcacc----------cctgaattgaataa

Consistency!

Page 70: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

The DIALIGN approach

atc------TAATAGTTAaactccccCGTGC-TTag

cagtgcGTGTATTACTAAc----------GG-TTCAATcgcg

caaa--GAGTATCAcc----------CCTGaaTTGAATaa

Page 71: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

More methods for multiple alignment:

T-Coffee PIMA Muscle Prrp Mafft ProbCons

Page 72: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Substitution matrices

Similarity score s(a,b) for amino acids a and b based on probability pa,b of substitution a -> b

Idea: it is more reasonable to align amino acids that are often replaced by each other!

Page 73: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Substitution matrices

Assumptions:

pa,b does not depend on sequence position

Sequence positions independent of each other pa,b = pb,a (symmetry!)

Page 74: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Substitution matrices

Compute score s(a,b) for degree of similarity between amino acids a and b:

Probability pa,b of substitution

a → b (or b → a), Frequency qa of a

Define

s(a,b) = log (pa,b / qa qb)

Page 75: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Substitution matrices

Page 76: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Substitution matrices

To calculate pa,b:

Consider alignments of related proteins and count substitutions

a → b (or b → a)

Page 77: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Substitution matrices

To calculate pa,b:

Consider alignments of related proteins and count substitutions

a → b (or b → a)

ESWTS-RQWERYTIALMSDQRREVLYWIALY

ERWTSERQWERYTLALMS-QRREALYWIALY

Page 78: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Substitution matrices

To calculate pa,b:

Consider alignments of related proteins and count substitutions

a → b (or b → a)

ESWTS-RQWERYTIALMSDQRREVLYWIALY

ERWTSERQWERYTLALMS-QRREALYWIALY

Page 79: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Substitution matrices

Problems involved:

1. Probability pa,b depends on time t since sequences separated in evolution: pa,b = pa,b (t)

2. Protein families contain multiple sequences: phylogenetic tree must be known!

3. Alignment of protein families must be known!

4. Multiple mutations at one sequence position

Page 80: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Substitution matrices

M. Dayhoff et al., Atlas of Protein sequence and Structure, 1978

PAM matrices

Page 81: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Substitution matrices

Calculation of pa,b(t) :

Consider multiple alignments of closely related protein families

Count occurrence of a and b at corresponding positions in alignments using phylogenetic tree

Estimate pa,b(t) for small times t

Calculate conditional probabilities p(a|b,t) for small t Normalize to distance 1 PAM (= percentage of

accepted mutations) Calculate p(a|b,t) for larger evolutionary distances by

matrix multiplication

Calculate pa,b(t) for larger evolutionary distances

Page 82: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Substitution matrices

Page 83: Burkhard Morgenstern Institut f ür  Mikrobiologie und Genetik

Substitution matrices

Alternative: BLOSUM matrices

S. Henikoff and J.G. Henikoff, PNAS, 1992

Basis: BLOCKS database, gap-free regions of multiple alignments.

Cluster of sequences if percentage of similarity > L Estimate pa,b(t) directly.

Default values: L = 62, L = 50