How to calculate probability of event

because I suck at math

Discuss the language of the universe.

Moderators: Calilasseia, ADParker

How to calculate probability of event

#1  Postby Rumraket » Aug 01, 2015 7:16 pm

So, I'm trying to work out how to calculate the probability of some event, but I don't know how.

It's actually a pretty simple thing (I think?): Take the human genome (~3.2 billion base-pairs), randomly insert 100 mutations, then calculate the probability of those 100 specific mutations happening before they did. Every base-pair can mutate of course, and there are multiple mutations possible at every site.

To keep it simple, I just want these mutations included:
Insertion of any basepair anywhere (that means 4 different kinds of insertions possible at 3.2 billions sites)
Deletions of any basepair anywhere (delete one basepair at any of 3.2 billion pasepairs)
Any substitution anywhere (change any basepair at any of 3.2 billion sites into one of the 3 others).

I think you can simplify this to say there is 8 possible changes possible at any site (1 deletion, 4 insertions, 3 substitutions)

100 of those in total will happen in a human genome. How does one make a formula to calculate the probability?
Half-Life 3 - I want to believe
User avatar
Rumraket
THREAD STARTER
 
Posts: 13215
Age: 40

Print view this post

Ads by Google


Re: How to calculate probability of event

#2  Postby Rumraket » Aug 01, 2015 7:25 pm

The probability of a mutation, if there is only 1 happening, at any one site would be 1 in the total number of sites, which would be 3.2 billion. So that's just 1 in 3.2 billion, Right?

But there are 100 happening, and there are 8 different types possible at every site in the 3.2 billion.
Half-Life 3 - I want to believe
User avatar
Rumraket
THREAD STARTER
 
Posts: 13215
Age: 40

Print view this post

Re: How to calculate probability of event

#3  Postby Rumraket » Aug 01, 2015 7:42 pm

C'mon, where are all you math-gives-me-a-hardon dudes? :lol:
Half-Life 3 - I want to believe
User avatar
Rumraket
THREAD STARTER
 
Posts: 13215
Age: 40

Print view this post

Re: How to calculate probability of event

#4  Postby zoon » Aug 01, 2015 8:06 pm

Don' t trust me, I'm not a mathematician, I'm just putting in a guess before the maths wizards arrive. If you are taking the probability of any one of those mutations to be 1 divided by 3.2 billion, then the probability of 2 specific mutations would be (1 divided by 3.2 billion) squared, and so on. The probability of exactly those 100 specific mutations would be (1 divided by 3.2 billion) to the hundredth, which is on the small side. I've probably completely misunderstood the question?
User avatar
zoon
 
Posts: 3230

Print view this post

Re: How to calculate probability of event

#5  Postby Rumraket » Aug 01, 2015 8:13 pm

No, I think you're definitely on the right track there. I just need some confirmation and it's a go :P

Edit: I think we still need to factor in that there are 8 possible kinds of mutations at every of those 3.2 billion sites.
Half-Life 3 - I want to believe
User avatar
Rumraket
THREAD STARTER
 
Posts: 13215
Age: 40

Print view this post

Re: How to calculate probability of event

#6  Postby zoon » Aug 01, 2015 8:49 pm

Rumraket wrote:No, I think you're definitely on the right track there. I just need some confirmation and it's a go :P

Edit: I think we still need to factor in that there are 8 possible kinds of mutations at every of those 3.2 billion sites.

Each one of those 8 possible mutations still counts as a single mutation, with a probability of (1 divided by 3.2 billion). So I think the calculation I gave holds?? If you were looking for the probability of exactly one mutation at each of those 100 sites, without specifying which of the possible 8 mutations, then the probability of a mutation at any one of those sites would become (8 divided by 3.2 billion), and the probability of just one mutation at exactly each of those 100 sites would be (8 divided by 3.2 billion) to the hundredth. But I could easily be out by a factor of 100 billion or so.
User avatar
zoon
 
Posts: 3230

Print view this post

Re: How to calculate probability of event

#7  Postby Rumraket » Aug 01, 2015 9:01 pm

zoon wrote:
Rumraket wrote:No, I think you're definitely on the right track there. I just need some confirmation and it's a go :P

Edit: I think we still need to factor in that there are 8 possible kinds of mutations at every of those 3.2 billion sites.

Each one of those 8 possible mutations still counts as a single mutation, with a probability of (1 divided by 3.2 billion).

Hmmm yeah, 1 of 8 will happen yes, but there are 8 possibilities, so the probability of a specific mutation at every site would be 1/8 th in 3.2 billion?

I guess that would make the total probability 1/8th divided by 3.2 billion to the 100th power?
Half-Life 3 - I want to believe
User avatar
Rumraket
THREAD STARTER
 
Posts: 13215
Age: 40

Print view this post

Ads by Google


Re: How to calculate probability of event

#8  Postby zoon » Aug 01, 2015 9:14 pm

Rumraket wrote:
zoon wrote:
Rumraket wrote:No, I think you're definitely on the right track there. I just need some confirmation and it's a go :P

Edit: I think we still need to factor in that there are 8 possible kinds of mutations at every of those 3.2 billion sites.

Each one of those 8 possible mutations still counts as a single mutation, with a probability of (1 divided by 3.2 billion).

Hmmm yeah, 1 of 8 will happen yes, but there are 8 possibilities, so the probability of a specific mutation at every site would be 1/8 th in 3.2 billion?

I guess that would make the total probability 1/8th divided by 3.2 billion to the 100th power?

Yes, I agree, that should have been my first calculation. :cheers:
User avatar
zoon
 
Posts: 3230

Print view this post

Re: How to calculate probability of event

#9  Postby Rumraket » Aug 01, 2015 9:20 pm

So that would give an end probability of ~1.5x10-1041 according to my calculator.
Half-Life 3 - I want to believe
User avatar
Rumraket
THREAD STARTER
 
Posts: 13215
Age: 40

Print view this post

Re: How to calculate probability of event

#10  Postby zoon » Aug 01, 2015 10:44 pm

I have a suspicion that's the probability of getting those 100 mutations in a particular order, and if the order they arrive doesn't matter, then the probability is multiplied by the number of ways of ordering 100 objects in a row? Which is 1 x 2 x 3 x 4 x .......x 100?? All the same, the chance of getting that exact collection of 100 mutations would still be small enough to be discounted for practical purposes? Was there any particular reason for the query?
User avatar
zoon
 
Posts: 3230

Print view this post

Re: How to calculate probability of event

#11  Postby lucek » Aug 01, 2015 11:25 pm

Not to throw a monkey wrench into the works here but this is far more complicated. The odds of all 8 alternatives is different from the rest due to the different chemical reactions required for them to happen Even substitutions have slight biases.
Next time a creationist says, "Were you there to watch the big bang", say "Yes we are".
"Nutrition is a balancing act during the day, not a one-shot deal from a single meal or food.":Sciwoman
User avatar
lucek
 
Posts: 3641

United States (us)
Print view this post

Re: How to calculate probability of event

#12  Postby Rumraket » Aug 01, 2015 11:58 pm

lucek wrote:Not to throw a monkey wrench into the works here but this is far more complicated. The odds of all 8 alternatives is different from the rest due to the different chemical reactions required for them to happen Even substitutions have slight biases.

I know the real events are much, much more complicated. Transversion vs transition bias, deletion vs insertion (all these have different probability distributions and even then, their distributions are different in some areas of the genome compared to others), duplication can almost universally only happen in areas with high numbers of repeats and prone to unequal crossover etc. etc.

The point is not to get an accurate representation of the real biochemistry of mutation(one could probably write a fucking dissertation on that). I have even excluded some mutations from my example.

What I want to learn is how to do the math of these kinds of problems of mutations in general. If there are N sites, X number of mutations happen, and there are Y number of possible mutations at each site, how do I calculate the probability? If N is 3x109, X is 100 and Y is 8. How's the formula look?

That way I can just plug in the numbers, so to speak, as I get across different situations.
Half-Life 3 - I want to believe
User avatar
Rumraket
THREAD STARTER
 
Posts: 13215
Age: 40

Print view this post

Re: How to calculate probability of event

#13  Postby Rumraket » Aug 02, 2015 12:11 am

zoon wrote:I have a suspicion that's the probability of getting those 100 mutations in a particular order

Well that's what I'm looking for.

Suppose we put 100 random mutations in a 3.2 billion base genome. Then after the fact has happened, we have a very specific order of mutations. Each of those 100 mutations will be a particular one out of the 8 possible, and they will have happened in a particular location out of those 3.2 billion. So what I want to calculate is, as I expect, an absurdly low number. The probability of that specific event that happened, before it happened.

zoon wrote:Was there any particular reason for the query?

Yeah. Several, mostly to do with creationist probability bullshittery as you could probably have guessed.
Half-Life 3 - I want to believe
User avatar
Rumraket
THREAD STARTER
 
Posts: 13215
Age: 40

Print view this post

Re: How to calculate probability of event

#14  Postby zoon » Aug 02, 2015 2:03 am

I think, for the purpose of this calculation, the order of the mutations along the genome is just an aspect of what distinguishes one mutation from another ( the other aspect being which of the 8 kinds each mutation is). The order I was talking about would be the equivalent of the temporal order in which the mutations happened.

The calculation is the same as if there were N x Y marbles in a bag, numbered from 1 to N x Y, then you take one marble out, write down its number and put it back again, and repeat that procedure X times. Then the question is, what is the probability of getting a particular collection of X numbers, for example, if N is 5, Y is 2, and X is 3, what is the probability of getting 7, 2 and 9? The chance of getting 7 the first time is 1/10, the chance of getting 7 the first time and 2 the second is (1/10) squared, and the chance of getting 7 the first time, 2 the second time and 9 the third time is (1/10) cubed, or 1/1000, 0.001, which was where we were with the calculation. But since it does not matter for your question whether we drew 7 then 2 then 9, or, say, 2 then 9 then 7, we need to multiply that probability by (1 x 2 x 3), or 6, because there were 6 different orders we could have drawn the same collection of 3 numbers. So the final probability is (.001 x 6), or .006. But with the figures you were using, the answer would, as you say, be minute anyway.
User avatar
zoon
 
Posts: 3230

Print view this post

Re: How to calculate probability of event

#15  Postby Thommo » Aug 06, 2015 11:56 am

Rumraket wrote:So, I'm trying to work out how to calculate the probability of some event, but I don't know how.

It's actually a pretty simple thing (I think?): Take the human genome (~3.2 billion base-pairs), randomly insert 100 mutations, then calculate the probability of those 100 specific mutations happening before they did. Every base-pair can mutate of course, and there are multiple mutations possible at every site.

To keep it simple, I just want these mutations included:
Insertion of any basepair anywhere (that means 4 different kinds of insertions possible at 3.2 billions sites)
Deletions of any basepair anywhere (delete one basepair at any of 3.2 billion pasepairs)
Any substitution anywhere (change any basepair at any of 3.2 billion sites into one of the 3 others).

I think you can simplify this to say there is 8 possible changes possible at any site (1 deletion, 4 insertions, 3 substitutions)

100 of those in total will happen in a human genome. How does one make a formula to calculate the probability?


There are a number of complications to this, that will make it hideous to get an exact algebraic answer, and to tell the truth I'm not entirely sure what you're after, so it's unclear whether making a number of simplifying assumptions is going to be a problem or not.

One aspect is that you refer to something happening "before they did". Are you actually interested in the time aspect? If so then we probably need to consider some fixed probability of mutation per base pair per unit time. That itself might be complicated by multiple simultaneous mutations being more, or less likely (i.e. non-independent trials) and mutation at individual sites not being equiprobable. Individual mutations may also be more or less likely than others.

Another aspect is that your assumption that there are 8 possible mutations at each site is ok, but the number of sites is going to change in the event of insertion or deletion. Given how large the number of sites is that probably won't matter in ballpark terms (3.2bn + 100 isn't so very different from 3.2bn - 100, relatively speaking).

We also have multiple pathways to the same end point - e.g. insertion after the millionth base pair, followed by mutation of that newly created base pair, followed by deletion of that newly inserted/mutated pair gives back the original genome, as does any two mutations of the millionth (or any other) base pair followed by it mutating a third time back to its original state.

Rumraket wrote:I know the real events are much, much more complicated. Transversion vs transition bias, deletion vs insertion (all these have different probability distributions and even then, their distributions are different in some areas of the genome compared to others), duplication can almost universally only happen in areas with high numbers of repeats and prone to unequal crossover etc. etc.

The point is not to get an accurate representation of the real biochemistry of mutation(one could probably write a fucking dissertation on that). I have even excluded some mutations from my example.

What I want to learn is how to do the math of these kinds of problems of mutations in general. If there are N sites, X number of mutations happen, and there are Y number of possible mutations at each site, how do I calculate the probability? If N is 3x109, X is 100 and Y is 8. How's the formula look?

That way I can just plug in the numbers, so to speak, as I get across different situations.


Ok, so making some guesses at assumptions and ignoring the "time" aspect I'll try and answer this and hope it's the sort of thing you're after.

Step 1 - confirming the probability no two mutations occur at the same sites is high.
(You can ignore this step if you're already comfortable with combinatorics problems with replacement and ensuring no duplicates occur)

Let's suppose that there are 100 mutations and no two of them occurs at the same base pair, to simplify things. We can work out the probability that given there are exactly 100 mutations and these mutations occur at any base pair with equal probability no two occur at the same site as follows:-

First mutation cannot coincide with another (as no mutation has yet occured)
Second mutation can be at any site except the one already chosen = (3,200,000,000 - 1)/3,200,000,000
Third mutation can be at any site except the two already chosen = (3,200,000,000 - 2)/3,200,000,000
Fourth mutation can be at any site except the three already chosen = (3,200,000,000 - 3)/3,200,000,000
...
Hundredth mutation can be at any site except the ninety-nine already chosen = (3,200,000,000 - 99)/3,200,000,000

Then multiplying up to get the probability of all of these occurring together:-
3,200,000,000! / (3,200,000,000100 x (3,200,000,000 - 100)!)

Note: ! denotes "factorial" the result of multiplying together that number with every smaller positive integer, e.g. 6! = 6x5x4x3x2x1.

Which gives a probability of 0.999998, so is unlikely to affect our results too much.

Step 2 - calculation of the number of different combinations of mutation sites

Since we have simplified our cases down, we are now in a position to put an approximated number on the chance that given a hundred mutations occurred it was these hundred mutations.

First is a fairly straightforward implementation of the formula for number of combinations (as distinct from permutations).

Calculating 3,200,000,000C100 gives:
3,200,000,000!/100!x3,199,999,900! = something catastrophically large, I have no software that can crunch it currently installed. Somewhere around 1 x 10780

Step 3 - calculation of the number of different mutations that can occur at those sites

Again, this is very simple, each of the hundred sites can have one of 8 mutations occur, so we multiply 8x8x...x8 = 8100 = 2.04 x 1090

Multiplying them up we have something around 1 x 10780 x 2.04 x 1090 ~ 2 x 10870 different ways the mutations can occur, so correspondingly the probability of any exact configuration being chosen is simply the inverse (i.e. 1/x) which is 5 x 10-871.
User avatar
Thommo
 
Posts: 27174

Print view this post

Ads by Google


Re: How to calculate probability of event

#16  Postby Rumraket » Aug 06, 2015 1:19 pm

Thank you Thommo, I think this is what I was looking for.

For crunching large numbers I just use Wolfram Alpha, it usually doesn't complain about big numbers. Neither Excel or the standard windows calculator will even accept the input. :lol:
Half-Life 3 - I want to believe
User avatar
Rumraket
THREAD STARTER
 
Posts: 13215
Age: 40

Print view this post

Re: How to calculate probability of event

#17  Postby Thommo » Aug 06, 2015 1:27 pm

No worries. Those are standard combinatorial techniques by the way, that stuff on permutations and combinations is really handy for all sorts of problems.

The formula picking k objects from n "with replacement" and getting no duplicates is also fairly standard, though I'm not aware of any particular name for it.

I can't stress enough the number of simplifying assumptions I made, variations in the probabilities, size of genome and ignoring the timing of mutation though, which could all theoretically affect results a lot in some circumstances.
User avatar
Thommo
 
Posts: 27174

Print view this post

Re: How to calculate probability of event

#18  Postby igorfrankensteen » Aug 07, 2015 2:14 am

Don't overlook the fact that it isn't the variables you are watching which determine possibilities or probabilities.

It's the mechanism of activity which does so.

Take coin flips: it isn't the coin, or the number of faces which determines probabilities discussed, it's the physics of the flip.
User avatar
igorfrankensteen
 
Name: michael e munson
Posts: 2114
Age: 67
Male

Country: United States
United States (us)
Print view this post

Re: How to calculate probability of event

#19  Postby Rumraket » Oct 21, 2016 11:53 am

Thommo gave a good answer earlier, but with a number of additional simplifying assumptions, (such as ignoring the possibility of the same nucleotide mutating multiple times), isn't the probability of some particular set of mutations really just

1 in (G/1 x S/1)n

Where G is genome size, S is number of possible substitutions, and n is total number of mutations?

For a 3 billion base pair genome, 3 possible substitutions at each site (each nucleotide can be substituted to 1 of the 3 others) and 100 total mutations, it would be roughly 1 in ~2.66 x 10995 ?
Half-Life 3 - I want to believe
User avatar
Rumraket
THREAD STARTER
 
Posts: 13215
Age: 40

Print view this post

Re: How to calculate probability of event

#20  Postby Rumraket » Oct 21, 2016 1:16 pm

Why in the hell would i divide by 1? :picard:
Half-Life 3 - I want to believe
User avatar
Rumraket
THREAD STARTER
 
Posts: 13215
Age: 40

Print view this post

Next

Return to Mathematics

Who is online

Users viewing this topic: No registered users and 1 guest