📊

Probability

Requisites

Story

Characters

Andrey Kolmogorov

Key questions

Why use probability and no other mechanism?

Statistics or probability

Why study probability?

Love of wisdom.

But if you want to win someone.

I am not much given to regret, so I puzzled over this one a while. Should have taken much more statistics in college, I think.”—Max Levchin?, Paypal Co-founder, Slide FounderQuote of the week from the Web site of the American Statistical Association on November 23, 2010

Luck. Coincidence. Risk. Doubt. Fortune. Chance. Uncertainty. What is that called Randomness?

R vs Python vs MATLAB vs Octave vs Julia

https://www.linkedin.com/pulse/r-vs-python-matlab-octave-julia-who-winner-siva-prasad-katru/

Is a story proof fully valid mathematical proof?

[2] book.

If they are dependent events, then an event causes the other one — and vice versa.

Probability

💡
An informal definition of probability. The logic of uncertainty.

Outcome. A possible result of an experiment.

Sample space. A set SS of all outcomes of an experiment.

Probability function. One that assigns probabilities to the outcomes. Pr:S[0,1]Pr:S\to[0,1], such that

ωSPr(w)=1\sum_{\omega \in S}Pr(w)=1 .

Event. A subset of the sample space, ESE\subseteq S.

Population. Not yet.

Experiment. Activity.

What is the probability goal? Probability measures the chance that event A will occur, it detonates as P(A)P(A). Probability doesn't say what are good decisions and does not predict the future!

How to think about the elements of a sample? Remember each element of a sample ontologically are different, A is A -Law of identity-. For example, If have a set of books, each book has a characteristic of being different.

Probability modeling

Tree Model

Conditional probability.

P[Postpostriority|Apostriori]

💡
"The probability of B given A" and "if A then B" express similar concepts in different ways.

A Set Theory Dictionary for probability problems [2]

no element of AA, implies at least a element of AcA^c

https://en.m.wikipedia.org/wiki/Base_rate_fallacy

Probability approaches

Classical Approach or Naive Probability

P(A)=ASP(A)=\dfrac{|A|}{|S|}

Where S is a finite sample space and an event ASA\subseteq S and with outcomes equally likely.

The Personal Opinion Approach

The worst knowledge.

Relative Frequency Theory

Probability interpretation?

Statistics? Repeat experiment n times, that is S.

Modern Approach or Axiomatic Probability

👁️‍🗨️
\varnothing denotes the null event in probability, the event of nothing happening.

The general definition of probability. A probability space consists of a sample space SS; an event space, or a set of events such that FSF\subseteq S ; and a probability function, which takes an event AA and returns P(A)P(A), satisfying the following axioms:

P()=0P(\varnothing)=0
P(A)0P(A)\geq0
P(S)=1P(S)=1

We can see that PP assigns each event, a real number between 0 and 1, as output, i.e. 0P(A)10\le P(A)\le1.

👁️‍🗨️
When A, B are disjoint events i.e. AB=A\cap B=\varnothing, in probability we called mutually exclusive events.

Theorem 1.1 Let be A1,A2,...A_1,A_2,... mutually exclusive events, then

P(j=1Aj)=j=1P(Aj)P(\bigcup\limits_{j=1}^{\infty}A_j)=\sum^\infty_{j=1}P(A_j)

Theorem 1.2

P(AC)=1P(A)P(A^C)=1-P(A)

Theorem 1.3

If AB,then P(A)P(B)\text{If } A\subset B,\text{then }P(A)\le P(B)

Theorem 1.4 (Inclusion-exclusion). For any events A1,...,AnA_1,...,A_n

P(i=1nAi)=i=1nP(Ai)i<jnP(AiAj)+i<j<knP(AiAjAk)...+(1)n+1P(A1A2...An)P(\bigcup\limits_{i=1}^{n}A_i)=\sum^n_{i=1}P(A_i)-\sum^n_{i<j}P(A_i\cup A_j)+\sum^n_{i<j<k}P(A_i\cap A_j\cap A_k)-...+(-1)^{n+1}P(A_1\cap A_2 \cap...\cap A_n)

Worked examples.

Dos cantidatos A y B presentaran un Test. La probabilidad de que A aprueba es 1/7, de B 2/9. ¿Cual es la probabilidad de que al menos uno de los cantidatos aprueba?

R, "Vector thinking".

If you want to create a vector.

# name <- c(values)
vector <- c(3,1,4,1,5,9)

It is a structured language.

fn(parameters)

If you want a get the largest value.

max(vector)

When you want to create simulation.

sample

Montmort's matching problem.

Birthday problem.

Python

Julia

Conditional probability

Definition.

Let A and B events with P(B)>0P(B)>0, the conditional probability of A given that the event B has occurred or A given B is denoted by P(AB)P(A|B), is defined as

P(AB)=P(AB)P(B)P(A|B)=\dfrac{P(A\cap B)}{P(B)}

The conditional probability is a learning process, where P(A)P(A) shows our knowledge of event A before the experiment takes place, that is a priori probability of A and B is the evidence we observe.

ABA\cap B means A and B happen simultaneously. But, ABA|B A happen by B.

Prosecutor's fallacy

Monty Hall

Bayes

P(AB)=P(BA)P(A)P(B)P(A|B)=\dfrac{P(B|A)P(A)}{P(B)}

Informally, you might think Bayes

P(causeeffect)=P(effectcause)P(cause)P(effect)P(cause|effect)=\dfrac{P(effect|cause)P(cause)}{P(effect)}

Worked examples

A family has 3 children, creatively named AA, BB, and CC.

Random variables and their distributions

Expectation

Continous random variables

Momements

Joint distributions

Transformations

Overview and descriptive statistics

Data.

Statistician collects data.

Population. Set of measurements of interest to the experimenter.

What? Sample. Subset of population.

Variable. Characteristic that changes about experimental unit under experiment. Examples. Hair color.

Who? Experimental units. Objects on which a variable is measured. Blackbox is an active subject.

A measurement or datum results when a variable is actually observed on an experimental unit.

A set of measurements, called data, can be either a sample or a population. I.e. Measurements=data=sample\text{Measurements} = data = sample or Measurements=data=population\text{Measurements} = data = population.

Variables types. Qualitative measure a characteristic, Quantitative measure a numerical quantity: discrete or numerable and continuous or not numerable.

How many variables have you measured?

Statistics

Descriptive StatisticsInferential Statistics
We can enumerate the population easily.We cannot enumerate the population easily. So We choose a sample.
Describe population. No need for inference. You can get the conclusions.Inference (i.e. supposed conclusions) about the population from samples.


Get samples to inference population, then predict future about a black-box, guarantee a stable knowledge and make decisions. Remember past results are no guarantee of future performance. There are three kinds of lies….. Lies Damn Lies Statistics You need to make statistics work for you, not lie for you!

Inferential statistics

  1. Define the objective.
  1. Design of the experiment.
  1. Collect data with math standard.
  1. Make inferences.
  1. Determine reliability of the inference.

Graphing Variables

Use a data distribution to describe:

Graphing Qualitative Variables

Graphing Quantitative Variables

Descriptive Statistics

Measures of Location

Measures of Variability

Chebyshev theorem

Z-score

z-score (also called a standard score) gives you how far from the mean a data point by standard deviation.

Standard deviation.

Probability

Definition. Probability is the logic of uncertainty.

Set dictionary

Vandermonde's identity

Naive probability

How to count?

If you want to count outcomes, how to count them?

Experiment.

Experiment random.

Sample spaces. All possible outcomes.

Event. Set of possible outcomes.

Elementary event. Events containing only one outcome are called elementary events and they are written interchangeably for simplicity.

S=amount of outcoumes|S|=\text{amount of outcoumes}

Experimental probability is probability that is determined on the basis of the results of an experiment repeated many times. When we compute the probability of a future event based on our observations of past events.

Theoretical probability is probability that is determined on the basis of reasoning. Axiomatic.

Combination

Bayes

Condictional

Philosophical questions

Worked examples

Random Variables

Def.

Probability Distributions for Discrete Random Variables.

If X is a discrete random variable, the function given by f(x)=P(X=x) for each x within the range of X is called the probability distribution of X.

Probability Distributions for Continuous Random Variables. Probability density function.

F(x)=f(x)dxF(x)=\int f(x)dx

F(x) probability distribution function. f(x) density function

σX=Var(X)\sigma_X=\sqrt{Var(X)}

Var(X)=(xiμX)2P(xi)Var(X)=\sum(x_i-\mu_X)^2P(x_i)
sigma = (table,mu) => Math.sqrt(
							table.reduce((acc, currentValue) => 
									acc+=Math.pow(currentValue[0]-mu, 2)*currentValue[1], 0)
						 )

Funciones

NameTagsConcepto
función de probabilidad o función de masa de probabilidad Discretas
función de densidadContinuas
Función de distribución o función de distribución acumuladaContinuasDiscretasFSuma de las funciones de probabilidades o de densidad.
Untitled


Distributions

// Por traducir al ingles. Ejemplos trabajados.

https://en.wikipedia.org/wiki/Relationships_among_probability_distributions#/media/File:Relationships_among_some_of_univariate_probability_distributions.jpg

Distribución Discreta Uniforme

¿Qué caracteriza o mide la variable aleatoria?

La distribución discreta uniforme se caracteriza por su constante probabilidad 1/(ba)1/(b-a) con respecto a los bab-a valores del dominio x[a,b]x\in[a,b] de una variable aleatoria discreta. A saber, sus parámetros son a y b.

Fórmula y gráfica de la distribución.

Asi la función de masa de probabilidad o función de probabilidad de una variable aleatoria que es uniforme es: f(x)=1ba=1nf(x)=\dfrac{1}{b-a}=\dfrac{1}{n}, para x[a,b]x\in [a,b] donde xixjx_i \neq x_j cuando iji \neq j. Su gráfica:

Su función acumulada: i[a,b], F(i;a,b)=ia+1ba+1i ∈ [a,b], \text{ } {\displaystyle F(i;a,b)={\frac {\lfloor i\rfloor -a+1}{b-a+1}}}. Donde ii esl argumento de la función, aa es el ínfimo del dominio y bb el supremo del mismo. Gráfica:

https://dk81.github.io/dkmathstats_site/rmath-uniform-plots.html

Función generadora de momentos.

MX(t)=eate(b+1)tn(1et)M_X(t)=\dfrac{e^{at}-e^{(b+1)t}}{n(1-e^t)}

Media.

E(X)=a+b2E(X)=\dfrac{a+b}{2}

Varianza.

μ=(ba)212\mu=\dfrac{(b-a)^2}{12}

Distribución Bernoulli

¿Qué caracteriza o mide la variable aleatoria?

Mide la probabilidad de exito ff de un experimento con dos resultados posibles: "exito" y fracaso, sus probabilidades son pp y 1p1-p respectivamente. Tal que, el numero de exitos tiene un distribucion de Bernoulli. A saber, su parametro es p.p.

Fórmula y gráfica de la distribución.

f(x,p)=px(1p)1xf(x,p)=p^x(1-p)^{1-x} para x=0,1x=0,1. Tal que P(X=1)=f(1,p)=pP(X=1)=f(1,p)=p y P(X=0)=f(0,p)=1pP(X=0)=f(0,p)=1-p.

Grafica:

F(k,p)={1p k0<11 k1F(k,p) = \begin{cases} 1-p \text{ } k \leq 0<1 \\ 1 \text{ } k \geq1 \end{cases}

Grafica:

Función generadora de momentos.

MX(t)=q+petM_X(t)=q+pe^t

Media.

E(X)=pE(X)=p

Varianza.

Var(X)=pqVar(X)=pq

Distribución Binomial

¿Qué caracteriza o mide la variable aleatoria?

  1. n ensayos de Bernoulli.
  1. Los cuales son identicos e independientes, es decir, probabilidad de éxito pp permanece sin cambio de un ensayo a otro.
  1. La variable aleatoria denota el numero de éxitos obtenidos en nn ensayos.

Fórmula y gráfica de la distribución.

f(x;n,p)=(nx)px(1p)nxf(x;n,p)= {n \choose x}p^x(1-p)^{n-x} donde n es el numero de ensayos y x[0,n]x\in[0,n].

F(x,n,p)=i=0x(ni)pi(1p)niF(x,n,p)=\sum^ {\lfloor {x} \rfloor}_{i=0} {n\choose i}p^i(1-p)^{n-i} (Funcion logistica).

Función generadora de momentos.

MX(t)=(1p+pet)nM_X(t)=(1-p+pe^t)^n

Media.

E(X)=npE(X)=np

Varianza.

Var(X)=npqVar(X)=npq

Distribución Multinomial

¿Qué caracteriza o mide la variable aleatoria?

Se caracteriza por ser la generalizacion de una distribucion binomial para k categorias o eventos, en vez de 2 (exito o fracaso). A saber, sus parametros son n>0n>0 y p1,...,pkp_1,...,p_k donde pi=1\sum p_i=1.

Fórmula y gráfica de la distribución.

f(x)=n!x1!x2...xk!p1x1...pkxkf(x)=\dfrac{n!}{x_1!x_2...x_k!}p_1^{x_1}...p_k^{x_k}

Función generadora de momentos.

MX(t)=(i=1kpieti)nM_X(t)=(\sum^k_{i=1}p_ie^{t_i})^n

Media.

E(Xi)=npiE(X_i)=np_i

Varianza.

Var(Xi)=npi(1pi)Var(X_i)=np_i(1-p_i)

Distribución Geométrica

¿Qué caracteriza o mide la variable aleatoria?

  1. n ensayos de Bernoulli.
  1. Identicos e independientes, con la misma probabilidad de exito pp (parametro), tal que P(X=1)=P[exito en el primero ensayo]=pP(X=1)=P[\text{exito en el primero ensayo}]=p
  1. La variable aleatoria denota el numero de ensayos xx para obtener el primer exito. Su espacio muestral es S={E,FE,FFE,FFFE,...}S=\{E,FE,FFE,FFFE,...\}

Fórmula y gráfica de la distribución.

f(x)=(1p)x1pf(x)=(1-p)^{x-1}p, donde 0<p10<p\leq1 y x=1,2,3...x=1,2,3...

F(x)=1(1p)xF(x)=1-(1-p)^x

Función generadora de momentos.

MX(t)=pet1(1pt),M_X(t)=\dfrac{pe^t}{1-(1-p^t)}, para t<ln(1p)t<-ln(1-p).

Media.

E(X)=1pE(X)=\dfrac{1}{p}

Varianza.

Var(X)=1pp2Var(X)=\dfrac{1-p}{p^2}

Distribución Binomial Negativa

¿Qué caracteriza o mide la variable aleatoria?

  1. n ensayos de Bernoulli. Identicos e independientes, con la misma probabilidad de exito pp (parametro).
  1. Los ensayos se observan hasta obtener exactamente rr exitos. Donde el experimentador lo fija.
  1. La variable aleatoria denota el numero de ensayos xx para obtener rr exitos.

Fórmula y gráfica de la distribución.

f(x)=(x1r1)(1p)xrprf(x)={x-1 \choose r-1}(1-p)^{x-r}p^r para r=1,2,3...r=1,2,3... y x=r,r+1,r+2x=r,r+1,r+2.

Función generadora de momentos.

MX(t)=(1p1pet)rM_X(t)=(\dfrac{1-p}{1-pe^{t}})^r para t<log(p)t<-log(p).

Media.

E(X)=pr1pE(X)=\dfrac{pr}{1-p}

Varianza.

Var(X)=pr(1p)2Var(X)=\dfrac{pr}{(1-p)^2}

Distribución Poisson

¿Qué caracteriza o mide la variable aleatoria?

La variable aleatoria mide el numero de sucesos u ocurrencias xx de un evento especificado en una unidad determinada de tiempo, longitud o espacio ss, durante el cual se puede esperar que ocurra un promedio λ\lambda de estos eventos o el radio de ocurrencias de un estos eventos. Los eventos ocurrent al azar e idependientes entre si. Nace de la necesidad para distribuciones binomiales grandes.

The Poisson distrution is often used in situations where we are couting the number of successes in a particular region o interval of time, an there are a large number of trials, with a small probability of success. The Poisson paradigm is also called the law of rare events. The interpretation of "rare" is that the pjp_j are small, not that λ\lambda is small.

Fórmula y gráfica de la distribución.

f(x)=ekkxx!f(x)=\dfrac{e^{-k}k^x}{x!} para x=0,1,2,...x=0,1,2,... y k>0k>0.

k=λsk=\lambda s, donde λ\lambda el promedio de casos del evento por unidad y ss la magnitud o tamano del periodo de observacion.

Funcion acumalativa.

Función generadora de momentos

MX(t)=ek(ek1)M_X(t)=e^{k(e^k-1)}

Media

E(x)=kE(x)=k

Varianza

Var(X)=kVar(X)=k

Distribución Hipergeométrica

¿Qué caracteriza o mide la variable aleatoria?

  1. El experimento consiste en extraer de una muestra aleatoria de tamano n sin remplazo ni consideracion de su orden, de un conjunto de N objetos.
  1. De los N objetos, r posee el rasgo (caracteristica) que interesa, mientras que los otros N-r objetos restantes no lo tienen.
  1. La variable aleatoria es el numero de objetos de la muestra que posee el rasgo.

Fórmula y gráfica de la distribución.

f(x)=(rx)(Nrnx)(Nn)f(x)=\dfrac{{r \choose x}{N-r \choose n-x}}{{N \choose n}} donde N, r y n son enteros positivos y los parametros.

Tal que max(0,n(Nr))xmin(nr)max(0,n-(N-r))\le x \le min(n-r)

Funcion acumaltiva.

Función generadora de momentos.

Donde F1F_1 es la genaralizacion de la funcion hipergeometrica.

Media.

E(X)=nKNE(X)=n\dfrac{K}{N}

Varianza.

Var(X)=nK(NK)(Nn)N2(N1)Var(X)=n\dfrac{K(N-K)(N-n)}{N^2(N-1)}

Consideraciones.

Worked examples

  1. Una variable aleatoria es:

Una función cuyo dominio es el espacio muestral.

2. Para que se produzca f(y) = P(Y = y) para toda y, la distribución de probabilidad para la variable discreta Y puede ser representada por

It is a graph, table or formula.

6.31 Se escoge un punto D en la línea AB, cuyo punto medio es C y cuya longitud es a. Si X , la distancia de D a A , es una variable aleatoria que tiene la densidad uniforme con α = 0 y β = a , ¿cuál es la probabilidad de que AD, BD y AC formarán un triángulo?Estrictamente menor para poder llamarlo triángulo. Si fuera menor o igual hablamos de una línea, lo cual claramente no cumple la característica principal: un polígono de tres lados.

Sabemos que:

En todo triángulo la suma de las longitudes de dos lados cualesquiera es siempre mayor a la longitud del lado restante.

Es decir:

Triaˊngulo    cumple la desigualdad triangularTriángulo\iff \text{cumple la desigualdad triangular}

Para nuestro caso particular:

AD+BD>ACAD+AC>BDAC+BD>ADAD+BD>AC\\ AD+AC>BD\\ AC+BD>AD
💡
Estrictamente menor para poder llamarlo triangulo. Si fuera menor o igual hablariamos de una linea, lo cual claramente no cumple la caractaristica principal: un poligono de tres lados.

Sustituimos de acuerdo a las condiciones del problema:

AD=x,BD=ax,AC=a2x+(ax)>x2    a>0x+a2>ax    x>a4a2+(ax)>x    x<3a4AD=x, BD=a-x, AC=\dfrac{a}{2}\\ x+(a-x)>\dfrac{x}{2}\implies a>0\\ x+\dfrac{a}{2}>a-x\implies x>\dfrac{a}{4}\\ \dfrac{a}{2}+(a-x)>x \implies x<\dfrac{3a}{4}\\

Así, X como variable aleatoria cumple con lo siguiente:

α=0<a4<X<3a4<β=a\alpha=0<\dfrac{a}{4}<X<\dfrac{3a}{4}<\beta=a

Por lo tanto, la probabilidad es:

P(a4<X<3a4)=a/43a/41/a dx=0.5P(\dfrac{a}{4}<X<\dfrac{3a}{4})=\int_{a/4}^{3a/4}1/a\text{ }dx=0.5

Accuracy (predictions, outcomes)

np.count_nonzero(predictions == outcomes) / len(predictions) 
== np.mean(predictions == outcomes)

Stochastic process

A stochastic process is a collection of random variables, indexed by an ordered time
variable.

Markov chain

Origin of Markov chains (video) | Khan Academy
Introduction to Markov chains
https://www.khanacademy.org/computing/computer-science/informationtheory/moderninfotheory/v/markov_chains

Kosambi–Karhunen–Loève theorem.

https://www.stat.cmu.edu/~cshalizi/uADA/12/lectures/ch18.pdf

TODO

Kappa

Teoría de campo medio

Meand-field games

Next steps

Exercises [1].

Let XX be the random variable representing the number of heads seen after flipping a fair coin three times. Let YY be the random variable representing the outcome of rolling two fair six-sided dice and multiplying their values.

Evaluate

E[X+Y]=E[X]+E[Y]=1.5+12.25=13.75E[X+Y]=E[X]+E[Y]=1.5+12.25=13.75

References

  1. MIT OpenCourseWare. (2022, December 21). MIT OpenCourseWare. Retrieved from https://ocw.mit.edu/courses/6-006-introduction-to-algorithms-spring-2020/resources/mit6_006s20_ps0-questions
  1. Think Bayes: Bayesian Statistics in Python (O'reilly) 2nd Edition by Allen Downey

[7] Definition and examples of outcome | define outcome - Probability - Free Math Dictionary Online. Icoachmath.com. Retrieved June 10, 2021 from http://www.icoachmath.com/math_dictionary/outcome.html