Probability

Requisites

Set theory

Counting methods

Story

Characters

Andrey Kolmogorov

Key questions

Why use probability and no other mechanism?

Statistics or probability

Why study probability?

Love of wisdom.

But if you want to win someone.

I am not much given to regret, so I puzzled over this one a while. Should have taken much more statistics in college, I think.”—Max Levchin?, Paypal Co-founder, Slide FounderQuote of the week from the Web site of the American Statistical Association on November 23, 2010

Luck. Coincidence. Risk. Doubt. Fortune. Chance. Uncertainty. What is that called Randomness?

R vs Python vs MATLAB vs Octave vs Julia

https://www.linkedin.com/pulse/r-vs-python-matlab-octave-julia-who-winner-siva-prasad-katru/

Is a story proof fully valid mathematical proof?

[2] book.

If they are dependent events, then an event causes the other one — and vice versa.

Probability

💡

An informal definition of probability. The logic of uncertainty.

Outcome. A possible result of an experiment.

Sample space. A set $S$ of all outcomes of an experiment.

Probability function. One that assigns probabilities to the outcomes. $Pr:S\to[0,1]$ , such that

$\sum_{\omega \in S}Pr(w)=1$ .

Event. A subset of the sample space, $E\subseteq S$ .

Population. Not yet.

Experiment. Activity.

What is the probability goal? Probability measures the chance that event A will occur, it detonates as $P(A)$ . Probability doesn't say what are good decisions and does not predict the future!

How to think about the elements of a sample? Remember each element of a sample ontologically are different, A is A -Law of identity-. For example, If have a set of books, each book has a characteristic of being different.

Probability modeling

Tree Model

Conditional probability.

P[Postpostriority|Apostriori]

💡

"The probability of B given A" and "if A then B" express similar concepts in different ways.

A Set Theory Dictionary for probability problems [2]

no element of $A$ , implies at least a element of $A^c$

https://en.m.wikipedia.org/wiki/Base_rate_fallacy

Probability approaches

Classical Approach or Naive Probability

P(A)=\dfrac{|A|}{|S|}

Where S is a finite sample space and an event $A\subseteq S$ and with outcomes equally likely.

Example. Birthday problem. There are $k$ people in a room. Assume each person's birthday is equally likely to be any of the 365 days of the year (we exclude February 29), and that people's birthdays are independent (we assume there are no twins in the room). What is the probability that two or more people in the group have the same birthday?
- Frank, P.; Goldstein, S.; Kac, M.; Prager, W.; Szegö, G.; Birkhoff, G., eds. (1964). Selected Papers of Richard von Mises. 2. Providence, Rhode Island: Amer. Math. Soc. pp. 313–334.
Assume outcomes equally likely, then naive probability apply.
$S=\text{All possible k persons' birthday sequences in a room}\\=\{ (x_1,x_2,...,x_k) | x_1\in [1,365],x_2\in [1,365],...,x_k\in [1,365] \}$
$P(\text{At least 1 birthday match})=\dfrac{|\text{At least 1 birthday match}|}{|\text{S}|}\\=\dfrac{|\text{At least 1 birthday match}|}{\text{Ways to assign } k\text{ birthdays}}\\ =\dfrac{|\text{At least 1 birthday match}|}{365^k}$
But to calculate $|\text{At least 1 birthday match}|$ is hard, thus we calculate his complement.
❓
Is $\text{At least 1 birthday match} \subseteq \text{All possible k persons' birthday sequences in a room}?$ Yes, It is. Then complement too, i.e. $\text{No birthday match} \subseteq \text{All possible k persons' birthday sequences in a room}$ .
$P(\text{At least 1 birthday match})=1-P(\text{No birthday match})\\=1-\dfrac{|\text{No birthday match}|}{365^k}$
We know that $\text{No birtday match}\\=\{ (x_1,x_2,...,x_k) | x_1\in [1,365],x_2\in [1,364],x_3\in [1,363]...,x_k\in [1,365-k+1] \}$ ,his cardinality also called permutation is equal to $365!$ .
Therefore,
$P(\text{At least 1 birthday match})=1-\dfrac{365!/(365-k)!}{365^k}=1-\dfrac{365!}{365^k(365-k)!}$
Figure: Probability that in a room of k people, at least two were born on the same day. This probability first exceeds 0.5 when k=23. For k≥366 we are guaranteed to have a match [2].

The Personal Opinion Approach

The worst knowledge.

Relative Frequency Theory

Probability interpretation?

Statistics? Repeat experiment n times, that is S.

Modern Approach or Axiomatic Probability

👁️‍🗨️

\varnothing

denotes the null event in probability, the event of nothing happening.

The general definition of probability. A probability space consists of a sample space $S$ ; an event space, or a set of events such that $F\subseteq S$ ; and a probability function, which takes an event $A$ and returns $P(A)$ , satisfying the following axioms:

P(\varnothing)=0

P(A)\geq0

P(S)=1

We can see that $P$ assigns each event, a real number between 0 and 1, as output, i.e. $0\le P(A)\le1$ .

👁️‍🗨️

When A, B are disjoint events i.e.

A\cap B=\varnothing

, in probability we called mutually exclusive events.

Theorem 1.1 Let be $A_1,A_2,...$ mutually exclusive events, then

P(\bigcup\limits_{j=1}^{\infty}A_j)=\sum^\infty_{j=1}P(A_j)

Theorem 1.2

P(A^C)=1-P(A)

Proof.
Assume that $A$ and $A^c$ are disjoint events and their union is $S$ .
By second axiom and first theorem
$P(S)=1=P(A\cup A^c)=P(A)+P(A^C)\\ P(A^c)=1-P(A)$

Theorem 1.3

\text{If } A\subset B,\text{then }P(A)\le P(B)

Proof.

Theorem 1.4 (Inclusion-exclusion). For any events $A_1,...,A_n$

P(\bigcup\limits_{i=1}^{n}A_i)=\sum^n_{i=1}P(A_i)-\sum^n_{i<j}P(A_i\cup A_j)+\sum^n_{i<j<k}P(A_i\cap A_j\cap A_k)-...+(-1)^{n+1}P(A_1\cap A_2 \cap...\cap A_n)

Proof.

Worked examples.

A die is a cube whose 6 sides are labeled with the integers from 1 to 6. The die is fair if all 6 sides are equally likely to come up on top when the die is rolled. The plural form of "die" is "dice". Why $P(\text{the total after rolling 4 fair dice is 21})>P(\text{the total after rolling 4 fair dice is 22})?$
$\text{4 fair dice}=\{(d_1,d_2,d_3,d_4)|(d_1\in[1,6]),d_2\in[1,6],...,d_4\in[1,6]\}$
$\text{4 fair dice results} = \{sum(\text{4 fair dice sequence})\in\mathbb{N}| 4\le sum(x)\le36\}$
$\text{4 fair dice is 21}=\{sum(\text{4 fair dice sequence})=21|seq_1=\{(6,5,5,5),...\},seq_2=\{(6,6,6,3),...\},seq_3=\{(6,6,5,4),...\}\}$
$\text{4 fair dice is 21}=|seq_1|+|seq_2|+|seq_3|=\dfrac{4!}{3!}+\dfrac{4!}{3!}+\dfrac{4!}{2}=4+4+12=20$
$\text{4 fair dice is 22}=\{sum(\text{4 fair dice sequence})=22|seq_1=\{(6,6,6,4),...\},seq_2=\{(6,6,5,5),...\}\}$
$\text{4 fair dice is 21}=|seq_1|+|seq_2|=\dfrac{4!}{3!}+\dfrac{4!}{2^2}=4+6=10$
$P(\text{the total after rolling 4 fair dice is 21})=\dfrac{|\text{4 fair dice is 21}|}{|\text{4 fair dice results}|}=\dfrac{20}{32}$
$P(\text{the total after rolling 4 fair dice is 22})=\dfrac{|\text{4 fair dice is 22}|}{|\text{4 fair dice results}|}=\dfrac{10}{32}$

A palindrome is an expression such as "A man, a plan, a canal: Panama" that reads the same backwards as forwards, ignoring spaces, capitalization, and punctuation. Assume for this problem that all words of the specified length are equally likely, that there are no spaces or punctuation, and that the alphabet consists of the lowercase letters a,b,…,z. Why $P(\text{a random 2-letter word is a palindrome})=P(\text{a random 3-letter word ia palindrome})?$
$P(\text{a random 2-letter word is a palindrome})=\dfrac{|\text{2-letter palindrome}|}{|\text{2-letter word}|}=\dfrac{26}{26^2}=\dfrac{1}{26}$
$P(\text{a random 3-letter word is a palindrome})=\dfrac{|\text{3-letter palindrome}|}{|\text{3-letter word}|}=\dfrac{26\cdot26}{26^3}=\dfrac{1}{26}$

Three people get into an empty elevator at the first floor of a building that has floors. Each presses the button for their desired floor (unless one of the others has already pressed that button). Assume that they are equally likely to want to go to floors through (independently of each other). What is the probability that the buttons for consecutive floors are pressed?
$\frac{14}{243}$

Why the probability that all 3 people in a group of 3 were born on January 1 is less than the probability that in a group of 3 people, one was born on January 1, another one was born on January 2, and the remaining one was born on January 3?

Martin and Gale play an exciting game of "toss the coin," where they toss a fair coin until the pattern HH occurs (two consecutive Heads) or the pattern TH occurs (Tails followed immediately by Heads). Martin wins the game if and only if the first appearance of the pattern HH occurs before the first appearance of the pattern TH. Note that this game is scored with a 'moving window'; that is, in the event of TTHH on the first four flips, Gale wins, since TH appeared on flips two and three before HH appeared on flips three and four. Why is true that Martin is less likely to win because as soon as Tails is tossed, TH will definitely occur before HH?

Elk dwell in a certain forest. There are Nelk, of which a simple random sample of size n are captured and tagged (“simple random sample" means that all $N \choose n$ sets of n elk are equally likely). The captured elk are returned to the population, and then a new sample is drawn, this time with size $m$ . This is an important method that is widely used in ecology, known as capture-recapture. What is the probability that exactly $k$ of the $m$ elk in the new sample were previously tagged? (Assume that an elk that was captured before doesn’t become more or less likely to be captured again.)

Dos cantidatos A y B presentaran un Test. La probabilidad de que A aprueba es 1/7, de B 2/9. ¿Cual es la probabilidad de que al menos uno de los cantidatos aprueba?

Montmort's matching problem.

R, "Vector thinking".

If you want to create a vector.

# name <- c(values)
vector <- c(3,1,4,1,5,9)

It is a structured language.

fn(parameters)

If you want a get the largest value.

max(vector)

When you want to create simulation.

sample

Montmort's matching problem.

Birthday problem.

Python

Julia

Conditional probability

Definition.

Let A and B events with $P(B)>0$ , the conditional probability of A given that the event B has occurred or A given B is denoted by $P(A|B)$ , is defined as

P(A|B)=\dfrac{P(A\cap B)}{P(B)}

The conditional probability is a learning process, where $P(A)$ shows our knowledge of event A before the experiment takes place, that is a priori probability of A and B is the evidence we observe.

$A\cap B$ means A and B happen simultaneously. But, $A|B$ A happen by B.

Prosecutor's fallacy

Monty Hall

Bayes

P(A|B)=\dfrac{P(B|A)P(A)}{P(B)}

Informally, you might think Bayes

P(cause|effect)=\dfrac{P(effect|cause)P(cause)}{P(effect)}

Worked examples

Mr. Jones has two children. The older is a girl. What is the probability that both children are girls? Mr. Smith has two children. At least one of them is a boy. What is the probability that both children are boys? This was posed by Martin Gardner in Scientific American.
Assume that gender is binary, $P(boy)=P(girl)$ , and that the genders of two children are independent.
$\text{Children}=\{GG,GB,BG, BB\}$ , where position indicate what child is older.
$P(\text{both girls}|\text{elder is a girl})=\dfrac{P(\text{both girls }\cap \text{elder is a girl})}{P(\text{elder is girl})}=\dfrac{1/4}{2/4}=1/2$
$P(\text{both boys}|\text{at least one boy})=\dfrac{P(\text{both boys }\cap \text{at least one boy})}{P(\text{at least one boy})}=\dfrac{1/4}{3/4}=1/3$

A spam filter is designed by looking at commonly occurring phrases in spam. Suppose that 80% of email is spam. In 10% of the spam emails, the phrase "free money" is used, whereas this phrase is only used in 1% of non-spam emails. A new email has just arrived, which does mention "free money". What is the probability that it is spam?
$A:\text{ event that an email is spam}$
$B:\text{ event that an email mention "free money" }$
$P(A|B)=\dfrac{P(A)P(B|A)}{P(B)}=\dfrac{0.8\times0.1}{(0.2)(0.01)+(0.8)(0.1)}=0.97560975609$

The screens used for a certain type of cell phone are manufactured by 3 companies, A, B, and C. The proportions of screens supplied by A, B, and C are 0.5, 0.3, and 0.2, respectively, and their screens are defective with probabilities 0.01, 0.02, and 0.03, respectively. Given that the screen on such a phone is defective, what is the probability that Company A manufactured it?
$P(A)=0.5,P(B)=0.3,P(C)=0.2$
$P(D|A)=0.01,P(D|B)=0.02,P(D|C)=0.03$
$P(A|D)=\dfrac{0.5\times0.01}{0.5\times0.01+0.3\times0.02+0.02\times0.3}=0.29411764705$

A family has 3 children, creatively named $A$ , $B$ , and $C$ .

Discuss intuitively whether the event " $A$ is older than $B$ " is independent of the event " $A$ is older than $C$ ".
Maybe You think something as "They are independent events since their causality a priori is contingent, no necessary". But It is not true, They are dependent events. Empirically or a posterior relation We can see that if there are n children, call them $A_0,A_1,A_2,...,A_{n}$ , then writing $x>y$ to mean that $x$ is older than $y$ and writing $A_0>A_1,A_2,A_3,A_4,...,A_{n-1}$ to mean that $A_0$ is the oldest than them.
1. Exists evidence about $A_0$ fits into birth order first place, but no $A_n$ .
1. $A_0$ is very old. It's hard than $A_n$ is older.
Therefore, They are dependent events since $A_n$ decreases the probability for being the highest age when $A_0$ happens, that is exists a causality.

Find the probability that $A$ is older than $B$ , given that $A$ is older than $C$ .
$R:\text{A is older than B}$
$T:\text{A is older than C}$
$S=\{ABC,ACB,BAC,BCA,CAB,CBA\}$ , where each element is a birth order.
$P(R|T)=\dfrac{P(R\cap T)}{P(T)}=\dfrac{\dfrac{|R\cap T|}{|S|}}{\dfrac{|T|}{|S|}}=\dfrac{|R \cap T|}{|T|}=\dfrac{2}{3}$
$P(R\cap T)=P(\text{A is the eldest child})=\dfrac{|\text{the eldest child}|}{|childs|}=\dfrac{1}{3}$
$P(T)=\dfrac{\text{|A is the eldest child|}}{|childs|}=\dfrac{1}{2}$

Random variables and their distributions

Expectation

Continous random variables

Momements

Joint distributions

Transformations

Overview and descriptive statistics

Data.

Statistician collects data.

Population. Set of measurements of interest to the experimenter.

What? Sample. Subset of population.

Variable. Characteristic that changes about experimental unit under experiment. Examples. Hair color.

Who? Experimental units. Objects on which a variable is measured. Blackbox is an active subject.

A measurement or datum results when a variable is actually observed on an experimental unit.

A set of measurements, called data, can be either a sample or a population. I.e. $\text{Measurements} = data = sample$ or $\text{Measurements} = data = population$ .

Variables types. Qualitative measure a characteristic, Quantitative measure a numerical quantity: discrete or numerable and continuous or not numerable.

How many variables have you measured?

Univariate data.

Bivariate data.

Multivariate data.

Statistics

Descriptive Statistics	Inferential Statistics
We can enumerate the population easily.	We cannot enumerate the population easily. So We choose a sample.
Describe population. No need for inference. You can get the conclusions.	Inference (i.e. supposed conclusions) about the population from samples.

Get samples to inference population, then predict future about a black-box, guarantee a stable knowledge and make decisions. Remember past results are no guarantee of future performance. There are three kinds of lies….. Lies Damn Lies Statistics You need to make statistics work for you, not lie for you!

Inferential statistics

Define the objective.

Design of the experiment.

Collect data with math standard.

Make inferences.

Determine reliability of the inference.

Graphing Variables

Use a data distribution to describe:

What values of the variables have been measured?

How often each value has occurred?

Graphing Qualitative Variables

Graphing Quantitative Variables

Descriptive Statistics

Measures of Location

Measures of Variability

Chebyshev theorem

Z-score

z-score (also called a standard score) gives you how far from the mean a data point by standard deviation.

Standard deviation.

Probability

Definition. Probability is the logic of uncertainty.

Set dictionary

Vandermonde's identity

Naive probability

How to count?

If you want to count outcomes, how to count them?

Experiment.

Experiment random.

Sample spaces. All possible outcomes.

Event. Set of possible outcomes.

Elementary event. Events containing only one outcome are called elementary events and they are written interchangeably for simplicity.

$|S|=\text{amount of outcoumes}$

Experimental probability is probability that is determined on the basis of the results of an experiment repeated many times. When we compute the probability of a future event based on our observations of past events.

Theoretical probability is probability that is determined on the basis of reasoning. Axiomatic.

Combination

Bayes

Condictional

Philosophical questions

Worked examples

Random Variables

Def.

Probability Distributions for Discrete Random Variables.

If X is a discrete random variable, the function given by f(x)=P(X=x) for each x within the range of X is called the probability distribution of X.

Probability Distributions for Continuous Random Variables. Probability density function.

$F(x)=\int f(x)dx$

F(x) probability distribution function. f(x) density function

$\sigma_X=\sqrt{Var(X)}$

Var(X)=\sum(x_i-\mu_X)^2P(x_i)

sigma = (table,mu) => Math.sqrt(
							table.reduce((acc, currentValue) => 
									acc+=Math.pow(currentValue[0]-mu, 2)*currentValue[1], 0)
						 )

Funciones

Name	Tags	Concepto
función de probabilidad o función de masa de probabilidad	Discretas
función de densidad	Continuas
Función de distribución o función de distribución acumulada	ContinuasDiscretasF	Suma de las funciones de probabilidades o de densidad.
Untitled

Distributions

// Por traducir al ingles. Ejemplos trabajados.

Distribución Discreta Uniforme

¿Qué caracteriza o mide la variable aleatoria?

La distribución discreta uniforme se caracteriza por su constante probabilidad $1/(b-a)$ con respecto a los $b-a$ valores del dominio $x\in[a,b]$ de una variable aleatoria discreta. A saber, sus parámetros son a y b.

Fórmula y gráfica de la distribución.

Asi la función de masa de probabilidad o función de probabilidad de una variable aleatoria que es uniforme es: $f(x)=\dfrac{1}{b-a}=\dfrac{1}{n}$ , para $x\in [a,b]$ donde $x_i \neq x_j$ cuando $i \neq j$ . Su gráfica:

Su función acumulada: $i ∈ [a,b], \text{ } {\displaystyle F(i;a,b)={\frac {\lfloor i\rfloor -a+1}{b-a+1}}}$ . Donde $i$ esl argumento de la función, $a$ es el ínfimo del dominio y $b$ el supremo del mismo. Gráfica:

https://dk81.github.io/dkmathstats_site/rmath-uniform-plots.html

Función generadora de momentos.

$M_X(t)=\dfrac{e^{at}-e^{(b+1)t}}{n(1-e^t)}$

Media.

$E(X)=\dfrac{a+b}{2}$

Varianza.

$\mu=\dfrac{(b-a)^2}{12}$

Distribución Bernoulli

¿Qué caracteriza o mide la variable aleatoria?

Mide la probabilidad de exito $f$ de un experimento con dos resultados posibles: "exito" y fracaso, sus probabilidades son $p$ y $1-p$ respectivamente. Tal que, el numero de exitos tiene un distribucion de Bernoulli. A saber, su parametro es $p.$

Fórmula y gráfica de la distribución.

$f(x,p)=p^x(1-p)^{1-x}$ para $x=0,1$ . Tal que $P(X=1)=f(1,p)=p$ y $P(X=0)=f(0,p)=1-p$ .

Grafica:

$F(k,p) = \begin{cases} 1-p \text{ } k \leq 0<1 \\ 1 \text{ } k \geq1 \end{cases}$

Grafica:

Función generadora de momentos.

$M_X(t)=q+pe^t$

Media.

$E(X)=p$

Varianza.

$Var(X)=pq$

Distribución Binomial

¿Qué caracteriza o mide la variable aleatoria?

n ensayos de Bernoulli.

Los cuales son identicos e independientes, es decir, probabilidad de éxito $p$ permanece sin cambio de un ensayo a otro.

La variable aleatoria denota el numero de éxitos obtenidos en $n$ ensayos.

Fórmula y gráfica de la distribución.

$f(x;n,p)= {n \choose x}p^x(1-p)^{n-x}$ donde n es el numero de ensayos y $x\in[0,n]$ .

$F(x,n,p)=\sum^ {\lfloor {x} \rfloor}_{i=0} {n\choose i}p^i(1-p)^{n-i}$ (Funcion logistica).

Función generadora de momentos.

$M_X(t)=(1-p+pe^t)^n$

Media.

$E(X)=np$

Varianza.

$Var(X)=npq$

Distribución Multinomial

¿Qué caracteriza o mide la variable aleatoria?

Se caracteriza por ser la generalizacion de una distribucion binomial para k categorias o eventos, en vez de 2 (exito o fracaso). A saber, sus parametros son $n>0$ y $p_1,...,p_k$ donde $\sum p_i=1$ .

Fórmula y gráfica de la distribución.

$f(x)=\dfrac{n!}{x_1!x_2...x_k!}p_1^{x_1}...p_k^{x_k}$

Función generadora de momentos.

$M_X(t)=(\sum^k_{i=1}p_ie^{t_i})^n$

Media.

$E(X_i)=np_i$

Varianza.

$Var(X_i)=np_i(1-p_i)$

Distribución Geométrica

¿Qué caracteriza o mide la variable aleatoria?

n ensayos de Bernoulli.

Identicos e independientes, con la misma probabilidad de exito $p$ (parametro), tal que $P(X=1)=P[\text{exito en el primero ensayo}]=p$

La variable aleatoria denota el numero de ensayos $x$ para obtener el primer exito. Su espacio muestral es $S=\{E,FE,FFE,FFFE,...\}$

Fórmula y gráfica de la distribución.

$f(x)=(1-p)^{x-1}p$ , donde $0<p\leq1$ y $x=1,2,3...$

$F(x)=1-(1-p)^x$

Función generadora de momentos.

$M_X(t)=\dfrac{pe^t}{1-(1-p^t)},$ para $t<-ln(1-p)$ .

Media.

$E(X)=\dfrac{1}{p}$

Varianza.

$Var(X)=\dfrac{1-p}{p^2}$

Distribución Binomial Negativa

¿Qué caracteriza o mide la variable aleatoria?

n ensayos de Bernoulli. Identicos e independientes, con la misma probabilidad de exito $p$ (parametro).

Los ensayos se observan hasta obtener exactamente $r$ exitos. Donde el experimentador lo fija.

La variable aleatoria denota el numero de ensayos $x$ para obtener $r$ exitos.

Fórmula y gráfica de la distribución.

$f(x)={x-1 \choose r-1}(1-p)^{x-r}p^r$ para $r=1,2,3...$ y $x=r,r+1,r+2$ .

Función generadora de momentos.

$M_X(t)=(\dfrac{1-p}{1-pe^{t}})^r$ para $t<-log(p)$ .

Media.

$E(X)=\dfrac{pr}{1-p}$

Varianza.

$Var(X)=\dfrac{pr}{(1-p)^2}$

Distribución Poisson

¿Qué caracteriza o mide la variable aleatoria?

La variable aleatoria mide el numero de sucesos u ocurrencias $x$ de un evento especificado en una unidad determinada de tiempo, longitud o espacio $s$ , durante el cual se puede esperar que ocurra un promedio $\lambda$ de estos eventos o el radio de ocurrencias de un estos eventos. Los eventos ocurrent al azar e idependientes entre si. Nace de la necesidad para distribuciones binomiales grandes.

The Poisson distrution is often used in situations where we are couting the number of successes in a particular region o interval of time, an there are a large number of trials, with a small probability of success. The Poisson paradigm is also called the law of rare events. The interpretation of "rare" is that the $p_j$ are small, not that $\lambda$ is small.

Fórmula y gráfica de la distribución.

$f(x)=\dfrac{e^{-k}k^x}{x!}$ para $x=0,1,2,...$ y $k>0$ .

$k=\lambda s$ , donde $\lambda$ el promedio de casos del evento por unidad y $s$ la magnitud o tamano del periodo de observacion.

Funcion acumalativa.

Función generadora de momentos

$M_X(t)=e^{k(e^k-1)}$

Media

$E(x)=k$

Varianza

Var(X)=k

Distribución Hipergeométrica

¿Qué caracteriza o mide la variable aleatoria?

El experimento consiste en extraer de una muestra aleatoria de tamano n sin remplazo ni consideracion de su orden, de un conjunto de N objetos.

De los N objetos, r posee el rasgo (caracteristica) que interesa, mientras que los otros N-r objetos restantes no lo tienen.

La variable aleatoria es el numero de objetos de la muestra que posee el rasgo.

Fórmula y gráfica de la distribución.

$f(x)=\dfrac{{r \choose x}{N-r \choose n-x}}{{N \choose n}}$ donde N, r y n son enteros positivos y los parametros.

Tal que $max(0,n-(N-r))\le x \le min(n-r)$

Funcion acumaltiva.

Función generadora de momentos.

Donde $F_1$ es la genaralizacion de la funcion hipergeometrica.

Media.

$E(X)=n\dfrac{K}{N}$

Varianza.

$Var(X)=n\dfrac{K(N-K)(N-n)}{N^2(N-1)}$

Consideraciones.

Si el número de unidades muestreado (n) es pequeño en relación con el de objetos del cual se extrae la muestra (N ), entonces es posible usar la distribución binomial para aproximar las probabilidades hipergeométricas.

Una regla general es que la aproximación suele ser satisfactoria si n/N ≤ 0.05.

Si n es pequeña en relación con N, la composición del grupo muestreado no cambia mucho de un ensayo a otro, pese a que se conserven los objetos muestreados. Así pues, la
probabilidad de éxito tampoco se modifica considerablemente de un ensayo al siguiente y, para cualquier fin práctico, puede verse como una constante.

De tal suerte, la distribución de X, el número de éxitos obtenidos en n intentos, puede aproximarse mediante la distribución binomial con parámetros n y p = r/N.

Worked examples

Una variable aleatoria es:

Una función cuyo dominio es el espacio muestral.

2. Para que se produzca f(y) = P(Y = y) para toda y, la distribución de probabilidad para la variable discreta Y puede ser representada por

It is a graph, table or formula.

6.31 Se escoge un punto D en la línea AB, cuyo punto medio es C y cuya longitud es a. Si X , la distancia de D a A , es una variable aleatoria que tiene la densidad uniforme con α = 0 y β = a , ¿cuál es la probabilidad de que AD, BD y AC formarán un triángulo?Estrictamente menor para poder llamarlo triángulo. Si fuera menor o igual hablamos de una línea, lo cual claramente no cumple la característica principal: un polígono de tres lados.

Sabemos que:

En todo triángulo la suma de las longitudes de dos lados cualesquiera es siempre mayor a la longitud del lado restante.

Es decir:

Triángulo\iff \text{cumple la desigualdad triangular}

Para nuestro caso particular:

AD+BD>AC\\ AD+AC>BD\\ AC+BD>AD

💡

Estrictamente menor para poder llamarlo triangulo. Si fuera menor o igual hablariamos de una linea, lo cual claramente no cumple la caractaristica principal: un poligono de tres lados.

Sustituimos de acuerdo a las condiciones del problema:

AD=x, BD=a-x, AC=\dfrac{a}{2}\\ x+(a-x)>\dfrac{x}{2}\implies a>0\\ x+\dfrac{a}{2}>a-x\implies x>\dfrac{a}{4}\\ \dfrac{a}{2}+(a-x)>x \implies x<\dfrac{3a}{4}\\

Así, X como variable aleatoria cumple con lo siguiente:

\alpha=0<\dfrac{a}{4}<X<\dfrac{3a}{4}<\beta=a

Por lo tanto, la probabilidad es:

P(\dfrac{a}{4}<X<\dfrac{3a}{4})=\int_{a/4}^{3a/4}1/a\text{ }dx=0.5

Accuracy (predictions, outcomes)

np.count_nonzero(predictions == outcomes) / len(predictions) 
== np.mean(predictions == outcomes)

Stochastic process

A stochastic process is a collection of random variables, indexed by an ordered time
variable.

Markov chain

Kosambi–Karhunen–Loève theorem.

https://www.stat.cmu.edu/~cshalizi/uADA/12/lectures/ch18.pdf

TODO

Kappa

Teoría de campo medio

Meand-field games

Next steps

💽Data mining. Data analysis.

Exercises [1].

Let $X$ be the random variable representing the number of heads seen after flipping a fair coin three times. Let $Y$ be the random variable representing the outcome of rolling two fair six-sided dice and multiplying their values.

Evaluate

$E[X]$
After $3$ flips you could get the possible outcomes 0, 1, 2, and 3 heads. We know the total possibilities are $2^3=8$ , so we’ll calculate the probability for each event.
$E[X]=\sum_{0\le i\le3}i \dfrac{3 \choose i}{8}= 1.5$

$E[Y]$
There are $6^2$ possible outcomes, but what are all products of multiplying two numbers between 1 and 6? For example, $6=1\times 6=2 \times 3$ .
So, $p(Y=6)=\dfrac{4}{36}$ since we have 4 divisors and every divisor does the operation.
1 2 3 4 5 6
1 6
2 6
3 6
4
5
6 6
Generally speaking, $p(Y=y)=\dfrac{i}{36}$ where $i$ is the number of y’s divisors where both multipliers are between 1 and 6. For example, $p(Y=36)=\dfrac{1}{36}$ since $6$ is the only divisor that checks the property. A set of divisors would be $\{\{1\}, \{1,2\}, \{1,3\}, \{1,2,4\},..., \{6\} \}$ .
The max number in our set $P$ is 36 since 6 and 6 are the biggest numbers and the minimum number is 1; Therefore, we are searching products between 1 and 36 ( $1 \le product\le36$ ). We ignore the primes greater than 6 since their probability equals 0.
$E[Y]=\sum_{v\in V}v p(Y=v)\\ =1p(Y=1)+2p(Y=2)+...+36p(Y=36)\\ =1\dfrac{1}{36}+2\dfrac{2}{36}+...+36\dfrac{1}{36} =12.25$
```
function divisors(n) {
  const numbers = 0
  for(let i=1; i<=6; i++)
     if (n%i === 0 && n/i <= 6)
       numbers++
  return numbers;
}
Array.from({length: 36}, (e, i)=> i+1)
.reduce((acc,curr) => acc+curr*divisors(curr)/36, 0)
```

$E[X+Y]=E[X]+E[Y]=1.5+12.25=13.75$

References

MIT OpenCourseWare. (2022, December 21). MIT OpenCourseWare. Retrieved from https://ocw.mit.edu/courses/6-006-introduction-to-algorithms-spring-2020/resources/mit6_006s20_ps0-questions

Think Bayes: Bayesian Statistics in Python (O'reilly) 2nd Edition by Allen Downey

The Canon

Name	Author	Why?
Probability and Statistics for Engineering and the Science. EIGHTH EDITION.	Jay Devore
Introduction to Probability (second edition), Chapman & Hall/CRC Press (2019), ISBN 9781138369917	Jessica HwangJoseph K. Blitzstein
John E. Freund's Mathematical Statistics with Applications, 8th edition, by Miller and Miller. ISBN: 9780321807090	John E. Freund
Introduction to Probability and Statistics 14th Edición ISBN: 978-1133103752	Barbara M. BeaverRobert J. BeaverWilliam Mendenhall Robert
First Course in Probability. 9780321794772	Sheldon M. Ross
Understanding probability ISBN:978-1-107-65856-1	Tijms H. C..
Digital textbook on probability and statistics.	Marco Taboga
Probability Theory: The Logic of Science	E. T. Jaynes

Complement

Name	Author
Statistics versus words. Randall Collins. https://sci-hub.se/10.2307/223353	Randall Collins
An Investigation of the Laws of Thought, on Which Are Founded the Mathematical Theories of Logic and Probabilities.	George Boole
Human Action: A Treatise on Economics	Ludwig von Mises
Edgeworth's Writings on Chance, Probability and Statistics	Philip Mirowski
Logic and Probability	Stanford Encyclopedia of Philosophy

[7] Definition and examples of outcome | define outcome - Probability - Free Math Dictionary Online. Icoachmath.com. Retrieved June 10, 2021 from http://www.icoachmath.com/math_dictionary/outcome.html

Requisites

Story

Characters

Key questions

Why use probability and no other mechanism?

Statistics or probability

Why study probability?

Luck. Coincidence. Risk. Doubt. Fortune. Chance. Uncertainty. What is that called Randomness?

R vs Python vs MATLAB vs Octave vs Julia

Is a story proof fully valid mathematical proof?

If they are dependent events, then an event causes the other one — and vice versa.

Probability

Probability modeling

Tree Model

A Set Theory Dictionary for probability problems [2]

Probability approaches

Classical Approach or Naive Probability

The Personal Opinion Approach

Relative Frequency Theory

Modern Approach or Axiomatic Probability

Worked examples.

R, "Vector thinking".

Montmort's matching problem.

Birthday problem.

Python

Julia

Conditional probability

Definition.

Prosecutor's fallacy

Monty Hall

Bayes

Worked examples

Random variables and their distributions

Expectation

Continous random variables

Momements

Joint distributions

Transformations

Overview and descriptive statistics

Statistics

Inferential statistics

Graphing Variables

Graphing Qualitative Variables

Graphing Quantitative Variables

Descriptive Statistics

Probability

Set dictionary

Vandermonde's identity

Naive probability

How to count?

Philosophical questions

Worked examples

Random Variables

Def.

Probability Distributions for Discrete Random Variables.

Probability Distributions for Continuous Random Variables. Probability density function.

@import url('https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.16.9/katex.min.css')σX=Var(X)\sigma_X=\sqrt{Var(X)}σX​=Var(X)​﻿

Funciones

Distributions

Distribución Discreta Uniforme

¿Qué caracteriza o mide la variable aleatoria?

Fórmula y gráfica de la distribución.

Función generadora de momentos.

Media.

Varianza.

Distribución Bernoulli

¿Qué caracteriza o mide la variable aleatoria?

Fórmula y gráfica de la distribución.

Función generadora de momentos.

Media.

Varianza.

Distribución Binomial

¿Qué caracteriza o mide la variable aleatoria?

Fórmula y gráfica de la distribución.

Función generadora de momentos.

Media.

Varianza.

Distribución Multinomial

¿Qué caracteriza o mide la variable aleatoria?

Fórmula y gráfica de la distribución.

$\sigma_X=\sqrt{Var(X)}$