# Artificial Intelligence

# Requisites

compressive sensing (sparse coding),

information theory,

control theory,

economics,

logic,

operations research,

game theory,

and optimization.

# Introduction

## What is AI?

We struggle with what is intelligent, but not artificial.

https://www.tor.com/2011/06/21/norvig-vs-chomsky-and-the-fight-for-the-future-of-ai/

We use rational agents as our approach. An agent is an entity that can perceive an environment X and can act on it where X can be virtual or physical. How an agent decides to act given all previous considerations is a black box.

So, intelligence or rationality is when an agent makes the best decision in a given environment and goal with constraints and acts in accordance with it. Therefore, we evaluate an agent by its results, not by its mental state.

If an agent makes the best decision in all environments and goals, it is called artificial general intelligence.

If an agent makes the best decision in a specific environment, it is called weak AI.

From a scientific point’s view is understanding complete AI, but from an engineering point’s view is making incomplete, imperfect, and weak artificial intelligence.

Some constraints are lack of knowledge, time to learn, time to execute, money, actuators, and sensors.

But, now black boxes become white boxes. If you are here, then you are interested in white boxes which are computational procedures. You’re going to learn them!

The goal of IA

Agents. How can we create intelligence?

Tools. How can we use IA techniques to solve techniques?

**An intelligence agents**Perception

Robotics

Language

Knowledge

Reasoning

Learning

How can we make systems that behave as humans?

Are we there yet?

Today, Machines do narrow tasks, humans do broad tasks.

IA Agents:

achieving human-level intelligence

IA tools

Understanding and solving real problems.

Predicting poverty.

Self-driven cars.

Authentication.

## Ecosystem

https://www.microsoft.com/en-us/research/collaboration/bair/

https://people.eecs.berkeley.edu/~yima/ by https://twitter.com/YiMaTweets

## Story

**Computing Machinery and Intelligence.**

The Imitation Game (Turing test).

Player A is a computer who claims to be a man.

Player B is a man.

Interrogator.

Total Turing Test.

Loebner Prize.

The Argument from Extrasensory Perception. During Cold War, people were interested in viz, telepathy, and precognition, so Turing prepared an argument for the situation.

https://courses.cs.washington.edu/courses/csep590a/06au/projects/history-ai.pdf

https://plato.stanford.edu/entries/artificial-intelligence/

https://journals.sagepub.com/doi/pdf/10.1177/0008125619864925

https://dl.acm.org/doi/fullHtml/10.1145/2063176.2063177

https://sitn.hms.harvard.edu/flash/2017/history-artificial-intelligence/

Association for Computing Machinery (ACM). (2022, December 22). January 2023 CACM: The End of Programming. Youtube. Retrieved from https://www.youtube.com/watch?v=OnYJXm9NvyA&ab_channel=AssociationforComputingMachinery(ACM)

## Levels

State-based models: search problems, MDPs, games

Variable-based models: CSPs, Bayesian networks

Logic-based models: propositional logic, first-order logic

## Related work

Cognitive science

Philosophy of mind

## Worked examples

## “Surely computers cannot be intelligent—they can do only what their programmers tell them.” Is the latter statement true, and does it imply the former?

Argument 0.

If we assume "The Universal Machine can do all possible processes", we cannot say "Some possible processes can't be done by the Universal Machine" since it follows a contradiction, that is, "The Universal Machine is not universal". The big questions are “are computers universal machines?”, “are humans universal machines?”, “Can we generate life, intelligence, or conscientiousness from inorganic materials?”

Argument 1.

Formerly,

$p:\text{computers can be intelligent}$

$q:\text{computers can do what their programmers tell them}$

Implication:

$q\implies \lnot p$

Conversely,

$p\implies \lnot q$

If $\lnot q\implies FALSE$, $p\implies FALSE$, so $q$ doesn’t imply $\lnot p$.

But, is $q$ true?

All computer actions are programmed.

There are computer actions as randomness by entropy devices are not programmed.

Some computer actions are not programmed.

$q$ is false.

https://ieeexplore.ieee.org/document/7313842

BUG? Turing Machine Deterministic and Nondeterministic equivalence.

## FAQ

### I learned in the Theory of computation some problems are undecidable, but I see those problems solved with Artificial Intelligence, how is that?

# AI Framework.

Intelligent Agents. Rational agents.

## Agent class

```
graph TD
subgraph Architecture
Information --> Program
Program --> Decision
end
```

## Code

`@startuml Agent <|-- "Rational Agent" class Agent { actuators sensors type process: 'competitive' | 'cooperative' performance measure percepts(environment) choose the best decision by performance measure(): Decision } note left of Agent::percepts What is the environment like now? end note note left of Agent::"choose the best decision by performance measure" What action I should take now? It's called the agent program too. Building a small program is a key challenge. No brute force such as a vast table! But remember, sometimes, memory = intelligence. In fact, if you know all the possible results, it's called Omniscience. Memory is called safe steps. Decreasing memory increases heuristic steps. end note class "Rational Agent" {} note right: Learning and Autonomy "Rational Agent" <|-- "Reflex agent" "Rational Agent" <|-- "Model-based reflex agent" "Rational Agent" <|-- "Goal-based agent" "Rational Agent" <|-- "Utility-based agent" Agent --> Environment: action Environment --> Agent: percepts class Environment { observable: 'Fully' | 'Partially' agent: agents: Agent[] // Single, Multi deterministic: 'deterministic'| 'nondeterministic'| 'stochastic' sequential: 'episodic' | 'sequential' static: 'static' | 'dynamic' | 'semi-dynamic' discrete: 'discrete' | 'continuous' known: 'Known' | 'Unknown' } class Actuators Agent o-- Actuators class "Reflex agent" { choose the best decision by performance measure(): Decision - rule match(state, rules) - interpret input(environment): state } class "Model-based reflex agent" { current state choose the best decision by performance measure(): Decision - transit() } @enduml note right of "Reflex agent"::"choose the best decision by performance measure" condition-action rules f(x) end note note right of "Model-based reflex agent"::"choose the best decision by performance measure" state machine (DFA/Meadly/Moore/...) end note`

## Worked examples

## Further readings

Economic agents.

Robotics.

# Solving problems by Searching.

```
graph TD
S["S, environment"] -->|"cost(S,action1, A)"| A["A, new environment"]
S -->|"cost(S,action2, B)"| B["B, new environment"]
S -->|"cost(S,action3, C)"| C["C, new environment"]
subgraph possible_solutions3
C-->E["..."]
end
subgraph possible_solutions2
B-->F["..."]
end
subgraph possible_solutions1
A-->D["..."]
end
D-->Goal,["Goal, new environment == goal"]
```

## Searching problem model

```
classDiagram
class SearchProblem {
heuristic()
start_state()
is_goal(state)
expand(state)
valid_actions_from(state)
action_cost(state, action, next_state)
next_state(state, action)
}
class State {
distance_from_start_state
previous_state
environment
build()
relax()
reconstruct_path()
}
class SearchingStrategy {
findPlanFor(problem)
}
class Agent {
searchingStrategy: SearchingStrategy
problem: SearchProblem
act(state)
}
Agent *-- SearchingStrategy
Agent *-- SearchProblem
SearchingStrategy <|-- BFS
SearchingStrategy <|-- DFS
SearchingStrategy <|-- AStar
SearchingStrategy <|-- Dijistra
SearchingStrategy <|-- LinearProgramming
```

## Searching strategies

## Searching for solutions

Considerations.

We build a tree or graph on demand by searching strategies.

We have got to avoid cycles in order to keep away infinite loops.

Each new state saves the reference's previous state and we only choose valid action, so when our searching algorithm reaches the goal; it reconstructs the path from the start state.

$S\to A\to ...\to Goal$

Codification matters.

## Uninformed search strategies

BFS

DFS

## Informed Search Strategies

A*, Greedy Search, Hill Climbing, Simulated Annealing, Best-First Search

## Heuristic Functions

A heuristic is a function that estimates the distance from the current state to the goal.

$f(n)$ is the real or estimated cost of the $n$ solution.

$g(n)$ is the cost to reach $n$ from the start state. $\sum_{i=0}^n cost(i,action,i+1)$

$h(n)$ is the estimated cost to reach the goal state from the $n$ state, so it uses the available information from the problem or environment state in order to estimate the cost.

$h^*(n)$ is the real cost to reach the goal state from the $n$ state.

$\sum_{i=n}^g cost(i,action,i+1)$

Note $h(n) \to0$ and $h^*(n)\to 0$.

### Properties

Main idea: estimated heuristic $\le$ actual costs.

Admissibility.

$0 \le h(x) \le \text{costs to goal}$

Consistency.

$h(x)-h(y) \le costs(x,y)$

Dominance.

Optimal.

### How do you find a heuristic function?

Relax your problems, use available information about the current state or the goal, use min, and max functions, or use some distance functions such as Manhattan distance, Euclidean distance, Hamming distance, … and norms.

# Beyond Classical Search.

Heuristic. Safe steps.

Offline, Online.

Solving problems by searching are graph algorithms that **generate **new nodes by heuristic and safe steps and **test** them, wrong answers are **rejected** them.

## Hands-on Projects

### 8-Queen solver

### Hanoi tower

### Maze solver

Graph Theory Visualizer: Maze. (2022, July 03). Retrieved from https://graph-theory.sanchezcarlosjr.com

Project 0 - Unix, Python and Autograder Tutorial - CS 188: Introduction to Artificial Intelligence, Spring 2021. (2022, September 29). Retrieved from https://inst.eecs.berkeley.edu/~cs188/sp21/project0/#question-1-addition

### The farmer, fox, goose, and grain

### Integral solver

### Pacman

Project 1 - Search - CS 188: Introduction to Artificial Intelligence, Spring 2021. (2022, September 29). Retrieved from https://inst.eecs.berkeley.edu/~cs188/fa22/projects/proj1/

## Worked examples

## References

https://aimacode.github.io/aima-javascript/3-Solving-Problems-By-Searching/

How to solve it: Modern Heuristics by Zbigniew Michalewicz, David B. Fogel.

# Adversarial Search

## MinMax

https://www.youtube.com/watch?v=l-hh51ncgDI&ab_channel=SebastianLague

## Monte-Carlo

**matchmaking algorithms**

## Hands-on Projects

### Tic Tac Toe

### Chess

### Pacman v2

Notion – The all-in-one workspace for your notes, tasks, wikis, and databases. (2023, April 20). Retrieved from https://www.chessengines.org

Project 2. (2022, October 13). Retrieved from https://inst.eecs.berkeley.edu/~cs188/fa22/projects/proj2

# Variable-based models with Factor Graphs

Now we embark on our journey through variable-based models in which we will think in terms of **variables, factors**, and **weights. **In particular, in this section we explore Factor graphs and their special cases: Constraint Satisfaction Problems (CSP), Markov networks and Bayesian networks. We don’t relay anymore in searching all possible solutions anymore. Instead, we assign to variables, int this way, allowing algorithms to **infer** the variables ordering, etc.

```
graph TD
subgraph Variables
X1((X1))
X2((X2))
X3((X3))
end
subgraph Factors
f1[f1]
f2[f2]
f3[f3]
f4[f4]
end
X1 --- f1
X1 --- f2
X2 --- f2
X2 --- f3
X3 --- f3
X3 --- f4
```

## Formal definition

Constraint Satisfaction problems are defined by a set of variables $X_i$, each with a domain $D_i$ of possible values, and a set of constraints $C$. The aim is to find an assignment of the variables $X_i$ from the domains $D_i$ in such a way that none of the constraints $C$ are violated. Informally, our goal is to find the best assignment of values to the variables.

Variables-based models

A constraint satisfaction problem consists of three components X, D, and C:

X is a set of variables

D is a set of domains

C is a set of constraints that specify allowable combinations of values

Factors

Each assignment x

Objective arg max W

continuous-domain CSPs is of linear programming

**Message Passing**

## Exercises and Projects

## Summary

### Key decisions

## FAQ

## Reference Notes

https://stanford.edu/~shervine/teaching/cs-221/cheatsheet-variables-models#bayesian-networks

https://gtsam.org/2020/06/01/factor-graphs.html

https://stanford-cs221.github.io/spring2024-extra/modules/csps/csps1.pdf

# Knowledge, reasoning, and planning

Although traditional logical agents provide us expressiveness in a compact way, they are inherently deterministic and struggle to handle unstructured data. These systems follow predefined rules, making it difficult to manage uncertainty and ambiguity across diverse domains. Additionally, representing and processing unstructured data (e.g., text, images, time series, video) is challenging and often requires significant manual effort and expense. This lack of flexibility limits their ability to generalize across different domains as effectively as modern Deep Learning models.

### Logical Agents

Knowledge-based agents

Different syntax, same semantics: $2+3\iff3+2$

Same syntax, different semantics: $3/2 \text{ (Python 2.7)} \iff3+2 \text{ (Python 3)}$

A knowledge base is a set of sentences, each sentence is an assertion about the world given a representation **language **for a specific domain. Logic consists of syntax, semantics, and inference rules. The formulas by themselves are just symbols (syntax), they don’t provide meaning.

A knowledge-based agent is composed of a knowledge base which depends on domain-specific content and an inference mechanism. They can represent states, actions, and weights, incorporate new percepts, update internal representations of the world, and deduce hidden properties of the world.

Semantics is the interpretation function.

In a declarative approach to building a logical agent, we add new sentences because we tel**l **it what it needs to know and query what is known

Natural Language?

We can save knowledge in different data models and apply different inference mechanisms Knowledge graphs.

Entailment. It adds trivial information to KB.

Contradiction.

Contingency. It adds non-trivial information to KB.

http://intrologic.stanford.edu/dictionary/logical_entailment.html

Learning formulas.

A language needs syntax, semantics, and implementation level.

The syntax of a language defines a set of valid formula

Prolog, Relational databases, SQL, Datalog?

Intelligent agents need knowledge about the world to choose good actions.

A model or world $w$ in propositional logic is an assignment of truth values to propositional symbols.

Modeling and inference

Propositional logic with only Horn clauses

Propositional logic

Modal logic

First-order logic

Second-order logic

Tell[f] → KB

Possible responses:

- Already knew that: entailment ($KB\models f)$

- Don’t believe that: contradiction ($KB\models \neg f)$

- Learned something new (update KB): contingent.

Ask[f] → KB

Possible responses:

- Yes: entailment ($KB\models f)$

- No: contradiction ($KB\models \neg f)$

- I don’t know: contingent.

A knowledge base KB is satisfiable if $M(KB)=\emptyset$

Execution engine.

Knowledge base (domain-specific facts) + inference engine.

Syntax, set of possible worlds, truth condition.

Sound Algorithm.

Complete Algorithm.

Theorem-proving.

Model-checking.

https://www.youtube.com/watch?v=xL0kNw5TudI&t=192s

https://www.youtube.com/watch?v=oM5LUGPO7Zk&list=PLh7QmcIRQB-uiOS4GMlBbq0jkvtqhqtq0

https://www.youtube.com/watch?v=CAsq7hm3sbI&ab_channel=IITDelhiJuly2018

https://www.youtube.com/watch?v=xFpndTg7ZqA&t=1s&ab_channel=IITDelhiJuly2018

https://www.youtube.com/watch?v=h6zCkrZ8ehE&t=1s&ab_channel=RichNeapolitan

tammet. (2022, December 13). gkc. Retrieved from https://github.com/tammet/gkc

### Program

```
class KnowledgeAgent:
KB knowledge base
t int
act(environment):
tell(KB, MakePerceptSentence(environment, t))
action = ask(KB, MakeActionQuery(t))
tell(KB, MakeActionSentence(action, t))
t = t+1
return action
```

## Inference machine

## First-Order Logic

## Inference in First-Order Logic

## Worked examples

## Knowledge graph

Prolog

RDF

Datalog

SQL and open cypher (Apache Age)

## Projects

**Card fraud detector**

**Make an online quiz system about Artificial Intelligence**

**8-eight queen**

**Pacman Finder**

https://inst.eecs.berkeley.edu/~cs188/sp21/project3/

**Wordle Solver**

https://swi-prolog.discourse.group/t/wordle-solver/5124

https://cheatle.occasionallycogent.com/

## Resources

- http://www.learnprolognow.org/ is a great place to start

- http://cs.union.edu/~striegnk/courses/nlp-with-prolog/html/ covers some advanced topics

- http://www.coli.uni-saarland.de/projects/milca/courses/comsem/html/ more advanced

- http://www.mtome.com/Publications/PNLA/prolog-digital.pdf This is my personal favorite

You may also consider picking up some of the following books

- Clocksin - Mellish: Programming in Prolog

- Covington - Nute - Vellino: Prolog Programming in Depth

- Sterling - Shapiro : The Art of Prolog

- Bratko : Prolog Programming for Artificial Intelligence

## FAQ

## What are real-world projects where people use PROLOG?

https://www.quora.com/What-is-Prolog-used-for-today

https://www.drdobbs.com/parallel/the-practical-application-of-prolog/184405220

# Classical Planning

# Planning and Acting in the Real World

# Knowledge representation

# Uncertain knowledge and reasoning

# Quantifying Uncertainty

# Probabilistic Reasoning

# Probabilistic Reasoning over Time

# Making Simple Decisions

# Making Complex Decisions

# Machine Learning

Machine Learning is field of study that gives computers the ability to learn without being explicitly programmed Arthur Samuel (1959). Traditional programming and classic artificial intelligence involves writing rules that act on data to produce answers. But if you flip this approach, you get machine learning. In this case, we gather a large amount of data and answers, apply a learning algorithm, and as an output, we acquire rules or models. These models can then make predictions without being specifically programmed to perform the task.

**Machine Learning.**The main driver of recent successes in IA. Move from “code” to “data” to manage information complexity. The goal is the

**Generalization.**

Reflex-based models. Linear classifiers, deep neural networks.

Modeling. Simplify the real world into a well-defined mathematical model. Example: planning goes from A to B in a city. Inference. Developing algorithms to find new data. Learning. Model without parameters such that we use data to learn those parameters by applying an algorithm.

We classify Machine Learning as Supervised learning, nonsupervised learning and reinforcement learning. The below table gives you an overview of learning algorithms.

Learning algorithm | When to Use | Relevant Metrics |

Linear Regression | When there's a linear relationship between the input and output. | Mean Squared Error (MSE), R-squared, Adjusted R-squared |

Logistic Regression | For binary classification problems. | Accuracy, Precision, Recall, AUC-ROC |

Decision Trees | When there's a need to understand the decision-making process. Useful for both classification and regression. | Gini Index, Information Gain for model construction; Accuracy, Precision, Recall for evaluation |

Random Forest | When model interpretability is less important and you need higher performance. | Out-of-bag (OOB) error, Accuracy, Precision, Recall |

K-Nearest Neighbors | When instances of the same class are generally close to each other in the feature space. | Accuracy, Precision, Recall, F1 Score |

Support Vector Machines | When there's a clear margin of separation between classes. | Accuracy, Precision, Recall, F1 Score |

Neural Networks | For complex problems like image recognition, speech recognition, and natural language processing. | Depends on the task, but often includes Accuracy, Precision, Recall, AUC-ROC, and Loss metrics like Cross-Entropy Loss |

XGBoost | Best for heterogeneous structured datasets | |

Ensam

1.1 Introducción

1.2 Aplicaciones

1.3 Principales enfoques de aprendizaje automático

1.4 Paradigmas de aprendizaje automático

1.5 Conceptos básicos

1.6 Problemas fundamentales

1.7 Evaluació n de modelos aprendidos

2.1 Introducció n

2.2 Desarrollo histó rico del paradigma

2.3 Árboles de decisió n

2.4 Reglas de inducció n

2.5 Aplicaciones

2.6 Tó picos selectos

3.1 Introducció n

3.2 Desarrollo histó rico del paradigma

3.3 Algoritmos gené ticos

3.4 Programació n gené tica

3.5 Aplicaciones

3.6 Algoritmos bioinspirado

4.1 Introducció n

4.2 Desarrollo histó rico del paradigma

4.3 Teorema de Bayes

4.4 Ingenuo bayesiano

4.5 Aplicaciones

4.6 Modelos gráficos probabilistas

5.1 Introducció n

5.2 Desarrollo histó rico del paradigma

5.3 Redes Neuronales Artificiales (RNA)

5.4 Algoritmo de retro-propagació n

5.5 Aplicaciones

5.6 Revisió n de arquitecturas de RNA

6.1 Introducció n

6.2 Desarrollo histó rico del paradigma

6.3 K-vecinos más cercanos

6.4 Máquinas de soporte de vectores

6.5 Aplicaciones

6.6 Tó picos selectos

https://github.com/afshinea/stanford-cs-229-machine-learning/tree/master/en

https://realpython.com/python-ai-neural-network/

### Libros

*François Fleuret’s Homepage*. (n.d.). Retrieved June 16, 2023, from https://fleuret.org/francois/#lbdl

### Notas

CM. (n.d.). International Conference on Machine Learning. https://icml.cc/

Domingos, P. (2017). The Master Algorithm. The MIT Press.

GECCO. (2022). The Genetic and Evolutionary Computation Conference. https://gecco-2022.sigevo.org/HomePage

Mitchell, T. (1997). Machine Learning. McGraw Hill.

Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. The MIT Press.

NeurIPS (2021). Conference on Neural Information Processing Systems.

https://nips.cc/

Bishop, C. M. (2006). Pattern Recognition and Machine Learning. The MIT Press.

CVF. (n.d.). Computer Vision Foundation.

http://openaccess.thecvf.com/menu.py

Nunes, L. (2006). Fundamentals of Natural Computing: Basic Concepts, Algorithms, and Applications. Chapman & Hall/CRC. [Clásica].

Russell, S., & Norvig, P. (2009). Artificial Intelligence: A Modern Approach. Pearson. [Clásica].

Sucar, L. E. (2015). Probabilistic Graphical Models: Principles and Applications. Springer. [Clásica].

Tan, P., Steinbach, M., Karpatne, A. & Kumar, V. (2018). Introduction to Data Mining (2nd ed.). Pearson.

*Clasificación con Árboles de Decisión: el algoritmo CART | Codificando Bits*. (n.d.). Retrieved June 16, 2023, from https://www.codificandobits.com/blog/clasificacion-arboles-decision-algoritmo-cart/

- Sanz, F. (2020, November 30). Cómo funciona el algoritmo XGBoost en Python. The Machine Learners. https://www.themachinelearners.com/xgboost-python/https://www.themachinelearners.com/xgboost-python/

- Graff, M. (2022) Aprendizaje Computacionalhttps://ingeotec.github.io/AprendizajeComputacional/

- Lex Fridman. (2020, May 5).
*Daphne Koller: Biomedicine and Machine Learning | Lex Fridman Podcast #93*.

*Ep 67 - Thamar Solorio (U.Houston - Bloomberg) - Lo bueno de la Academia y el NLP by Hacia Afuera*. (n.d.). Retrieved June 16, 2023, from https://podcasters.spotify.com/pod/show/elia-ia/episodes/Ep-67---Thamar-Solorio-U-Houston---Bloomberg---Lo-bueno-de-la-Academia-y-el-NLP-e1ppqte

- Ep 58 - Luciana Benotti (Universidad Nacional de Córdoba) - Inclusión y diversidad en Inteligencia Artificial by Hacia Afuera. (n.d.).

Retrieved June 16, 2023, from

https://podcasters.spotify.com/pod/show/elia-ia/episodes/Ep-58---Luciana-Benotti-Universidad-Nacional-de-Crdoba---Inclusin-y-diversidad-en-Inteligencia-Artificial-e1nrmek

- Ep 65 - Ivana Feldfeber (DataGénero) - Datos e inclusión by Hacia

Afuera. (n.d.). Spotify for Podcasters. Retrieved June 16, 2023, from

https://podcasters.spotify.com/pod/show/elia-ia/episodes/Ep-65---Ivana-Feldfeber-DataGnero---Datos-e-inclusin-e1ou88s

*Women in AI*. (n.d.). Spotify. Retrieved June 16, 2023, from https://open.spotify.com/show/62v63cucHe8HdZD6ooyCOg

# Learning from Examples

When you have a dataset with features (X) and labels (Y), supervised learning means finding the relation mapping from $X$ to $Y$.

Which so an interesting algorithm? This algorithm learns from examples, that is, you have a** training set,** model the task, for example as linear regression, and it gives a **hypothesis which is **a model that maps input **features** to **target**. The learner is an optimization algorithm that needs an optimization problem, that is, our task is split into finding the right **optimization model** and then employing the right **optimization algorithm**.

The optimization problem relies on min $Loss(x,y,w)$.

Loss minimization tasks. min TrainLoss(w)

The score is how confident we are.

The margin is how correct we are.

You might ask how can you get the training set, how can you deploy the hypothesis, and how can you know what method to apply, the answers are in.

### Development cycle

```
Split data into train, val, test
Exploratory data
Repeat:
- Implement feature/tune hyperparameters
- Run learning algorithm
- Sanity check train and val error rates
— Look at errors to brainstorm improvements
- Log as far as you can (reports)
- Run on test set to get final error rates
```

Most of the time, the test metric does not decrease.

Optimization.

Discrete optimization. find the discrete object

min Cost(p)

p in Paths

Algorithmic tool: dynamic programming

Continuous optimization: find the best vector of real numbers that satisfies

min TrainingError(w)

w in R^d

Algorithmic tool: gradient descent

Ground true.

It refers to the expected label associated with a dataset.

```
graph TD
TrainingSet --> LearningAlgorithm
LearningAlgorithm --> Hypothesis
```

Given the features $\chi$ the **hypothesis **$h$ is a **predictor**, to target $y$. The features $\chi$ denote the space of input values, and target $y$ the space of output values. So the supervised learning goal is to find a good predictor $h: \chi \to y$.

```
class Model
fit(trainingset)
apply a learning algorithm to training examples
generates a hypothesis
predict(instances)
apply the hypothesis to instances
```

You call **regression** problems when $y$ is continuous otherwise $y$ is discrete, so we call it a **classification **problem.

Suppose you have a linear regression problem, you may represent $h$ as $h(x)=mx+b$, an affine function. More generally, $h(\bold{x})=\bold{\theta} \bold{x}^T$, $\bold{\theta}=[\theta_0,\theta_1,\theta_2,...,\theta_n],\bold{x}=[x_0,x_1,...,1]$ where $\bold{\theta}$ are the “parameters” that allows us to make good predictions.

## Loss functions

A. Chadha, V. Jain, Distilled Notes for Machine Learning , https://www.vinija.ai, 2022, Accessed: July 1 2022.

## Distance metrics

Computing edit distance

Input: two strings, s and t

Output: minimum number of character insertions, deletions, and substitutions between s and t.

Example:

s: a cat

t: the cats!

General principles are reducing the problem and away details.

## Linear prediction

Linear Regression

$min F(w) = \sum_{i=1}^n (wx_i-y_i)^2$

Input: set of pairs.

Output: w\in R that minimizes the squared error F(w= \sum_{i=1}^n (x_I w - y_i)^2.

Algorithm Gradient Descent.

Linear prediction.

Score: a weighted combination of features.

Weight vector w.

$w\phi(x)$

$f_p$

## Binary linear classifier

Decision boundary

Separate the space into different subspaces to classify.

$f_w(x)=sign(w\phi(x))$

$Loss_{hinge}=max(1-f_w,0)$

Case analyzes

\Delta Loss_{hinge} =

0 if w \phi(x)y >1 (

-\phi(x)y 0w

It increases the margin correctly.

$Loss_{logistic}(x,y,w)=log(1+e^-(w\phi(x)y))$

## Gradient descent and Stochastic gradient descent

Optimization.

Discrete optimization. find the discrete object

min Cost(p)

p in Paths

Algorithmic tool: dynamic programming

Continuous optimization: find the best vector of real numbers that satisfies

min TrainingError(w)

w in R^d

Algorithmic tool: gradient descent

The gradient $\Delta f$ is going to give the direction and the rate of value at a point f

The goal is to move in the contrary direction of the gradient.

Least squares regression.

Objective function:

$TrainLoss(w) = \dfrac{1}{D_{train}} sum_{(x,y)\in w} f_w$

Gradient

\Delta TrainLoss(W) = \Delta TrainLoss(W)

SGD

For each(x,y) in D_train:

w ← w - step_size Loss(x,y,w)

It’s about quality.

SGD can be worse than GD if the dataset has noise.

Step size

Strategies

Constant

Decreasing

## Maximum likelihood estimation

Maximum likelihood estimation is the goal of training classifiers, that is, we’re finding the parameters $\theta$ that maximize the probability for the actual observed data. $p(y=1|x;\theta)$ refers to the conditional probability that the output is the class $y=1$ given the input variables $x$ and the parameters of the model $\theta$.

independently and identically distributed

## Types of problem

Pattern recognition

System recommendation

## Types of data

**Vector Data**: This is the most common and simplest form of data in machine learning. The dataset is a 2D tensor where each data point can be encoded as a vector. Examples can be anything from housing price prediction data (features being the number of rooms, location, size of the house, etc.) to text data (after applying some sort of vectorization like bag-of-words or TF-IDF).

- Natural language.

**Time series or Sequence Data**: Time series data captures a series of data points recorded over regular time intervals. The order of data points is important here because the same set of data points in a different time order might mean something entirely different. Sequence data is very similar, but time isn't necessarily a factor here. Examples of these are stock price data, weather data, or any type of data where time plays a crucial role. For sequence data, a sentence or a DNA sequence would be a good example as the order of words or genes is important.

**Image Data**: Images are represented as 3D tensors (height, width, color_depth). However, a batch of images used for training a model is stored in a 4D tensor (batch_size, height, width, color_depth). Deep learning models like Convolutional Neural Networks (CNNs) are designed to extract features from these 4D tensors and use them to classify images, detect objects, and more. Applications range from medical imaging (detecting diseases) to self-driving cars (identifying pedestrians, signs, etc.).

**Video Data**: Video data can be thought of as a series of images, so naturally, this extends the image data tensor by one more dimension, the frame dimension. So a video dataset would be a 5D tensor (batch_size, frames, height, width, color_depth). Video data is used in various applications like activity recognition, video synthesis, and object tracking in videos.

**Graph data.**

### Data labeling

## Fine-tuning

**What is Transfer Learning?**

## Datasets

## Notebooks

https://www.youtube.com/watch?v=T-fAkfU9j_o&ab_channel=Elpensamientoenllamas

## Natural Computing

NACO

https://fcampelo.github.io/EC-Bestiary/

black hole algorithm

Mandelbrot set from scratch, Markov text-generation, and John Conway’s Game of Life ar

## Pattern Recognition

- Bishop, C. M. (2006).
*Pattern Recognition and Machine Learning*. Springer. Available at: Microsoft Research

- Duda, R. O., Stork, D. G., & Hart, P. E. (2001).
*Pattern Classification*(2nd ed.). Wiley.

- Fu, K. (1974).
*Syntactic Methods in Pattern Recognition*. Academic Press.

- Jürgen, M. & Matthias, N. (2018).
*Pattern Recognition: Introduction, Features, Classifiers and Principles*. Berlin: De Gruyter Oldenbourg (De Gruyter Graduate). Available at: EBSCOhost.

- Koutroumbas, K. & Theodoridis, S. (2009).
*Pattern Recognition*. Academic Press. Available at: EBSCOhost

- Murty, M. N. & Devi, V. S. (2011).
*Pattern Recognition: An Algorithmic Approach*. Springer London. Available at: EBSCOhost

- Massachusetts Institute of technology. (n.d.). Mitopencourseware.

Pattern Recognition and Analysis.

https://ocw.mit.edu/courses/media-arts-and-sciences/mas-622j-pattern-recognition-and-analysis-fall-2006/syllabus/

## Deep Learning

https://people.idsia.ch/~juergen/deep-learning-history.html

## Tensor

## Automatic differentiation

La función de activación tiene como propósito incorporar no linealidades al modelo. Las redes neuronales están inspiradas en las redes neuronales reales.

Que es un gradiente? Metodo de optimización.

La regla de cadena me permite pasar los gradientes de la salida de una neurona hacía la entrada

Activation

Forward propagation

**back propagation**

https://playground.tensorflow.org

https://realpython.com/python-ai-neural-network/

Neural Networks and Deep Learning http://neuralnetworksanddeeplearning.com/

**Lote (Batch)**

**Entrada (Input)**

**Activaciones (Activations)**

**Pesos (Weights)**

**Salida (Output) **

## Optimizers

## Hyperparameters

In a Keras model, hyperparameters such as optimizer, loss, and metrics have crucial roles in defining how the model will be trained and evaluated. Let's discuss each of these and the context in which they should be used:

**Optimizer**: Optimizers in Keras help to adjust the attributes of your neural network such as weights and learning rate to reduce the losses. Different optimizers suit different problems and can significantly affect the model's performance and convergence speed.**SGD**: Stochastic Gradient Descent, which is the most basic optimizer. It's robust but can be slow and sensitive to the learning rate choice.

**RMSprop**: Usually a good choice for recurrent neural networks.

**Adam**: A good default choice for many problems, it combines the advantages of RMSprop and SGD with momentum.

**Adagrad, Adadelta, Adamax, Nadam**: Other variants of optimizers, each with its strengths, but in most cases, Adam should suffice.

**Loss**: Loss function or cost function is a method to calculate the disparity between the predicted output and the actual output. This is the function that the model will strive to minimize.**Mean Squared Error (MSE)**: Used for regression problems (predicting a continuous value).

**Binary Cross-Entropy**: Used for binary classification problems (predicting a yes/no outcome).

**Categorical Cross-Entropy**: Used for multi-class classification problems, where the outputs are one-hot encoded.

**Sparse Categorical Cross-Entropy**: Like categorical cross-entropy, but for integer targets.

**Weighted cross-entropy loss**: Used for unbalanced multi-class classification problems, where the ouputs are one-hot encoded.

**Metrics**: Metrics are used to judge the performance of your model. Choosing the right metric is essential to judge your model accurately.**Accuracy**: Suitable for classification problems, especially if the classes are balanced.

**Precision, Recall, F1-score**: These are more informative than accuracy for binary classification, especially if the classes are imbalanced.

**MSE, RMSE, MAE (Mean Absolute Error)**: Suitable for regression problems.

Other hyperparameters include:

**Learning rate**: This determines how fast the model learns, a too-high learning rate might cause the model to diverge, a too-small learning rate might cause the model to converge too slowly.

**Batch size**: This is the number of samples that will be propagated through the network simultaneously. Smaller batch sizes require less memory but can result in noisy gradient updates. Larger batch sizes require more memory but can result in more accurate gradient updates.

**Number of epochs**: This is the number of times the model will iterate over the entire dataset. You should set this as high as possible and use techniques like Early Stopping to prevent overfitting.

Choosing the right hyperparameters often involves trial and error and can be guided by experience, knowledge about the problem and the data, or hyperparameter tuning techniques such as grid search or random search.

## Redes neuronales convuncionales

estructura de localidad

## Redes neuronales recurrentes

- Las redes neuronales son aproximadores universales para funciones continuas:
**Funciones matemáticas**

- Las redes neuronales recurrentes son equivalentes a máquinas de Turing:
**Algoritmos**

- El algoritmo de backpropagation va encontrar una configuración de la red que imita el comportamiento de los datos

"Multilayer feedforward networks with a nonpolynomial activation function can approximate any function". Neural Networks. 6 (6): 861–867. Siegelmann, H. T., & Sontag, E. D. (1992, July). On the computational power of neural nets. In Proceedings of the fifth annual workshop on Computational learning theory (pp. 440-449).

Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. nature, 323(6088), 533-536.

[En inglés] Python AI: How to Build a Neural Network & Make Predictions

https://realpython.com/python-ai-neural-network/

[En español traducido por google] Python AI: Cómo construir una red neuronal y hacer predicciones https://realpython-com.translate.goog/python-ai-neural-network/?_x_tr_sl=en&_x_tr_tl=es&_x_tr_hl=en-US&_x_tr_pto=wapp

**Sitios**

- Lewis, O. (2023). Awesome Artificial Intelligence (AI).
[Inglés]https://github.com/owainlewis/awesome-artificial-intelligence

### Libros

- Intelligence, A. (2021). A Modern Approach, 4th US ed.[Inglés sitio oficia]https://aima.cs.berkeley.edu/

- Zhang, A., Lipton, Z. C., Li, M., & Smola, A. J. (2020). Dive into Deep Learning. [Inglés]https://d2l.ai/

### Cursos

- DEEP LEARNING · Deep Learning. (n.d.). https://atcold.github.io/NYU-DLSP21/

### Videos

- Irving Vasquez. (2022, August 23).
*Introducción a las redes neuronales - Presentación del curso*. [Español]

Understanding LSTM Networks:

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Recurrent Neural Networks and LSTM explained:

https://purnasaigudikandula.medium.com/recurrent-neural-networks-and-lstm-explained-7f51c7f6bbb9

Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM): https://www.youtube.com/watch?v=WCUNPb-5EYI

https://homes.cs.washington.edu/~pedrod/

## Tensorflow

### Tensorflow APIS for training

Tensorflow offers the input of the pipeline (tf.data), Keras and Estimator. Batch processing is doing with tf.distributte.

Why input pipeline? Because the data might not enough to fit in memory, efficient utilize hardware, decuple loading and preprepecessing. ETL are the typical stages for batch procesing, extract strage read from memory or remote storage, parse file format, Transform stage it performs specific domain transformations, load stage , transfer data to the accelator.

GPU/TPU processing power has a big gap with respect with CPU processing.

A typical batch processing for Deep Learning looks like

```
dataset = read data from storage as stream
dataset =
apply distributed pipe operators to dataset which is executed as a dataflow graph
build the architecture of the model with high level APIs
model.fit(dataset)
```

Optimizations are Software pipeline, Parallel transformation, and Parallel extraction.

Tensorflow Dataset is a project to onboard new novel users.

## Unsupervised Learning - Clustering

Data has lots of rich latent structures. We want methods to discover these structures automatically.

Input: a training set of input points

Output: assignment of each point to a cluster

K-mean algorithm

DB Scan

Hierarchical clustering

## K-Means algorithm

Even though KMeans is not the best algorithm for cluster data, it illustrates the task and a simple solution. In Kmeans, each cluster $k=1,...,K$ is represented by a centroid $\mu_k \in \mathbb{R}^d$ and the objective is each vector $\phi(x_i)$ is assigned to the closest centroid. Formally, the objective function is

Algorithm: K-means

```
Initialize \mu_1, ..., \mu_K randomly
for t=1,...,T:
Step 1: set assignments z given \mu
Step 2: set centroids \mu given \z
```

# Knowledge in Learning

# Learning Probabilistic Models

# Reinforcement Learning

# * Communicating, perceiving, and acting

# MACTI (Temporal)

**Inducción**

https://www.youtube.com/watch?v=9AwJrXAz9QA

https://www.youtube.com/watch?v=CDYLHa63ws4

https://www.youtube.com/watch?v=ERYgaGKaHoE

https://www.youtube.com/watch?v=KX4DdZeRAsI

https://www.youtube.com/watch?v=zdpDR_F2ovg

https://www.youtube.com/watch?v=eAmFytbeNTc

Morozov, E. (2023, April 3). *Ni es inteligente ni es artificial: esa etiqueta es una herencia de la Guerra Fría*. El País.https://elpais.com/ideas/2023-04-03/ni-es-inteligente-ni-es-artificial-esa-etiqueta-es-una-herencia-de-la-guerra-fria.html

Varios (2023, March 10).* Declaración de Montevideo sobre Inteligencia Artificial y su impacto en América Latina.*https://www.fundacionsadosky.org.ar/declaracion-de-montevideo-sobre-inteligencia-artificial-y-su-impacto-en-america-latina/

Podcast T4-E06-Sebastián Ramírez-Contribuyendo al Opensource • Saturdays.AI. (n.d.). Retrieved June 12, 2023

[Podcast] https://saturdays.ai/2022/09/07/podcast-t4-e06-sebastian-ramirez-contribuyendo-al-opensource/

- Parr, T., & Howard, J. (2018).
*The Matrix Calculus You Need For Deep Learning*(arXiv:1802.01528). arXiv. [Inglés]https://doi.org/10.48550/arXiv.1802.01528

- Kutyniok, G. (2022).
*The Mathematics of Artificial Intelligence*(arXiv:2203.08890). arXiv.[Inglés]https://doi.org/10.48550/arXiv.2203.08890

- Werness, B., & Hu, R. (n.d.).
*22. Appendix: Mathematics for Deep Learning — Dive into Deep Learning 1.0.0-beta0 documentation*. [Inglés]https://d2l.ai/chapter_appendix-mathematics-for-deep-learning/index.html

- Berner, J., Grohs, P., Kutyniok, G., & Petersen, P. (2022).
*The Modern Mathematics of Deep Learning*(pp. 1–111).[Inglés]http://arxiv.org/abs/2105.04026

- Porat, B. (2014). A Gentle Introduction to Tensors.
*Israel: Department of Electrical Engineering Technion, Israel Institute of Technology*.[Inglés]https://www.ese.wustl.edu/~nehorai/Porat_A_Gentle_Introduction_to_Tensors_2014.pdf

*GPT-3: La supernova del modelado del lenguaje | Ivan Vladimir Meza Ruiz*. (2023). Blog personalhttps://turing.iimas.unam.mx/~ivanvladimir/posts/chat-gpt/

### Programación en Python

## Otros

https://github.com/ivanvladimir/Proyectos-MeIA/tree/main

# Explainability of Complex Machine Learning Models

```
graph TD
ExplanationMethods --> ExplainableByDesign --> ExplnationsForGlassBoxes
ExplainableByDesign --> EngineerdExplanations
ExplanationMethods --> PostHocExplanationsForBlackBoxModels
PostHocExplanationsForBlackBoxModels --> Local
Local --> Counterfactuals
Local --> FeatureImportance
FeatureImportance-->LIME
FeatureImportance-->SHAP
FeatureImportance-->DALEX
FeatureImportance-->NAM
FeatureImportance-->CIU
FeatureImportance-->GRADCAM
FeatureImportance-->IG
Local --> Prototypes
PostHocExplanationsForBlackBoxModels --> Global
Global--> Prototypes
Global-->SetOfLocalExplanations
Global-->ModelDistillation
```

Post-hoc local feature importance, rule-based, prototypes, counterfactuales

# Perception

# Robotics

# *Philosophical Foundations

# Weak AI: Can Machines Act Intelligently?

# Strong AI: Can Machines Really Think?

# The Ethics and Risks of Developing Artificial Intelligence

# AI: Present and Future

# Build your product with Artificial Intelligence

## Low level

Tensorflow

OpenCV

dlib

## Mid level

https://developers.google.com/mediapipe

## High level

`face_recognition`

https://github.com/steven2358/awesome-generative-ai

# TODO

independently and identically distributed