🧠

Artificial Intelligence

Image generated by ChatGPT4o

Requisites

compressive sensing (sparse coding),

information theory,

control theory,

economics,

logic,

operations research,

game theory,

and optimization.

Introduction

What is AI?

We struggle with what is intelligent, but not artificial.

https://www.tor.com/2011/06/21/norvig-vs-chomsky-and-the-fight-for-the-future-of-ai/

We use rational agents as our approach. An agent is an entity that can perceive an environment X and can act on it where X can be virtual or physical. How an agent decides to act given all previous considerations is a black box.

So, intelligence or rationality is when an agent makes the best decision in a given environment and goal with constraints and acts in accordance with it. Therefore, we evaluate an agent by its results, not by its mental state.

If an agent makes the best decision in all environments and goals, it is called artificial general intelligence.

If an agent makes the best decision in a specific environment, it is called weak AI.

From a scientific point’s view is understanding complete AI, but from an engineering point’s view is making incomplete, imperfect, and weak artificial intelligence.

Some constraints are lack of knowledge, time to learn, time to execute, money, actuators, and sensors.

But, now black boxes become white boxes. If you are here, then you are interested in white boxes which are computational procedures. You’re going to learn them!

The goal of IA
Agents. How can we create intelligence?
Tools. How can we use IA techniques to solve techniques?

An intelligence agents

Perception
Robotics
Language
Knowledge
Reasoning
Learning

How can we make systems that behave as humans?
Are we there yet?
Today, Machines do narrow tasks, humans do broad tasks.

IA Agents:
achieving human-level intelligence

IA tools
Understanding and solving real problems.
Predicting poverty.
Self-driven cars.
Authentication.

Ecosystem

https://inside.com/ai

https://www.csail.mit.edu/

https://bair.berkeley.edu/

http://ai.ucsd.edu/

https://www.microsoft.com/en-us/research/collaboration/bair/

https://people.eecs.berkeley.edu/~yima/ by https://twitter.com/YiMaTweets

Story

Computing Machinery and Intelligence.

The Imitation Game (Turing test).

Player A is a computer who claims to be a man.

Player B is a man.

Interrogator.

Total Turing Test.

Loebner Prize.

The Argument from Extrasensory Perception. During Cold War, people were interested in viz, telepathy, and precognition, so Turing prepared an argument for the situation.

https://ai-timeline.sanchezcarlosjr.com/

https://courses.cs.washington.edu/courses/csep590a/06au/projects/history-ai.pdf

https://plato.stanford.edu/entries/artificial-intelligence/

https://journals.sagepub.com/doi/pdf/10.1177/0008125619864925

https://dl.acm.org/doi/fullHtml/10.1145/2063176.2063177

https://sitn.hms.harvard.edu/flash/2017/history-artificial-intelligence/

Association for Computing Machinery (ACM). (2022, December 22). January 2023 CACM: The End of Programming. Youtube. Retrieved from https://www.youtube.com/watch?v=OnYJXm9NvyA&ab_channel=AssociationforComputingMachinery(ACM)

💡
“We propose that a 2 month, 10 man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire. The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.” McCarthy, John, Marvin L. Minsky, Nathaniel Rochester, and Claude E. Shannon. "A proposal for the dartmouth summer research project on artificial intelligence, august 31, 1955." AI magazine 27, no. 4 (2006): 12-12.

Levels

https://www.youtube.com/watch?v=oM5LUGPO7Zk

State-based models: search problems, MDPs, games

Variable-based models: CSPs, Bayesian networks

Logic-based models: propositional logic, first-order logic

Related work

Cognitive science

Philosophy of mind

Worked examples

FAQ

I learned in the Theory of computation some problems are undecidable, but I see those problems solved with Artificial Intelligence, how is that?

AI Framework.

Intelligent Agents. Rational agents.

https://ai.facebook.com/blog/yann-lecun-advances-in-ai-research/

Agent class

👉🏼
Agent = Architecture + Program.
Artificial intelligence’s concern is the
program, an autonomous computional program.
graph TD
subgraph Architecture
  Information --> Program
  Program --> Decision
end

Worked examples

Further readings

Economic agents.

Robotics.

Autonomic computing

Solving problems by Searching.

💡
Searching problems are optimization problems.
graph TD
  S["S, environment"] -->|"cost(S,action1, A)"| A["A, new environment"]
  S -->|"cost(S,action2, B)"| B["B, new environment"]
  S -->|"cost(S,action3, C)"| C["C, new environment"]
  subgraph possible_solutions3
    C-->E["..."]
  end
 subgraph possible_solutions2
    B-->F["..."]
  end
  subgraph possible_solutions1
    A-->D["..."]
  end
  D-->Goal,["Goal, new environment == goal"]

Searching problem model

classDiagram
    class SearchProblem {
      heuristic()
      start_state()
      is_goal(state)
      expand(state)
      valid_actions_from(state)
      action_cost(state, action, next_state)
      next_state(state, action)
    }
    class State {
       distance_from_start_state
       previous_state
       environment
       build()
       relax()
       reconstruct_path()
    }
    class SearchingStrategy {
       findPlanFor(problem)
    }
    class Agent {
      searchingStrategy: SearchingStrategy
      problem: SearchProblem
      act(state)
    }
    Agent *-- SearchingStrategy
    Agent *-- SearchProblem
    SearchingStrategy <|-- BFS
    SearchingStrategy <|-- DFS
    SearchingStrategy <|-- AStar
    SearchingStrategy <|-- Dijistra
    SearchingStrategy <|-- LinearProgramming

Searching strategies

Searching strategies

Searching for solutions

Considerations.

We build a tree or graph on demand by searching strategies.

We have got to avoid cycles in order to keep away infinite loops.

Each new state saves the reference's previous state and we only choose valid action, so when our searching algorithm reaches the goal; it reconstructs the path from the start state.

SA...GoalS\to A\to ...\to Goal

Codification matters.

Uninformed search strategies

BFS

DFS

Informed Search Strategies

A*, Greedy Search, Hill Climbing, Simulated Annealing, Best-First Search

Heuristic Functions

A heuristic is a function that estimates the distance from the current state to the goal.

f(n)f(n) is the real or estimated cost of the nn solution.

g(n)g(n) is the cost to reach nn from the start state. i=0ncost(i,action,i+1)\sum_{i=0}^n cost(i,action,i+1)

h(n)h(n) is the estimated cost to reach the goal state from the nn state, so it uses the available information from the problem or environment state in order to estimate the cost.

h(n)h^*(n) is the real cost to reach the goal state from the nn state.

i=ngcost(i,action,i+1)\sum_{i=n}^g cost(i,action,i+1)

Note h(n)0h(n) \to0 and h(n)0h^*(n)\to 0.

Properties

Main idea: estimated heuristic \le actual costs.

Admissibility.

0h(x)costs to goal0 \le h(x) \le \text{costs to goal}

Consistency.

h(x)h(y)costs(x,y)h(x)-h(y) \le costs(x,y)

Dominance.

Optimal.

How do you find a heuristic function?

Relax your problems, use available information about the current state or the goal, use min, and max functions, or use some distance functions such as Manhattan distance, Euclidean distance, Hamming distance, … and norms.

Beyond Classical Search.

Heuristic. Safe steps.

Offline, Online.

Solving problems by searching are graph algorithms that generate new nodes by heuristic and safe steps and test them, wrong answers are rejected them.

Hands-on Projects

8-Queen solver

Hanoi tower

Maze solver

Graph Theory Visualizer: Maze. (2022, July 03). Retrieved from https://graph-theory.sanchezcarlosjr.com

Project 0 - Unix, Python and Autograder Tutorial - CS 188: Introduction to Artificial Intelligence, Spring 2021. (2022, September 29). Retrieved from https://inst.eecs.berkeley.edu/~cs188/sp21/project0/#question-1-addition

The farmer, fox, goose, and grain

Integral solver

Pacman

Project 1 - Search - CS 188: Introduction to Artificial Intelligence, Spring 2021. (2022, September 29). Retrieved from https://inst.eecs.berkeley.edu/~cs188/fa22/projects/proj1/

Worked examples

References

https://aimacode.github.io/aima-javascript/3-Solving-Problems-By-Searching/

How to solve it: Modern Heuristics by Zbigniew Michalewicz, David B. Fogel.

Adversarial Search

MinMax

https://www.youtube.com/watch?v=l-hh51ncgDI&ab_channel=SebastianLague

Monte-Carlo

matchmaking algorithms

Hands-on Projects

Tic Tac Toe

Chess

Pacman v2

Notion – The all-in-one workspace for your notes, tasks, wikis, and databases. (2023, April 20). Retrieved from https://www.chessengines.org

Project 2. (2022, October 13). Retrieved from https://inst.eecs.berkeley.edu/~cs188/fa22/projects/proj2

Variable-based models with Factor Graphs

Now we embark on our journey through variable-based models in which we will think in terms of variables, factors, and weights. In particular, in this section we explore Factor graphs and their special cases: Constraint Satisfaction Problems (CSP), Markov networks and Bayesian networks. We don’t relay anymore in searching all possible solutions anymore. Instead, we assign to variables, int this way, allowing algorithms to infer the variables ordering, etc.

graph TD
    subgraph Variables
        X1((X1))
        X2((X2))
        X3((X3))
    end
    
    subgraph Factors
        f1[f1]
        f2[f2]
        f3[f3]
        f4[f4]
    end
    
    X1 --- f1
    X1 --- f2
    X2 --- f2
    X2 --- f3
    X3 --- f3
    X3 --- f4

Formal definition

Constraint Satisfaction problems are defined by a set of variables XiX_i, each with a domain DiD_i of possible values, and a set of constraints CC. The aim is to find an assignment of the variables XiX_i from the domains DiD_i in such a way that none of the constraints CC are violated. Informally, our goal is to find the best assignment of values to the variables.

Variables-based models

A constraint satisfaction problem consists of three components X, D, and C:

X is a set of variables

D is a set of domains

C is a set of constraints that specify allowable combinations of values

Factors

Each assignment x

Objective arg max W

continuous-domain CSPs is of linear programming

Message Passing

Exercises and Projects

Summary

Key decisions

FAQ

Reference Notes

https://stanford.edu/~shervine/teaching/cs-221/cheatsheet-variables-models#bayesian-networks

https://gtsam.org/2020/06/01/factor-graphs.html

https://stanford-cs221.github.io/spring2024-extra/modules/csps/csps1.pdf

Knowledge, reasoning, and planning

Although traditional logical agents provide us expressiveness in a compact way, they are inherently deterministic and struggle to handle unstructured data. These systems follow predefined rules, making it difficult to manage uncertainty and ambiguity across diverse domains. Additionally, representing and processing unstructured data (e.g., text, images, time series, video) is challenging and often requires significant manual effort and expense. This lack of flexibility limits their ability to generalize across different domains as effectively as modern Deep Learning models.

Logical Agents

Knowledge-based agents

Different syntax, same semantics: 2+3    3+22+3\iff3+2

Same syntax, different semantics: 3/2 (Python 2.7)    3+2 (Python 3)3/2 \text{ (Python 2.7)} \iff3+2 \text{ (Python 3)}

A knowledge base is a set of sentences, each sentence is an assertion about the world given a representation language for a specific domain. Logic consists of syntax, semantics, and inference rules. The formulas by themselves are just symbols (syntax), they don’t provide meaning.

A knowledge-based agent is composed of a knowledge base which depends on domain-specific content and an inference mechanism. They can represent states, actions, and weights, incorporate new percepts, update internal representations of the world, and deduce hidden properties of the world.

Semantics is the interpretation function.

In a declarative approach to building a logical agent, we add new sentences because we tell it what it needs to know and query what is known

Natural Language?

We can save knowledge in different data models and apply different inference mechanisms Knowledge graphs.

Entailment. It adds trivial information to KB.

Contradiction.

Contingency. It adds non-trivial information to KB.

http://intrologic.stanford.edu/dictionary/logical_entailment.html

Learning formulas.

A language needs syntax, semantics, and implementation level.

The syntax of a language defines a set of valid formula

Prolog, Relational databases, SQL, Datalog?

Intelligent agents need knowledge about the world to choose good actions.

A model or world ww in propositional logic is an assignment of truth values to propositional symbols.

Modeling and inference

Propositional logic with only Horn clauses

Propositional logic

Modal logic

First-order logic

Second-order logic

Tell[f] → KB

Possible responses:

Ask[f] → KB

Possible responses:

A knowledge base KB is satisfiable if M(KB)=M(KB)=\emptyset

Execution engine.

Knowledge base (domain-specific facts) + inference engine.

Syntax, set of possible worlds, truth condition.

Sound Algorithm.

Complete Algorithm.

Theorem-proving.

Model-checking.

https://www.youtube.com/watch?v=xL0kNw5TudI&t=192s

https://www.youtube.com/watch?v=oM5LUGPO7Zk&list=PLh7QmcIRQB-uiOS4GMlBbq0jkvtqhqtq0

https://www.youtube.com/watch?v=CAsq7hm3sbI&ab_channel=IITDelhiJuly2018

https://www.youtube.com/watch?v=xFpndTg7ZqA&t=1s&ab_channel=IITDelhiJuly2018

https://www.youtube.com/watch?v=h6zCkrZ8ehE&t=1s&ab_channel=RichNeapolitan

tammet. (2022, December 13). gkc. Retrieved from https://github.com/tammet/gkc

Program

class KnowledgeAgent:
  KB knowledge base
  t int
  act(environment):
    tell(KB, MakePerceptSentence(environment, t))
    action = ask(KB, MakeActionQuery(t))
    tell(KB, MakeActionSentence(action, t))
    t = t+1
    return action 

Inference machine

First-Order Logic

Inference in First-Order Logic

Worked examples

Knowledge graph

Prolog

RDF

Datalog

SQL and open cypher (Apache Age)

https://opencypher.org/

Projects

Card fraud detector

Make an online quiz system about Artificial Intelligence

8-eight queen

Pacman Finder

https://inst.eecs.berkeley.edu/~cs188/sp21/project3/

Wordle Solver

https://swi-prolog.discourse.group/t/wordle-solver/5124

https://cheatle.occasionallycogent.com/

Resources

You may also consider picking up some of the following books

  • Clocksin - Mellish: Programming in Prolog

FAQ

What are real-world projects where people use PROLOG?

https://www.quora.com/What-is-Prolog-used-for-today

https://www.cs.nmsu.edu/ALP/2011/03/natural-language-processing-with-prolog-in-the-ibm-watson-system/

https://www.drdobbs.com/parallel/the-practical-application-of-prolog/184405220

Classical Planning

Planning and Acting in the Real World

Knowledge representation

Uncertain knowledge and reasoning

Quantifying Uncertainty

Probabilistic Reasoning

Probabilistic Reasoning over Time

Making Simple Decisions

Making Complex Decisions

Machine Learning

Machine Learning is field of study that gives computers the ability to learn without being explicitly programmed Arthur Samuel (1959). Traditional programming and classic artificial intelligence involves writing rules that act on data to produce answers. But if you flip this approach, you get machine learning. In this case, we gather a large amount of data and answers, apply a learning algorithm, and as an output, we acquire rules or models. These models can then make predictions without being specifically programmed to perform the task.

Machine Learning.


The main driver of recent successes in IA. Move from “code” to “data” to manage information complexity. The goal is the Generalization.

Reflex-based models. Linear classifiers, deep neural networks.
Modeling. Simplify the real world into a well-defined mathematical model. Example: planning goes from A to B in a city. Inference. Developing algorithms to find new data. Learning. Model without parameters such that we use data to learn those parameters by applying an algorithm.

We classify Machine Learning as Supervised learning, nonsupervised learning and reinforcement learning. The below table gives you an overview of learning algorithms.

Created by https://solclover.com with Plotly
Learning algorithmWhen to UseRelevant Metrics
Linear RegressionWhen there's a linear relationship between the input and output.Mean Squared Error (MSE), R-squared, Adjusted R-squared
Logistic RegressionFor binary classification problems.Accuracy, Precision, Recall, AUC-ROC
Decision TreesWhen there's a need to understand the decision-making process. Useful for both classification and regression.Gini Index, Information Gain for model construction; Accuracy, Precision, Recall for evaluation
Random ForestWhen model interpretability is less important and you need higher performance.Out-of-bag (OOB) error, Accuracy, Precision, Recall
K-Nearest NeighborsWhen instances of the same class are generally close to each other in the feature space.Accuracy, Precision, Recall, F1 Score
Support Vector MachinesWhen there's a clear margin of separation between classes.Accuracy, Precision, Recall, F1 Score
Neural NetworksFor complex problems like image recognition, speech recognition, and natural language processing.Depends on the task, but often includes Accuracy, Precision, Recall, AUC-ROC, and Loss metrics like Cross-Entropy Loss
XGBoostBest for heterogeneous structured datasets

Ensam

1.1 Introducción
1.2 Aplicaciones
1.3 Principales enfoques de aprendizaje automático
1.4 Paradigmas de aprendizaje automático
1.5 Conceptos básicos
1.6 Problemas fundamentales
1.7 Evaluació n de modelos aprendidos

2.1 Introducció n
2.2 Desarrollo histó rico del paradigma
2.3 Árboles de decisió n
2.4 Reglas de inducció n
2.5 Aplicaciones
2.6 Tó picos selectos

3.1 Introducció n
3.2 Desarrollo histó rico del paradigma
3.3 Algoritmos gené ticos
3.4 Programació n gené tica
3.5 Aplicaciones
3.6 Algoritmos bioinspirado

4.1 Introducció n
4.2 Desarrollo histó rico del paradigma
4.3 Teorema de Bayes
4.4 Ingenuo bayesiano
4.5 Aplicaciones
4.6 Modelos gráficos probabilistas

5.1 Introducció n
5.2 Desarrollo histó rico del paradigma
5.3 Redes Neuronales Artificiales (RNA)
5.4 Algoritmo de retro-propagació n
5.5 Aplicaciones
5.6 Revisió n de arquitecturas de RNA

6.1 Introducció n
6.2 Desarrollo histó rico del paradigma
6.3 K-vecinos más cercanos
6.4 Máquinas de soporte de vectores
6.5 Aplicaciones
6.6 Tó picos selectos

https://github.com/afshinea/stanford-cs-229-machine-learning/tree/master/en

https://course.fast.ai/

https://www.fast.ai/

https://realpython.com/python-ai-neural-network/

https://huggingface.co/

Libros

  1. François Fleuret’s Homepage. (n.d.). Retrieved June 16, 2023, from https://fleuret.org/francois/#lbdl

Notas

CM. (n.d.). International Conference on Machine Learning. https://icml.cc/


Domingos, P. (2017). The Master Algorithm. The MIT Press.

GECCO. (2022). The Genetic and Evolutionary Computation Conference. https://gecco-2022.sigevo.org/HomePage


Mitchell, T. (1997). Machine Learning. McGraw Hill.


Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. The MIT Press.


NeurIPS (2021). Conference on Neural Information Processing Systems.
https://nips.cc/

Bishop, C. M. (2006). Pattern Recognition and Machine Learning. The MIT Press.


CVF. (n.d.). Computer Vision Foundation.
http://openaccess.thecvf.com/menu.py

Nunes, L. (2006). Fundamentals of Natural Computing: Basic Concepts, Algorithms, and Applications. Chapman & Hall/CRC. [Clásica].

Russell, S., & Norvig, P. (2009). Artificial Intelligence: A Modern Approach. Pearson. [Clásica].

Sucar, L. E. (2015). Probabilistic Graphical Models: Principles and Applications. Springer. [Clásica].

Tan, P., Steinbach, M., Karpatne, A. & Kumar, V. (2018). Introduction to Data Mining (2nd ed.). Pearson.

  1. Clasificación con Árboles de Decisión: el algoritmo CART | Codificando Bits. (n.d.). Retrieved June 16, 2023, from https://www.codificandobits.com/blog/clasificacion-arboles-decision-algoritmo-cart/
  1. Sanz, F. (2020, November 30). Cómo funciona el algoritmo XGBoost en Python. The Machine Learners. https://www.themachinelearners.com/xgboost-python/https://www.themachinelearners.com/xgboost-python/
  1. Graff, M. (2022) Aprendizaje Computacionalhttps://ingeotec.github.io/AprendizajeComputacional/

Learning from Examples

When you have a dataset with features (X) and labels (Y), supervised learning means finding the relation mapping from XX to YY.

Which so an interesting algorithm? This algorithm learns from examples, that is, you have a training set, model the task, for example as linear regression, and it gives a hypothesis which is a model that maps input features to target. The learner is an optimization algorithm that needs an optimization problem, that is, our task is split into finding the right optimization model and then employing the right optimization algorithm.

The optimization problem relies on min Loss(x,y,w)Loss(x,y,w).

Loss minimization tasks. min TrainLoss(w)

The score is how confident we are.
The margin is how correct we are.

You might ask how can you get the training set, how can you deploy the hypothesis, and how can you know what method to apply, the answers are in.

Development cycle

Split data into train, val, test
Exploratory data
Repeat:
 - Implement feature/tune hyperparameters
 - Run learning algorithm
 - Sanity check train and val error rates
 — Look at errors to brainstorm improvements
 - Log as far as you can (reports)
- Run on test set to get final error rates

Most of the time, the test metric does not decrease.

Optimization.

Discrete optimization. find the discrete object

min Cost(p)
p in Paths

Algorithmic tool: dynamic programming

Continuous optimization: find the best vector of real numbers that satisfies
min TrainingError(w)
w in R^d

Algorithmic tool: gradient descent



Stanford CS221: Artificial Intelligence: Principles and techniques

Ground true.
It refers to the expected label associated with a dataset.

graph TD
  TrainingSet --> LearningAlgorithm
  LearningAlgorithm -->  Hypothesis

Given the features χ\chi the hypothesis hh is a predictor, to target yy. The features χ\chi denote the space of input values, and target yy the space of output values. So the supervised learning goal is to find a good predictor h:χyh: \chi \to y.


class Model
  fit(trainingset)
     apply a learning algorithm to training examples
     generates a hypothesis
  predict(instances)
      apply the hypothesis to instances

You call regression problems when yy is continuous otherwise yy is discrete, so we call it a classification problem.

Suppose you have a linear regression problem, you may represent hh as h(x)=mx+bh(x)=mx+b, an affine function. More generally, h(x)=θxTh(\bold{x})=\bold{\theta} \bold{x}^T, θ=[θ0,θ1,θ2,...,θn],x=[x0,x1,...,1]\bold{\theta}=[\theta_0,\theta_1,\theta_2,...,\theta_n],\bold{x}=[x_0,x_1,...,1] where θ\bold{\theta} are the “parameters” that allows us to make good predictions.

Regression
Binary classification

Loss functions

A. Chadha, V. Jain, Distilled Notes for Machine Learning , https://www.vinija.ai, 2022, Accessed: July 1 2022.

Distance metrics

Computing edit distance

Input: two strings, s and t
Output: minimum number of character insertions, deletions, and substitutions between s and t.

Example:

s: a cat
t: the cats!

General principles are reducing the problem and away details.

Linear prediction

Linear Regression


minF(w)=i=1n(wxiyi)2min F(w) = \sum_{i=1}^n (wx_i-y_i)^2

Input: set of pairs.
Output: w\in R that minimizes the squared error F(w= \sum_{i=1}^n (x_I w - y_i)^2.

Algorithm Gradient Descent.

Linear prediction.
Score: a weighted combination of features.
Weight vector w.

wϕ(x)w\phi(x)

fpf_p

Binary linear classifier

Decision boundary

Separate the space into different subspaces to classify.

fw(x)=sign(wϕ(x))f_w(x)=sign(w\phi(x))

Losshinge=max(1fw,0)Loss_{hinge}=max(1-f_w,0)
Case analyzes
\Delta Loss_{hinge} =
0 if w \phi(x)y >1 (
-\phi(x)y 0w
It increases the margin correctly.

Losslogistic(x,y,w)=log(1+e(wϕ(x)y))Loss_{logistic}(x,y,w)=log(1+e^-(w\phi(x)y))

Gradient descent and Stochastic gradient descent

Optimization.

Discrete optimization. find the discrete object

min Cost(p)
p in Paths

Algorithmic tool: dynamic programming

Continuous optimization: find the best vector of real numbers that satisfies
min TrainingError(w)
w in R^d

Algorithmic tool: gradient descent

The gradient Δf\Delta f is going to give the direction and the rate of value at a point f
The goal is to move in the contrary direction of the gradient.
Least squares regression.
Objective function:

TrainLoss(w)=1Dtrainsum(x,y)wfwTrainLoss(w) = \dfrac{1}{D_{train}} sum_{(x,y)\in w} f_w
Gradient
\Delta TrainLoss(W) = \Delta TrainLoss(W)

SGD
For each(x,y) in D_train:
w ← w - step_size Loss(x,y,w)

It’s about quality.

SGD can be worse than GD if the dataset has noise.

Step size

Strategies
Constant
Decreasing

Maximum likelihood estimation

Maximum likelihood estimation is the goal of training classifiers, that is, we’re finding the parameters θ\theta that maximize the probability for the actual observed data. p(y=1x;θ)p(y=1|x;\theta) refers to the conditional probability that the output is the class y=1y=1 given the input variables xx and the parameters of the model θ\theta.

independently and identically distributed

Types of problem

Pattern recognition

System recommendation

Types of data

  1. Vector Data: This is the most common and simplest form of data in machine learning. The dataset is a 2D tensor where each data point can be encoded as a vector. Examples can be anything from housing price prediction data (features being the number of rooms, location, size of the house, etc.) to text data (after applying some sort of vectorization like bag-of-words or TF-IDF).
  1. Natural language.
  1. Time series or Sequence Data: Time series data captures a series of data points recorded over regular time intervals. The order of data points is important here because the same set of data points in a different time order might mean something entirely different. Sequence data is very similar, but time isn't necessarily a factor here. Examples of these are stock price data, weather data, or any type of data where time plays a crucial role. For sequence data, a sentence or a DNA sequence would be a good example as the order of words or genes is important.
  1. Image Data: Images are represented as 3D tensors (height, width, color_depth). However, a batch of images used for training a model is stored in a 4D tensor (batch_size, height, width, color_depth). Deep learning models like Convolutional Neural Networks (CNNs) are designed to extract features from these 4D tensors and use them to classify images, detect objects, and more. Applications range from medical imaging (detecting diseases) to self-driving cars (identifying pedestrians, signs, etc.).
  1. Video Data: Video data can be thought of as a series of images, so naturally, this extends the image data tensor by one more dimension, the frame dimension. So a video dataset would be a 5D tensor (batch_size, frames, height, width, color_depth). Video data is used in various applications like activity recognition, video synthesis, and object tracking in videos.
  1. Graph data.

Data labeling

https://github.com/HumanSignal/awesome-data-labeling

Fine-tuning

What is Transfer Learning?

Datasets

Notebooks

https://www.youtube.com/watch?v=T-fAkfU9j_o&ab_channel=Elpensamientoenllamas

Natural Computing

NACO

https://fcampelo.github.io/EC-Bestiary/

black hole algorithm

Mandelbrot set from scratch, Markov text-generation, and John Conway’s Game of Life ar

Pattern Recognition

  1. Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer. Available at: Microsoft Research
  1. Duda, R. O., Stork, D. G., & Hart, P. E. (2001). Pattern Classification (2nd ed.). Wiley.
  1. Fu, K. (1974). Syntactic Methods in Pattern Recognition. Academic Press.
  1. Jürgen, M. & Matthias, N. (2018). Pattern Recognition: Introduction, Features, Classifiers and Principles. Berlin: De Gruyter Oldenbourg (De Gruyter Graduate). Available at: EBSCOhost.
  1. Koutroumbas, K. & Theodoridis, S. (2009). Pattern Recognition. Academic Press. Available at: EBSCOhost
  1. Murty, M. N. & Devi, V. S. (2011). Pattern Recognition: An Algorithmic Approach. Springer London. Available at: EBSCOhost
  1. Massachusetts Institute of technology. (n.d.). Mitopencourseware.
    Pattern Recognition and Analysis.
    https://ocw.mit.edu/courses/media-arts-and-sciences/mas-622j-pattern-recognition-and-analysis-fall-2006/syllabus/

Deep Learning

https://people.idsia.ch/~juergen/deep-learning-history.html

François Chollet - Deep Learning with Python-Manning Publications (2021)

Tensor

Automatic differentiation

La función de activación tiene como propósito incorporar no linealidades al modelo. Las redes neuronales están inspiradas en las redes neuronales reales.

Que es un gradiente? Metodo de optimización.

La regla de cadena me permite pasar los gradientes de la salida de una neurona hacía la entrada

Activation

Forward propagation

back propagation

https://playground.tensorflow.org

https://realpython.com/python-ai-neural-network/

Neural Networks and Deep Learning http://neuralnetworksanddeeplearning.com/

Lote (Batch)

Entrada (Input)

Activaciones (Activations)

Pesos (Weights)

Salida (Output)

Optimizers

Optimizers. Credits.

Hyperparameters

In a Keras model, hyperparameters such as optimizer, loss, and metrics have crucial roles in defining how the model will be trained and evaluated. Let's discuss each of these and the context in which they should be used:

  1. Optimizer: Optimizers in Keras help to adjust the attributes of your neural network such as weights and learning rate to reduce the losses. Different optimizers suit different problems and can significantly affect the model's performance and convergence speed.
    • SGD: Stochastic Gradient Descent, which is the most basic optimizer. It's robust but can be slow and sensitive to the learning rate choice.
    • RMSprop: Usually a good choice for recurrent neural networks.
    • Adam: A good default choice for many problems, it combines the advantages of RMSprop and SGD with momentum.
    • Adagrad, Adadelta, Adamax, Nadam: Other variants of optimizers, each with its strengths, but in most cases, Adam should suffice.
  1. Loss: Loss function or cost function is a method to calculate the disparity between the predicted output and the actual output. This is the function that the model will strive to minimize.
    • Mean Squared Error (MSE): Used for regression problems (predicting a continuous value).
    • Binary Cross-Entropy: Used for binary classification problems (predicting a yes/no outcome).
    • Categorical Cross-Entropy: Used for multi-class classification problems, where the outputs are one-hot encoded.
    • Sparse Categorical Cross-Entropy: Like categorical cross-entropy, but for integer targets.
    • Weighted cross-entropy loss: Used for unbalanced multi-class classification problems, where the ouputs are one-hot encoded.
  1. Metrics: Metrics are used to judge the performance of your model. Choosing the right metric is essential to judge your model accurately.
    • Accuracy: Suitable for classification problems, especially if the classes are balanced.
    • Precision, Recall, F1-score: These are more informative than accuracy for binary classification, especially if the classes are imbalanced.
    • MSE, RMSE, MAE (Mean Absolute Error): Suitable for regression problems.

Other hyperparameters include:

Choosing the right hyperparameters often involves trial and error and can be guided by experience, knowledge about the problem and the data, or hyperparameter tuning techniques such as grid search or random search.

Redes neuronales convuncionales

estructura de localidad

Redes neuronales recurrentes

  1. Las redes neuronales son aproximadores universales para funciones continuas: Funciones matemáticas
  1. Las redes neuronales recurrentes son equivalentes a máquinas de Turing: Algoritmos
  1. El algoritmo de backpropagation va encontrar una configuración de la red que imita el comportamiento de los datos

"Multilayer feedforward networks with a nonpolynomial activation function can approximate any function". Neural Networks. 6 (6): 861–867. Siegelmann, H. T., & Sontag, E. D. (1992, July). On the computational power of neural nets. In Proceedings of the fifth annual workshop on Computational learning theory (pp. 440-449).
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. nature, 323(6088), 533-536.

[En inglés] Python AI: How to Build a Neural Network & Make Predictions

https://realpython.com/python-ai-neural-network/

[En español traducido por google] Python AI: Cómo construir una red neuronal y hacer predicciones https://realpython-com.translate.goog/python-ai-neural-network/?_x_tr_sl=en&_x_tr_tl=es&_x_tr_hl=en-US&_x_tr_pto=wapp

Sitios

  1. Lewis, O. (2023). Awesome Artificial Intelligence (AI).

    [Inglés]https://github.com/owainlewis/awesome-artificial-intelligence

Libros

  1. Intelligence, A. (2021). A Modern Approach, 4th US ed.[Inglés sitio oficia]https://aima.cs.berkeley.edu/
  1. Zhang, A., Lipton, Z. C., Li, M., & Smola, A. J. (2020). Dive into Deep Learning. [Inglés]https://d2l.ai/

Cursos

  1. DEEP LEARNING · Deep Learning. (n.d.). https://atcold.github.io/NYU-DLSP21/

Videos

  1. Irving Vasquez. (2022, August 23). Introducción a las redes neuronales - Presentación del curso. [Español]

Understanding LSTM Networks:

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Recurrent Neural Networks and LSTM explained:

https://purnasaigudikandula.medium.com/recurrent-neural-networks-and-lstm-explained-7f51c7f6bbb9

Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM): https://www.youtube.com/watch?v=WCUNPb-5EYI

https://homes.cs.washington.edu/~pedrod/

Tensorflow

Tensorflow APIS for training

Tensorflow offers the input of the pipeline (tf.data), Keras and Estimator. Batch processing is doing with tf.distributte.

Why input pipeline? Because the data might not enough to fit in memory, efficient utilize hardware, decuple loading and preprepecessing. ETL are the typical stages for batch procesing, extract strage read from memory or remote storage, parse file format, Transform stage it performs specific domain transformations, load stage , transfer data to the accelator.

GPU/TPU processing power has a big gap with respect with CPU processing.

A typical batch processing for Deep Learning looks like

dataset = read data from storage as stream

dataset = 
   apply distributed pipe operators to dataset which is executed as a dataflow graph
build the architecture of the model with high level APIs
model.fit(dataset)

Optimizations are Software pipeline, Parallel transformation, and Parallel extraction.

Tensorflow Dataset is a project to onboard new novel users.

Unsupervised Learning - Clustering

Data has lots of rich latent structures. We want methods to discover these structures automatically.

Input: a training set of input points
Output: assignment of each point to a cluster

https://towardsdatascience.com/the-5-clustering-algorithms-data-scientists-need-to-know-a36d136ef68

K-mean algorithm
DB Scan
Hierarchical clustering

K-Means algorithm

Even though KMeans is not the best algorithm for cluster data, it illustrates the task and a simple solution. In Kmeans, each cluster k=1,...,Kk=1,...,K is represented by a centroid μkRd\mu_k \in \mathbb{R}^d and the objective is each vector ϕ(xi)\phi(x_i) is assigned to the closest centroid. Formally, the objective function is

minzminμLosskmeans(z,μ)=i=1nϕ(xi)μzi2min_{z}min_\mu Loss_{kmeans}(z,\mu)=\sum^n_{i=1}||\phi(x_i)-\mu_{z_i}||^2

Algorithm: K-means

Initialize \mu_1, ..., \mu_K randomly
for t=1,...,T:
    Step 1: set assignments z given \mu
    Step 2: set centroids \mu given \z

Knowledge in Learning

Learning Probabilistic Models

Reinforcement Learning

* Communicating, perceiving, and acting

MACTI (Temporal)

Inducción

https://www.youtube.com/watch?v=9AwJrXAz9QA

https://www.youtube.com/watch?v=CDYLHa63ws4

https://www.youtube.com/watch?v=ERYgaGKaHoE

https://www.youtube.com/watch?v=KX4DdZeRAsI

https://www.youtube.com/watch?v=zdpDR_F2ovg

https://www.youtube.com/watch?v=eAmFytbeNTc

Morozov, E. (2023, April 3). Ni es inteligente ni es artificial: esa etiqueta es una herencia de la Guerra Fría. El País.https://elpais.com/ideas/2023-04-03/ni-es-inteligente-ni-es-artificial-esa-etiqueta-es-una-herencia-de-la-guerra-fria.html

Varios (2023, March 10). Declaración de Montevideo sobre Inteligencia Artificial y su impacto en América Latina.https://www.fundacionsadosky.org.ar/declaracion-de-montevideo-sobre-inteligencia-artificial-y-su-impacto-en-america-latina/

Podcast T4-E06-Sebastián Ramírez-Contribuyendo al Opensource • Saturdays.AI. (n.d.). Retrieved June 12, 2023

[Podcast] https://saturdays.ai/2022/09/07/podcast-t4-e06-sebastian-ramirez-contribuyendo-al-opensource/

GPT-3: La supernova del modelado del lenguaje | Ivan Vladimir Meza Ruiz. (2023). Blog personalhttps://turing.iimas.unam.mx/~ivanvladimir/posts/chat-gpt/

Programación en Python

Otros

https://github.com/ivanvladimir/Proyectos-MeIA/tree/main

Explainability of Complex Machine Learning Models

graph TD
  ExplanationMethods --> ExplainableByDesign --> ExplnationsForGlassBoxes
  ExplainableByDesign --> EngineerdExplanations
  ExplanationMethods --> PostHocExplanationsForBlackBoxModels
  PostHocExplanationsForBlackBoxModels --> Local
  Local --> Counterfactuals
  Local --> FeatureImportance
  FeatureImportance-->LIME
  FeatureImportance-->SHAP
  FeatureImportance-->DALEX
  FeatureImportance-->NAM
  FeatureImportance-->CIU
  FeatureImportance-->GRADCAM
  FeatureImportance-->IG
  Local --> Prototypes
  PostHocExplanationsForBlackBoxModels --> Global
  Global--> Prototypes
  Global-->SetOfLocalExplanations
  Global-->ModelDistillation

  

Post-hoc local feature importance, rule-based, prototypes, counterfactuales

Perception

Robotics

*Philosophical Foundations

Weak AI: Can Machines Act Intelligently?

Strong AI: Can Machines Really Think?

The Ethics and Risks of Developing Artificial Intelligence

AI: Present and Future

Build your product with Artificial Intelligence

Low level

Tensorflow

OpenCV

dlib

Mid level

https://developers.google.com/mediapipe

High level

face_recognition

https://github.com/steven2358/awesome-generative-ai

TODO

independently and identically distributed

Resources

https://github.com/sanchezcarlosjr/artificial-intelligence