Artificial Intelligence
Requisites
compressive sensing (sparse coding),
information theory,
control theory,
economics,
logic,
operations research,
game theory,
and optimization.
Introduction
What is AI?
We struggle with what is intelligent, but not artificial.
https://www.tor.com/2011/06/21/norvig-vs-chomsky-and-the-fight-for-the-future-of-ai/
We use rational agents as our approach. An agent is an entity that can perceive an environment X and can act on it where X can be virtual or physical. How an agent decides to act given all previous considerations is a black box.
So, intelligence or rationality is when an agent makes the best decision in a given environment and goal with constraints and acts in accordance with it. Therefore, we evaluate an agent by its results, not by its mental state.
If an agent makes the best decision in all environments and goals, it is called artificial general intelligence.
If an agent makes the best decision in a specific environment, it is called weak AI.
From a scientific point’s view is understanding complete AI, but from an engineering point’s view is making incomplete, imperfect, and weak artificial intelligence.
Some constraints are lack of knowledge, time to learn, time to execute, money, actuators, and sensors.
But, now black boxes become white boxes. If you are here, then you are interested in white boxes which are computational procedures. You’re going to learn them!
The goal of IA
Agents. How can we create intelligence?
Tools. How can we use IA techniques to solve techniques?
An intelligence agents
Perception
Robotics
Language
Knowledge
Reasoning
Learning
How can we make systems that behave as humans?
Are we there yet?
Today, Machines do narrow tasks, humans do broad tasks.
IA Agents:
achieving human-level intelligence
IA tools
Understanding and solving real problems.
Predicting poverty.
Self-driven cars.
Authentication.
Ecosystem
https://www.microsoft.com/en-us/research/collaboration/bair/
https://people.eecs.berkeley.edu/~yima/ by https://twitter.com/YiMaTweets
Story
Computing Machinery and Intelligence.
The Imitation Game (Turing test).
Player A is a computer who claims to be a man.
Player B is a man.
Interrogator.
Total Turing Test.
Loebner Prize.
The Argument from Extrasensory Perception. During Cold War, people were interested in viz, telepathy, and precognition, so Turing prepared an argument for the situation.
https://courses.cs.washington.edu/courses/csep590a/06au/projects/history-ai.pdf
https://plato.stanford.edu/entries/artificial-intelligence/
https://journals.sagepub.com/doi/pdf/10.1177/0008125619864925
https://dl.acm.org/doi/fullHtml/10.1145/2063176.2063177
https://sitn.hms.harvard.edu/flash/2017/history-artificial-intelligence/
Association for Computing Machinery (ACM). (2022, December 22). January 2023 CACM: The End of Programming. Youtube. Retrieved from https://www.youtube.com/watch?v=OnYJXm9NvyA&ab_channel=AssociationforComputingMachinery(ACM)
Levels
State-based models: search problems, MDPs, games
Variable-based models: CSPs, Bayesian networks
Logic-based models: propositional logic, first-order logic
Related work
Cognitive science
Philosophy of mind
Worked examples
“Surely computers cannot be intelligent—they can do only what their programmers tell them.” Is the latter statement true, and does it imply the former?
Argument 0.
If we assume "The Universal Machine can do all possible processes", we cannot say "Some possible processes can't be done by the Universal Machine" since it follows a contradiction, that is, "The Universal Machine is not universal". The big questions are “are computers universal machines?”, “are humans universal machines?”, “Can we generate life, intelligence, or conscientiousness from inorganic materials?”
Argument 1.
Formerly,
Implication:
Conversely,
If , , so doesn’t imply .
But, is true?
All computer actions are programmed.
There are computer actions as randomness by entropy devices are not programmed.
Some computer actions are not programmed.
is false.
https://ieeexplore.ieee.org/document/7313842
BUG? Turing Machine Deterministic and Nondeterministic equivalence.
FAQ
I learned in the Theory of computation some problems are undecidable, but I see those problems solved with Artificial Intelligence, how is that?
AI Framework.
Intelligent Agents. Rational agents.
Agent class
graph TD
subgraph Architecture
Information --> Program
Program --> Decision
end
Code
@startuml Agent <|-- "Rational Agent" class Agent { actuators sensors type process: 'competitive' | 'cooperative' performance measure percepts(environment) choose the best decision by performance measure(): Decision } note left of Agent::percepts What is the environment like now? end note note left of Agent::"choose the best decision by performance measure" What action I should take now? It's called the agent program too. Building a small program is a key challenge. No brute force such as a vast table! But remember, sometimes, memory = intelligence. In fact, if you know all the possible results, it's called Omniscience. Memory is called safe steps. Decreasing memory increases heuristic steps. end note class "Rational Agent" {} note right: Learning and Autonomy "Rational Agent" <|-- "Reflex agent" "Rational Agent" <|-- "Model-based reflex agent" "Rational Agent" <|-- "Goal-based agent" "Rational Agent" <|-- "Utility-based agent" Agent --> Environment: action Environment --> Agent: percepts class Environment { observable: 'Fully' | 'Partially' agent: agents: Agent[] // Single, Multi deterministic: 'deterministic'| 'nondeterministic'| 'stochastic' sequential: 'episodic' | 'sequential' static: 'static' | 'dynamic' | 'semi-dynamic' discrete: 'discrete' | 'continuous' known: 'Known' | 'Unknown' } class Actuators Agent o-- Actuators class "Reflex agent" { choose the best decision by performance measure(): Decision - rule match(state, rules) - interpret input(environment): state } class "Model-based reflex agent" { current state choose the best decision by performance measure(): Decision - transit() } @enduml note right of "Reflex agent"::"choose the best decision by performance measure" condition-action rules f(x) end note note right of "Model-based reflex agent"::"choose the best decision by performance measure" state machine (DFA/Meadly/Moore/...) end note
Worked examples
Further readings
Economic agents.
Robotics.
Solving problems by Searching.
graph TD
S["S, environment"] -->|"cost(S,action1, A)"| A["A, new environment"]
S -->|"cost(S,action2, B)"| B["B, new environment"]
S -->|"cost(S,action3, C)"| C["C, new environment"]
subgraph possible_solutions3
C-->E["..."]
end
subgraph possible_solutions2
B-->F["..."]
end
subgraph possible_solutions1
A-->D["..."]
end
D-->Goal,["Goal, new environment == goal"]
Searching problem model
classDiagram
class SearchProblem {
heuristic()
start_state()
is_goal(state)
expand(state)
valid_actions_from(state)
action_cost(state, action, next_state)
next_state(state, action)
}
class State {
distance_from_start_state
previous_state
environment
build()
relax()
reconstruct_path()
}
class SearchingStrategy {
findPlanFor(problem)
}
class Agent {
searchingStrategy: SearchingStrategy
problem: SearchProblem
act(state)
}
Agent *-- SearchingStrategy
Agent *-- SearchProblem
SearchingStrategy <|-- BFS
SearchingStrategy <|-- DFS
SearchingStrategy <|-- AStar
SearchingStrategy <|-- Dijistra
SearchingStrategy <|-- LinearProgramming
Searching strategies
Searching for solutions
Considerations.
We build a tree or graph on demand by searching strategies.
We have got to avoid cycles in order to keep away infinite loops.
Each new state saves the reference's previous state and we only choose valid action, so when our searching algorithm reaches the goal; it reconstructs the path from the start state.
Codification matters.
Uninformed search strategies
BFS
DFS
Informed Search Strategies
A*, Greedy Search, Hill Climbing, Simulated Annealing, Best-First Search
Heuristic Functions
A heuristic is a function that estimates the distance from the current state to the goal.
is the real or estimated cost of the solution.
is the cost to reach from the start state.
is the estimated cost to reach the goal state from the state, so it uses the available information from the problem or environment state in order to estimate the cost.
is the real cost to reach the goal state from the state.
Note and .
Properties
Main idea: estimated heuristic actual costs.
Admissibility.
Consistency.
Dominance.
Optimal.
How do you find a heuristic function?
Relax your problems, use available information about the current state or the goal, use min, and max functions, or use some distance functions such as Manhattan distance, Euclidean distance, Hamming distance, … and norms.
Beyond Classical Search.
Heuristic. Safe steps.
Offline, Online.
Solving problems by searching are graph algorithms that generate new nodes by heuristic and safe steps and test them, wrong answers are rejected them.
Hands-on Projects
8-Queen solver
Hanoi tower
Maze solver
Graph Theory Visualizer: Maze. (2022, July 03). Retrieved from https://graph-theory.sanchezcarlosjr.com
Project 0 - Unix, Python and Autograder Tutorial - CS 188: Introduction to Artificial Intelligence, Spring 2021. (2022, September 29). Retrieved from https://inst.eecs.berkeley.edu/~cs188/sp21/project0/#question-1-addition
The farmer, fox, goose, and grain
Integral solver
Pacman
Project 1 - Search - CS 188: Introduction to Artificial Intelligence, Spring 2021. (2022, September 29). Retrieved from https://inst.eecs.berkeley.edu/~cs188/fa22/projects/proj1/
Worked examples
References
https://aimacode.github.io/aima-javascript/3-Solving-Problems-By-Searching/
How to solve it: Modern Heuristics by Zbigniew Michalewicz, David B. Fogel.
Adversarial Search
MinMax
https://www.youtube.com/watch?v=l-hh51ncgDI&ab_channel=SebastianLague
Monte-Carlo
matchmaking algorithms
Hands-on Projects
Tic Tac Toe
Chess
Pacman v2
Notion – The all-in-one workspace for your notes, tasks, wikis, and databases. (2023, April 20). Retrieved from https://www.chessengines.org
Project 2. (2022, October 13). Retrieved from https://inst.eecs.berkeley.edu/~cs188/fa22/projects/proj2
Variable-based models with Factor Graphs
Now we embark on our journey through variable-based models in which we will think in terms of variables, factors, and weights. In particular, in this section we explore Factor graphs and their special cases: Constraint Satisfaction Problems (CSP), Markov networks and Bayesian networks. We don’t relay anymore in searching all possible solutions anymore. Instead, we assign to variables, int this way, allowing algorithms to infer the variables ordering, etc.
graph TD
subgraph Variables
X1((X1))
X2((X2))
X3((X3))
end
subgraph Factors
f1[f1]
f2[f2]
f3[f3]
f4[f4]
end
X1 --- f1
X1 --- f2
X2 --- f2
X2 --- f3
X3 --- f3
X3 --- f4
Formal definition
Constraint Satisfaction problems are defined by a set of variables , each with a domain of possible values, and a set of constraints . The aim is to find an assignment of the variables from the domains in such a way that none of the constraints are violated. Informally, our goal is to find the best assignment of values to the variables.
Variables-based models
A constraint satisfaction problem consists of three components X, D, and C:
X is a set of variables
D is a set of domains
C is a set of constraints that specify allowable combinations of values
Factors
Each assignment x
Objective arg max W
continuous-domain CSPs is of linear programming
Message Passing
Exercises and Projects
Summary
Key decisions
FAQ
Reference Notes
https://stanford.edu/~shervine/teaching/cs-221/cheatsheet-variables-models#bayesian-networks
https://gtsam.org/2020/06/01/factor-graphs.html
https://stanford-cs221.github.io/spring2024-extra/modules/csps/csps1.pdf
Knowledge, reasoning, and planning
Although traditional logical agents provide us expressiveness in a compact way, they are inherently deterministic and struggle to handle unstructured data. These systems follow predefined rules, making it difficult to manage uncertainty and ambiguity across diverse domains. Additionally, representing and processing unstructured data (e.g., text, images, time series, video) is challenging and often requires significant manual effort and expense. This lack of flexibility limits their ability to generalize across different domains as effectively as modern Deep Learning models.
Logical Agents
Knowledge-based agents
Different syntax, same semantics:
Same syntax, different semantics:
A knowledge base is a set of sentences, each sentence is an assertion about the world given a representation language for a specific domain. Logic consists of syntax, semantics, and inference rules. The formulas by themselves are just symbols (syntax), they don’t provide meaning.
A knowledge-based agent is composed of a knowledge base which depends on domain-specific content and an inference mechanism. They can represent states, actions, and weights, incorporate new percepts, update internal representations of the world, and deduce hidden properties of the world.
Semantics is the interpretation function.
In a declarative approach to building a logical agent, we add new sentences because we tell it what it needs to know and query what is known
Natural Language?
We can save knowledge in different data models and apply different inference mechanisms Knowledge graphs.
Entailment. It adds trivial information to KB.
Contradiction.
Contingency. It adds non-trivial information to KB.
http://intrologic.stanford.edu/dictionary/logical_entailment.html
Learning formulas.
A language needs syntax, semantics, and implementation level.
The syntax of a language defines a set of valid formula
Prolog, Relational databases, SQL, Datalog?
Intelligent agents need knowledge about the world to choose good actions.
A model or world in propositional logic is an assignment of truth values to propositional symbols.
Modeling and inference
Propositional logic with only Horn clauses
Propositional logic
Modal logic
First-order logic
Second-order logic
Tell[f] → KB
Possible responses:
- Already knew that: entailment (
- Don’t believe that: contradiction (
- Learned something new (update KB): contingent.
Ask[f] → KB
Possible responses:
- Yes: entailment (
- No: contradiction (
- I don’t know: contingent.
A knowledge base KB is satisfiable if
Execution engine.
Knowledge base (domain-specific facts) + inference engine.
Syntax, set of possible worlds, truth condition.
Sound Algorithm.
Complete Algorithm.
Theorem-proving.
Model-checking.
https://www.youtube.com/watch?v=xL0kNw5TudI&t=192s
https://www.youtube.com/watch?v=oM5LUGPO7Zk&list=PLh7QmcIRQB-uiOS4GMlBbq0jkvtqhqtq0
https://www.youtube.com/watch?v=CAsq7hm3sbI&ab_channel=IITDelhiJuly2018
https://www.youtube.com/watch?v=xFpndTg7ZqA&t=1s&ab_channel=IITDelhiJuly2018
https://www.youtube.com/watch?v=h6zCkrZ8ehE&t=1s&ab_channel=RichNeapolitan
tammet. (2022, December 13). gkc. Retrieved from https://github.com/tammet/gkc
Program
class KnowledgeAgent:
KB knowledge base
t int
act(environment):
tell(KB, MakePerceptSentence(environment, t))
action = ask(KB, MakeActionQuery(t))
tell(KB, MakeActionSentence(action, t))
t = t+1
return action
Inference machine
First-Order Logic
Inference in First-Order Logic
Worked examples
Knowledge graph
Prolog
RDF
Datalog
SQL and open cypher (Apache Age)
Projects
Card fraud detector
Make an online quiz system about Artificial Intelligence
8-eight queen
Pacman Finder
https://inst.eecs.berkeley.edu/~cs188/sp21/project3/
Wordle Solver
https://swi-prolog.discourse.group/t/wordle-solver/5124
https://cheatle.occasionallycogent.com/
Resources
- http://www.learnprolognow.org/ is a great place to start
- http://cs.union.edu/~striegnk/courses/nlp-with-prolog/html/ covers some advanced topics
- http://www.coli.uni-saarland.de/projects/milca/courses/comsem/html/ more advanced
- http://www.mtome.com/Publications/PNLA/prolog-digital.pdf This is my personal favorite
You may also consider picking up some of the following books
- Clocksin - Mellish: Programming in Prolog
- Covington - Nute - Vellino: Prolog Programming in Depth
- Sterling - Shapiro : The Art of Prolog
- Bratko : Prolog Programming for Artificial Intelligence
FAQ
What are real-world projects where people use PROLOG?
https://www.quora.com/What-is-Prolog-used-for-today
https://www.drdobbs.com/parallel/the-practical-application-of-prolog/184405220
Classical Planning
Planning and Acting in the Real World
Knowledge representation
Uncertain knowledge and reasoning
Quantifying Uncertainty
Probabilistic Reasoning
Probabilistic Reasoning over Time
Making Simple Decisions
Making Complex Decisions
Machine Learning
Machine Learning is field of study that gives computers the ability to learn without being explicitly programmed Arthur Samuel (1959). Traditional programming and classic artificial intelligence involves writing rules that act on data to produce answers. But if you flip this approach, you get machine learning. In this case, we gather a large amount of data and answers, apply a learning algorithm, and as an output, we acquire rules or models. These models can then make predictions without being specifically programmed to perform the task.
Machine Learning.
The main driver of recent successes in IA. Move from “code” to “data” to manage information complexity. The goal is the Generalization.
Reflex-based models. Linear classifiers, deep neural networks.
Modeling. Simplify the real world into a well-defined mathematical model. Example: planning goes from A to B in a city. Inference. Developing algorithms to find new data. Learning. Model without parameters such that we use data to learn those parameters by applying an algorithm.
We classify Machine Learning as Supervised learning, nonsupervised learning and reinforcement learning. The below table gives you an overview of learning algorithms.
Learning algorithm | When to Use | Relevant Metrics |
Linear Regression | When there's a linear relationship between the input and output. | Mean Squared Error (MSE), R-squared, Adjusted R-squared |
Logistic Regression | For binary classification problems. | Accuracy, Precision, Recall, AUC-ROC |
Decision Trees | When there's a need to understand the decision-making process. Useful for both classification and regression. | Gini Index, Information Gain for model construction; Accuracy, Precision, Recall for evaluation |
Random Forest | When model interpretability is less important and you need higher performance. | Out-of-bag (OOB) error, Accuracy, Precision, Recall |
K-Nearest Neighbors | When instances of the same class are generally close to each other in the feature space. | Accuracy, Precision, Recall, F1 Score |
Support Vector Machines | When there's a clear margin of separation between classes. | Accuracy, Precision, Recall, F1 Score |
Neural Networks | For complex problems like image recognition, speech recognition, and natural language processing. | Depends on the task, but often includes Accuracy, Precision, Recall, AUC-ROC, and Loss metrics like Cross-Entropy Loss |
XGBoost | Best for heterogeneous structured datasets | |
Ensam
1.1 Introducción
1.2 Aplicaciones
1.3 Principales enfoques de aprendizaje automático
1.4 Paradigmas de aprendizaje automático
1.5 Conceptos básicos
1.6 Problemas fundamentales
1.7 Evaluació n de modelos aprendidos
2.1 Introducció n
2.2 Desarrollo histó rico del paradigma
2.3 Árboles de decisió n
2.4 Reglas de inducció n
2.5 Aplicaciones
2.6 Tó picos selectos
3.1 Introducció n
3.2 Desarrollo histó rico del paradigma
3.3 Algoritmos gené ticos
3.4 Programació n gené tica
3.5 Aplicaciones
3.6 Algoritmos bioinspirado
4.1 Introducció n
4.2 Desarrollo histó rico del paradigma
4.3 Teorema de Bayes
4.4 Ingenuo bayesiano
4.5 Aplicaciones
4.6 Modelos gráficos probabilistas
5.1 Introducció n
5.2 Desarrollo histó rico del paradigma
5.3 Redes Neuronales Artificiales (RNA)
5.4 Algoritmo de retro-propagació n
5.5 Aplicaciones
5.6 Revisió n de arquitecturas de RNA
6.1 Introducció n
6.2 Desarrollo histó rico del paradigma
6.3 K-vecinos más cercanos
6.4 Máquinas de soporte de vectores
6.5 Aplicaciones
6.6 Tó picos selectos
https://github.com/afshinea/stanford-cs-229-machine-learning/tree/master/en
https://realpython.com/python-ai-neural-network/
Libros
- François Fleuret’s Homepage. (n.d.). Retrieved June 16, 2023, from https://fleuret.org/francois/#lbdl
Notas
CM. (n.d.). International Conference on Machine Learning. https://icml.cc/
Domingos, P. (2017). The Master Algorithm. The MIT Press.
GECCO. (2022). The Genetic and Evolutionary Computation Conference. https://gecco-2022.sigevo.org/HomePage
Mitchell, T. (1997). Machine Learning. McGraw Hill.
Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. The MIT Press.
NeurIPS (2021). Conference on Neural Information Processing Systems.
https://nips.cc/
Bishop, C. M. (2006). Pattern Recognition and Machine Learning. The MIT Press.
CVF. (n.d.). Computer Vision Foundation.
http://openaccess.thecvf.com/menu.py
Nunes, L. (2006). Fundamentals of Natural Computing: Basic Concepts, Algorithms, and Applications. Chapman & Hall/CRC. [Clásica].
Russell, S., & Norvig, P. (2009). Artificial Intelligence: A Modern Approach. Pearson. [Clásica].
Sucar, L. E. (2015). Probabilistic Graphical Models: Principles and Applications. Springer. [Clásica].
Tan, P., Steinbach, M., Karpatne, A. & Kumar, V. (2018). Introduction to Data Mining (2nd ed.). Pearson.
- Clasificación con Árboles de Decisión: el algoritmo CART | Codificando Bits. (n.d.). Retrieved June 16, 2023, from https://www.codificandobits.com/blog/clasificacion-arboles-decision-algoritmo-cart/
- Sanz, F. (2020, November 30). Cómo funciona el algoritmo XGBoost en Python. The Machine Learners. https://www.themachinelearners.com/xgboost-python/https://www.themachinelearners.com/xgboost-python/
- Graff, M. (2022) Aprendizaje Computacionalhttps://ingeotec.github.io/AprendizajeComputacional/
- Lex Fridman. (2020, May 5). Daphne Koller: Biomedicine and Machine Learning | Lex Fridman Podcast #93.
- Ep 67 - Thamar Solorio (U.Houston - Bloomberg) - Lo bueno de la Academia y el NLP by Hacia Afuera. (n.d.). Retrieved June 16, 2023, from https://podcasters.spotify.com/pod/show/elia-ia/episodes/Ep-67---Thamar-Solorio-U-Houston---Bloomberg---Lo-bueno-de-la-Academia-y-el-NLP-e1ppqte
- Ep 58 - Luciana Benotti (Universidad Nacional de Córdoba) - Inclusión y diversidad en Inteligencia Artificial by Hacia Afuera. (n.d.).
Retrieved June 16, 2023, from
https://podcasters.spotify.com/pod/show/elia-ia/episodes/Ep-58---Luciana-Benotti-Universidad-Nacional-de-Crdoba---Inclusin-y-diversidad-en-Inteligencia-Artificial-e1nrmek
- Ep 65 - Ivana Feldfeber (DataGénero) - Datos e inclusión by Hacia
Afuera. (n.d.). Spotify for Podcasters. Retrieved June 16, 2023, from
https://podcasters.spotify.com/pod/show/elia-ia/episodes/Ep-65---Ivana-Feldfeber-DataGnero---Datos-e-inclusin-e1ou88s
- Women in AI. (n.d.). Spotify. Retrieved June 16, 2023, from https://open.spotify.com/show/62v63cucHe8HdZD6ooyCOg
Learning from Examples
When you have a dataset with features (X) and labels (Y), supervised learning means finding the relation mapping from to .
Which so an interesting algorithm? This algorithm learns from examples, that is, you have a training set, model the task, for example as linear regression, and it gives a hypothesis which is a model that maps input features to target. The learner is an optimization algorithm that needs an optimization problem, that is, our task is split into finding the right optimization model and then employing the right optimization algorithm.
The optimization problem relies on min .
Loss minimization tasks. min TrainLoss(w)
The score is how confident we are.
The margin is how correct we are.
You might ask how can you get the training set, how can you deploy the hypothesis, and how can you know what method to apply, the answers are in.
Development cycle
Split data into train, val, test
Exploratory data
Repeat:
- Implement feature/tune hyperparameters
- Run learning algorithm
- Sanity check train and val error rates
— Look at errors to brainstorm improvements
- Log as far as you can (reports)
- Run on test set to get final error rates
Most of the time, the test metric does not decrease.
Optimization.
Discrete optimization. find the discrete object
min Cost(p)
p in Paths
Algorithmic tool: dynamic programming
Continuous optimization: find the best vector of real numbers that satisfies
min TrainingError(w)
w in R^d
Algorithmic tool: gradient descent
Ground true.
It refers to the expected label associated with a dataset.
graph TD
TrainingSet --> LearningAlgorithm
LearningAlgorithm --> Hypothesis
Given the features the hypothesis is a predictor, to target . The features denote the space of input values, and target the space of output values. So the supervised learning goal is to find a good predictor .
class Model
fit(trainingset)
apply a learning algorithm to training examples
generates a hypothesis
predict(instances)
apply the hypothesis to instances
You call regression problems when is continuous otherwise is discrete, so we call it a classification problem.
Suppose you have a linear regression problem, you may represent as , an affine function. More generally, , where are the “parameters” that allows us to make good predictions.
Loss functions
A. Chadha, V. Jain, Distilled Notes for Machine Learning , https://www.vinija.ai, 2022, Accessed: July 1 2022.
Distance metrics
Computing edit distance
Input: two strings, s and t
Output: minimum number of character insertions, deletions, and substitutions between s and t.
Example:
s: a cat
t: the cats!
General principles are reducing the problem and away details.
Linear prediction
Linear Regression
Input: set of pairs.
Output: w\in R that minimizes the squared error F(w= \sum_{i=1}^n (x_I w - y_i)^2.
Algorithm Gradient Descent.
Linear prediction.
Score: a weighted combination of features.
Weight vector w.
Binary linear classifier
Decision boundary
Separate the space into different subspaces to classify.
Case analyzes
\Delta Loss_{hinge} =
0 if w \phi(x)y >1 (
-\phi(x)y 0w
It increases the margin correctly.
Gradient descent and Stochastic gradient descent
Optimization.
Discrete optimization. find the discrete object
min Cost(p)
p in Paths
Algorithmic tool: dynamic programming
Continuous optimization: find the best vector of real numbers that satisfies
min TrainingError(w)
w in R^d
Algorithmic tool: gradient descent
The gradient is going to give the direction and the rate of value at a point f
The goal is to move in the contrary direction of the gradient.
Least squares regression.
Objective function:
Gradient
\Delta TrainLoss(W) = \Delta TrainLoss(W)
SGD
For each(x,y) in D_train:
w ← w - step_size Loss(x,y,w)
It’s about quality.
SGD can be worse than GD if the dataset has noise.
Step size
Strategies
Constant
Decreasing
Maximum likelihood estimation
Maximum likelihood estimation is the goal of training classifiers, that is, we’re finding the parameters that maximize the probability for the actual observed data. refers to the conditional probability that the output is the class given the input variables and the parameters of the model .
independently and identically distributed
Types of problem
Pattern recognition
System recommendation
Types of data
- Vector Data: This is the most common and simplest form of data in machine learning. The dataset is a 2D tensor where each data point can be encoded as a vector. Examples can be anything from housing price prediction data (features being the number of rooms, location, size of the house, etc.) to text data (after applying some sort of vectorization like bag-of-words or TF-IDF).
- Natural language.
- Time series or Sequence Data: Time series data captures a series of data points recorded over regular time intervals. The order of data points is important here because the same set of data points in a different time order might mean something entirely different. Sequence data is very similar, but time isn't necessarily a factor here. Examples of these are stock price data, weather data, or any type of data where time plays a crucial role. For sequence data, a sentence or a DNA sequence would be a good example as the order of words or genes is important.
- Image Data: Images are represented as 3D tensors (height, width, color_depth). However, a batch of images used for training a model is stored in a 4D tensor (batch_size, height, width, color_depth). Deep learning models like Convolutional Neural Networks (CNNs) are designed to extract features from these 4D tensors and use them to classify images, detect objects, and more. Applications range from medical imaging (detecting diseases) to self-driving cars (identifying pedestrians, signs, etc.).
- Video Data: Video data can be thought of as a series of images, so naturally, this extends the image data tensor by one more dimension, the frame dimension. So a video dataset would be a 5D tensor (batch_size, frames, height, width, color_depth). Video data is used in various applications like activity recognition, video synthesis, and object tracking in videos.
- Graph data.
Data labeling
Fine-tuning
What is Transfer Learning?
Datasets
Notebooks
https://www.youtube.com/watch?v=T-fAkfU9j_o&ab_channel=Elpensamientoenllamas
Natural Computing
NACO
https://fcampelo.github.io/EC-Bestiary/
black hole algorithm
Mandelbrot set from scratch, Markov text-generation, and John Conway’s Game of Life ar
Pattern Recognition
- Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer. Available at: Microsoft Research
- Duda, R. O., Stork, D. G., & Hart, P. E. (2001). Pattern Classification (2nd ed.). Wiley.
- Fu, K. (1974). Syntactic Methods in Pattern Recognition. Academic Press.
- Jürgen, M. & Matthias, N. (2018). Pattern Recognition: Introduction, Features, Classifiers and Principles. Berlin: De Gruyter Oldenbourg (De Gruyter Graduate). Available at: EBSCOhost.
- Koutroumbas, K. & Theodoridis, S. (2009). Pattern Recognition. Academic Press. Available at: EBSCOhost
- Murty, M. N. & Devi, V. S. (2011). Pattern Recognition: An Algorithmic Approach. Springer London. Available at: EBSCOhost
- Massachusetts Institute of technology. (n.d.). Mitopencourseware.
Pattern Recognition and Analysis.
https://ocw.mit.edu/courses/media-arts-and-sciences/mas-622j-pattern-recognition-and-analysis-fall-2006/syllabus/
Deep Learning
https://people.idsia.ch/~juergen/deep-learning-history.html
Tensor
Automatic differentiation
La función de activación tiene como propósito incorporar no linealidades al modelo. Las redes neuronales están inspiradas en las redes neuronales reales.
Que es un gradiente? Metodo de optimización.
La regla de cadena me permite pasar los gradientes de la salida de una neurona hacía la entrada
Activation
Forward propagation
back propagation
https://playground.tensorflow.org
https://realpython.com/python-ai-neural-network/
Neural Networks and Deep Learning http://neuralnetworksanddeeplearning.com/
Lote (Batch)
Entrada (Input)
Activaciones (Activations)
Pesos (Weights)
Salida (Output)
Optimizers
Hyperparameters
In a Keras model, hyperparameters such as optimizer, loss, and metrics have crucial roles in defining how the model will be trained and evaluated. Let's discuss each of these and the context in which they should be used:
- Optimizer: Optimizers in Keras help to adjust the attributes of your neural network such as weights and learning rate to reduce the losses. Different optimizers suit different problems and can significantly affect the model's performance and convergence speed.
- SGD: Stochastic Gradient Descent, which is the most basic optimizer. It's robust but can be slow and sensitive to the learning rate choice.
- RMSprop: Usually a good choice for recurrent neural networks.
- Adam: A good default choice for many problems, it combines the advantages of RMSprop and SGD with momentum.
- Adagrad, Adadelta, Adamax, Nadam: Other variants of optimizers, each with its strengths, but in most cases, Adam should suffice.
- Loss: Loss function or cost function is a method to calculate the disparity between the predicted output and the actual output. This is the function that the model will strive to minimize.
- Mean Squared Error (MSE): Used for regression problems (predicting a continuous value).
- Binary Cross-Entropy: Used for binary classification problems (predicting a yes/no outcome).
- Categorical Cross-Entropy: Used for multi-class classification problems, where the outputs are one-hot encoded.
- Sparse Categorical Cross-Entropy: Like categorical cross-entropy, but for integer targets.
- Weighted cross-entropy loss: Used for unbalanced multi-class classification problems, where the ouputs are one-hot encoded.
- Metrics: Metrics are used to judge the performance of your model. Choosing the right metric is essential to judge your model accurately.
- Accuracy: Suitable for classification problems, especially if the classes are balanced.
- Precision, Recall, F1-score: These are more informative than accuracy for binary classification, especially if the classes are imbalanced.
- MSE, RMSE, MAE (Mean Absolute Error): Suitable for regression problems.
Other hyperparameters include:
- Learning rate: This determines how fast the model learns, a too-high learning rate might cause the model to diverge, a too-small learning rate might cause the model to converge too slowly.
- Batch size: This is the number of samples that will be propagated through the network simultaneously. Smaller batch sizes require less memory but can result in noisy gradient updates. Larger batch sizes require more memory but can result in more accurate gradient updates.
- Number of epochs: This is the number of times the model will iterate over the entire dataset. You should set this as high as possible and use techniques like Early Stopping to prevent overfitting.
Choosing the right hyperparameters often involves trial and error and can be guided by experience, knowledge about the problem and the data, or hyperparameter tuning techniques such as grid search or random search.
Redes neuronales convuncionales
estructura de localidad
Redes neuronales recurrentes
- Las redes neuronales son aproximadores universales para funciones continuas: Funciones matemáticas
- Las redes neuronales recurrentes son equivalentes a máquinas de Turing: Algoritmos
- El algoritmo de backpropagation va encontrar una configuración de la red que imita el comportamiento de los datos
"Multilayer feedforward networks with a nonpolynomial activation function can approximate any function". Neural Networks. 6 (6): 861–867. Siegelmann, H. T., & Sontag, E. D. (1992, July). On the computational power of neural nets. In Proceedings of the fifth annual workshop on Computational learning theory (pp. 440-449).
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. nature, 323(6088), 533-536.
[En inglés] Python AI: How to Build a Neural Network & Make Predictions
https://realpython.com/python-ai-neural-network/
[En español traducido por google] Python AI: Cómo construir una red neuronal y hacer predicciones https://realpython-com.translate.goog/python-ai-neural-network/?_x_tr_sl=en&_x_tr_tl=es&_x_tr_hl=en-US&_x_tr_pto=wapp
Sitios
- Lewis, O. (2023). Awesome Artificial Intelligence (AI).
[Inglés]https://github.com/owainlewis/awesome-artificial-intelligence
Libros
- Intelligence, A. (2021). A Modern Approach, 4th US ed.[Inglés sitio oficia]https://aima.cs.berkeley.edu/
- Zhang, A., Lipton, Z. C., Li, M., & Smola, A. J. (2020). Dive into Deep Learning. [Inglés]https://d2l.ai/
Cursos
- DEEP LEARNING · Deep Learning. (n.d.). https://atcold.github.io/NYU-DLSP21/
Videos
- Irving Vasquez. (2022, August 23). Introducción a las redes neuronales - Presentación del curso. [Español]
Understanding LSTM Networks:
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Recurrent Neural Networks and LSTM explained:
https://purnasaigudikandula.medium.com/recurrent-neural-networks-and-lstm-explained-7f51c7f6bbb9
Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM): https://www.youtube.com/watch?v=WCUNPb-5EYI
https://homes.cs.washington.edu/~pedrod/
Tensorflow
Tensorflow APIS for training
Tensorflow offers the input of the pipeline (tf.data), Keras and Estimator. Batch processing is doing with tf.distributte.
Why input pipeline? Because the data might not enough to fit in memory, efficient utilize hardware, decuple loading and preprepecessing. ETL are the typical stages for batch procesing, extract strage read from memory or remote storage, parse file format, Transform stage it performs specific domain transformations, load stage , transfer data to the accelator.
GPU/TPU processing power has a big gap with respect with CPU processing.
A typical batch processing for Deep Learning looks like
dataset = read data from storage as stream
dataset =
apply distributed pipe operators to dataset which is executed as a dataflow graph
build the architecture of the model with high level APIs
model.fit(dataset)
Optimizations are Software pipeline, Parallel transformation, and Parallel extraction.
Tensorflow Dataset is a project to onboard new novel users.
Unsupervised Learning - Clustering
Data has lots of rich latent structures. We want methods to discover these structures automatically.
Input: a training set of input points
Output: assignment of each point to a cluster
K-mean algorithm
DB Scan
Hierarchical clustering
K-Means algorithm
Even though KMeans is not the best algorithm for cluster data, it illustrates the task and a simple solution. In Kmeans, each cluster is represented by a centroid and the objective is each vector is assigned to the closest centroid. Formally, the objective function is
Algorithm: K-means
Initialize \mu_1, ..., \mu_K randomly
for t=1,...,T:
Step 1: set assignments z given \mu
Step 2: set centroids \mu given \z
Knowledge in Learning
Learning Probabilistic Models
Reinforcement Learning
* Communicating, perceiving, and acting
MACTI (Temporal)
Inducción
https://www.youtube.com/watch?v=9AwJrXAz9QA
https://www.youtube.com/watch?v=CDYLHa63ws4
https://www.youtube.com/watch?v=ERYgaGKaHoE
https://www.youtube.com/watch?v=KX4DdZeRAsI
https://www.youtube.com/watch?v=zdpDR_F2ovg
https://www.youtube.com/watch?v=eAmFytbeNTc
Morozov, E. (2023, April 3). Ni es inteligente ni es artificial: esa etiqueta es una herencia de la Guerra Fría. El País.https://elpais.com/ideas/2023-04-03/ni-es-inteligente-ni-es-artificial-esa-etiqueta-es-una-herencia-de-la-guerra-fria.html
Varios (2023, March 10). Declaración de Montevideo sobre Inteligencia Artificial y su impacto en América Latina.https://www.fundacionsadosky.org.ar/declaracion-de-montevideo-sobre-inteligencia-artificial-y-su-impacto-en-america-latina/
Podcast T4-E06-Sebastián Ramírez-Contribuyendo al Opensource • Saturdays.AI. (n.d.). Retrieved June 12, 2023
[Podcast] https://saturdays.ai/2022/09/07/podcast-t4-e06-sebastian-ramirez-contribuyendo-al-opensource/
- Parr, T., & Howard, J. (2018). The Matrix Calculus You Need For Deep Learning (arXiv:1802.01528). arXiv. [Inglés]https://doi.org/10.48550/arXiv.1802.01528
- Kutyniok, G. (2022). The Mathematics of Artificial Intelligence (arXiv:2203.08890). arXiv.[Inglés]https://doi.org/10.48550/arXiv.2203.08890
- Werness, B., & Hu, R. (n.d.). 22. Appendix: Mathematics for Deep Learning — Dive into Deep Learning 1.0.0-beta0 documentation. [Inglés]https://d2l.ai/chapter_appendix-mathematics-for-deep-learning/index.html
- Berner, J., Grohs, P., Kutyniok, G., & Petersen, P. (2022). The Modern Mathematics of Deep Learning (pp. 1–111).[Inglés]http://arxiv.org/abs/2105.04026
- Porat, B. (2014). A Gentle Introduction to Tensors. Israel: Department of Electrical Engineering Technion, Israel Institute of Technology.[Inglés]https://www.ese.wustl.edu/~nehorai/Porat_A_Gentle_Introduction_to_Tensors_2014.pdf
GPT-3: La supernova del modelado del lenguaje | Ivan Vladimir Meza Ruiz. (2023). Blog personalhttps://turing.iimas.unam.mx/~ivanvladimir/posts/chat-gpt/
Programación en Python
Otros
https://github.com/ivanvladimir/Proyectos-MeIA/tree/main
Explainability of Complex Machine Learning Models
graph TD
ExplanationMethods --> ExplainableByDesign --> ExplnationsForGlassBoxes
ExplainableByDesign --> EngineerdExplanations
ExplanationMethods --> PostHocExplanationsForBlackBoxModels
PostHocExplanationsForBlackBoxModels --> Local
Local --> Counterfactuals
Local --> FeatureImportance
FeatureImportance-->LIME
FeatureImportance-->SHAP
FeatureImportance-->DALEX
FeatureImportance-->NAM
FeatureImportance-->CIU
FeatureImportance-->GRADCAM
FeatureImportance-->IG
Local --> Prototypes
PostHocExplanationsForBlackBoxModels --> Global
Global--> Prototypes
Global-->SetOfLocalExplanations
Global-->ModelDistillation
Post-hoc local feature importance, rule-based, prototypes, counterfactuales
Perception
Robotics
*Philosophical Foundations
Weak AI: Can Machines Act Intelligently?
Strong AI: Can Machines Really Think?
The Ethics and Risks of Developing Artificial Intelligence
AI: Present and Future
Build your product with Artificial Intelligence
Low level
Tensorflow
OpenCV
dlib
Mid level
https://developers.google.com/mediapipe
High level
face_recognition
https://github.com/steven2358/awesome-generative-ai
TODO
independently and identically distributed