Basic ML

KAIST 김재철 AI 대학원 최윤재 교수님의 수업인 인공지능을 위한 프로그래밍을 참조했습니다

Artificial Intelligence

Make machines/computers mimic human intelligence

Concept as old as the computer (Chess program by Alan Turing)

Machine Learning

Use statistical methods to teach machines to learn from data to be good at a specific task (Spam filtering)

Deep Learning

Train machines to be good at complex tasks based on neural networks and massive data

Machine Learning Categories

Superviesd Learning: Learn a function that maps an input x to an output y
- Examples
  - Image classification
  - French-English translation
  - Image captionng
Unsuperviesd Learning: Learn a distribution / manifold function of data X (no label y)
- Examples
  - Clustering
  - Low-rank matrix factorization
  - Kernel density estimator
  - Generative Models
Reinforcement Learning: Given an environment E and a set of actions A, learn a function that maximizes the long-term reward R
- Examples
  - Go
  - Atari
  - Self-driving car

Supervised Learning과 Unsuperviesd Learning의 경계는 모호하다...

Optimization

모든 모델들은 training이 필요하다(learning a function).

\(f(x; \theta)\) -> objective function(our goal)

find \(\theta\) Minimize loss(\(y\), \(f(x; \theta)\)) -> optimization

find \(\theta^*\) -> Minimize loss -> achieving objective -> successfully learning function(=training model)

반대로 진행하는 것이 우리의 방식

How to Optimize Model

Numerical method: Iteratively find a better \(\theta\) until you are satisfied

Gradient Descent: Updating parameters \(\theta\) based on training data X, Y

Stochastic Gradient Descent: Updating parameters based on a subset of the training data

Why use SGD? : data is too big, help avoid local minimum, If minibatches are I.I.D -> GD

Evaluation

When we stop the traing?

Train & Validation & Test

N-fold Cross Validation: Test model's performance in diverse train/validation/test splits

Overfitting & Underfitting

Data complexity VS model capacity

Regularization

Restrict the freedom of your model(Downsizing the hypothesis space)

\(L_1\) regularization, \(L_2\) regularization

Popular Classifiers

Logistic Regression
- Probability -> Odds -> Log of odds (Logit)
  Logit: \(ln\frac{p}{1-p}\)
- Assume the logit can be modeled linearly
- Trained via gradient descent
Support Vector Machine(SVM)
- Maximize the margin between two classes
- Trained via constrained optimization (Lagrange Multiplier)

Decision Tree
- Build a tree based on features
- Trained via the CART algorithm
Ensembles
- Use multiple classifiers to improve performance
- Bagging
- Boosting

Popular Clustering

K-means
- Update membership of each sample using the closest centroid
- Update the centroid value using all the member samples
- Repeat the above step
Mixture of Gaussian
- Generalization of K-means clustering
- A sample has a probabilistic membership to each cluster
- Trained via Expectation-Maximization (EM)

티스토리툴바