Rohdy
Rohdy의 study
Rohdy
전체 방문자
오늘
어제
  • 분류 전체보기 (32)
    • 석사 이야기 (0)
    • DL Study (17)
    • Pr4AI Study (1)
    • RecSys (3)
    • GraphMining (9)
    • 논문 (2)

블로그 메뉴

  • 홈
  • 태그
  • 방명록

공지사항

인기 글

태그

  • 2020
  • logistic
  • 논문요약
  • IDGL
  • 대학원입시
  • 대학원컨택
  • 대학원준비
  • 시그모이드
  • 로지스틱
  • KAIST
  • sigmoid
  • 정보대학원
  • GSDS
  • 컨택
  • LinearRegression
  • gnn
  • 대학원면접
  • 데이터사이언스
  • 대학원
  • ML
  • NeurIPS
  • 연세대
  • gradientdescent

최근 댓글

최근 글

티스토리

hELLO · Designed By 정상우.
Rohdy

Rohdy의 study

Basic ML
Pr4AI Study

Basic ML

2022. 8. 29. 19:45

KAIST 김재철 AI 대학원 최윤재 교수님의 수업인 인공지능을 위한 프로그래밍을 참조했습니다

 

Artificial Intelligence

Make machines/computers mimic human intelligence

Concept as old as the computer (Chess program by Alan Turing)

 

Machine Learning

Use statistical methods to teach machines to learn from data to be good at a specific task (Spam filtering)

 

Deep Learning

Train machines to be good at complex tasks based on neural networks and massive data

 

Machine Learning Categories

  • Superviesd Learning: Learn a function that maps an input x to an output y
    • Examples
      • Image classification
      • French-English translation
      • Image captionng
  • Unsuperviesd Learning: Learn a distribution / manifold function of data X (no label y)
    • Examples
      • Clustering
      • Low-rank matrix factorization
      • Kernel density estimator
      • Generative Models
  • Reinforcement Learning: Given an environment E and a set of actions A, learn a function that maximizes the long-term reward R
    • Examples
      • Go
      • Atari
      • Self-driving car

Supervised Learning과 Unsuperviesd Learning의 경계는 모호하다...

 

Optimization

모든 모델들은 training이 필요하다(learning a function).

\(f(x; \theta)\) -> objective function(our goal)

find \(\theta\) Minimize loss(\(y\), \(f(x; \theta)\)) -> optimization

 

find \(\theta^*\) -> Minimize loss -> achieving objective -> successfully learning function(=training model)

반대로 진행하는 것이 우리의 방식

 

How to Optimize Model

Numerical method: Iteratively find a better \(\theta\) until you are satisfied

Gradient Descent: Updating parameters \(\theta\) based on training data X, Y

Stochastic Gradient Descent: Updating parameters based on a subset of the training data

     Why use SGD? : data is too big, help avoid local minimum, If minibatches are I.I.D -> GD

 

Evaluation

When we stop the traing?

  • Accuracy: Used for multi-class classification
  • Area under the ROC(AUROC): Used for binary classification
  • Precision & Recall: Used for information retrieval
  • BLEU Score: Used for machine translation
  • Perplexity: Used for language modeling
  • FID Score: Use for image generation

 

Train & Validation & Test

N-fold Cross Validation: Test model's performance in diverse train/validation/test splits

Not often used in modern deep learning

Overfitting & Underfitting

Data complexity VS model capacity

Regularization

Restrict the freedom of your model(Downsizing the hypothesis space)

\(L_1\) regularization, \(L_2\) regularization

 

Popular Classifiers

  • Logistic Regression
    • Probability -> Odds -> Log of odds (Logit)
      Logit: \(ln\frac{p}{1-p}\)
    • Assume the logit can be modeled linearly
    • Trained via gradient descent
  • Support Vector Machine(SVM)
    • Maximize the margin between two classes
    • Trained via constrained optimization (Lagrange Multiplier)

  • Decision Tree
    • Build a tree based on features
    • Trained via the CART algorithm
  • Ensembles
    • Use multiple classifiers to improve performance
    • Bagging
    • Boosting

Popular Clustering

  • K-means
    • Update membership of each sample using the closest centroid
    • Update the centroid value using all the member samples
    • Repeat the above step
  • Mixture of Gaussian
    • Generalization of K-means clustering
    • A sample has a probabilistic membership to each cluster
    • Trained via Expectation-Maximization (EM)

 

    Rohdy
    Rohdy

    티스토리툴바