1. The Machine Learning Landscape
TERM
- Training set : The examples that system uses to learn
- Training instance(sample) : EachU training example
- Accuracy : The particular performance measure
- Data Mining : Applying ML techiques to dig into large amounts of data can help discover patterns that were not immediately apparent
- Labels : training data you feed to the algorithm includes the desired solution
(예측하고자 하는 항목 =+= 정답지)
- Learning rate : how fast they should adapt to changing data
CHAPTER 1) The Machine Learning Landscape
: clarifying what ML is & why we may want to use it.
1. What?
💡 science (and art) of programming computers so they learn from data.
💡 field of study that gives computers the ability to learn without being explicitly programmed - Arthur, Samuel, 1959
2. Why Use?
- automatically learn → much shorter / easier to maintain / most likely more accurate
- ML can help humans learn
Machine Learning Approach 이미지
3. Types
- whether or not they are trained with human supervision
: supervised / unsupervised / semisupervised / Reinforcement Learning
- whether or not they can learn incrementally on the fly
: online / batch learning
- simply comparing new data points to known data points
vs detect patterns in the training data and build a predictive modelinstance-based / model-based Learning### A) Supervised / Unsupervised Learningclassified according to the amount and type of supervision they get during training.
- Supervised Learning(지도 학습)
💡 : need to give it many examples including both their predictors and their labels.
****
⇒ classification(분류)
⇒ regression(회귀) : predict a target numeric value given a set of features(=attribute)(predictors)
- Most important SL Algorithm
k-Nearest Neighbors / Linear Regression / Logistic Regression / Support Vector Machines/Decision Trees and Random Forests / Neural networks
- Unsupervised Learning(비지도 학습)
💡 : the training data is unlabeled (learn without a teacher)
- Most important SL Algorithm
- Clustering(K-Means/DBSCAN/Hierarchical Cluster Analysis(HCA))
: detect groups of similar features. / find connections without help
- hierarchical clustering Algorithm : subdivide each group into smaller groups
- Anomaly(변칙) detection and novelty detection(One-class SVM/Isolation Forest)
: : detecting unusual transactions to prevent fraud, catching manufactoring defects, automatically removing outliers from db.
+=+Novelty detection : see only normal data
- Visualization and Dimensionality Reduction
(Principal Component Analysis(PCA)/Kernel PCA/Locally-Linear Embedding(LLE),t-distributed Stochastic Neighbor Embedding(t-SNE))-Visualization Algorithm : output a 2D or 3D representation of data that can easily be plotted.
+=+Dimensionality reduction : simplify the data without losing too much information.
⇒ run much faster, less disk&memory space, perfom better
+=+Feature extraction : merge several correlated features into one.
- Asscoiation rule learning(Apriori/Eclat)
: dig into large amounts of data and discover interesting relations between attributes.
- Semisupervised Learning(반지도 학습)
💡 combinations of ****a lot of unlabeled data & a little bit of labeled data
- Reinforcement Learning(강화 학습)
💡 The learning system(agent) observe the environment, select and perform actions, and get rewards/penalties in return.
: must learn by itself what is the best strategy = policy
cf) Alphago
B) Batch / Online Learning
: whether or not the system can learn incrementally(점진적으로) from a stream of incoming data
- Batch Learning
💡 : incrementally X ⇒ must be trained using all the available data. (한번에 모두)
Offline Learning : First the system is trained, and hen it is launched into production and runs without learning anymore
⇒ if, ++ new data ⇒ train a new version of the system from scratch on the full dataset, then stop the old system and replace it with the new one
😓DIsadvantages😓
⇒ take a lot of time/computing resources/money ⇒ done offline
⇒ if amount of data is huge, may be IMPOSSIBLE
😀Advantages😀
⇒ simple and often works fine
- Online Learning
💡 train the system incrementally by feeding it data instances sequentially, either individually or by small groups
: mini-batches (mini batches 뭉탱이 sequentially)
- out-of-learning
😀Advantages😀
⇒ fast and cheap
⇒ change rapidly & autonomously
⇒ save a huge amount of space.
😓DIsadvantages😓
⇒ High Learning rate → rapidly adapt to new data, BUT quickly forget the old data
⇒ Low Learning rate → learn more slowly, BUT less sensitive to noise
⇒ if bad data is fed → clients will notice that performance will gradually decline.
C) Instance-Based / Model-Based Learning
: How they generalize
- Instance-based Learning
💡 learn the examples by heart, then generalizes to new cases by comparing them to the learned examples.
: certainly not the best.
KNN
- Model-based Learning
💡 build a model of these examples, then use the model to make predictions
model selection → utility function=fitness function(measure how good my model is) / define cost function
→ study the data
→ select a model
→ trained it (the learning algorithm searched for the model parameter values that minimize a cost function)
→apply the model to make predictions on new cases(=inference), hoping that this model will generalize well.
4. Main Challenges of Machine Learning
: Bad Algorithm / Bad data
1) Insufficient Quantity(양) of Training Data
: Even for very simple problems, typically need thousands of examples.
2) Nonrepresentative Training Data
: training data be representative of the new cases you want to generalize to
⇒ if not , unlikey to make accurate predictions
sample is small → sampling noise
sample is large → sampling bias /ex) Roosvelt_Literary Digest
3) Poor-Quality Data
: will make it harder for the system to detect the underlying patterns → less likely to perform well.
→ 多 spend time cleaning up your training data.
4) Irrelevant Features
: “Garbage in, Garbage out”
- feature engineering : coming up with a good set of features to train on(enough relevant features & not too many irrelevant)
- feature selection : selection the most useful features to train on among existing features.
- Feature extraction : combing existing features to produce a more useful one
- creating new features by gathering new data
5) Overfitting the Training Data
: the model performs well on the training data, but it does not generalize well.it happens when the model is too complex relative to the amount and noises of the training data.find the right balance : fitting the training data perfectly & keeping the model simple enough to ensure that it will generalize well.
- To simplify the model by selecting one with fewer parameters
- To gather more training data
- To reduce the noise
- Regularization : Constraining a model to make it simpler and reduce the risk of overfitting.
cf) Training error(LOW) & Generalization error(HIGH) ⇒ OVERFITTING
6) Underfitting the Training Data
: occurs when the model is too simple to learn the underlying structure of the data.
- Selecting a more powerful model, with more parameters
- Feeding better features to the learning algorithm( feature engineering)
- Reducing the constraints on the model.