Skip to the content.

The following are my notes from the ISLR book, available freely online here. I used Notion to present these notes in a Q+A format, done through toggles. The Notion page is here. The Jupyter Notebook of the lab and exercises can be found here.

2.1 - What is Statistical Learning?

2.2 - Assessing Model Accuracy

2.4 - Exercises

  1. Inflexible or flexible method?
    1. sample size n is large, number of predictors p is small - flexible method will be better because we have a lot of training data.
    2. p is extremely large, n is small - inflexible method is better because we do not want high variance in the model from the small number of data points
    3. relationship is highly non-linear - flexible methods will reduce bias and give an accurate model, but we must also have a lot of training samples to reduce variance.
    4. variance of error terms is extremely high - inflexible methods are better because they have low variance and do not change too much from fluctuations in a training data point that might be caused due to the error.
  2. Classification or Regression? Inference or prediction? n and p?
    1. collect data on top 500 firms - profit, employees, industry and CEO salary. Trying to understand which factors affect CEO salary?
      • Regression problem - CEO salary is a quantitative output variable
      • Inference - trying to understand how the different predictors affect output
      • n = 500
      • p = 3 (CEO Salary is output variable and not a predictor)
    2. new product either success or failure. collect data on 20 products - success or fail, price, marketing budget, competition price, and ten other variables
      • Classification problem - trying to classify the new product
      • Prediction problem - based on previous data, predict whether new product will succeed
      • n = 20
      • p = 13
    3. predicting the % change in US dollar in relation to weekyl changes in world stock markets. Collect weekly data for all of 2012 - % change in dollar, % change US market, % change in British market, % change German market
      • Regression problem
      • Combination of prediction and inference - want to figure out the result as well as what causes it
      • n = 52 weeks
      • p = 3 (% change in dollar is the output variable)
  3. bias-variance decomposition
    1. draw out curves of Bias^2, Variance, Irreducible error, Training error, Testing error

      Bias-Variance Tradeoff Graph

    2. explain

      1. Bias^2 decreases as model flexbility increases, because a more flexible model does not lose accuracy when estimating the real world with a model
      2. Variance increases as model flexibility increases because a more flexible model can produce different results on different training sets
      3. Irreducible error (Bayes error rate) is just a horizontal line that denotes the lower bound on the test error
      4. Test error = Bias^2 + Variance + Irreducible error
      5. Training error approaches 0 as the model gets more flexible and overfits to the training data.
  4. real-life examples
    1. classification - cancer or not based on blood test results, product is a success or failure based on other similar products, type of bread based on picture; email spam or not spam
    2. regression - home prices based on location, etc.; how much weight somebody can lift based on height, weight; number of people at a concern based on prior concert tickets and location
    3. cluster analysis - genre based on movie sales data, fashion trends based on price and stuff,
  5. advantages and disadvantages of flexible approach to regression and classification?

    advantages to flexibility

    • able to predict complex functions (non-linear)
    • low bias

    disadvantages to flexibility

    • requires a lot of training data
    • high variance to input dataset
    • caution of overfitting

    less flexible dataset might be better if data is straightforwardly modeled (close to linear), or if there are not that many observations in the training data, or if the variance of the error term is high.

  6. parametric statistical learning function reduces learning a function down to fitting several parameters. two step process - make assumption about the dataset, and then estimate those parameters

    non-parametric learning function does not make any assumptions about underlying function, but runs into overfitting problems, and requires a lot of training data and care to smoothness parameter.

  7. KNN example
    1. distances from X1= X2=X3=0

      exercise

    2. with K = 1, we predict Green
    3. with K = 3, we predict Red
    4. if the decision boundary is highly non-linear, we want a flexible model to be able to model it accurately. So we will want to pick a small K.