STA314: Statistical Methods for Machine Learning I

Overview

Machine learning (ML) is a set of techniques that allow computers to learn from data and past experience, rather than requiring humans to specify the desired behaviour by hand. ML has become increasingly central both in statistics as an academic discipline, and in the data science industry. This course provides a broad introduction to commonly used ML methods, as well as the key statistical concepts underlying ML. It serves as a foundation for more advanced courses, such as STA414 (Statistical Methods for Machine Learning II).

We will cover popular statistical methods for supervised and unsupervised learning from data as well as important concepts used in the field, including: training error, test error and cross-validation; classification, regression, and logistic regression; variable selection; penalized regression; principal components analysis; stochastic gradient descent; decision trees and random forests; k-means clustering and nearest neighbour methods. Computational tutorials will support effective application of these methods.



Annoucement

  • Homework 4 solution is available below.
  • Homework 3 solution is available below.
  • The last tutorial will be on Dec 4th.
  • Homework 4 will be due on Nov 29th, 11:59pm.
  • Midterm will be Oct 25th. We will have makeup classes on Oct 23rd.
  • Homework 1 will be released at 11:59pm on Septmeber 20th via Crowdmark.
  • The first lecture starts on September 11th.


Course information

Course email and textbooks

Staff

Instructor email office office hours (OH) OH mode
Xin Bing xin.bing@utoronto.ca UY 9192 Mon, 3:00PM-4:00PM
Tue, 3:00PM-4:00PM
Zoom link

TAs

Section TA OH OH mode
LEC0101 Liam Welsh Fri, 11AM-12PM Zoom link
  Junhao Zhu Mon, 9AM-10AM In-person, SS621
  Weizheng Zhang Tue, 10AM-11AM Zoom link
LEC0201 Ziyi Liu Thu, 1PM-2PM Zoom link
  Haochen Song Fri, 9AM-10AM Zoom link
  Tong Li Mon, 10AM-11AM Zoom link

Lectures

Section Time Location
LEC0101 Wed, 11AM-1PM MP103
LEC0201 Wed, 3PM-5PM SS2118

Tutorials

Section Tutorial Session Time Location TA
LEC0101 101 Mon, 12PM-1PM SS1083 Junhao
  102   SS2105 Weizheng
  103   UC52 Liam
LEC0201 201 Mon, 4PM-5PM SS1072 Haochen
  202   SS1074 Ziyi
  203   SS1087 Tong


Homeworks & exams

  Out Due Credit Content Solution
Homework 1 Sep 21, 12 AM Oct 4, 11:59 PM 10% Lec01-03 [Q1-Q4.pdf],[Q5.pdf]
Homework 2 Oct 5, 12 AM Oct 18, 11:59 PM 10% Lec04-05 [Q1.pdf],[Q4-Q5.pdf]
Midterm exam Oct 25, location: EX200 LEC0101, 11AM-1PM
LEC0201, 3PM-5PM
30% Lec01-06 [sol-101.pdf]
[sol-201.pdf]
Homework 3 Nov 2, 12 AM Nov 19, 11:59 PM 10% Lec06-08 [Q1-Q3.pdf],[Q4-Q5.pdf]
Homework 4 Nov 16, 12 AM Nov 29, 11:59 PM 10% Lec06-10 [sol-derivation], [sol-coding]
Final exam Dec 9, 2-4 PM   30% Lec01-11  


Lectures

This is a preliminary schedule; it may change throughout the term.

Dates Lecture Topic Lecture Slides Tutorial Suggested Readings
Sep 11 Logistics and introduction [Lec00.pdf] No tutorial ISL 1
Sep 13 Introduction to Statistical Learning,
the bias-variance tradeoff
[Lec01.pdf] Sep 18, [Notes], [R code] [Linear algebra review]
ISL 2.1, 2.2, 2.3
Sep 20 Linear regression [Lec02.pdf] Sep 25, [R code] ISL 3.1, 3.2, 3.3, 3.6
Sep 27 Model selection under linear models,
cross-validation
[Lec03.pdf] Oct 2, [Rmd] ISL 5.1, 6.1.1, 6.1.2, 6.1.3
ESL 7.10
Oct 4 Subset selection under linear models,
regularized linear regression
[Lec04.pdf] No tutorial, Thanksgiving
[OLS vs Ridge] (optional reading)
ISL 6.1.1, 6.1.2, 6.2, 6.4
ESL 7.10
Oct 11 More on regularized linear regression [Lec05.pdf] Oct 16 ISL 6.2, 6.4
Oct 18 Move beyond linearity [Lec06.pdf] No tutorial ISL 7.1-7.4, 7.6, 7.7
Oct 23 Introduction to classification [Lec07.pdf] Oct 30, [Rmd] ISL 4.1, 4.2
Nov 1 Logistic regression, gradient descent [Lec08.pdf] No tutorial ISL 4.3-4.6, 9.1-9.5
ESL 4.3, 4.4, 12.1, 12.2
PRML 4.1, 4.3
ConvOpt 2.1-2.3, 3.1, 3.2, 4.1, 4.2
[Multivariate calculus]
Nov 15 Multi-class logistic regression,
discriminant analysis
[Lec09.pdf] Nov 20, [Rmd] ISL 4.3-4.6
ESL 5.1, 5.2
ConvOpt 2.5, 5.1, 5.2
Nov 22 Tree based approaches: decision tree [Lec10.pdf] No tutorial ISL 8.1
ESL 9.2
Nov 29 Bootstrap, bagging,
random forest, boosting
[Lec11.pdf] Dec 4, [Rmd] ISL 5.2, 5.3.4, 8.2
ESL 7.11, 8.7, 10.1, 15
[Notes on gradient boosting machine]
Dec 6 Unsupervised learning: K-means clustering and PCA [Lec12.pdf] Class ends