Assignments#
Problem set #1#
Complete Exercise 4.1 in Hsieh’s book.
Complete Exercise 4.5 in Hsieh’s book. (Source data:
SWE_Nino_Nina.csv)Complete Exercise 4.6 in Hsieh’s book. (Source data:
nino12_long_anom.csv&nino34_long_anom.csv)For the SWE data in
SWE_Nino_Nina.csv, calculate the 95% CI of their median value using bootstrapping. Be sure to avoid the percentile method.Reproduce the figure for Q3 in the Pre-course Quiz. (Figure Link) Figure credit: Nicolas P. Rougier (2021).
Problem set #2#
Complete Exercise 5.7 in Hsieh’s book. (Source data:
Milwaukee_wind_direction_ozone.csv)Visualize
SWE_tele.csvand make a few arguments that the generated figures can visually support.Complete Exercise 5.8 in Hsieh’s book. (Source data:
SWE_tele.csv)Complete Exercise 5.9 in Hsieh’s book. (Source data:
SWE_tele.csv)
Problem set #3#
Complete Exercise 6.5 in Hsieh’s book. Please use the cross-validation technique to tune at least one model hyperparameter. (Source data:
SWE_tele.csv)Complete Exercise 8.1 in Hsieh’s book. Please tune the learning rate for the MLP NN model. I have generated the input data for you, which can be downloaded from this link (
data_noise-*.csv).Visualize the regression results of Exercise 8.1 at least for the case with the Gaussian noise at 0.5 times the standard deviation of \( y_{\textrm{signal}}\).
Problem set #4#
Complete Exercise 12.1 in Hsieh’s book. For subquestion (a), visualize the results to reproduce Figure 12.1(b). Make sure to label/mark correct and incorrect predictions. (Source data:
forest_testing.csv&forest_testing.csv)Following the first question, use the support vector machine to classify the forest types in the given dataset. Feel free to choose one-versus-the-rest or one-versus-one approach (and specify your choice). Train using the first two predictors and compare the results with the linear discriminant analysis by visualizing them similarly.
Generate a synthetic signal with added noise \(y = \sin x + 0.5 \times \mathcal{N}(0, 1)\) and collect 40 data points that are distributed within the range \(x = [0, 4\pi]\). Now use (a) ridge regression, (b) kernel ridge regression, and (c) Gaussian process regression to model the data and give the prediction in the range \(x = [0, 8\pi]\) with visualization. Describe and justify your kernel selection and hyperparameter tuning process whenever necessary. Compare the results from three regression methods.
Problem set #5#
Complete Exercise 12.5 in Hsieh’s book. Please use the Linear Discriminant Analysis (LDA) model. You’ll need to determine some details about how to process the data. Specify them. (Source data:
SydneyAirport_weather.csv)Complete Exercise 14.3 in Hsieh’s book. Please use the same preprocessing workflow as in the previous question for the data. Challenge yourself and see if you can build a better model than LDA! (Source data:
SydneyAirport_weather.csv)Complete Exercise 14.5 in Hsieh’s book, including (b). Visualize the regression results by plotting the data and predictions alongside several important predictors. (Source data:
bike_sharing_daily_data.csv&bike_sharing_Readme.txt)