Assignments

Assignments#

Complete Exercise 4.1 in Hsieh’s book.
Complete Exercise 4.5 in Hsieh’s book. (Source data: SWE_Nino_Nina.csv)
Complete Exercise 4.6 in Hsieh’s book. (Source data: nino12_long_anom.csv & nino34_long_anom.csv)
For the SWE data in SWE_Nino_Nina.csv, calculate the 95% CI of their median value using bootstrapping. Be sure to avoid the percentile method.
Reproduce the figure for Q3 in the Pre-course Quiz. (Figure Link) Figure credit: Nicolas P. Rougier (2021).

Complete Exercise 5.7 in Hsieh’s book. (Source data: Milwaukee_wind_direction_ozone.csv)
Visualize SWE_tele.csv and make a few arguments that the generated figures can visually support.
Complete Exercise 5.8 in Hsieh’s book. (Source data: SWE_tele.csv)
Complete Exercise 5.9 in Hsieh’s book. (Source data: SWE_tele.csv)

Complete Exercise 6.5 in Hsieh’s book. Please use the cross-validation technique to tune at least one model hyperparameter. (Source data: SWE_tele.csv)
Complete Exercise 8.1 in Hsieh’s book. Please tune the learning rate for the MLP NN model. I have generated the input data for you, which can be downloaded from this link (data_noise-*.csv).
Visualize the regression results of Exercise 8.1 at least for the case with the Gaussian noise at 0.5 times the standard deviation of \( y_{\textrm{signal}}\).

Complete Exercise 12.1 in Hsieh’s book. For subquestion (a), visualize the results to reproduce Figure 12.1(b). Make sure to label/mark correct and incorrect predictions. (Source data: forest_testing.csv & forest_testing.csv)
Following the first question, use the support vector machine to classify the forest types in the given dataset. Feel free to choose one-versus-the-rest or one-versus-one approach (and specify your choice). Train using the first two predictors and compare the results with the linear discriminant analysis by visualizing them similarly.
Generate a synthetic signal with added noise \(y = \sin x + 0.5 \times \mathcal{N}(0, 1)\) and collect 40 data points that are distributed within the range \(x = [0, 4\pi]\). Now use (a) ridge regression, (b) kernel ridge regression, and (c) Gaussian process regression to model the data and give the prediction in the range \(x = [0, 8\pi]\) with visualization. Describe and justify your kernel selection and hyperparameter tuning process whenever necessary. Compare the results from three regression methods.

Complete Exercise 12.5 in Hsieh’s book. Please use the Linear Discriminant Analysis (LDA) model. You’ll need to determine some details about how to process the data. Specify them. (Source data: SydneyAirport_weather.csv)
Complete Exercise 14.3 in Hsieh’s book. Please use the same preprocessing workflow as in the previous question for the data. Challenge yourself and see if you can build a better model than LDA! (Source data: SydneyAirport_weather.csv)
Complete Exercise 14.5 in Hsieh’s book, including (b). Visualize the regression results by plotting the data and predictions alongside several important predictors. (Source data: bike_sharing_daily_data.csv & bike_sharing_Readme.txt)