Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

ISYE 6501 Course homework assignment one solution, Assignments of Socialization and the Life Course

University of Western Ontario Socialization and the Life Course

ISYE 6501 Course homework assignment one solution

Typology: Assignments

2023/2024

Uploaded on 06/15/2025

daniel-rong 🇨🇦

5 documents

1 / 8

This page cannot be seen from the preview

Don't miss anything!

HW1

2024-08-25

Question 2.1

Describe a situation or problem from your job, everyday life, current events, etc., for which a classification

model would be appropriate. List some (up to 5) predictors that you might use.

Answer

Since my friends are fairly into reading sometimes I have to pick a book as gift for their birthdays. To narrow

down whether a friend of mine will like the book or not I will check Goodreads for some of the following

information that will make good predictors

1. Whether the person has that book on their ‘To-read’ class (binary variable). If yes, then this is likely

to predictive of it being a good choice

2. The number of books the person has read in the genre of the book that I have chosen (numerical

variable).

3. The number of books the person has read in a similar genre to the book I have chosen (for example

scifi books are likely similar to young-adult/action adventure books)

4. Time since the last time that person finishes a book (in months). The longer its been, the more likely

the person will like the book.

5. The number of friends of that person who have read that book or have that book in their ‘to-read’

books (numerical variable). The more friends that do that , the more likely that person will like the

book.

Question 2.2

The files credit_card_data.txt (without headers) and credit_card_data-headers.txt (with headers) contain

a dataset with 654 data points, 6 continuous and 4 binary predictor variables. It has anonymized credit

card applications with a binary response variable (last column) indicating if the application was positive

or negative. The dataset is the “Credit Approval Data Set” from the UCI Machine Learning Repository

(https://archive.ics.uci.edu/ml/datasets/Credit+Approval) without the categorical variables and without

data points that have missing values.

1. Using the support vector machine function ksvm contained in the R package kernlab, find a good

classifier for this data. Show the equation of your classifier, and how well it classifies the data points

in the full data set. (Don’t worry about test/validation data yet; we’ll cover that topic soon.)

df <- read.table("C:/Users/tungh/OneDrive/Georgia Tech/ISYE6501/Module 2/hw1/data 2.2/credit_card_data-headers.txt",

header = TRUE)

head(df)

## A1 A2 A3 A8 A9 A10 A11 A12 A14 A15 R1

## 1 1 30.83 0.000 1.25 1 0 1 1 202 0 1

## 2 0 58.67 4.460 3.04 1 0 6 1 43 560 1

1

Partial preview of the text

Download ISYE 6501 Course homework assignment one solution and more Assignments Socialization and the Life Course in PDF only on Docsity!

HW

Question 2.

Describe a situation or problem from your job, everyday life, current events, etc., for which a classification model would be appropriate. List some (up to 5) predictors that you might use.

Answer

Since my friends are fairly into reading sometimes I have to pick a book as gift for their birthdays. To narrow down whether a friend of mine will like the book or not I will check Goodreads for some of the following information that will make good predictors

Whether the person has that book on their ‘To-read’ class (binary variable). If yes, then this is likely to predictive of it being a good choice
The number of books the person has read in the genre of the book that I have chosen (numerical variable).
The number of books the person has read in a similar genre to the book I have chosen (for example scifi books are likely similar to young-adult/action adventure books)
Time since the last time that person finishes a book (in months). The longer its been, the more likely the person will like the book.
The number of friends of that person who have read that book or have that book in their ‘to-read’ books (numerical variable). The more friends that do that , the more likely that person will like the book.

Question 2.

The files credit_card_data.txt (without headers) and credit_card_data-headers.txt (with headers) contain a dataset with 654 data points, 6 continuous and 4 binary predictor variables. It has anonymized credit card applications with a binary response variable (last column) indicating if the application was positive or negative. The dataset is the “Credit Approval Data Set” from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Credit+Approval) without the categorical variables and without data points that have missing values.

Using the support vector machine function ksvm contained in the R package kernlab, find a good classifier for this data. Show the equation of your classifier, and how well it classifies the data points in the full data set. (Don’t worry about test/validation data yet; we’ll cover that topic soon.)

df <- read.table ("C:/Users/tungh/OneDrive/Georgia Tech/ISYE6501/Module 2/hw1/data 2.2/credit_card_data-h header = TRUE) head (df)

## A1 A2 A3 A8 A9 A10 A11 A12 A14 A15 R

dim (df)

## [1] 654 11

We first run the starter code in the homework, and with the defaut paremeter C = 100 we are getting close to 86.4% accuracy.

We will also print the coefficients of A1 through Am and Ao

# install.packages('kernlab') data <- as.matrix (df) library ("kernlab")

Warning: package ’kernlab’ was built under R version 4.4.

# call ksvm. Vanilladot is a simple linear kernel. model <- ksvm ( as.matrix (data[, 1 : 10]), data[, 11], type = "C-svc", kernel = "vanilladot", C = 100, scaled = TRUE)

Setting default kernel parameters

# calculate a1...am a <- colSums (model @ xmatrix[[1]] ***** model @ coef[[1]]) a

## A1 A2 A3 A8 A

## A10 A11 A12 A14 A

# calculate a a0 <- - model @ b a

## [1] 0.

# see what the model predicts pred <- predict (model, data[, 1 : 10]) pred

## [1] 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

## [38] 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0

## [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

## [112] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

## [149] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

## [186] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1

Above you can see that with poly = 2 we are getting higher accuracy. We can write a function to tune C as well.

library (kernlab)

# Define the function to evaluate models with varying C values evaluate_svm <- function (data, C_values = 10 ˆseq ( - 3, 3, by = 1)) { best_accuracy <- 0 best_C <- NA

for (C_value in C_values) { # Train the SVM model with the current C value model <- ksvm ( as.matrix (data[, 1 : 10]), data[, 11], type = "C-svc", kernel = "polydot", kpar = list (degree = 2), C = C_value, scaled = TRUE)

# Make predictions pred <- predict (model, data[, 1 : 10])

# Calculate accuracy accuracy <- sum (pred == data[, 11]) /nrow (data)

# Print the accuracy for the current C value cat ("C:", C_value, "Accuracy:", sprintf ("%.2f%%", accuracy ***** 100), " \n ")

# Check if this is the best accuracy so far if (accuracy > best_accuracy) { best_accuracy <- accuracy best_C <- C_value } }

# Return the best C value and corresponding accuracy cat (" \n The best C value:", best_C, " \n Best Accuracy:", sprintf ("%.2f%%", best_accuracy ***** 100), " \n ") return ( list (best_C = best_C, best_accuracy = best_accuracy)) }

# Example usage result <- evaluate_svm (data)

C: 0.001 Accuracy: 86.39%

C: 0.01 Accuracy: 86.70%

C: 0.1 Accuracy: 87.46%

C: 1 Accuracy: 88.07%

C: 10 Accuracy: 88.84%

C: 100 Accuracy: 89.30%

C: 1000 Accuracy: 88.38%

The best C value: 100

Best Accuracy: 89.30%

Next we try the radical basis function which is what the class reading talks about (https://pyml.sourceforge. net/doc/howto.pdf)

model.4 <- ksvm ( as.matrix (data[, 1 : 10]), data[, 11], type = "C-svc", kernel = "rbfdot", C = 100, scaled = TRUE) pred <- predict (model.4, data[, 1 : 10]) sum (pred == data[, 11]) /nrow (data)

## [1] 0.

It performs exceptionally well, I wonder if it over fits when we have unbalanced data (lots of examples of 1 class but not others)

pred

## [1] 1 1 0 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

## [38] 1 1 1 1 1 1 1 1 1 1 1 0 0 1 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0

## [75] 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0

## [112] 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

## [149] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

## [186] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1

## [223] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1

## [260] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

## [297] 0 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

## [334] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

## [371] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

## [408] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

## [445] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

## [482] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0

## [519] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

## [556] 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0

## [593] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0

## [630] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Looking at the results, it doesnt seeem to predict all 1’s or 0’s. Thus we can conclude that perhaps theres not evident of overfitting on unbalanced data. Without test and validation sets, we can’t determine this completely.

Base on this, I conclude the Gaussian kernel performs the best, followed by 2-polynomial with C = 100 as second best performing model.

Question 2.2.

Using the k-nearest-neighbors classification function kknn contained in the R kknn package, suggest a good value of k, and show how well it classifies that data points in the full data set. Don’t forget to scale the data (scale=TRUE in kknn).

# install.packages('kknn') library ("kknn")

Warning: package ’kknn’ was built under R version 4.4.

Metric (distance):
Accuracy for the test set is 89.
k:
Metric (distance):
Accuracy for the test set is 89.
k:
Metric (distance):
Accuracy for the test set is 89.
k:
Metric (distance):
Accuracy for the test set is 89.
k:
Metric (distance):
Accuracy for the test set is 89.
k:
Metric (distance):
Accuracy for the test set is 89.
k:
Metric (distance):
Accuracy for the test set is 89.
k:
Metric (distance):
Accuracy for the test set is 89.
k:
Metric (distance):
Accuracy for the test set is 89.
k:
Metric (distance):
Accuracy for the test set is 89.
k:
Metric (distance):
Accuracy for the test set is 89.
k:
Metric (distance):
Accuracy for the test set is 89.
k:
Metric (distance):
Accuracy for the test set is 89.
k:
Metric (distance):
Accuracy for the test set is 88.
k:
Metric (distance):
Accuracy for the test set is 88.
k:
Metric (distance):
Accuracy for the test set is 88.
k:
Metric (distance):
Accuracy for the test set is 88.
k:
Metric (distance):
Accuracy for the test set is 88.
k:

Metric (distance): 2

Accuracy for the test set is 88.

k: 30

Metric (distance): 2

Accuracy for the test set is 88.

Using the Eucledian distance, the optimal value for k seems to be around 13 with accuracy for validation set of 89%. Note that we splitting 80% and 20% for train and validation set. Below we print out values for such data, it looks fairly close to the predictions made by the SVM model.

evaluate_knn (df, k = 13, print_pred = TRUE)

## [1] 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 1 1

## [38] 1 1 1 1 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0

## [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

## [112] 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

Levels: 0 1

k: 13

Metric (distance): 2

Accuracy for the test set is 89.

# table(df$R1)

ISYE 6501 Course homework assignment one solution, Assignments of Socialization and the Life Course

Related documents

Partial preview of the text

Download ISYE 6501 Course homework assignment one solution and more Assignments Socialization and the Life Course in PDF only on Docsity!

HW

Question 2.

Question 2.

## A1 A2 A3 A8 A9 A10 A11 A12 A14 A15 R

## [1] 654 11

Warning: package ’kernlab’ was built under R version 4.4.

Setting default kernel parameters

## A1 A2 A3 A8 A

## A10 A11 A12 A14 A

## [1] 0.

## [1] 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

## [38] 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0

## [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

## [112] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

## [149] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

## [186] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1

C: 0.001 Accuracy: 86.39%

C: 0.01 Accuracy: 86.70%

C: 0.1 Accuracy: 87.46%

C: 1 Accuracy: 88.07%

C: 10 Accuracy: 88.84%

C: 100 Accuracy: 89.30%

C: 1000 Accuracy: 88.38%

The best C value: 100

Best Accuracy: 89.30%

## [1] 0.

## [1] 1 1 0 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

## [38] 1 1 1 1 1 1 1 1 1 1 1 0 0 1 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0

## [75] 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0

## [112] 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

## [149] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

## [186] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1

## [223] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1

## [260] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

## [297] 0 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

## [334] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

## [371] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

## [408] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

## [445] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

## [482] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0

## [519] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

## [556] 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0

## [593] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0

## [630] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Question 2.2.

Warning: package ’kknn’ was built under R version 4.4.

Metric (distance):

Accuracy for the test set is 89.

k:

Metric (distance):

Accuracy for the test set is 89.

k:

Metric (distance):

Accuracy for the test set is 89.

k:

Metric (distance):

Accuracy for the test set is 89.

k:

Metric (distance):

Accuracy for the test set is 89.

k:

Metric (distance):

Accuracy for the test set is 89.

k:

Metric (distance):

Accuracy for the test set is 89.

k:

Metric (distance):

Accuracy for the test set is 89.

k:

Metric (distance):

Accuracy for the test set is 89.

k:

Metric (distance):

Accuracy for the test set is 89.

k: