Are the variables Length, Left, Right, Bottom, Top, and Diagonal together a good predictor of whether a bill is counterfeit or genuine?

Data set found on Kaggle: Swiss banknote counterfeit detection

Inspiration: Our group thinks that the subject of detecting fake money is very interesting. Moreover, The data is suitable for multivariate analysis.

The data consists of 200 observations on 7 variables

Counterfeit: indicator random variable: 1 if counterfeit, 0 if genuine

Length: length of the bill from the left edge to the right edge in millimeters

Left: length of the left edge from bottom to top in millimeters

Right: length of the right edge from bottom to top in millimeters

Bottom: bottom margin width in millimeters

Top: top margin width in millimeters

Diagonal: Length of diagonal of the bill in millimeters

Response variable: Counterfeit

Predictor variables: Length, Left, Right, Bottom, Top, and Diagonal

Main Question: Are the variables Length, Left, Right, Bottom, Top, and Diagonal together a good predictor of whether a bill is counterfeit or genuine?

We are planning to use inference for multivariate means to compare the different sample mean vectors for counterfeit vs real bills, and to test if they are different. We will also use PCA to try to reduce the number of variables if possible.

PCA analysis group includes: This group will use PCA to reduce the dimension of data. At the same time look for potential correlations between different predictors.

Hypothesis testing group 1: This group will compare the multivariate means for counterfeit and real bills using hypothesis testing.

Hypothesis testing group 2 includes: This group will try to use appropriate statistical techniques to infer the population means of each predictor variable for counterfeit and real bills.

 

Are the variables Length, Left, Right, Bottom, Top, and Diagonal together a good predictor of whether a bill is counterfeit or genuine?
Scroll to top