In this practical, we will explore a real-world dataset and study two applications, the prediction of wine quality and type (red or white) from the chemical attributes.
Download and import the Wine Quality dataset from the UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/wine+quality.
This dataset contains 11 physicochemical attributes of 1599 kinds of red wines (winequality-red.csv
) and 4898 kinds of white wines (winequality-white.csv
) as well as a quality score between 0 and 10.
You can look at the winequality.names
file for more details about the dataset.
What are the characteristics of the dataset?
The data considers “Vinho verde”, a product from the Minho (north-west) region of Portugal.
Wine accounts for 15% of the Portuguese production
10% of the wine is exported (mostly white).
The data was collected between May 2004 and Feb. 2007
The data was collected by the official certification entity (Commission of the Vinhos Verdes Region - CVRVV)
Only protected designation of origin samples
To prepare for the tutorial, make sure you have the necessary libraries installed. Check that you can run the following code.
#Essential libraries for data representation
import pandas as pd
import numpy as np
#Display and plotting libraries
from IPython.display import display
import seaborn as sn
import matplotlib.pyplot as plt
#Statistics libraries
from scipy import stats
from collections import Counter
#ML Libraries
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.preprocessing import OneHotEncoder
# For dimension reduction
from sklearn.manifold import TSNE
from sklearn.manifold import Isomap
from sklearn.decomposition import PCA
#DNN libraries
import tensorflow as tf
from tensorflow.keras.models import Sequential