In this practical, we will explore a real-world dataset and study two applications, the prediction of wine quality and type (red or white) from the chemical attributes.


Description of the dataset

Download and import the Wine Quality dataset from the UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/wine+quality.

This dataset contains 11 physicochemical attributes of 1599 kinds of red wines (winequality-red.csv) and 4898 kinds of white wines (winequality-white.csv) as well as a quality score between 0 and 10.

You can look at the winequality.names file for more details about the dataset.

What are the characteristics of the dataset?

Região_do_Minho.png

Picture 1.png


To prepare for the tutorial, make sure you have the necessary libraries installed. Check that you can run the following code.

#Essential libraries for data representation
import pandas as pd
import numpy as np

#Display and plotting libraries
from IPython.display import display
import seaborn as sn
import matplotlib.pyplot as plt

#Statistics libraries
from scipy import stats
from collections import Counter

#ML Libraries
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.preprocessing import OneHotEncoder

# For dimension reduction
from sklearn.manifold import TSNE
from sklearn.manifold import Isomap
from sklearn.decomposition import PCA

#DNN libraries
import tensorflow as tf
from tensorflow.keras.models import Sequential

Task 1: Importing and Feature Engineering