1.4 Feature Selection

It is not always a good thing to deal with feature sets having maybe thousands of features or even more.

The curse of dimensionality: As the dimensionality increases, the volume of the space increases so fast that the available data becomes sparse.
High dimensionality tends to make models more complex and difficult to interpret.
Often lead models to overfit on the training data.

CAUTION: Feature selection is not Feature extraction!

Feature selection will return a subset of the original set of features. There are three main strategies:

Filter methods: They select features based on metrics like correlation, mutual information, relationship of each feature with the target variable to be predicted and so on
Wrapper methods: They capture interaction between multiple features by using a recursive approach to build multiple models using feature subsets and select the best subset of features giving us the best-performing model.
Embedded methods: They use Machine Learning models to rank and score feature variables based on their importance

Example of Filter Methods