Feature Importance

Feature Importance for regression

from sklearn.datasets import make_regression

# define dataset

X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, random_state=1)

# summarize the dataset

print(X.shape, y.shape)

#linear regression feature importance
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
from matplotlib import pyplot
# define dataset
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, random_state=1)
# define the model
model = LinearRegression()
# fit the model
model.fit(X, y)
# get importance
importance = model.coef_
# summarize feature importance
for i,v in enumerate(importance):
print('Feature: %0d, Score: %.5f' % (i,v))
# plot feature importance
pyplot.bar([x for x in range(len(importance))], importance)
pyplot.show()

feature importance for classification

<div><br class="Apple-interchange-newline"># test classification dataset
from sklearn.datasets import make_classification
# define dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, random_state=1)
# summarize the dataset
print(X.shape, y.shape)</div>

# test classification dataset
from sklearn.datasets import make_classification
# define dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, random_state=1)
# summarize the dataset
print(X.shape, y.shape)




Decision Tree Feature ImportanceDecision tree algorithms like classification and regression trees (CART) offer importance scores based on the reduction in the criterion used to select split points, like Gini or entropy.
This same approach can be used for ensembles of decision trees, such as the random forest and stochastic gradient boosting algorithms.
Let’s take a look at a worked example of each.
CART Feature ImportanceWe can use the CART algorithm for feature importance implemented in scikit-learn as the DecisionTreeRegressor and DecisionTreeClassifier classes.
After being fit, the model provides a feature_importances_ property that can be accessed to retrieve the relative importance scores for each input feature.
Let’s take a look at an example of this for regression and classification.
CART Regression Feature ImportanceThe complete example of fitting a DecisionTreeRegressor and summarizing the calculated feature importance scores is listed below.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# decision tree for feature importance on a regression problem
from sklearn.datasets import make_regression
from sklearn.tree import DecisionTreeRegressor
from matplotlib import pyplot
# define dataset
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, random_state=1)
# define the model
model = DecisionTreeRegressor()
# fit the model
model.fit(X, y)
# get importance
importance = model.feature_importances_
# summarize feature importance
for i,v in enumerate(importance):
	print('Feature: %0d, Score: %.5f' % (i,v))
# plot feature importance
pyplot.bar([x for x in range(len(importance))], importance)
pyplot.show()
Running the example fits the model, then reports the coefficient value for each feature.
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.
The results suggest perhaps three of the 10 features as being important to prediction.
1
2
3
4
5
6
7
8
9
10
Feature: 0, Score: 0.00294
Feature: 1, Score: 0.00502
Feature: 2, Score: 0.00318
Feature: 3, Score: 0.00151
Feature: 4, Score: 0.51648
Feature: 5, Score: 0.43814
Feature: 6, Score: 0.02723
Feature: 7, Score: 0.00200
Feature: 8, Score: 0.00244
Feature: 9, Score: 0.00106
A bar chart is then created for the feature importance scores.
Bar Chart of DecisionTreeRegressor Feature Importance Scores
CART Classification Feature ImportanceThe complete example of fitting a DecisionTreeClassifier and summarizing the calculated feature importance scores is listed below.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# decision tree for feature importance on a classification problem
from sklearn.datasets import make_classification
from sklearn.tree import DecisionTreeClassifier
from matplotlib import pyplot
# define dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, random_state=1)
# define the model
model = DecisionTreeClassifier()
# fit the model
model.fit(X, y)
# get importance
importance = model.feature_importances_
# summarize feature importance
for i,v in enumerate(importance):
	print('Feature: %0d, Score: %.5f' % (i,v))
# plot feature importance
pyplot.bar([x for x in range(len(importance))], importance)
pyplot.show()
Running the example fits the model, then reports the coefficient value for each feature.
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.
The results suggest perhaps four of the 10 features as being important to prediction.
1
2
3
4
5
6
7
8
9
10
Feature: 0, Score: 0.01486
Feature: 1, Score: 0.01029
Feature: 2, Score: 0.18347
Feature: 3, Score: 0.30295
Feature: 4, Score: 0.08124
Feature: 5, Score: 0.00600
Feature: 6, Score: 0.19646
Feature: 7, Score: 0.02908
Feature: 8, Score: 0.12820
Feature: 9, Score: 0.04745
A bar chart is then created for the feature importance scores.
Bar Chart of DecisionTreeClassifier Feature Importance Scores
Random Forest Feature ImportanceWe can use the Random Forest algorithm for feature importance implemented in scikit-learn as the RandomForestRegressor and RandomForestClassifier classes.
After being fit, the model provides a feature_importances_ property that can be accessed to retrieve the relative importance scores for each input feature.
This approach can also be used with the bagging and extra trees algorithms.
Let’s take a look at an example of this for regression and classification.
Random Forest Regression Feature ImportanceThe complete example of fitting a RandomForestRegressor and summarizing the calculated feature importance scores is listed below.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# random forest for feature importance on a regression problem
from sklearn.datasets import make_regression
from sklearn.ensemble import RandomForestRegressor
from matplotlib import pyplot
# define dataset
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, random_state=1)
# define the model
model = RandomForestRegressor()
# fit the model
model.fit(X, y)
# get importance
importance = model.feature_importances_
# summarize feature importance
for i,v in enumerate(importance):
	print('Feature: %0d, Score: %.5f' % (i,v))
# plot feature importance
pyplot.bar([x for x in range(len(importance))], importance)
pyplot.show()
Running the example fits the model, then reports the coefficient value for each feature.
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.
The results suggest perhaps two or three of the 10 features as being important to prediction.
1
2
3
4
5
6
7
8
9
10
Feature: 0, Score: 0.00280
Feature: 1, Score: 0.00545
Feature: 2, Score: 0.00294
Feature: 3, Score: 0.00289
Feature: 4, Score: 0.52992
Feature: 5, Score: 0.42046
Feature: 6, Score: 0.02663
Feature: 7, Score: 0.00304
Feature: 8, Score: 0.00304
Feature: 9, Score: 0.00283
A bar chart is then created for the feature importance scores.
Bar Chart of RandomForestRegressor Feature Importance Scores
Random Forest Classification Feature ImportanceThe complete example of fitting a RandomForestClassifier and summarizing the calculated feature importance scores is listed below.
# random forest for feature importance on a classification problem
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from matplotlib import pyplot
# define dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, random_state=1)
# define the model
model = RandomForestClassifier()
# fit the model
model.fit(X, y)
# get importance
importance = model.feature_importances_
# summarize feature importance
for i,v in enumerate(importance):
	print('Feature: %0d, Score: %.5f' % (i,v))
# plot feature importance
pyplot.bar([x for x in range(len(importance))], importance)
pyplot.show()
Running the example fits the model, then reports the coefficient value for each feature.
Note: Your results may vary given the stochastic nature of the algorithm or evaluation
 procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.
The results suggest perhaps two or three of the 10 features as being important to prediction.

buddhareddydatascience

Search This Blog

Feature Importance

Feature Importance for regression

feature importance for classification

Decision Tree Feature Importance

CART Feature Importance

CART Regression Feature Importance

CART Classification Feature Importance

Random Forest Feature Importance

Random Forest Regression Feature Importance

Random Forest Classification Feature Importance

Comments

Post a Comment