feature selection (변수선택) (차원축소)

feature selection (변수선택) (차원축소)

2021. 5. 8. 03:17ㆍ데이터 사이언스/데이터 변수 선택

ㅇ 모듈 임포트

from sklearn.feature_selection import RFE

1) 변수 중요도 ################################

ㅇ 변수 중요도 메소드 정의

def show_feature_importance(model, data):

df=pd.DataFrame({'column':list(data.columns),

'score':list(model.feature_importances_),

})

plt.figure(figsize=(20,10))

sns.barplot(x="score", y="column", data=df)

plt.yticks(size=20)

plt.xticks(size=20)

plt.xlabel('score', size=20)

plt.ylabel('column', size=20)

plt.show()

ㅇ 변수 중요도 예시

2) RFE (recursive feature elimination #########################################

# Backward 방식중 하나로, 모든 변수를 우선 다 포함시킨 후 반복해서 학습을 진행하면서 중요도가 낮은 변수를 하나씩 제거하는 방식

ㅇ 예시 코드

fs_model = CatBoostClassifier(max_depth = 10)

rfe = RFE(fs_model, 15)

fit = rfe.fit(X_train, y_train)

print("Num Features: ", fit.n_features_)

print("Selected Features: ", fit.support_)

print("Feature Ranking: " ,fit.ranking_)

3) shap value

ㅇ 예시 코드

explainer = shap.TreeExplainer(lgb_model) # 트리 모델 Shap Value 계산 객체 지정
shap_values = explainer.shap_values(test_x) # Shap Values 계산

shap.summary_plot(shap_values, test_x)

4) permutation importance

5) pca

KS-STORY