Home » Python » python – how to work with sklearn pipeline, if features are already extracted?-Exceptionshub

python – how to work with sklearn pipeline, if features are already extracted?-Exceptionshub

Posted by: admin February 24, 2020 Leave a comment

Questions:

Hi I’m learning about text classification and if I have a dataset like this one:

enter image description here

My question: If I split training and testing set from the dataset, and do the feature extraction separately (I’m working with the word embedding).

Is it correct to pass the features from the training and testing dataset (names: feature_array_trainingset and feature_array_testingset) to the pipeline directly this way:

from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.pipeline import Pipeline
from sklearn import svm

pipeline = Pipeline([('classifier',svm.SVC())])

pipeline.fit(feature_array_trainingset,train['Category'])

predictions = pipeline.predict(feature_array_testingset)
print (classification_report(predictions,test['Category']))

It returns the classification result, but I’m not quite sure whether I’m doing the correct process or not.

How to&Answers: