World's Best AI Learning Platform with profoundly Demanding Certification Programs
Designed by IITian's, only for AI Learners.
dataset = pd.read_csv('diabetes-data.csv') zero_not_accepted = ['Glucose', 'BloodPressure', 'SkinThickness', 'BMI', 'Insulin'] for column in zero_not_accepted: dataset[column] = dataset[column].replace(0, np.NaN) mean = int(dataset[column].mean(skipna=True)) dataset[column] = dataset[column].replace(np.NaN, mean) X = dataset.iloc[:, 0:8] y = dataset.iloc[:, 8] X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, test_size=0.2) print(X_test) sc_X = StandardScaler() X_train = sc_X.fit_transform(X_train) X_test = sc_X.transform(X_test) classifier = KNeighborsClassifier(n_neighbors=11, p=2, metric="euclidean") import math math.sqrt(len(y_test)) classifier.fit(X_train, y_train) y_pred = classifier.predict(X_test) cm = confusion_matrix(y_test, y_pred)
My final confusion matrix is [[94 13] [15 32]]
This is where it get confusing, if I calculate the F1 score manually, I get 0.8704. However, in python it returned me 0.6956 using f1_score(y_test, y_pred). Can anyone please explain to me what was the issues?
Additional information: I tried to print the classification_report(y_test, y_pred)) and this is the output: *
precision recall f1-score support 0 0.86 0.88 0.87 107 1 0.71 0.68 0.70 47 accuracy 0.82 154 macro avg 0.79 0.78 0.78 154 weighted avg 0.82 0.82 0.82 154
Scikit numbers order in the confusion matrix are not the same as the order you expect / have in your books/lecture.
For scikit learn order of numbers in the matrix is :
TN FN FP TP So F1 = 2TP / (2TP + FP + FN) F1 = 2*32 / (2*32 + 15 + 13) F1 = 0.6956
is the good answer.
You did the calculs as the matrix numbers were ordered :
TP FP FN TN F1 = 2*94 / 2*94+13+15 F1 = 0.8703
Which is wrong as scikit matrix numbers are not in this order.