All Courses

ValueError: Found input variables with inconsistent numbers of samples: [143, 426]

By Jennifer, 2 years ago
  • Bookmark
0

How can I fix this error it throws? ValueError: Found input variables with inconsistent numbers of samples:[143, 426]


#split the data set into independent (X) and dependent (Y) data sets
X = df.iloc[:,2:31].values
Y = df.iloc[:,1].values

#split the data qet into 75% training and 25% testing
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.25, random_state = 0)

#scale the data (feature scaling)
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_train = sc.fit_transform(X_test)

#Using Logistic Regression Algorithm to the Training Set

classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, Y_train)

and the shape of X_train, Y_train:

X_train.shape
(143, 29)
Y_train.shape
(426,)

error msg: ValueError Traceback (most recent call last) in () 2 3 classifier = LogisticRegression(random_state = 0) ----> 4 classifier.fit(X_train, Y_train) 5 #Using KNeighborsClassifier Method of neighbors class to use Nearest Neighbor algorithm 6


2 frames /usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py in check_consistent_length(*arrays) 210 if len(uniques) > 1: 211 raise ValueError("Found input variables with inconsistent numbers of" --> 212 " samples: %r" % [int(l) for l in lengths]) 213 214

ValueError: Found input variables with inconsistent numbers of samples: [143, 426]

Python
Machine-learning
3 Answers
0

You have a bug at line 11 where you are assigning to X_train instead of X_test. Take a look at the corrected code below.

#split the data set into independent (X) and dependent (Y) data sets
X = df.iloc[:,2:31].values
Y = df.iloc[:,1].values

#split the data qet into 75% training and 25% testing
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.25, random_state = 0)

#scale the data (feature scaling)
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

#Using Logistic Regression Algorithm to the Training Set

classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, Y_train)

Also do not use fit_transform on X_test. You won't use the same mean and std as that calculated in X_train.

0
Gilbertcane

Sounds like the shapes of your labels and predictions are not in alignment. I faced a similar problem while fitting a regression model . The problem in my case was, Number of rows in X was not equal to number of rows in y. In most case, x as your feature parameter and y as your predictor. But your feature parameter should not be 1D. So check the shape of x and if it is 1D, then convert it from 1D to 2D.


x.reshape(-1,1)


Also, you likely get problems because you remove rows containing nulls in X_train and y_train independent of each other. y_train probably has few, or no nulls and X_train probably has some. So when you remove a row in X_train and the same row is not removed in y_train it will cause your data to be unsynced and have different lenghts. Instead you should remove nulls before you separate X and y.



3
Shashankshnau1993@gmail.com

The ValueError: Found input variables with inconsistent numbers of samples error is typically raised when you are trying to fit a machine learning model using data that have different numbers of rows or samples for the input features and the target variable. This can occur when you are trying to fit the model using a training dataset that has a different number of rows for the input features and the target variable.


To resolve this error, you will need to ensure that your input data has the same number of rows or samples for both the input features and the target variable. You can do this by checking the shape of your input data and making sure that it is consistent. You may also need to check your data preprocessing steps to ensure that you are not accidentally dropping or adding rows to your input data.


It is also possible that the error could be caused by a bug in your code, such as an issue with how you are indexing or slicing your data. In this case, you may need to carefully review your code and look for any issues that could be causing the error.


If you are unable to resolve the error after reviewing your data and code, it may be helpful to include more context and details about your specific situation in your question. This can help others better understand the problem and provide more specific guidance on how to resolve it.

Your Answer

Webinars

How To Land a Job in Data Science?

Feb 9th (7:00 PM) 182 Registered
More webinars

Related Discussions

Running random forest algorithm with one variable

View More