Skip to content Skip to sidebar Skip to footer

Working Of Labelencoder In Sklearn

Say I have the following input feature: hotel_id = [1, 2, 3, 2, 3] This is a categorical feature with numeric values. If I give it to the model as it is, the model will treat it a

Solution 1:

The LabelEncoder is a way to encode class levels. In addition to the integer example you've included, consider the following example:

>>>from sklearn.preprocessing import LabelEncoder>>>le = LabelEncoder()>>>>>>train = ["paris", "paris", "tokyo", "amsterdam"]>>>test = ["tokyo", "tokyo", "paris"]>>>le.fit(train).transform(test)
array([2, 2, 1]...)

What the LabelEncoder allows us to do, then, is to assign ordinal levels to categorical data. However, what you've noted is correct: namely, the [2, 2, 1] is treated as numeric data. This is a good candidate for using the OneHotEncoder for dummy variables (which I know you said you were hoping not to use).

Note that the LabelEncoder must be used prior to one-hot encoding, as the OneHotEncoder cannot handle categorical data. Therefore, it is frequently used as pre-cursor to one-hot encoding.

Alternatively, it can encode your target into a usable array. If, for instance, train were your target for classification, you would need a LabelEncoder to use it as your y variable.

Solution 2:

If you are running a classification model then the labels are treated as classes and the order is ignored. You don't need to onehot.

Solution 3:

A way to handle this problem is to change your numbers to label with package inflect

So I have been visiting all numbers of hotels id's and I have changed them into words for example 1 -> 'one' and 2 -> 'two' ... 99 -> 'ninety-nine'

import inflect
p = inflect.engine()

def toNominal(df,column):
for index, rowin df.iterrows():
    df.loc[index, column] =  p.number_to_words(df.loc[index, column])

toNominal(df, 'hotel_id')

Post a Comment for "Working Of Labelencoder In Sklearn"