Skip to content Skip to sidebar Skip to footer

Python Nltk Code Snippet To Train A Classifier (naive Bayes) Using Feature Frequency

I was wondering if anyone could help me through a code snippet that demonstrates how to train Naive Bayes classifier using a feature frequency method as opposed to feature presence

Solution 1:

In the link you sent it says this function is feature extractor that simply checks whether each of these words is present in a given document.

Here is the whole code with numbers for each line:

1     all_words = nltk.FreqDist(w.lower() for w in movie_reviews.words())
2     word_features = all_words.keys()[:2000] 

3defdocument_features(document): 
4          document_words = set(document) 
5          features = {}
6for word in word_features:
7               features['contains(%s)' % word] = (word in document_words)
8return features

In line 1 it created a list of all words.

In line 2 it takes the most frequent 2000 words.

3 the definition of the function

4 converts the document list (I think it must be a list) and converts the list to a set.

5 declares a dictionary

6 iterates over all of the most frequent 2000 words

7 creates a dictionary where the key is 'contains(theword)' and the value is either true or false. True if the word is present in the document, false otherwise

8 returns the dictionary which is shows whether the document contains the most frequent 2000 words or not.

Does this answer your question?

Solution 2:

For training, create appropriate FreqDists that you can use to create ProbDists, than can then be passed in to the NaiveBayesClassifier. But the classification actually works on feature sets, which use boolean values, not frequencies. So if you want to classify based on a FreqDist, you'll have to implement your own classifier, that does not use the NLTK feature sets.

Solution 3:

Here's a method which will help you :

''' Returns the frequency of letters '''defget_freq_letters(words):
    fdist = nltk.FreqDist([char.lower() for word in words for char in word if char.isalpha()])
    freq_letters = {}
    for key,value in fdist.iteritems():
        freq_letters[key] = value
    return freq_letters

Post a Comment for "Python Nltk Code Snippet To Train A Classifier (naive Bayes) Using Feature Frequency"