N_gram Frequency Python Ntlk
I want to write a function that returns the frequency of each element in the n-gram of a given text. Help please. I did this code fo counting frequency of 2-gram code: from nltk i
Solution 1:
I don't see an expected output section, hence I assume this is what might need.
import nltk
def compute_freq(sentence, n_value=2):
tokens = nltk.word_tokenize(sentence)
ngrams = nltk.ngrams(tokens, n_value)
ngram_fdist = nltk.FreqDist(ngrams)
return ngram_fdist
By default this function returns frequency distribution of bigrams - for example,
text = "This is an example sentence."
freq_dist = compute_freq(text)
Now, freq_dist would look like -
FreqDist({('is', 'an'): 1, ('example', 'sentence'): 1, ('an', 'example'): 1, ('This',
'is'): 1, ('sentence', '.'): 1})
From here you can print the keys and values like so
for k,v in freq_dist.items():
print(k, v)
('is', 'an') 1
('example', 'sentence') 1
('an', 'example') 1
('This', 'is') 1
('sentence', '.') 1
For anything other that bigram, just change the 'n_value' argument when calling the function. For example,
freq_dist = compute_freq(text, n_value=3)#will give you trigram distribution
('example', 'sentence', '.') 1
('an', 'example', 'sentence') 1
('This', 'is', 'an') 1
('is', 'an', 'example') 1
Post a Comment for "N_gram Frequency Python Ntlk"