python - Retrieving sentence strings from NLTK corpus -
this dataset:
emma=gutenberg.sents('austen-emma.txt')
it gives me sentences
[[u'she',u'was',u'happy',[u'it',u'was',u'her',u'own',u'good']]
but want get:
['she happy','it own good']
you appear getting correct output, according nltk docs:
sents(fileids=none)[source]¶ returns: given file(s) list of sentences or utterances, each encoded list of word strings.
so need turn list of word strings space-separated sentence:
sentences = [" ".join(list_of_words) list_of_words in emma]
Comments
Post a Comment