Question #9
Reading: Reading 4 Big Data Projects
PDF File: Reading 4 Big Data Projects.pdf
Page: 3
Status: Unattempted
Correct Answer: B
Question
The process of splitting a given text into separate words is best characterized as:
Answer Choices:
A. stemming
B. tokenization
Explanation
Text is considered to be a collection of tokens, where a token is equivalent to a word.
Tokenization is the process of splitting a given text into separate tokens. Bag-of-words
(BOW) is a collection of a distinct set of tokens from all the texts in a sample dataset.
Stemming is the process of converting inflected word forms into a base word.