Question #9

Reading: Reading 4 Big Data Projects

PDF File: Reading 4 Big Data Projects.pdf

Page: 3

Status: Unattempted

Correct Answer: B

Question
The process of splitting a given text into separate words is best characterized as:
Answer Choices:
A. stemming
B. tokenization
Explanation
Text is considered to be a collection of tokens, where a token is equivalent to a word. Tokenization is the process of splitting a given text into separate tokens. Bag-of-words (BOW) is a collection of a distinct set of tokens from all the texts in a sample dataset. Stemming is the process of converting inflected word forms into a base word.
Actions
Practice Flashcards