Skip navigation links

Package co.cask.cdap.examples.wordcount

This package contains the WordCount sample Application that counts words and tracks the associations between words.

See: Description

Package co.cask.cdap.examples.wordcount Description

This package contains the WordCount sample Application that counts words and tracks the associations between words. This is a slightly modified version of the classic WordCount example. This WordCount Application consists of: 1. A Stream named wordStream that receives strings of words to be counted. 2. A Flow named WordCounter processes the strings from the Stream and calculates the word counts and other word statistics using four Flowlets: - The splitter splits the input string into words and aggregates and persists global statistics; - The counter takes words as inputs and calculates and persists per-word statistics; - The unique Flowlet calculates the unique number of words seen; - The associator stores word associations between all of the words in each input string. 3. A Service named ``RetrieveCounts`` that serves read requests for calculated statistics, word counts and associations. It exposes these endpoints: - ``/stats`` returns the total number of words, the number of unique words, and the average word length; - ``/count/{word}`` returns the word count of a specified word and its word associations, up to the specified limit or a pre-set limit of ten if not specified; - ``/assoc/{word1}/{word2}`` returns the top associated words (those with the highest counts). 4. Four Datasets used by the Flow and Service to model, store, and serve the data: - A core Table named wordStats to track global word statistics; - A system KeyValueTable Dataset named wordCounts counts the occurrences of each word; - A custom UniqueCountTable Dataset named uniqueCount determines and counts the number of unique words seen; - A custom AssociationTable Dataset named wordAssocs tracks associations between words.
Skip navigation links

Copyright © 2018 Cask Data, Inc.. All rights reserved.