Table of Contents
ngrams-freq-filter - filters out ngrams with low counts.
ngrams-freq-filter
[-t THRESHOLD]
The ngrams-freq-filter utility reads ngrams produced
by the ngram utility from standard input and filters out ngrams with counts
below a user-specified threshold. The output is written to standard output.
- -t THRESHOLD
- specifies count threshold. ngrams with counts below
this number will not be included in the output. Default value: 1.
- Command:
echo -e "this is a test\nthis is yet another test" | \
ngrams -n 2 | ngrams-freq-filter -t 2
- Output:
6
2 <s> this
2 test </s>
2 this is
Autocorpus was written by Maciej Pacula (maciej.pacula@gmail.com).
The project website is http://mpacula.com/autocorpus
autocorpus(7)
,
ngrams(1)
, ngrams(5)
, ngrams-sort(1)
, sentences(1)
, tokenize(1)
, wiki-articles(1)
,
wiki-textify(1)
,
Table of Contents