<sum of counts>
C word1 ... wordn
C word1 ... wordn
...
where words are delimited by spaces and the count ’C’ is delimited by a tab character. Each ngram is printed on a separate line and all ngrams are unique. The first line is always the sum of all ngram counts.
The special words <s> and </s> denote start and end of sentences, respectively.