<sum of counts> C word1 ... wordn C word1 ... wordn ...
where words are delimited by spaces and the count ’C’ is delimited by a tab character. Each ngram is printed on a separate line and all ngrams are unique. The first line is always the sum of all ngram counts.
The special words <s> and </s> denote start and end of sentences, respectively.