Table of Contents
wiki-textify - transform MediaWiki markup into plaintext.
wiki-textify
[-h, --ignore-headings]
wiki-textify takes MediaWiki input from
standard input and outputs plaintext, ignoring all metadata, tables, comments
and other formatting.
All input articles must be followed by a ’\f’ (page
feed) character, and they will also be so delimited in the output.
In the
output, paragraphs and headings are delimited by at least two linesbreaks.
Sentences within paragraphs are either on the same line or separated by
a single linebreak.
- -h, --ignore-headings
- If set, headings will be ignored
in the output.
- Command:
echo -e "==Test section==\n’’This is a test’’ of converting\
[markup|MediaWiki] markup into [plaintext]\n\f" | wiki-textify
- Output:
Test section
This is a test of converting MediaWiki markup into plaintext
^L
Autocorpus was written by Maciej Pacula (maciej.pacula@gmail.com).
The project website is http://mpacula.com/autocorpus
autocorpus(7)
,
ngrams(1)
, ngrams(5)
, ngrams-freq-filter(1)
, ngrams-sort(1)
, sentences(1)
,
tokenize(1)
, wiki-articles(1)
,
Table of Contents