Table of Contents

Name

wiki-textify - transform MediaWiki markup into plaintext.

Synopsis

wiki-textify [-h, --ignore-headings]

Description

wiki-textify takes MediaWiki input from standard input and outputs plaintext, ignoring all metadata, tables, comments and other formatting.

All input articles must be followed by a ’\f’ (page feed) character, and they will also be so delimited in the output.

In the output, paragraphs and headings are delimited by at least two linesbreaks. Sentences within paragraphs are either on the same line or separated by a single linebreak.

Options

-h, --ignore-headings
If set, headings will be ignored in the output.

Examples

Command:

echo -e "==Test section==\n’’This is a test’’ of converting\
 [markup|MediaWiki] markup into [plaintext]\n\f" | wiki-textify 
Output:

Test section

This is a test of converting MediaWiki markup into plaintext ^L

Author

Autocorpus was written by Maciej Pacula (maciej.pacula@gmail.com).

The project website is http://mpacula.com/autocorpus

See Also

autocorpus(7) , ngrams(1) , ngrams(5) , ngrams-freq-filter(1) , ngrams-sort(1) , sentences(1) , tokenize(1) , wiki-articles(1) ,


Table of Contents