2015-05-21 Developer / Project Head: Ruprecht von Waldenfels
Purpose/Version/Date Simple web interface for querying (cwb-indexed) parallel corpora. git-commit: 29600cc 21 May 2015 Platform/License Linux/OSX open source License: GNU GPLv2+ Price/Availability free Programming Language(s): PHP, XSLT Key features: ONLINE PARALLEL CONCORDANCER, CQP-QUERY SUPPORT, SIMPLE INTERFACE FOR 2-3 LANGUAGES, SUPPORT FOR SENTENCE- AND WORD-ALIGNMENT, IMPROVED INSTALLATION AND PRE-PROCESSING INSTRUCTIONS Website: http://parasolcorpus.org (v2) Website: Bitbucket Repository (v2)


The ParaVoz package provides a simple, yet effective interface for a parallel corpus using OpenCWB (http://cwb.sourceforge.net). It should work on any linux machine with only minimal changes in the settings files to reflect paths, and language codes. All settings are found in the settings directory.

ParaVoz 2.0 extends (but not replaces) ParaVoz 1.0 and is more intuitive, but probably less suited for corpus with a large number of languages; it is best used with a corpus of two or three language. In distinction to ParaVoz 1, with ParaVoz 2.0, the parallel corpus is encoded as a single corpus file for each language, rather than for each text in the corpus. ParaVoz 2.0 now supports both sentence and word alignment.

For ParaSol 2.0, see the demo at http://parasolcorpus.org/ParaVoz ). For ParaSol 1.0, see the movie on the ParaSol website (http://parasolcorpus.org; movie at http://parasolcorpus.org/ParaSol_demo.mp4).

This web interface to CWB was initially written by Roland Meyer for use with the ParaSol corpus (then Regensburg Parallel Corpus) in 2006 and has since been in
development by successive authors. The java script based functionality was mainly added by Andreas Zeman, XSLT-support in the new modular interface mainly by Ruprecht von Waldenfels, who has supervised the publication as open source. Part of the architecture is described in Waldenfels (2011). We thank the Center for the Study of Language and Society, University of Berne, (http://www.csls.unibe.ch) for granting financial support enabling the publication of ParaVoz as open source at this stage.

ParaVoz 2.0 was then developed during the work on a German-Polish parallel corpus supported by a grant of the Johannes Gutenberg University Mainz; mostly by Michal Wozniak, with valuable input from Jan Machalica and under supervision by Ruprecht von Waldenfels.

  • Roland Meyer, Ruprecht von Waldenfels, Michal Wozniak, Andreas Zeman (2006-2015): ParaVoz – a simple web interface for querying parallel corpora. Second Version. Bern, Regensburg, Berlin, Krakow.


  • Ruprecht von Waldenfels (2011): Recent Developments in ParaSol: Breadth for Depth and XSLT based web concordancing with CWB. In: Daniela Majchráková and Radovan Garabík (eds.): Natural Language Processing, Multilinguality. Proceedings of Slovko 2011, Bratislava: Tribun, 156-162. Available online.