Author Archives: Markus Killer

Random sample from text file Mac OSX

Quick step-by-step guide:
Get a random sample of 100 lines per text file on Mac OSX:

Steps 1 to 4 only have to be followed once per computer. After that only steps 6 & 8 are needed.

  • Open Terminal window:
  • Install „Homebrew“ package manager (this allows you to install additional Unix/Linux programmes on your Mac). Copy and paste the following line into the Terminal window (all one line):

ruby -e “$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)”

Source: http://brew.sh/ (for further documentation)

If it asks you to install “Commandline developer tools”, say YES (might take a while).

  • Wait for installation to finish, press RETURN and enter your password (the one you use to log on to your Mac).
  • Type: brew install coreutils
  • Extract attached zip to your Desktop (make sure that the folder random_sample is visible on your Desktop and that there is a file called test.txt in it.
  • Go back to Terminal window and type: cd Desktop/random_sample
  • And now comes the actual shuffling bit: gshuf -n 2 test.txt

Instead of test.txt you can use your query results and instead of 2, you can enter the size of your sample.

  • If you want to save the sample into a new text file instead of just displaying it in the terminal window, type: gshuf -n 2 test.txt > random_sample1.txt and the results will be saved in the file txt in the same folder (feel free to adapt filenames and be aware of the fact that if you use one name twice the contents of the file with the same name will be overwritten).

Explanation of the different parts of the command:

shuffle command sample size (display shuffled lines, up to the line number specified by -n switch) name of file you want to shuffle (lines) write output into file name of output file
gshuf -n SAMPLESIZE test.txt > out.txt

An easy way to navigate to a particular folder: type cd [space] into the terminal window, drag&drop the folder you want to work in from your Finder into the Terminal and press RETURN/ENTER.

Other basic folder/directory navigation from Terminal window:

Source: http://www.cheatography.com/davechild/cheat-sheets/linux-command-line/

Example:

test.txt gshuf -n 2 test.txt
line 1: Aarau

line 2: Basel

line 3: Bern

line 4: Luzern

line 5: Olten

line 6: St. Gallen

line 7: Zürich

Command for a sample of 100:

cd path_to_folder_with_file_you_want_to_shuffle

gshuf -n 100 results.txt > random_sample1.txt

CQPweb tutorial (German)

Noah Bubenhofer's CQPweb Tutorial (German)

Noah Bubenhofer’s CQPweb Tutorial (German)

Linkt to Noah Bubenhofer’s CQPweb Tutorial (German)

CQPweb

2024-11-06 Developer / Project Head: Andrew Hardie
Purpose/Version/Date web interface to cwb stable: 3.2.43 dev: 3.3.15 6 November 2024 Platform/License web-based (also on localhost) open source License: GNU GPLv2+ Price/Availability free Programming Language(s): php, mysql, Perl Key features: SERVER INSTALLATION, MANAGE YOUR OWN CORPORA, WEB INTERFACE, CQP QUERIES Website: CQPweb project page Website: CQPWeb SVN Repository Website: UCREL Lancaster Corpus Server (free access to a lot of corpus resources after registration, including the extended Brown-family of corpora) Website: CQPweb at Beijing Foreign Studies University – Large Number of publicly accessible corpora (username: test, password: test) Website: CQPweb Video Tutorials Website: CQPwebInABox Video Tutorials
Return to top.

Related posts on langui.ch:

New release: ParaVoz2

ParaVoz2

2015-07-02 Developer / Project Head: Ruprecht von Waldenfels
Purpose/Version/Date Simple web interface for querying (cwb-indexed) parallel corpora. git-commit: 98bbe99 2 July 2015 Platform/License Linux/OSX open source License: GNU GPLv2+ Price/Availability free Programming Language(s): PHP, XSLT Key features: ONLINE PARALLEL CONCORDANCER, CQP-QUERY SUPPORT, SIMPLE INTERFACE FOR 2-3 LANGUAGES, SUPPORT FOR SENTENCE- AND WORD-ALIGNMENT, IMPROVED INSTALLATION AND PRE-PROCESSING INSTRUCTIONS Website: http://parasolcorpus.org (v2) Website: Bitbucket Repository (v2)
Return to top.

Release announcement:


> Date: Thu, 21 May 2015 14:41:13 +0200
> From: ruprecht.waldenfels _(at)_ gmx.net
> To: cwb _(at)_ sslmit.unibo.it
> Subject: [CWB] Interface for parallel corpora
>
> Dear colleagues,
>
> we would like to let you know that a new version of the ParaVoz corpus
> interface for parallel corpora hosted with CWB has been released.
> ParaVoz 2.0 has a user friendly interface, it features basic metadata
> management and supports word alignment.
>
> ParaVoz 2.0 extends (but not replaces) Paravoz 1.0; it is open-source
> and found here: https://bitbucket.org/rvwfels/paravoz2
>
> A demo version is found here: www.parasolcorpus.org/ParaVoz
>
> Best,
> Ruprecht von Waldenfels
> Michał Woźniak
>
> Institute of Polish, Polish Academy of Sciences, Cracow
> _______________________________________________
> CWB mailing list
> CWB _(at)_ sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb

Related posts on langui.ch:

CQPwebInABox

CQPwebInABox running on VMware Player

CQPwebInABox running on VMware Player

Excellent news! A couple of days ago, Andrew Hardie released a virtual machine with a preconfigured version of CQPweb installed:


> From: a.hardie(*at*)lancaster.ac.uk
> To: cwb(*at*)sslmit.unibo.it
> Date: Thu, 2 Apr 2015 05:20:33 +0000
> Subject: [CWB] Announcing CQPwebInABox
>
> Hi everybody,
>
> This is just a quick note to announce the availability of CQPwebInABox
> – a virtual machine image containing a pre-installed copy of CQPweb.
>
> This is designed to get beginners past the hump of having to install
> all the different components.
>
> The image (1.6GB) can be downloaded here:
> https://sourceforge.net/projects/cwb/files/CQPwebInABox/
>
>
> To run it, you will need to install VirtualBox (although I believe
> other virtualisation tools can also use the same file format, I haven’t
> yet tested this).
>
> You can get VirtualBox here:
> https://www.virtualbox.org/wiki/Downloads
> Then “import appliance” from the .ova download.
>
> The virtual machine runs Linux – however, I have set it up in such a
> way as to make the interface as similar to Windows as possible. So
> don’t fear the Linux!
>
> I will create some video tutorials & put them on YouTube as soon as I can.
>
> Feedback welcome.
>
> best
>
> Andrew.

CQPweb

2024-11-06 Developer / Project Head: Andrew Hardie
Purpose/Version/Date web interface to cwb stable: 3.2.43 dev: 3.3.15 6 November 2024 Platform/License web-based (also on localhost) open source License: GNU GPLv2+ Price/Availability free Programming Language(s): php, mysql, Perl Key features: SERVER INSTALLATION, MANAGE YOUR OWN CORPORA, WEB INTERFACE, CQP QUERIES Website: CQPweb project page Website: CQPWeb SVN Repository Website: UCREL Lancaster Corpus Server (free access to a lot of corpus resources after registration, including the extended Brown-family of corpora) Website: CQPweb at Beijing Foreign Studies University – Large Number of publicly accessible corpora (username: test, password: test) Website: CQPweb Video Tutorials Website: CQPwebInABox Video Tutorials
Return to top.

Related posts on langui.ch:

Link

VIEW (Visual Input Enhancement of the Web):

Instant colourization of grammar structures, instant learning activities from any page on the web. Supported languages: English, German, Spanish.

VIEW Toolbar

VIEW Toolbar – Firefox Add-on (Screenshot taken on Linux Mint 64-bit, extension is available on all major operating systems (Windows/Mac OSX/Linux)

Colourised prepositions in a Guardian Article published two hours ago ...

Colourized prepositions in a Guardian Article published two hours ago … (2 clicks and 10 seconds away)!

... multiple choice exercise on prepositions in the same article (2 clicks and 10 seconds away)!

… multiple-choice exercise on prepositions in the same article (2 clicks and 10 seconds away)!

 

 

New Release: NoSketchEngine

The SketchEngine development team has just released a new open-source version of their tools (bonito, manatee, finlibDownload-Links), including the following highlights:

  • extended support for parallel corpora
  • support for virtual corpora
  • asynchronous query processing showing partial results as they are computed
  • corpus info page providing an overall overview of the corpus stats
  • lots of smaller enhancements in the functionality and usability of the user interface
  • lots of speed enhancements, both for run time (query evaluation) and compile time (corpus indexing)
  • lots of bugfixes

Source: http://nlp.fi.muni.cz/trac/noske/wiki/Downloads [accessed: 13/06/2014]