Overview of Python NLP Resources (see also community-edited list):
- NLTK (Natural Language Toolkit)
- spaCy (new, under intense development, promising commercial open-source software)
- SPLAT: Speech Processing & Linguistic Analysis Tool (SPLAT is a simple python package and a command-line tool for performing NLP. It is built on top of NLTK and the Stanford CoreNLP. Current documentation is provided in the GitHub repository. Updated documentation will be available at splat-library.org within the coming months.)
- TextBlob (Simplified text processing)
- textblob-de (German language support for TextBlob)
- pattern (fast POS-tagger/WordNet interface for English, Spanish, German, French, Italian and Dutch)
- DAWG / marisa-trie / datrie (more memory efficient access to huge string data structures, alternative to Python dictionaries)
- PyNLPl /ˈpaɪnˌæp(ə)l/ (computation of n-grams, frequency lists and distributions / language models, Priority Queues, Beam Search)
- CLAM (Computational Linguistics Application Mediator)
- jellyfish (approximate and phonetic matching of strings)
- Tools developed at Zurich University
- TreeAligner (download page)
- clevertagger (github repository)
- Bleualign (github repository)
- ParZu – The Zurich Dependency Parser for German (github repository)
- t2t-pipeline (project page on langui.ch)
Python NLP / Corpus Linguistics Blogs
Useful python packages that are not included in the official distribution:
packages | installation notes |
---|---|
|
see My Python development environment on Xubuntu for Ubuntu/Xubuntu specific installation notes |
Python Tutorials / Blogs
- Jeff Knupp’s Blog
- Guru99.com
- Python XML Parser Tutorial: Create & Read XML with Examples (tons of ads, but useful)