Installation¶
This part of the documentation covers the installation of newspaper. The first step to using any software package is getting it properly installed.
Distribute & Pip¶
Installing newspaper is simple with pip. However, you will run into fixable issues if you are trying to install on ubuntu.
If you are on Debian / Ubuntu, install using the following:
Python development version, needed for Python.h:
$ sudo apt-get install python-dev
lxml requirements:
$ sudo apt-get install libxml2-dev libxslt-dev
For PIL to recognize .jpg images:
$ sudo apt-get install libjpeg-dev zlib1g-dev libpng12-dev
Install the distribution via pip:
$ pip install newspaper
Download NLP related corpora:
$ curl https://raw.githubusercontent.com/codelucas/newspaper/master/download_corpora.py | python2.7
If you are on OSX, install using the following, you may use both homebrew or macports:
$ brew install libxml2 libxslt
$ brew install libtiff libjpeg webp little-cms2
$ pip install newspaper
$ curl https://raw.githubusercontent.com/codelucas/newspaper/master/download_corpora.py | python2.7
Otherwise, install with the following:
NOTE: You will still most likely need to install the following libraries via your package manager
- PIL:
libjpeg-dev
zlib1g-dev
libpng12-dev
- lxml:
libxml2-dev
libxslt-dev
- Python Development version:
python-dev
Note that the Python3 package name is newspaper3k
while our Python2
package name is newspaper
.
$ pip install newspaper3k
$ curl https://raw.githubusercontent.com/codelucas/newspaper/master/download_corpora.py | python2.7
Get the Code¶
Newspaper is actively developed on GitHub, where the code is always available.
You can clone the public repository:
git clone git://github.com/codelucas/newspaper.git
Once you have a copy of the source, you can embed it in your Python package, or install it into your site-packages easily:
$ pip install -r requirements.txt
$ python setup.py install
Feel free to give our testing suite a shot:
$ python tests/unit_tests.py