Newspaper: Article scraping & curation ====================================== Release v0.1.2. :ref:`(Installation) `. Inspired by `requests`_ for its simplicity and powered by `lxml`_ for its speed. "Newspaper is an amazing python library for extracting & curating articles." -- `tweeted by`_ Kenneth Reitz, Author of `requests`_ "Newspaper delivers Instapaper style article extraction." -- `The Changelog`_ .. _`tweeted by`: https://twitter.com/kennethreitz/status/419520678862548992 .. _`The Changelog`: http://thechangelog.com/ **We support 10+ languages and everything is in unicode!** .. code-block:: pycon >>> import newspaper >>> newspaper.languages() Your available langauges are: input code full name ar Arabic ru Russian nl Dutch de German en English es Spanish fr French he Hebrew it Italian ko Korean no Norwegian pt Portuguese sv Swedish hu Hungarian fi Finnish da Danish zh Chinese id Indonesian vi Vietnamese A Glance: --------- .. code-block:: pycon >>> import newspaper >>> cnn_paper = newspaper.build('http://cnn.com') >>> for article in cnn_paper.articles: >>> print(article.url) u'http://www.cnn.com/2013/11/27/justice/tucson-arizona-captive-girls/' u'http://www.cnn.com/2013/12/11/us/texas-teen-dwi-wreck/index.html' ... >>> for category in cnn_paper.category_urls(): >>> print(category) u'http://lifestyle.cnn.com' u'http://cnn.com/world' u'http://tech.cnn.com' ... .. code-block:: pycon >>> article = cnn_paper.articles[0] .. code-block:: pycon >>> article.download() >>> article.html u'