Python RSS Feed Guide

Introduction

Python RSS Feed Guide
Python RSS Feed Guide

We will go over Python RSS Feed Guide.

Did you know that you can use Python to read your RSS feeds?

We will break down this in the following sections:

  • Why Python is suitable for reading RSS feeds
  • How to parse and read RSS feeds
  • How to scrape RSS feeds

I have used this successfully in various projects, and it works very well and has saved me a ton of trouble and time debugging things.

We will go point by point on getting you up and running in less than 5mins; having some background programming knowledge in Python is helpful if you want to fine-tune our code, but you don’t need to read your RSS feeds.

This complete guide should cover all your questions on using Python to read RSS feeds.

All code and examples of how to do this can be found in the Github link.

Why Use Python To Read RSS Feeds

In this section, I’d like to cover some reasons to access an RSS feed using Python. This may or may not apply to your use case, so feel free to skip this section if you know what you are doing and want to get into the coding part.

  • Python allows you to scrape the data and run analytics on it programmatically.
  • You can include this data as part of a viewer or news aggregator
  • You may want a text version RSS reader like me and customize it dynamically since it runs on Python
  • You can set up alerting and other triggers based on conditions such as article keywords etc
  • You can automate some form of article classification based on tags and content

The list above is incomplete, but it explains why some people may want to parse RSS feeds using Python programmatically.

How To Setup Python For RSS Feed Reading

The first step we will be doing is to set up the Python environment that we will use to run our application.

The commands below take care of this process for you.

$ virtualenv venv
created virtual environment CPython3.9.12.final.0-64 in 171ms
  creator CPython3Posix(dest=/Users/alex/code/unbiased/python-rss-feed/venv, clear=False, no_vcs_ignore=False, global=False)
  seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/Users/alex/Library/Application Support/virtualenv)
    added seed packages: pip==22.1.2, setuptools==63.2.0, wheel==0.37.1
  activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator

$ source venv/bin/activate

$ pip install -r requirements.txt
Collecting feedparser
  Using cached feedparser-6.0.10-py3-none-any.whl (81 kB)
Collecting sgmllib3k
  Using cached sgmllib3k-1.0.0-py3-none-any.whl
Installing collected packages: sgmllib3k, feedparser
Successfully installed feedparser-6.0.10 sgmllib3k-1.0.0
[notice] A new release of pip available: 22.1.2 -> 22.2.2
[notice] To update, run: pip install --upgrade pip

$ python -c "import feedparser; print(feedparser.__version__)"
6.0.10

$ cat requirements.txt
feedparser

There are three steps that we are following above:

  • Create and initialize a virtual environment
  • Install the Python dependencies from the requirements file (also in Github, as shown below)
  • Test to ensure our dependencies got installed and print out the version

In the terminal output above, all the steps were completed successfully, and we installed everything needed to run our Python RSS feed parser code.

How To Scrape RSS Feeds In Python

To scrape the RSS feed, we will use a library that can fetch the web data. Scraping is a process where you download the information online and extract something meaningful that suits your needs. Since everyone can define meaningful information differently, I will cover below different ways of doing this so that it covers the main aspects you will need from an RSS feed.

For this, we will leverage the library we set up earlier in the article, which is a feed parser. Using feedparser, we will be able to analyze and parse the content. However, to do this efficiently, we have implemented a library wrapper to help us perform such actions.

How To Read RSS Feeds In Python

The first step is to read the RSS feeds and implement the wrapper routines mentioned in the previous section. To do this, we will be leveraging the feed parser. This will give us the following two abilities:

  • Download the RSS feed data
  • Make an object of the RSS feed data into a Python object

To do this, we implement a function rss_helper which can also be found in the Github repo.

import feedparser

class RSSHelper:
    def __init__(self) -> None:
        return

    def get_rss_titles(self, rss_feed) -> list[str]:
        rss_feed = feedparser.parse(rss_feed)
        return [entry.title for entry in rss_feed.entries]

    def get_rss_links(self, rss_feed) -> list[str]:
        rss_feed = feedparser.parse(rss_feed)
        return [entry.link for entry in rss_feed.entries]

    def get_rss_source(self, rss_feed) -> list[str]:
        rss_feed = feedparser.parse(rss_feed)
        return [entry.source for entry in rss_feed.entries]

In the example above, we implement three helper functions which are all of similar nature:

  • get_rss_titles: This function parses the RSS feed and extracts the titles from it
  • get_rss_links: This function parses the RSS feed and extracts the links from it
  • get_rss_source: This function parses the RSS feed and removes the source from which the titles came from

All three functions leverage a function call from the feedparser library called parse. This allows you to download the file in the background and convert the XML format into a Python dictionary, which we can easily access. Later in the sections, we will implement some front-ends that make use of this wrapper and implement the functionality we need.

It must be noted that all functions above return a List of Python Strings that contain the data you may need, so it’s easily accessible in our code.

How To Parse RSS Feeds In Python

The next step in the process is to parse the RSS feeds in Python. To do this, we will use the Yahoo Finance RSS feed in our code examples as a reference. The Yahoo Finance RSS feed aggregates financial news regarding stocks, crypto, and the market moves from different sources. This serves as a great example because we can demonstrate three things listed below when parsing the RSS feed:

  • Getting a list of titles of the financial news
  • Getting a list of links for that financial news
  • Finding the source for those titles

So let’s start implementing some code to demonstrate this.

How To Get RSS Feed Titles Using Python

First, we will demonstrate how to get the RSS feed titles using the Python wrappers we implemented earlier.

The code below uses the RSS Helper class we implemented to do this.

import pprint
from rss_helper import RSSHelper

rss_feed = 'https://finance.yahoo.com/news/rssindex'

rsh = RSSHelper()
titles = rsh.get_rss_titles(rss_feed)
pprint.pprint(titles)
  • The first step we do is initialize and import our library
  • Then we initialize an object of our RSS helper
  • We then invoke the get_rss_titles wrapper we wrote earlier to get a list of RSS titles.
  • Finally, we print out the RSS titles from our RSS feed

We get the following output to demonstrate this execution using the Yahoo financial news feeds.

$ python ./python-get-rss-titles.py
['Warren Buffett Finally Throws In The Towel On 4 Lousy Stocks',
 'Mr. Big Short Michael Burry Makes a Shocking Decision',
 'Former California congressman T.J Cox arrested, charged with fraud',
 'President Biden forgives nearly $4 billion in student debt — what’s next?',
 "Michael Burry's Hedge Fund Added One Stock And Dumped All the Rest",
 'Seeking at Least 11% Dividend Yield? Analysts Suggest 2 Dividend Stocks to '
 'Buy',
 "IRS's RMD Rule Change Could Make Your Roth IRA More Valuable",
 'Afraid you missed the stock-market bottom? This research says curb your '
 'FOMO.',
 'Should I sell my house before prices really crash — or wait for the next big '
 'real estate boom?',
 '2 ‘Strong Buy’ Stocks Goldman Sachs Predicts Will Surge Over 40%',
 'Indian Billionaire’s Stock Holdings Worth Nearly $4 Billion in Focus '
....

As you can see above, this has returned a list of all the titles for the RSS feeds. Again, this makes it easy for us to consume and iterate over using Python code.

How To Get RSS Links Using Python

Similarly to the above, we will do the same thing on getting the RSS feed links using Python. The process is identical to before, and we will leverage our RSS helper library. The code that implements this is shown below.

import pprint
from rss_helper import RSSHelper

rss_feed = 'https://finance.yahoo.com/news/rssindex'

rsh = RSSHelper()
links = rsh.get_rss_links(rss_feed)
pprint.pprint(links)

The only step that is changed here is instead of invoking the function that gets the RSS titles, we will gather the get_rss_links role to get the links instead. Similarly to the above, the links come back as a list of strings we can process and show to the user.

If we were to execute the above code, we would get the following output:

$ python ./python-get-rss-links.py
['https://www.investors.com/etfs-and-funds/sectors/sp500-warren-buffett-finally-throws-in-the-towel-on-four-lousy-stocks/?src=A00220&yptr=yahoo',
 'https://www.thestreet.com/investing/mr-big-short-michael-burry-makes-a-shocking-decision?puc=yahoo&cm_ven=YAHOO&yptr=yahoo',
 'https://www.marketwatch.com/story/former-california-congressman-t-j-cox-arrested-charged-with-fraud-01660690370?siteid=yhoof2&yptr=yahoo',
 'https://finance.yahoo.com/news/president-biden-forgives-nearly-4-200000606.html',
 'https://finance.yahoo.com/news/traeger-weber-grill-replenishment-rate-172811716.html',
 'https://www.investors.com/market-trend/stock-market-today/dow-jones-futures-slide-ahead-of-fed-minutes-retail-sales-bbby-stock-surges/?src=A00220&yptr=yahoo',
 'https://finance.yahoo.com/news/irs-may-roth-ira-more-114022559.html',
 'https://www.marketwatch.com/story/im-in-a-very-lucky-position-i-will-receive-a-300-000-inheritance-should-i-pay-off-my-mortgage-or-invest-the-money-11660598294?siteid=yhoof2&yptr=yahoo',
 'https://finance.yahoo.com/news/sell-house-prices-crash-wait-202000848.html',
 'https://finance.yahoo.com/news/indian-billionaire-stock-holdings-worth-010000253.html',
 'https://finance.yahoo.com/news/2-strong-buy-stocks-goldman-215908413.html',
 'https://finance.yahoo.com/news/seeking-least-11-dividend-yield-131952447.html'
....

We have a list of all the URLs where the article resides. But what if we wanted to know what this content’s source is?

How To Get RSS Source Using Python

This section will demonstrate how to find the source of the content referenced in the articles above. To do this again, we will use the RSS helper library we implemented earlier. The code for this is shown below:

import pprint
from rss_helper import RSSHelper

rss_feed = 'https://finance.yahoo.com/news/rssindex'

rsh = RSSHelper()
sources = rsh.get_rss_source(rss_feed)
pprint.pprint(sources)

Again we will be using the same code process as in the previous two examples, but in this case, the function we will call to give us the RSS sources is the get_rss_source. That function allows finding which source each article came from. The return, in this case, is slightly different than before. We will be getting a list of dictionaries.

Let’s go ahead and execute the code and see a sample output of what the data contains in our result set.

$ python ./python-get-rss-source.py
[{'href': 'http://www.investors.com/', 'title': "Investor's Business Daily"},
 {'href': 'http://www.thestreet.com/', 'title': 'TheStreet.com'},
 {'href': 'http://www.marketwatch.com/', 'title': 'MarketWatch'},
 {'href': 'https://moneywise.com/', 'title': 'MoneyWise'},
 {'href': 'https://www.bloomberg.com/', 'title': 'Bloomberg'},
 {'href': 'https://www.tipranks.com/', 'title': 'TipRanks'},
 {'href': 'https://smartasset.com/', 'title': 'SmartAsset'},
 {'href': 'http://www.marketwatch.com/', 'title': 'MarketWatch'},
 {'href': 'https://moneywise.com/', 'title': 'MoneyWise'},
...

It can be shown above that there are two bits of information on each source that comes back to us:

  • href: This is the source of the original website giving us the information
  • title: This is the original title of the website (more like a description)

A few examples above show us that Yahoo Finance uses some sources like the ones listed below to get articles from:

  • Market Watch
  • MoneyWise
  • Bloomberg
  • The Street

We need to note here that each source could have a different format with more or less information. The RSS feed creator is responsible for maintaining this and giving you the necessary information. Generally, this is followed pretty closely in most feeds, but it may change, and this is why we are returning the entire dictionary, so the code works with other RSS feeds without changes.

If you need to do more scraping and digging on the RSS feed sources, you will have to print them out and then go over the details to find out what each represents.

Conclusion

We were able to go over this Python RSS Feed Guide successfully. Hopefully, I answered any questions you may have and helped you get started on your quest to read RSS feeds with Python.

Please drop me a cheer below if you found this helpful and think it may have helped you. I would appreciate it.

If you have any questions or comments, please post them below or send me a note on my Twitter. I check periodically and try to answer them in the priority they come in. Also, if you have any corrections, please let me know, and I’ll update the article with new updates or mistakes I made.

Would you consider using Python to read your RSS feeds?

I use this extensively for many projects when I want to scrape and aggregate information. Everyone can have different use cases for RSS, so I presented the methods here to help you get started quickly.

If you would like to find more articles related to Python, you can check the list below:

You can find information about relevant projects referenced in this article in the list below:

Leave a Comment

Your email address will not be published. Required fields are marked *