NodeJS Guide For RSS Feeds

Introduction

NodeJS Guide For RSS Feeds
NodeJS Guide For RSS Feeds

We will go over NodeJS Guide For RSS Feeds.

Did you know that you can use Javascript/NodeJS to read your RSS feeds?

We will break down this in the following sections:

  • Why Javascript is suitable for reading RSS feeds
  • How to parse and read RSS feeds with NodeJS
  • How to get a list of titles and links from an RSS feed

I have used this successfully in various projects, and it works very well and has saved me a ton of trouble and time debugging things.

We will go point by point on getting you up and running in less than 5mins; having some background programming knowledge in Javascript is helpful if you want to fine-tune our code, but you don’t need to read your RSS feeds.

This complete guide should cover all your questions on using Javascript to read RSS feeds.

All code and examples of how to do this can be found in the Github link.

Why Use Javascript To Read RSS Feeds

In this section, I’d like to cover some reasons to access an RSS feed using Javascript. This may or may not apply to your use case, so feel free to skip this section if you know what you are doing and want to get into the coding part.

  • Javascript allows you to scrape the data and run analytics on it programmatically.
  • You can include this data as part of a viewer or news aggregator
  • You may want a text version RSS reader like me and customize it dynamically since it runs on Javascript
  • You can set up alerting and other triggers based on conditions such as article keywords etc
  • You can automate some form of article classification based on tags and content

The list above is by no means, but it explains why some people may want to parse RSS feeds using Javascript programmatically.

How To Setup Javascript For RSS Feed Reading

How To Install Node and Yarn

The first step we will be doing is to set up the NodeJS environment that we will use to run our application. If you don’t have NodeJS and Yarn installed in your system I recommend checking these links to get started with those in your system:

If you have those installed in your system you can check the versions that you have by running the following commands:

$ node -v
v18.7.0

$ yarn -v
1.22.19

In order to ensure compatibility with this guide I recommend you use at least the versions listed above or greater than them. This will allow you to work by copying directly the things from the Git repo and this article.

How To Install Libraries For Parsing RSS feeds

The first step we need to do is make a directory and initialize our yarn packages there. In order to do this we will be calling the yarn init command and passing -2 so it uses version 2 structure which is faster and can parallelize package installation.

$ yarn init -2
➤ YN0000: Retrieving https://repo.yarnpkg.com/3.2.2/packages/yarnpkg-cli/bin/yarn.js
➤ YN0000: Saving the new release in .yarn/releases/yarn-3.2.2.cjs
➤ YN0000: Done in 0s 330ms
{
  name: 'nodejs-rss-reader',
  packageManager: '[email protected]'
}

This initializes our repo and creates a baseline from which now we can start installing packages that we will be using in this project. So the next step is to install two packages we need here:

  • Typescript
  • RSS-Reader Library

This can be shown in the command below:

$ yarn add typescript rss-parser
➤ YN0000: ┌ Resolution step
➤ YN0000: └ Completed in 0s 269ms
➤ YN0000: ┌ Fetch step
➤ YN0013: │ sax@npm:1.2.4 can't be found in the cache and will be fetched from the remote registry
➤ YN0013: │ typescript@npm:4.7.4 can't be found in the cache and will be fetched from the remote registry
➤ YN0013: │ typescript@patch:typescript@npm%3A4.7.4#~builtin<compat/typescript>::version=4.7.4&hash=f456af can't be found in the cache and will be fetched from the disk
➤ YN0013: │ xml2js@npm:0.4.23 can't be found in the cache and will be fetched from the remote registry
➤ YN0013: │ xmlbuilder@npm:11.0.1 can't be found in the cache and will be fetched from the remote registry
➤ YN0000: └ Completed
➤ YN0000: ┌ Link step
➤ YN0000: └ Completed
➤ YN0000: Done in 0s 384ms

How To Install Development Dependencies For NodeJS

As it can be seen above we successfully installed both package dependencies, now we can proceed into installing our development dependencies that we will use in our app. The most important one is our node types so we code completion works in Visual Code alongside we will also use ts-node compiler to produce Javascript files from Typescript files.

This can be seen in the command below:

$ yarn add -D @types/node ts-node
➤ YN0000: ┌ Resolution step
➤ YN0000: └ Completed in 2s 565ms
➤ YN0000: ┌ Fetch step
➤ YN0013: │ diff@npm:4.0.2 can't be found in the cache and will be fetched from the remote registry
➤ YN0013: │ make-error@npm:1.3.6 can't be found in the cache and will be fetched from the remote registry
➤ YN0013: │ ts-node@npm:10.9.1 can't be found in the cache and will be fetched from the remote registry
➤ YN0013: │ v8-compile-cache-lib@npm:3.0.1 can't be found in the cache and will be fetched from the remote registry
➤ YN0013: │ yn@npm:3.1.1 can't be found in the cache and will be fetched from the remote registry
➤ YN0000: └ Completed in 0s 586ms
➤ YN0000: ┌ Link step
➤ YN0000: └ Completed
➤ YN0000: Done in 3s 179ms

How To Configure TSC For NodeJS

Finally we also need to do a final step and this is initialize the Node Typescript pseudo compiler system with some options that are good to have. Also we will be defining our source and binary directories in the command below.

$ yarn tsc --init --rootDir src --outDir ./bin --esModuleInterop --lib ES2022 --module commonjs --noImplicitAny true

Created a new tsconfig.json with:                                                                                                       TS
  target: es2016
  module: commonjs
  lib: es2022
  outDir: ./bin
  rootDir: src
  strict: true
  esModuleInterop: true
  skipLibCheck: true
  forceConsistentCasingInFileNames: true

If you notice above we also specified our default Typescript library version which in this case the latest one is ES2022, depending on which year you are seeing this on you can adjust accordingly. For now the output defaults to ES2016 so it’s compatible with various versions of Javascript.

Now that all of our dependencies and libraries are installed we can proceed into implementing some code.

How To Read RSS Feeds In JavaScript

The first step is to read the RSS feeds and implement some code that will do this. We will be leveraging the feed parser library that we previously installed. This will give us the following two abilities:

  • Download the RSS feed data
  • Make an object of the RSS feed data into a JavaScript object

To do this, we implement a function that gets the RSS feeds as a promise which can also be found in the Github repo.

The code that implements this is the following:

import Parser from 'rss-parser';

const parser = new Parser();
const yahooFinanceFeed = 'https://finance.yahoo.com/news/rssindex';

async function readRssFeed(): Promise <any> {
  return await parser.parseURL(yahooFinanceFeed);
}

This works as follows:

  • First we import and load our rss parser library that we installed as a dependency
  • The we initialize a Parser instance
  • We define our feed in this case we will use the Yahoo Finance RSS Feed
  • Finally we define a function that will pull this and return a promise to us with an object version of the RSS feed by calling the parseURL function

We keep this and we will be building upon it to write code that gets the following information:

  • RSS Links
  • RSS Titles

In the following sections below.

How To Get RSS Links In Javascript

So the next function we will be implementing is that we will take the Object version of the RSS feed that we generated previously and transform this into a list of available RSS Links to the articles coming from the feed.

To do this we will be implementing the code below which builds upon what we did earlier by invoking the helper routine of the readRSSFeed.

async function getRssLinks(): Promise <string[]> {
  let feed: any;
  let feed_links: string[] = [];

  feed = await readRssFeed();
  if (!feed){
    console.log('Feed does not contain any data');
    return [];
  }

  feed.items.forEach((item: any) => {
    feed_links.push(item.link);
  });
  
  return feed_links;
}

Lets go over what the code above does step by step:

  • First we define our function to return a Promise of an array that contains strings, if there’s no results it will simply return an empty array as show on the third line which is where we initialize it.
  • Once this is defined we call our helper function to get the RSS feeds from the server in this case Yahoo Finance
  • We check if there’s any results and if it’s empty we simply return an empty list otherwise we will start processing the list item by item
  • For every item in the list we basically parse it and extract the link property which is essentially the one holding the URL information that we need
  • Finally we return our list with URLS back to the user

To test the code above we will try to execute it now and see what it will output for us:

$ yarn ts-node nodejs-rss-reader.ts
[
  'https://www.marketwatch.com/story/what-bidens-reported-student-debt-cancellation-plan-could-mean-for-borrowers-11661284989?siteid=yhoof2&yptr=yahoo',
  'https://www.investors.com/etfs-and-funds/sectors/sp500-warren-buffett-boldly-loads-up-on-his-very-best-stocks/?src=A00220&yptr=yahoo',
  'https://finance.yahoo.com/news/jpmorgan-sees-p-500-hitting-170200197.html',
  'https://finance.yahoo.com/news/top-economist-larry-summers-recommends-002940722.html',
  'https://finance.yahoo.com/news/zillow-fiasco-teach-homebuyers-sellers-223000709.html',
  'https://finance.yahoo.com/news/china-tears-down-tower-blocks-122948254.html',
....
]

Note that the code has successfully received the list of URLs that were present in the RSS feed. Another thing to note here is that since our code is in Typescript we need to use the ts-node package we installed earlier to invoke it, this is indicated in the first line on the output above.

How To Get RSS Titles In Javascript

This section will demonstrate how to find the titles of the content referenced in the articles above. To do this again, we will use the RSS helper function that lets us retrieves the RSS feed and converts it into a Javascript object.

The code for this is shown below:

async function getRssTitles(): Promise <string[]> {
  let feed: any;
  let feed_titles: string[] = [];

  feed = await readRssFeed();
  if (!feed){
    console.log('Feed does not contain any data');
    return [];
  }

  feed.items.forEach((item: any) => {
    feed_titles.push(item.title);
  });
  
  return feed_titles;
}

The code above works in a very similar way like the RSS link code that we implemented earlier. The only difference here is that when we are adding the titles in the page we need to access the title property from the feed object rather than the link.

Similarly this returns a Promise which is an array of strings. But lets go ahead and execute the code to see what results we will get.

$ yarn ts-node nodejs-rss-reader.ts
[
  'What Biden’s reported student debt cancellation plan could mean for borrowers',
  'Warren Buffett Boldly Loads Up On 4 Of His Best Stocks',
  'JPMorgan sees the S&P 500 hitting 4,800 by the end of 2022 — here is the $100B catalyst that it believes in for the next 2-3 months',
  'Top economist Larry Summers recommends a way for Biden to forgive trillions in student debt—and it echoes what Sen. Elizabeth Warren says',
  'What the Zillow fiasco can teach homebuyers and sellers about property pricing',
  'China tears down tower blocks in effort to boost stalling economy',
  'Tesla Stock Splits 3-For-1: Is Now The Time To Buy?',
....
]

Similarly to before we are using the ts-node command to run our code. We successfully retrieve a list of article titles as shown above.

We need to note here that each source could have a different format with more or less information. The RSS feed creator is responsible for maintaining this and giving you the necessary information. Generally, this is followed pretty closely in most feeds, but it may change, and this is why we are returning the entire dictionary, so the code works with other RSS feeds without changes.

If you need to do more scraping and digging on the RSS feed sources, you will have to print them out and then go over the details to find out what each represents.

Conclusion

We were able to go over this NodeJS Guide For RSS Feeds in an easy to explain manner with examples. Hopefully, I answered any questions you may have and helped you get started on your quest to read RSS feeds using JavaScript.

Please drop me a cheer below if you found this helpful and think it may have helped you. I would appreciate it.

If you have any questions or comments, please post them below or send me a note on my Twitter. I check periodically and try to answer them in the priority they come in. Also, if you have any corrections, please let me know, and I’ll update the article with new updates or mistakes I made.

Would you consider using NodeJS to read your RSS feeds?

I use this extensively for many projects when I want to scrape and aggregate information. Everyone can have different use cases for RSS, so I presented the methods here to help you get started quickly.

If you would like to find more articles related to RSS and APIs, you can check the list below:

You can find information about relevant projects referenced in this article in the list below:

Leave a Comment

Your email address will not be published. Required fields are marked *