Build Scramjet with Us!We're hiring

rss

A Sequence that works as a web feed, monitors website for a new content.

Description

  • Name: @scramjet/rss
  • Version: 0.23.0
  • Language: typescript
  • Author: iaforek
  • Tags: easy,RSS,feeds,Big Data,Data Producer

This sample gets a list of RSS feed URLs from ./config/rss.json file. Then it retrieves each feed and passes links to scraper. Scraper reads content of each URL. Next, keywords are matched and a score given for each URL.

If score is grater than 0 it gets added to a list of links and short details are posted to slack channel.

Configuration

Pass SLACK_WEBHOOK_URL as input parameter when starting Sequence.

Keywords are configured in ./config/keywords.json file. Use word and weight as in the example:

{ "word": "serverless", "weight": 15 }

RSS Feed URL as in ./config/rss.json

Running

❗ Remember to setup transform-hub locally or use the platform's environment for the sequence deployment.

Open the terminal and run the following commands:

# install dependencies npm install # transpile TS->JS to dist/ npm run build # make a compressed package with Sequence si seq pack dist # send Sequence to transform hub, this will output Sequence ID si seq send dist.tar.gz # start a Sequence, this will output Instance ID. Provide slack webhook URL as input parameter si seq start - --args [\"SLACK_WEBHOOK_URL\"] # See output - actual output will be send to slack channel si inst output - "GETTING RSS LINKS...\n" "GETTING RSS LINKS...\n" "GETTING RSS LINKS...\n" "GETTING RSS LINKS...\n" # Optional commands below: # Check console.log messages si inst stdout - # Check console.error messages si inst stderr -

As this is scraping content on regular basis 429 Too Many Requests Error is inevitable. In order to mitigate this problem, increase pause between requests.