scraping

The scraper takes URL and CSS ID selector as input parameters and returns data every second.

To test this please use URL: <https://www.timeanddate.com/worldclock/poland> and ID: #ct. Scraper will connect to the website and read (scrap) the current time. Next, it returns this as a stream. As URL and ID are parametrized we can use other websites too. For example, URL: <https://time.is/> and ID: #clock

📽️ The video that illustrates the execution of the sample is on our YouTube channel 👉 How to scrape websites using Scramjet Transform Hub?. Give a 👍 if you liked it and "Subscribe" to the channel to keep up to date with new videos.

Running

❗ Remember to setup transform-hub locally or use the platform's environment for the sequence deployment.

Open the terminal and run the following commands:

# go to 'scraping' directory
cd typescript/scraping

# install dependencies
npm install

# transpile TS->JS and copy node_modules and package.json to dist/
npm run build

# deploy the Sequence from the dist/ directory, which contains transpiled code, package.json and node_modules
si seq deploy dist --args `["https://www.timeanddate.com/worldclock/poland", "#ct"]`

# See output
si inst output -

# Optional commands:

# Check console.log messages
si inst stdout -

# Check console.error messages
si inst stderr -

💡NOTE: Command deploy performs three actions at once: pack, send and start the Sequence. It is the same as if you would run those three commands separately:

si seq pack dist/ -o scraping.tar.gz    # compress 'scraping/' directory into file named 'scraping.tar.gz'

si seq send scraping.tar.gz    # send compressed Sequence to STH, this will output Sequence ID

si seq start - --args '[\"https://www.timeanddate.com/worldclock/poland\", \#ct\"]'    # start the Sequence with arguments, this will output Instance ID

Output

$ si inst output -
06:10
06:15
06:20
06:25
06:31
06:36
06:41
06:46
06:51
06:56
(...)

Check out the source on GitHub

scraping

Running

Output

Was it helpful?

Didn't find information needed?

Running​

Output​

Was it helpful?

Didn't find information needed?

Running

Output