Scramjet Cloud Platform Concept

Scramjet Cloud Platform is a Serverless End-to-End Distributed Data Transformation Platform working in the Transform-as-a-Service model that can by itself acquire, transform and process data in real-time pipelines spanning across environments in different physical locations and cloud infrastructure providers.

The platform is used for uploading and executing programs, connecting their outputs with inputs using Topics and finally accessing the outputs via a centrally available API. You don't need to set up servers, operating systems and such, just write your program, package it, send it to us and let us know when you want them to start. We'll take care of the rest.

Scramjet Cloud Platform allows you to:

  • write and deploy simple long-running data processing programs called Sequences,
  • invoke Sequences with a simple CLI or programmatically via API or a set of client programs,
  • monitor and control running Sequences via one central, publicly available, secure API,
  • send and receive data produces and required by running Sequences,
  • share data between multiple Sequences by enclosing them in Spaces,
  • connect self-hosted servers to existing Data Spaces with minimum configuration,
  • create execution environments at multiple cloud providers with a click of a button.

Concept diagram

The diagram below shows the data flow using Scramjet Cloud Platform. Any type of data (file, image, stream, movie, etc.) can be used and passed into the input of an Instance. Please take a close look at the Instances in the diagram. They have been deployed using the same Sequence several times, which means that the same Sequence ("SEQUENCE A" in the diagram) has been started 3 times (2 times with the same dataset). The forth Instance was deployed using "SEQUENCE B", which in its transformation logic uses external API as a data input.

As a result, separate containers have been created for each Instance with a separated and safe environment. This is an effect of the scalability feature that STH has as one of its many functional features and strengths.

Transformed data can be obtained from the output and either saved on any database, file or can be passed forward to the other Instance for another transformation, or whatever else you would like to do with this data, it is up to you, the possibilities are limitless.

Scramjet Concept Diagram

Glossary:

Inputs
  1. STH can handle any input that can be handled by Node.js application.
  2. You, as a developer, are free to process variety of inputs in your Sequence applications, such as: Text, JSON, XML, SOAP, Audio, Video and more.
  3. Inputs can be either:
    • Provided to STH via its REST API; or
    • Consumed from various local or remote sources by the app; such as: Stream, STDIN, File, API, URL
    • Generated by the app itself.
Scramjet Cloud Platform - Sequence Flow

Scramjet Cloud Platform is responsible for maintaining and deploying Sequences, keeping them running and managing their lifecycle. This is a solution for the central processing and management unit with the following major components:

  • Sequence - a user's program to be executed on the STH, that contains a developer's code that consists of one or more functions with a lightweight application business logic. It needs to be packed into a package together with its dependencies (compressed into tar.gz format) before sending it to STH

  • Instance - once a Sequence is run, STH will create a separate runtime environment for it and will execute Sequence code inside this runtime entity. This is an Instance.

  • API & CLI - our Application Programming Interface and Command Line Interface allow for:

    • Data operations - sending input data and receiving output data
    • Management operations - manage STH itself and its entities: Sequences or Instances

STH exposes also its own REST API to provide and receive data, manage Sequences, Instances and STH itself.

What we also do on the STH level is that we apply a set of algorithms to optimize and speed up data processing execution in Sequences. You can interact with STH using our dedicated STH CLI, it will be quite helpful with deploying Sequences or interacting with running Instances.

STH is powered by Scramjet Framework - a fast, simple, functional reactive stream programming framework written on top of node.js object streams.

Outputs

Our engine outputs can be managed in several ways:

  • File - you can save your output to a local or a remote file
  • STDOUT - output can be directed to system STDOUT (STDERR is supported as well)
  • API - output can be consumed from our STH REST API
  • URL Request - you can write your app in a way to request URL, webhook etc
  • Stream - output can be streamed to a particular destination

You can mix multiple actions together: you can both send data to remote system/URL and save it locally.

Program Lifecycle

Scramjet Cloud Platform can be treated both as data processing engine and execution platform for multiple Sequences running on the same platform and performing various data processing tasks.

SCP allows you to deploy and run multiple data processing apps called Sequences.

Sequences are users programs/apps to be executed on the STH

We named our apps "Sequences", but they are not just any apps. They are specific apps that specialize in efficient data processing. "Sequence" term describes well its nature, as they process data through a sequence of chained functions. Therefore, usually our Sequences are concise, easy to write and powerful at the same time.

Every Sequence after being invoked becomes an Instance.

Instance is a running Sequence

An Instance (computing instance) is a Sequence that has been run and is currently being executed on SCP, which creates a separate runtime environment and executes Sequence code inside this runtime entity. As Sequence can be started multiple times (e.g. with different parameters) it means that every Instance is literally an instance of a Sequence. Instance also can process an enormous amount of data on the fly without losing persistence.

Both Sequence and Instance have their own API, which you can explore in sections:

IFCA - Intelligent Function Composition Algorithms

Algorithms that use asynchronous functions under the hood and process streams of data in an efficient and optimized way.

The Transform Hub engine powering Scramjet Cloud Platform can run programs based on Node.js and Python and thus allows developers to benefit from rich ecosystem, numerous packages and solutions provided by this vibrant community.