Skip to main content

Scramjet in Python

Tests License Version Discord GitHub stars Donate

⭐ Star us on GitHub — it motivates us a lot! 🚀

Framework Logo

Tested with Python 3.8.10 and Ubuntu 20.04.

Table of contents

Installation

Scramjet Framework is available on PyPI, You can install it with simple pip command:

pip install scramjet-framework-py

Quick start

Let's say we have a fruits.csv file like this:

orange,sweet,1
lemon,sour,2
pigface,salty,5
banana,sweet,3
cranberries,bitter,6

and we want to write the names of the sweet fruits to a separate file. To do this, write an async function like this:

from scramjet import streams
import asyncio
async def sweet_stream():
with open("fruits.csv") as file_in, open("sweet.txt", "w") as file_out:
await (
streams.Stream
.read_from(file_in)
.map(lambda line: line.split(','))
.filter(lambda record: record[1] == "sweet")
.map(lambda record: f"{record[0]}\n")
.write_to(file_out)
)
asyncio.run(sweet_stream())

output saved in sweet.txt:

orange
banana

and that's it!

Usage

Basic building block of Scramjet is the Stream class. It reads input in chunks, performs operations on these chunks and produces an iterable output that can be collected and written somewhere.

Creating a stream is done using read_from class method. It accepts any iterable or an object implementing .read() method as the input, and returns a Stream instance.

Transforming a stream:

  • map - transform each chunk in a stream using specified function.
  • filter - keep only chunks for which specified function evaluates to True.
  • flatmap - run specified function on each chunk, and return all of its results as separate chunks.
  • batch - convert a stream of chunks into a stream of lists of chunks.

Each of these methods return the modified stream, so they can be chained like this: some_stream.map(...).filter(...).batch(...)

Collecting data from the stream (asynchronous):

  • write_to - write all resulting stream chunks into a target.
  • to_list - return a list with all stream chunks.
  • reduce - combine all chunks using specified function.

Examples

You can find more examples in Scramjet GitHub like hello_datastream.py file. They don't require any additional dependencies, just the standard library, so you can run them simply with:

python hello_datastream.py

Requesting Features

Anything missing? Or maybe there is something which would make using Scramjet Framework much easier or efficient? Don't hesitate to fill up a new feature request! We really appreciate all feedback.

Reporting bugs

If you have found a bug, inconsistent or confusing behavior please fill up a new bug report.

Contributing

You can contribute to this project by giving us feedback (reporting bugs and requesting features) and also by writing code yourself!

The easiest way is to create a fork of this repository and then create a pull request with all your changes. In most cases, you should branch from and target main branch.

Please refer to Development Setup section on how to setup this project.

Development Setup

  1. Install Python3 interpreter on your computer. Refer to official docs.

  2. Install git version control system. Refer to official docs.

  3. Clone this repository:

git clone [email protected]:scramjetorg/framework-python.git
  1. Create and activate a virtualenv:
sudo apt install python3-virtualenv
virtualenv -p python3 venv
.venv/bin/activate
  1. Check Python version:
$ python --version
Python 3.8.10
  1. Install dependencies:
pip install -r dev-requirements.txt
  1. Run test cases (with activated virtualenv):
pytest

💡 HINT: add a filename if you want to limit which tests are run

  1. If you want to enable detailed debug logging, set one of the following env variables:
PYFCA_DEBUG=1       # debug pyfca
DATASTREAM_DEBUG=1 # debug datastream
SCRAMJET_DEBUG=1 # debug both

Was it helpful?

Didn't find information needed?

Join our Scramjet Community on Discord, where you can get help from our engineers directly.