❗❗❗ THIS REPOSITORY IS NO LONGER UPDATED – THE JELLY PROTOCOL AND ITS IMPLEMENTATIONS ARE NOW DEVELOPED HERE: GITHUB, WEBSITE. ❗❗❗
This repository contains supplementary materials to the article titled "Efficient RDF Streaming for the Edge-Cloud Continuum". The publication has been accepted into the IEEE 8th World Forum on Internet of Things (preprint).
The extra
directory contains a PDF file with additional materials that could not fit into the paper. Also attached is the LaTeX source and the plots in the EPS format.
The superfast-jellyfish
directory contains a Scala project that implements the method described in the paper.
Namespaces in the project:
pl.ostrzyciel.superfast_jellyfish
– the main namespace, contains some utility classes and a simple implementation of both a streaming server and a streaming client.convert
– implementations for the encoder and decoder. These classes convert between the Protocol Buffers representation and Apache Jena's classes.stream
– Akka Streams flows for integrating the encoder and decoder in streaming applications.benchmark
– a set of classes that were used to conduct the experiments presented in the paper. It is recommended to run them with the attached scripts.
To run the benchmarks, it is recommended to use a full JAR assembly. You can create one using the sbt assembly
command. In the paper, the experiments were ran on GraalVM Community 22.1.
See also: sbt documentation, sbt-assembly.
The datasets used in this study can be downloaded from here. After downloading, keep the datasets as separate .nt.gz
files.
The following datasets were used: identica
, mix
, wikipedia
, aemet-1
, migr_reschange
, tour_cap_nuts3
, aemet-2
, petrol
, flickr_10m
, nevada_10m
. The Flickr and Nevada datasets have to be trimmed to the first 10 million triples. To do this, simply decompress them, truncate to the first 10m lines and compress them again.
In the scripts
directory you will find bash scripts that were used to operate the experiments.
set_netem.sh
sets up network emulation needed for end-to-end tests.network_notes.md
explains how to set this up.kafka.properties
is the Kafka config that we've used. The important part is listeners – there needs to be three of them, for different speeds (as emulated by netem).ser_des.sh
– raw ser/des throughput. Arguments: [java executable] [path to Jelly's JAR] [base directory containing the datasets]size.sh
– serialized size. Arguments: [java executable] [path to Jelly's JAR] [base directory containing the datasets]full_grpc.sh
– end-to-end gRPC streaming throughput. Arguments: [java executable] [path to Jelly's JAR] [base directory containing the datasets] [port over which to communicate]kafka.sh
– end-to-end Kafka streaming throughput. Arguments: [java executable] [path to Jelly's JAR] [base directory containing the datasets] [port for the producer]latency.sh
– end-to-end streaming latency. Arguments: [java executable] [path to Jelly's JAR] [base directory containing the datasets] [port for gRPC] [port for the Kafka producer]
Keep in mind that a lot of these benchmarks take a long time to run (in total it takes several days to complete the suite).
The eda
directory contains the Jupyter notebooks that were used for data analysis and generation of tables and figures. Additionally, the data.7z
file contains all raw measurements that were gathered in this study.
We hope that the provided materials are enough for interested researchers to reproduce the experiments. However, in case of any issues, don't hesitate to contact us (contact details are in the paper).
The code in this repository is an early experimental version of Jelly. The current version of Jelly can be found here: GITHUB, WEBSITE.
The materials in this repository are licensed under the Apache 2.0 License. You can cite the materials using this DOI:
Piotr Sowiński (1, 2), Katarzyna Wasielewska-Michniewska (2), Maria Ganzha (1, 2), Wiesław Pawłowski (2, 3), Paweł Szmeja (2) Marcin Paprzycki (2)
- (1) Warsaw University of Technology
- (2) Systems Research Institute, Polish Academy of Sciences
- (3) Dept. of Mathematics, Physics, and Informatics, University of Gdańsk
This work is part of the ASSIST-IoT project that has received funding from the EU’s Horizon 2020 research and innovation programme under grant agreement No 957258.