1. Home
  2. Docs
  3. Getting Started with RAINBOW
  4. RAINBOW Analytics

RAINBOW Analytics

Rainbow Analytics with StreamSight

StreamSight is a query-driven framework for streaming analytics in edge computing. It supports users in composing analytic queries that are automatically translated and mapped to streaming operations suitable for running on distributed processing engines deployed in wide areas of coverage. Our reference implementation has been integrated with Apache Storm 2.2.0.

What does StreamSight do?

Data scientists and platform operators can compose complex analytic queries without any knowledge of the programmable model of the underlying distributed processing engine by solely using high-level directives and expressions.An example of such analytic query from the domain of intelligent transportation is the following:

insight = compute avg("_SYSTEM_CPU_VISIBLETOTAL" from (stream2), 10s) every 5s;

The above query computes for a 10 seconds sliding window the average CPU total segment with new datapoints considered every 5s, which is particularly useful for system administrators in order to detect CPU throttling.

Getting Started

Build StreamSight

  1. Make sure you’ve installed Docker on your machine.
  2. Build the cluster using the following command:
docker compose-up
  1. Create and submit the job using the following script
/analytics/streamsight/submit.sh
  • See this section for configuring StreamSight.
  • See this section for composing your queries.

StreamSight Architecture

The following figure depicts a high-level and abstract overview of the Fog Analytics Cycle. Users submit ad-hoc queries following the declarative query model and the system compiles these queries into low-level streaming components.

The next figure presents the Storm architecture when an analytics job is submitted. Each fog node is comprised of a set of Supervisor nodes responsible for the execution of analytics tasks. Zookeeper is responsible for the communication of Master Node (Nimbus) with the Supervisor nodes.

Insight Declaration Examples

In this section, we present a few examples for useful insights from monitoring a cloud infrastructure.

Example 1

A useful insight that many companies need to monitor and take decisions on that is CPU utilization.So the following insight returns the average CPU utilization of a service for 30 seconds every 10 seconds.

cpu_utilization = COPMUTE AVG("cpu_user" FROM (stream), 30s) + AVG("cpu_sys" FROM (stream), 30s) EVERY 10s;

Example 2

Next we present an insight for maximum number of HTTP Requests per Second for 10 seconds window which is computed every 30 seconds grouped by region. For DevOps engineers and developers, who works on web-based applications, the peak of traffic in a specific region can be a critical factor.

http_requests_per_seconds_by_region = COPMUTEMAX("requests_per_seconds" FROM (stream), 10s) BY "region" EVERY 30s;

Example 3

Below we present an example of a query that the execution depends on the condition on the right side. This is particularly important in the cases where analysts want to avoid collecting unnecessary results for values that are not important to them.

abnormal_temperature = compute avg("temperature" from (stream)) when > 100

Example 4

The example below presents a cumulative operation of the energy consumption which is accumulated every time there is a new tuple as input.

total_energy = compute sum("energy_consumption" from (stream))

Example 5

In the case that we need to calculate the overall cost as well we can multiply the above with the double value that represents the cost of the unit (0.002).

total_cost_energy = 0.002 * compute sum("energy_consumption" from (stream))

Configuration

ParameterDescription
Topology NameThe name of the StreamSight topology to be created
Output OperationThe operation to be applied for output results (Local print, Ignite exporter, Influx exporter)
Provider HostsThe hostnames of data providers
Provider PortThe port number of data providers
Consumer HostThe hostname of data consumer
Consumer PortThe port number of data consumer

Licence

The framework is open-sourced under the Apache 2.0 License base. The codebase of the framework is maintained by the authors for academic research and is therefore provided “as is”.

Was this article helpful to you? Yes No

How can we help?