The RAINBOW Data Management and Analytics Stack helps users to manage and analyze in real-time the vast amounts of monitoring data collected from both the underlying fog resources and performance indicators from deployed IoT applications. Specifically, service operators describe through RAINBOW’s query language their queries, leaving the RAINBOW Data Management and Analytics Stack to (i) translate while optimizing their queries by eliminating the unnecessary computations and data movement, (ii) submit the queries to the underlying engine and optimize their execution based on user’s orchestration preferences (energy-aware, performance-aware, etc.), (iii) ensure reliable, efficient, and in-time data fetching via RAINBOW’s unified Data Storage interface (Storage Fabric).
In this blogpost, we focus on RAINBOW’s Distributed Data Processing Service. Specifically, we built it upon the Apache Storm with our aim being to not implement yet another distributed data processing engine but rather to design novel scheduling algorithms that are decoupled from the underlying engine and acknowledge the unique settings found in geo-distributed IoT environments. One of the most important factors that lead RAINBOW to select Apache Storm as the underlying stream processing engine is the ease it provides when there is a need to customize the assignment of processing tasks to nodes. Specifically, RAINBOW introduced custom schedulers by implementing the Apache Storm scheduler interface, and without the need to resort to source code refactoring. The latter provides RAINBOW the ability to offer a Scheduler’s Repository with 4 task schedulers that users can select to optimize the processing execution based on their needs. The RAINBOW Scheduler’s Repository offers: (i) a BaselineScheduler that maps tasks to fog nodes by adopting a fairness strategy using a round-robin allocation mode; (ii) the PerfScheduler, which acknowledges both the heterogeneity of the underlying fog nodes’ resources and network link performance to optimize the scheduling by minimizing stream processing latency; (iii) the PerfDQScheduler that extends the performance scheduler to also consider data quality as an additional problem dimension; and (iv) the PerfEnergyScheduler that explores tradeoffs between power levels and performance to avoid energy waste when running streaming analytic jobs. Next, we introduce a set of experiments that we performed on a physical Fog testbed for diverse schedulers’ optimization strategies.
Task placement optimization strategies
Below, we provide an example of a fog deployment with an analytics job composed of 5 RAINBOW queries executed using 3 different schedulers. The schedulers’ under-examination are the BaselineScheduler (denoted in plots as default), PerfScheduler (denoted in plots as resource) and the PerfEnergyScheduler (denoted in plots as energy). The deployment for all three experiment runs is comprised of 5 fog nodes including 1 Dell PowerEdge R610 server (email@example.comGHz with 12GB RAM) and power ranging from 70-200W (denoted as nc) and 4 Raspberry Pi’s v4 model B (quad core ARM Cortex-A72@1.5GHz and 4GB RAM) and power ranging from 4-8W (denoted as rp0-3). rp1 and rp3 are battery-powered. The left plot depicts the operators mapped to each fog node, while the right 2 plots depict the overall power consumption and latency incurred of the two RAINBOW Schedulers when compared to the baseline.
In the case of the baseline scheduler, tasks are allocated fairly to all fog nodes irrespective of their resource capabilities. In turn, when embracing the PerfScheduler, all tasks are allocated to the powerful nc1 as resources permit this, minimizing communication latency. This has a direct positive effect on the net latency that drops by 24ms. However, to sufficiently achieve the required computations, nc1 operates at high power levels, which result in a 12% increment of the net energy consumption. On the other hand, when embracing the PerfEnergyScheduler, the tasks are shared among the two Raspberry Pi’s that are “plugged-in”, having as an effect in the net energy savings to drop by 18% when compared to the baseline, while the net latency (as expected) increases by 18ms.