Minimizing the Energy Footprint of Global Data Movement with GreenDataFlow

It is estimated that the number of devices connected to the Internet will be four times as high as the world population in 2022, and the global IP traffic will reach 4.8 zettabytes per year. The increased number of users and data rates do not only require increased network bandwidth and achievable data transfer throughput but also result in an increased energy footprint. The annual electricity consumed by the global data movement is estimated to be more than 200 terawatt-hours at the current rate, costing more than 40 billion US dollars per year. According to the same statistics, the share of the US in this global data movement and in its energy footprint is approximately 20%. This fact has resulted in a considerable amount of work focusing on power management and energy efficiency in hardware and software systems and more recently on power-aware networking.

The majority of the existing work on power-aware networking focuses on reducing the power consumption on networking devices (i.e., routers, switches, and hubs). Different techniques have been suggested such as putting idle sub-components (i.e., line cards, etc.) to sleep, adapting the rate at which switches forward packets depending on the traffic, putting the Ethernet cards to low power mode when there is no network traffic, development of architectures with programmable switches, switching layers that can incorporate different policies, and power-aware network protocols for energy efficiency in network routing.

The existing approaches suffer from the following drawbacks: (1) the solution is too costly (i.e., replacing all switches with energy-efficient ones); (2) the solution is unpractical in the short term (i.e., replacing TCP with a more energy-efficient version); (3) the solution penalizes performance while increasing energy efficiency (i.e., sleeping some components while not in use). 
The GreenDataFlow project, which is partially funded by NSF and IBM, proposes an innovative application-layer solution that is low cost, very easy and practical to deploy, and does not penalize the performance while increasing energy efficiency. This is a radically different approach for energy-efficient data transfers which makes GreenDataFlow truly transformative.

Approximately 25% of total electricity consumption during the end-to-end data transfers occur at the end-systems on a global (inter-continental) network, and this number goes up to 60% on a nation-wide network, and up to 90% on a local-area network. This ratio depends on the number of network devices (i.e., routers, switches, etc.) between the sender and receiver nodes, and how much power each device consumes. On any of these networks, decreasing the end-system power consumption would result in significant energy savings considering exabytes of data are moved, and terawatts of electricity is consumed in world-wide data movement every year. While achieving energy efficiency, GreenDataFlow aims to avoid penalizing the performance, as empirical data suggests performance directly impacts revenue. As an example, Google reported a 20% revenue loss due to a specific experiment that increased the time to display search results by as little as 500 milliseconds; and Amazon reported a 1% sales decrease for an additional delay of as little as 100 milliseconds.

GreenDataFlow proposes a novel approach to achieve highly energy-efficient end-to-end data transfers through application-layer protocol tuning and optimization. To achieve this goal, it investigates several key research questions: (1) it studies accurate prediction of the best combination of end-system and protocol parameters (e.g., CPU frequency scaling, multi-core scheduling, I/O block size, TCP buffer size, the level of parallelism, concurrency, and pipelining) for optimal data transfer throughput with energy-efficiency constraints; (2) it investigates the accurate prediction of the network device power consumption due to increased data transfer rate on the active links and dynamic readjustment of the transfer rate to balance the energy over performance ratio; and (3) it investigates service level agreement (SLA) based energy-efficient transfer algorithms, which will help the service providers to minimize the energy consumption during data transfers without compromising the SLA with the customer in terms of the promised performance level, but still execute the transfers with minimal energy levels given the requirements.

I believe that the GreenDataFlow framework will fill an important gap in the data transfer energy efficiency. The models, algorithms, and tools developed as part of the GreenDataFlow project will help to increase the performance and decrease the power consumption during end-to-end data transfers, which should save significant gigawatt-hours of energy and millions of dollars to the US economy. The preliminary results already show that it can achieve up to 700% performance improvement while saving up to 60% energy at the end systems in certain cases, which is quite exciting.


Popular posts from this blog

Step-by-Step Guidelines to Optimize Big Data Transfers

OneDataShare -- Fast, Scalable, and Flexible Data Sharing Made Easy