Data Intensive Scientific Thought

Posts

The Path to Sustainable AI -- Core Principles and Best Practices

September 12, 2024

Large-scale AI models are considerable consumers of computing resources and energy, leading to a significant carbon footprint on our planet. Researchers estimate that training a single natural language processing model can generate as much CO2e (carbon dioxide equivalent) as the annual emissions of 120 homes. AI workloads in data centers accounted for 15% of Google’s total electricity consumption -- 18.3 terawatt hours in 2021, which is comparable to the annual energy usage of the entire City of Atlanta. And this was well before the boom of generative AI technologies we have been witnessing over the last couple of years. Driven by the growing demands of large-scale data analytics and AI workloads, data centers are projected to consume 3–13% of global electricity by 2030 -- a significant increase from just 1% in 2010. The computational demands of cutting-edge AI models are increasing 1,000-fold every three years, and AI could account for 14% of the world’s total carbon...

Toward Sustainable Networking

March 07, 2024

Figure 1: Estimation of expected total annual energy consumption per IT industry in the period 2010–2030. The plethora of data generated by scientific applications, the Internet of Things, social media, and e-commerce fuel large-scale data analytics systems. As a result, data transfer over the Internet has been increasing each year exponentially and has already exceeded the zettabyte scale. With the increased data generation rate, the data movement’s carbon footprint is becoming an overwhelmingly critical problem, especially for data centers and wired access networks. It is estimated that information and communication technologies will use between 8% - 21% of the world’s electricity by 2030 . The estimation of expected total annual energy consumption per different IT industries in the period 2010–2030 is shown in Figure 1. The share of data centers and communication networks in the total IT power consumption is 69%. Among this share, the data transfers alone consume over a hun...

Spring'24 Seminar Course on Green Computing and Sustainability

February 22, 2024

This semester, I'm offering the second instantiation of my seminar course on Green Computing and Sustainability . The papers we plan to discuss this semester are listed below. This time, our focus will be Sustainable AI and Sustainable Data Centers . If you have recommendations on relevant papers to add to the list for our future seminars, please let me know. Green and Sustainable Computing: Challenges and Opportunities Assessing ICT Global Emissions Footprint: Trends to 2040 & R ecommendations The Sky is Not the Limit: Untapped Opportunities for Green Computing Sustainable Computing – Without the Hot Air The Odd One Out: Energy is not like Other Metrics Sustainable AI Green AI Towards a Methodology and Framework for AI Sustainability Metrics Do Deep Learning Frameworks Have Different Costs? Adaptive and Energy-Efficient Architectures for Machine Learning: Challenges and Opportunities Energy-Efficient Deep Learning Inference on Edge Dev...

A Vision for a National Data and Software Cyberinfrastructure

February 19, 2024

During my term as an NSF program director in the Office of Advanced Cyberinfrastructure between 2020-2022, I had the opportunity to lead the development of NSF’s Blueprint for a National Data and Software Cyberinfrastructure . This blueprint document is publicly available to the community and provides a forward-looking vision for a robust, secure, trusted, performant, scalable, and sustainable data and software cyberinfrastructure (Data and Software CI) ecosystem to enable and accelerate science and engineering research. This blueprint was prepared based on a comprehensive analysis of existing NSF programs and a wide range of input from the community via advisory bodies, requests for information (such as Data-Focused Cyberinfrastructure Needed to Support Future Data-Intensive Science and Engineering Research and Future Needs for Advanced Cyberinfrastructure to Support Science and Engineering Research ), community...

Toward Sustainable Software for HPC, Cloud, and AI Workloads

February 08, 2024

Software applications are considerable consumers of computing resources, leading to a significant carbon footprint. Recent estimates suggest that Information Technology accounts for approximately 11% of global energy consumption . Projections indicate that by 2030, data centers alone could account for 3–13% of global electricity use, a stark increase from the 1% recorded in 2010, propelled by escalating demands and emerging trends like large-scale AI workloads . The software industry is responsible for about 3% of global carbon emissions , which is very close to that associated with the aviation industry , and it could deteriorate if left unaddressed. Researchers have estimated that training a single machine learning algorithm for natural language processing may emit as much CO2e (carbon dioxide equivalent) as a small town with 120 homes would emit in a year . Another study reported that AI workloads in data centers made up 15% of Google’s total electricity consumption, which was 18...

Reading List for My Seminar Course on "Green Computing and Sustainability"

January 27, 2023

I w ill be teaching a seminar course on "Green Computing and Sustainability" in the Spring semester. The papers I plan to discuss in the course are listed below. I aim to narrow down the list later based on the interests of the students. Any suggestions are welcome. The Case for Green Computing and Sustainability Assessing ICT Global Emissions Footprint: Trends to 2040 & R ecommendations The Sky is Not the Limit: Untapped Opportunities for Green Computing Sustainable Computing – Without the Hot Air Green Computing Approaches - A Survey Green Data Centers Metrics for Sustainability in Data Centers The Odd One Out: Energy is not like Other Metrics A Systematic Review on Effective Energy Utilization Management Strategies in Cloud Data Centers Enabling Sustainable Clouds: The Case for Virtualizing the Energy System Treehouse: A Case For Carbon-Aware Datacenter Software Energy-Efficient Hybrid Framework for Green Cloud Computing Green Networking Greener, Energy-Effi...

Minimizing the Energy Footprint of Global Data Movement with GreenDataFlow

April 11, 2020

It is estimated that the number of devices connected to the Internet will be four times as high as the world population in 2022, and the global IP traffic will reach 4.8 zettabytes per year. The increased number of users and data rates do not only require increased network bandwidth and achievable data transfer throughput but also result in an increased energy footprint. The annual electricity consumed by the global data movement is estimated to be more than 200 terawatt-hours at the current rate, costing more than 40 billion US dollars per year. According to the same statistics, the share of the US in this global data movement and in its energy footprint is approximately 20%. This fact has resulted in a considerable amount of work focusing on power management and energy efficiency in hardware and software systems and more recently on power-aware networking. The majority of the existing work on power-aware networking focuses on reducing the power consumption on networking devices (i...