Hidden Costs of AI Q&A

At our recent SNIA Networking Storage Forum webinar, “Addressing the Hidden Costs of AI,” our expert team explored the impacts of AI, including sustainability and areas where there are potentially hidden technical and infrastructure costs. If you missed the live event, you can watch it on-demand in the SNIA Educational Library. Questions from the audience ranged from training Large Language Models to fundamental infrastructure changes from AI and more. Here are answers to the audience’s questions from our presenters.

Q: Do you have an idea of where the best tradeoff is for high IO speed cost and GPU working cost? Is it always best to spend maximum and get highest IO speed possible?

A: It depends on what you are trying to do If you are training a Large Language Model (LLM) then you’ll have a large collection of GPUs communicating with one another regularly (e.g., All-reduce) and doing so at throughput rates that are up to 900GB/s per GPU! For this kind of use case, it makes sense to use the fastest network option available. Any money saved by using a cheaper/slightly less performant transport will be more than offset by the cost of GPUs that are idle while waiting for data.

If you are more interested in Fine Tuning an existing model or using Retrieval Augmented Generation (RAG) then you won’t need quite as much network bandwidth and can choose a more economical connectivity option.

It’s worth noting Read More

Throughput, IOPs, and Latency Q&A

Throughput, IOPs, and latency are three terms often referred to as storage performance metrics. But the exact definitions of these terms and how they differ can be confusing. That’s why the SNIA Networking Storage Forum (NSF) brought back our popular webinar series, “Everything You Wanted to Know About Storage, But Were Too Proud to Ask,” with a live webinar, “Everything You Wanted to Know about Throughput, IOPs, and Latency But Were Too Proud to Ask.”

The live session was a hit with over 850 views in the first 48 hours. If you missed the live event, you can watch it on-demand. Our audience asked several interesting questions, here are our answer to them.

Q: Discussing congestion and mechanisms at play in RoCEv2 (DCQCN and delay-change control) would be more interesting than legacy BB_credit handling in FC SAN… Read More

Here’s Everything You Wanted to Know About Throughput, IOPs, and Latency

Any discussion about storage systems is incomplete without the mention of Throughput, IOPs, and Latency. But what exactly do these terms mean, and why are they important? To answer these questions, the SNIA Networking Storage Forum (NSF) is bringing back our popular webinar series, “Everything You Wanted to Know About Storage, But Were Too Proud to Ask.”

Collectively, these three terms are often referred to as storage performance metrics. Performance can be defined as the effectiveness of a storage system to address I/O needs of an application or workload. Different application workloads have different I/O patterns, and with that arises different bottlenecks, so there is no “one-size fits all” in storage systems. These storage performance metrics help with storage solution design and selection based on application/workload demands.

Join us on February 7, 2024, for “Everything You Wanted to Know About Throughput, IOPS, and Latency, But Were Too Proud to Ask.” In this webinar, we’ll cover: Read More

Accelerating Generative AI

Workloads using generative artificial intelligence trained on large language models are frequently throttled by insufficient resources (e.g., memory, storage, compute or network dataflow bottlenecks). If not identified and addressed, these dataflow bottlenecks can constrain Gen AI application performance well below optimal levels.

Given the compelling uses across natural language processing (NLP), video analytics, document resource development, image processing, image generation, and text generation, being able to run these workloads efficiently has become critical to many IT and industry segments. The resources that contribute to generative AI performance and efficiency include CPUs, DPUs, GPUs, FPGAs, plus memory and storage controllers. Read More

Addressing the Hidden Costs of AI

The latest buzz around generative AI ignores the massive costs to run and power the technology. Understanding what the sustainability and cost impacts of AI are and how to effectively address them will be the topic of our next SNIA Networking Storage Forum (NSF) webinar, “Addressing the Hidden Costs of AI.” On February 27, 2024, our SNIA experts will offer insights on the potentially hidden technical and infrastructure costs associated with generative AI. You’ll also learn best practices and potential solutions to be considered as they discuss: Read More

NVMe®/TCP Q&A

The SNIA Networking Storage Forum (NSF) had an outstanding response to our live webinar, “NVMe/TCP: Performance, Deployment, and Automation.” If you missed the session, you can watch it on-demand and download a copy of the presentation slides at the SNIA Educational Library. Our live audience gave the presentation a 4.9 rating on a scale of 1-5, and they asked a lot of detailed questions, which our presenter, Erik Smith, Vice Chair of SNIA NSF, has answered here.

Q: Does the Centralized Discovery Controller (CDC) layer also provide drive access control or is it simply for discovery of drives visible on the network?

A: As defined in TP8010, the CDC only provides transport layer discovery. In other words, the CDC will allow a host to discover transport layer information (IP, Port, NQN) about the subsystem ports (on the array) that each host has been allowed to communicate with. Provisioning storage volumes to a particular host is additional functionality that COULD be added to an implementation of the CDC. (e.g., Dell has a CDC implementation that we refer to as SmartFabric Storage Software (SFSS).

Q: Can you provide some examples of companies that provide CDC and drive access control functionalities? Read More

Considerations and Options for NVMe/TCP Deployment

NVMe®/TCP has gained a lot of attention over the last several years due to its great performance characteristics and relatively low cost. Since its ratification in 2018, the NVMe/TCP protocol has been enhanced to add features such as Discovery Automation, Authentication and Secure Channels that make it more suitable for use in enterprise environments. Now as organizations evaluate their options and consider adopting NVMe/TCP for use in their environment, many find they need a bit more information before deciding how to move forward.

That’s why the SNIA Networking Storage Forum (NSF) is hosting a live webinar on July 19, 2023 “NVMe/TCP: Performance, Deployment and Automation” where we will provide an overview of deployment considerations and options, and answer questions such as: Read More

Web 3.0 – The Future of Decentralized Storage

Decentralized storage is bridging the gap between Web 2.0 and Web 3.0, and its impact on enterprise storage is significant. The topic of decentralized storage and Web 3.0 will be the focus of an expert panel discussion the SNIA Networking Storage Forum is hosting on June 1, 2023, “Why Web 3.0 is Important to Enterprise Storage.”

In this webinar, we will provide an overview of enterprise decentralized storage and explain why it is more relevant now than ever before. We will delve into the benefits and demands of decentralized storage and discuss the evolution of on-premises, to cloud, to decentralized storage (cloud 2.0). We will also explore various use cases of decentralized storage, including its role in data privacy and security and the potential for decentralized applications (dApps) and blockchain technology. Read More

Live Panel: Sustainability in the Data Center

As our data-driven global economy continues to expand with new workloads such as proven digital assets and currency, artificial intelligence and advanced healthcare, our data centers continue to evolve with denser computational systems and increased data stores. This creates challenges for sustainable growth and managing costs.

On April 25, 2023, The SNIA Networking Storage Forum will explore this topic with a live webinar “Sustainability in the Data Center Ecosystem.” We’ve convened a panel of experts, who will cover a wide range of topics, including delivering more power efficiency per capacity, revolutionizing cooling to reduce heat, increasing system processing to enhance performance, infrastructure consolidation to reduce the physical and carbon footprint, and applying current and new metrics for carbon footprint and resource efficiency.

Beginning with a definition of sustainability, they will discuss: Read More

A Q&A on the Open Programmable Infrastructure (OPI) Project

Last month, the SNIA Networking Storage Forum hosted several experts leading the Open Programmable Infrastructure (OPI) project with a live webcast, “An Introduction to the OPI (Open Programmable Infrastructure) Project.” The project has been created to address a new class of cloud and datacenter infrastructure component. This new infrastructure element, often referred to as Data Processing Unit (DPU), Infrastructure Processing Unit (IPU) or xPU as a general term, takes the form of a server hosted PCIe add-in card or on-board chip(s), containing one or more ASIC’s or FPGA’s, usually anchored around a single powerful SoC device.

Our OPI experts provided an introduction to the OPI Project and then explained lifecycle provisioning, API, use cases, proof of concept and developer platform. If you missed the live presentation, you can watch it on demand and download a PDF of the slides at the SNIA Educational Library. The attendees at the live session asked several interesting questions. Here are answers to them from our presenters.

Q. Are there any plans for OPI to use GraphQL for API definitions since GraphQL has a good development environment, better security, and a well-defined, typed, schema approach?

Read More