Cloudflare, Inc. (NYSE: NET), a prominent connectivity cloud company, has revealed plans to deploy NVIDIA GPUs at the edge of its global network, along with NVIDIA Ethernet switches. This strategic move will bring AI inference compute power closer to users worldwide. Cloudflare will also incorporate NVIDIA’s comprehensive stack of inference software, including NVIDIA TensorRT-LLM and NVIDIA Triton Inference server, to enhance the performance of AI applications, including large language models.

Starting immediately, Cloudflare is granting all of its customers access to local compute power, enabling them to deliver AI applications and services more efficiently using compliant infrastructure. This development allows organizations to run AI workloads at scale and pay only for the compute power they utilize, all through Cloudflare for the first time.

AI inference is pivotal in shaping the end-user experience for AI applications and is expected to play a significant role in AI workloads. The demand for GPUs in organizations is substantial. With its extensive network of data centers in over 300 cities worldwide, Cloudflare can deliver swift user experiences while adhering to global compliance regulations.

Cloudflare is simplifying the deployment of AI models, powered by NVIDIA GPUs, networking infrastructure, and inference software, for organizations globally. This eliminates the need for organizations to manage, scale, optimize, or secure their AI deployments.

Matthew Prince, CEO and co-founder of Cloudflare, emphasizes the significance of AI inference on a network, stating that it offers a cost-effective solution with data proximity to users. By incorporating NVIDIA’s cutting-edge GPU technology into its global network, Cloudflare is democratizing AI inference and making it accessible and affordable on a global scale.

Ian Buck, Vice President of Hyperscale and HPC at NVIDIA, highlights the critical role of NVIDIA’s inference platform in powering generative AI applications. With NVIDIA GPUs and AI software available on Cloudflare, businesses will have the capacity to create responsive customer experiences and drive innovation across various industries.

Cloudflare’s deployment of NVIDIA GPUs to its global edge network enables:

Low-latency generative AI experiences for end users, with NVIDIA GPUs available for inference tasks in over 100 cities by the end of 2023 and nearly everywhere Cloudflare’s network reaches by the end of 2024.

Access to compute power near the location of customer data, assisting organizations in anticipating compliance and regulatory requirements.

Affordable, pay-as-you-go compute power at scale, ensuring that businesses can access the latest AI innovations without the need for substantial upfront investments to reserve GPUs that might go unused.