Will the next generation of Siri run largely on Google Cloud and Nvidia chips? | News

Ahmed Riaz May 31, 2026

0 12 2 minutes read

At the Worldwide Developers Conference 2026, which begins with the keynote on June 8th, Apple will probably announce the next generation of Siri. This will probably be available to all users in the fall along with iOS 27. Now “The Information” reports that there are still some problems as to how the immense computing power can be provided. As early as the beginning of March 2026, a report emerged that Apple’s own AI infrastructure was inadequate for the new generation of Siri based on Google Gemini.

As much as possible locally – but large models are problematic
Thanks to the “Apple Neural Engine”, which is built into A and M chips, iPhones, iPads and Macs are able to carry out some AI tasks locally – and with high efficiency. However, large AI models such as Google Gemini or ChatGPT require immense amounts of memory and local calculation is usually not possible.

Although there are some techniques for reducing the size of models and Apple itself acquired companies that specialized in these exact methods, these approaches usually do not provide satisfactory quality for a model like Google Gemini. For this reason, many requests to “Siri 2.0” are processed in data centers by powerful servers and not locally on devices.

According to the report, Apple should still try to run at least some of the AI features locally on the device. It is conceivable that Apple will locally run smaller and more specialized models extracted from Google Gemini for a specific task area.

Google Cloud and Nvidia AI chips should fix it
Apple actually wanted to run Google Gemini on its own server infrastructure, equipped with M2 Ultra chips. But apparently the infrastructure dubbed “Private Cloud Compute” is not sufficient because, according to “The Information,” Apple is struggling with many performance issues here.

It was only in the last few weeks that the decision was made to rely on “Google Cloud” as well as Nvidia AI accelerators – and not on its own infrastructure. Apple wants to continue to market this under the name “Private Cloud Compute” – but internally rely on Nvidia’s “Confidential Compute” to process user requests in a data protection-friendly manner. Here the requests are received and processed in encrypted form. According to Nvidia, this is only accompanied by a very moderate loss of speed.

Source link