As much as possible locally – but large models are problematic
Thanks to the “Apple Neural Engine”, which is built into A and M chips, iPhones, iPads and Macs are able to carry out some AI tasks locally – and with high efficiency. However, large AI models such as Google Gemini or ChatGPT require immense amounts of memory and local calculation is usually not possible.
Although there are some techniques for reducing the size of models and Apple itself acquired companies that specialized in these exact methods, these approaches usually do not provide satisfactory quality for a model like Google Gemini. For this reason, many requests to “Siri 2.0” are processed in data centers by powerful servers and not locally on devices.
According to the report, Apple should still try to run at least some of the AI features locally on the device. It is conceivable that Apple will locally run smaller and more specialized models extracted from Google Gemini for a specific task area.
Google Cloud and Nvidia AI chips should fix it
Apple actually wanted to run Google Gemini on its own server infrastructure, equipped with M2 Ultra chips. But apparently the infrastructure dubbed “Private Cloud Compute” is not sufficient because, according to “The Information,” Apple is struggling with many performance issues here.
It was only in the last few weeks that the decision was made to rely on “Google Cloud” as well as Nvidia AI accelerators – and not on its own infrastructure. Apple wants to continue to market this under the name “Private Cloud Compute” – but internally rely on Nvidia’s “Confidential Compute” to process user requests in a data protection-friendly manner. Here the requests are received and processed in encrypted form. According to Nvidia, this is only accompanied by a very moderate loss of speed.

