Apple showcases impressive on-device AI capabilities at WWDC 2024
Apple faced high expectations following Microsoft Build and Google I/O as it took the stage at its Worldwide Developers Conference (WWDC) 2024. The tech giant did not disappoint, demonstrating its integration of generative AI into the user experience across all its devices. Notably, Apple showcased its ability to handle AI workloads on-device, leveraging its state-of-the-art processors and open research efforts. This move allows Apple to provide high-quality, low-latency AI capabilities on its phones and computers, setting it apart from competitors.
The power of a 3-billion parameter model
During the Apple State of the Union presentation, it was revealed that Apple utilizes a 3-billion parameter model for its on-device AI. Although Apple does not explicitly mention the base model it uses, it has released the OpenELM family of language models, including a 3-billion parameter version. OpenELM has been optimized for resource-constrained devices, making modifications to enhance model quality without increasing parameters. It is highly likely that the foundation model used in Apple devices is a specialized version of OpenELM-3B.
Licensed data and reinforcement learning
To train its foundation model, Apple employed a variety of techniques. The model was trained on 1.8 trillion tokens of open datasets, including licensed data obtained through partnerships. Apple has secured deals with Shutterstock for images and major news and publishing organizations. Additionally, the model was fine-tuned for instruction-following using reinforcement learning from human feedback (RLHF). This approach allows the model to better understand and follow instructions provided by users, resulting in improved performance and accuracy.
Optimization techniques for resource efficiency
Apple has implemented several optimization techniques to enhance the capabilities of its models while maintaining resource efficiency. The foundation model utilizes “grouped query attention” (GQA), a technique developed by Google Research that accelerates inference speed without overwhelming memory and compute requirements. Furthermore, Apple employs “palletization” to compress the model’s weights by utilizing look-up tables and indices to group similar weights together. The company also mentions “quantization,” another compression technique that reduces the number of bits per parameter.
Customization through low-rank adaptation
Recognizing the limitations of a small language model, Apple’s engineers have created fine-tuned versions of the foundation model for on-device storage. To avoid storing multiple copies of the model, they utilize low-rank adaptation (LoRA) adapters. LoRA identifies and adjusts a small subset of weights that need modification for a specific task, storing these adaptations in adapters. At inference time, the adapters combine with the base model, allowing the device to utilize multiple LoRA adapters for different tasks. This approach enables Apple to offer customization options such as proofreading, summarization, and email replies.
Preferred performance over other models
Apple’s on-device AI model has undergone human evaluation, which demonstrates its preference over other models of equal or even larger size. Comparisons were made to models like Gemma-2B, Mistral-7B, Phi-3B-Mini, and Gemma-7B. This evaluation highlights the effectiveness and efficiency of Apple’s model, reaffirming its commitment to providing optimal user experiences.
Looking ahead
Apple’s on-device AI capabilities showcased at WWDC 2024 demonstrate the company’s ability to strike a balance between accuracy and user experience. By combining small models, optimization techniques, data partnerships, and specialized hardware, Apple has pushed the boundaries of what is possible in AI. As the technology is rolled out to users in the fall, it will be interesting to see how these impressive demos translate into real-world applications.