Apple researchers have successfully created an artificial intelligence (AI) system capable of visually perceiving and comprehending screen context.

Apple researchers have made a breakthrough in the field of artificial intelligence (AI) by creating a system that can visually perceive and comprehend screen context. The system, called ReALM (Reference Resolution As Language Modeling), uses large language models to convert the complex task of reference resolution into a pure language modeling problem. This allows the AI to understand ambiguous references to on-screen entities as well as conversational and background context, leading to more natural interactions with voice assistants.

The ability to understand context, including references, is crucial for a conversational assistant, according to the team of Apple researchers. Enabling users to issue queries about what they see on their screen is an important step in ensuring a true hands-free experience with voice assistants. ReALM achieves substantial performance gains compared to existing methods, outperforming GPT-4 on the task.

One key innovation of ReALM is its ability to reconstruct the screen using parsed on-screen entities and their locations to generate a textual representation that captures the visual layout. By fine-tuning language models specifically for reference resolution, the researchers demonstrated that ReALM can handle screen-based references effectively.

While the research shows promising results, there are limitations to relying solely on automated parsing of screens. Handling more complex visual references, such as distinguishing between multiple images, would likely require incorporating computer vision and multi-modal techniques.

Apple’s advancements in AI research are significant, even though it lags behind its tech rivals in the AI landscape. The company’s research labs have made breakthroughs in areas like multimodal models, AI-powered animation tools, and building specialized AI on a budget. These advancements signal Apple’s commitment to making Siri and other products more conversant and context-aware.

However, Apple faces tough competition from companies like Google, Microsoft, Amazon, and OpenAI, who have aggressively productized generative AI in various domains. Apple’s late entry into the AI market puts it at a disadvantage, but its deep pockets, brand loyalty, elite engineering, and tightly integrated product portfolio give it a chance to catch up.

In June, at the Worldwide Developers Conference, Apple is expected to unveil a new large language model framework and an “Apple GPT” chatbot, showcasing its AI-powered features across its ecosystem. CEO Tim Cook has hinted at the company’s ongoing work in AI, indicating that Apple’s AI efforts are extensive.

As the battle for AI supremacy intensifies, Apple aims to ensure it has a hand in shaping the new age of ubiquitous and truly intelligent computing. The progress made in AI research, particularly in understanding screen context, brings Apple closer to achieving this goal.