Machine Learning & AI

Unofficial AI summary of the WWDC26 group lab. May contain mistakes.

Related Labs: Apple Intelligence, Coding Intelligence, Machine Learning & AI

How should developers decide between on-device Foundation models and private cloud compute?

  • Evaluate based on context size: on-device models have 4K context, while private cloud compute offers 32K.
  • Consider offline availability: on-device models are accessible without a network connection.
  • Private cloud compute supports reasoning capabilities, allowing models to "think" before responding.
  • Use the Evaluation framework for comparative testing by switching between on-device and private cloud compute models with minimal code changes.
  • Be prepared to fall back to on-device models if network connectivity is lost when using private cloud compute.

What factors influence latency when choosing between local and private cloud compute models?

  • On-device models eliminate network latency, but good network conditions may make server round-trip time negligible.
  • Tokens per second performance might not significantly differ between local and server models, surprisingly.
  • Utilize Instruments for detailed metrics like round-trip latency and token budgets to measure performance.

How can developers use both on-device and private cloud compute models within the same app or feature?

  • Dynamic Profiles is a new API designed to facilitate the use of multiple models together, enabling context sharing and handoffs.
  • This allows for scenarios where an on-device model might be used for initial routing, with more complex tasks delegated to a server model.

Why might an Xcode-connected local LLM using MLX server exhibit strange behavior like long delays and an IAM_N tag, and how can this be resolved?

  • This issue is common when integrating new language models and often relates to tokenizer configuration.
  • Verify that the tokenizer correctly recognizes the specific token causing the problem.
  • Configuration issues, particularly with chat templates, are a likely cause.
  • Posting on forums or GitHub issues with detailed observations is recommended for debugging.

How can local LLMs running on MLX LM or custom Core AI models be used as coding agents in Xcode to manipulate projects, beyond simple chat interactions?

  • Xcode's integration with language models primarily uses the Language Model Protocol and the Agent Client Protocol (ACP).
  • Both MLX and Core AI support using custom models.
  • Core AI is inference-only — for agent use developers would need to build or contribute an OpenAI-compatible server layer; MLX LM servers work out of the box.
  • Xcode can automatically detect MLX LM servers and allow selection of models, supporting both chat and ACP modes.

How can developers detect when on-device or private cloud compute models have been updated, especially if prompt behavior regresses?

  • On-device models are typically updated as part of OS updates (~twice per year, often in betas), giving developers a testing window before GA.
  • There is no runtime model-version identifier — OS version is the practical proxy for which model developers are running.
  • Avoid per-OS-version prompt if branches; instead rephrase prompts to be robust across versions.
  • The Evaluation framework is crucial for detecting model changes by running comparative evaluations against previous OS versions.
  • Model updates are generally intended to improve performance, not regress functionality.

Can Foundation Models run from widgets, app intents, or background tasks, and what are the considerations for rate limits and thermal rules?

  • On-device models can run in the background and in widgets.
  • Rate limiting may occur on iOS due to system conditions; requests might fail and should be retried.
  • The system manages on-device LLM resources, ensuring only one instance is in memory, but foreground tasks may be prioritized.
  • On iOS, background tasks may be deprioritized if resource-intensive foreground activities are occurring (e.g., games get priority over background LLM use).
  • Don't surface rate-limit errors to users in background contexts — handle and retry silently.
  • Rate limiting does not apply on macOS for the on-device foundation model.

How can developers create or adapt models for their own apps, data, and domains, such as generating stylized user profile images or adaptive app guides?

  • Start by customizing prompts with app-specific user data to generate personalized outputs.
  • Consider fine-tuning open-source models from platforms like Hugging Face if more advanced customization is needed.
  • User adaptation can often be achieved through prompt engineering and extracting relevant information, rather than full fine-tuning.
  • Leverage the Evaluation framework for "hill climbing," iteratively improving model quality through prompt changes, model swaps, or fine-tuning.
  • The Evaluation framework's sample generator can create synthetic text inputs for testing (and pipelines exist for diffusion models).
  • The Evaluation framework supports various model types, including diffusion models, and can use model judge evaluators for style and quality assessment.

What are the best use cases for the Foundation Models Framework, and which should developers avoid?

  • Ideal use cases include content extraction, content generation, and tasks leveraging image input.
  • Test image input during the iOS 27 beta and file feedback if the use cases fail.
  • LoRA-style adapters may be possible for text models but are not supported for diffusion models.
  • Avoid using Foundation Models for real-time processing that requires sub-millisecond latency, such as ranking list views or processing video frames per second.
  • For highly specialized, real-time tasks, consider domain-specific APIs like Vision or Speech Recognition.
  • If a task requires more language support than available in Foundation Models, specialized translation APIs might be a better fit.
  • Foundation Models are powerful for summarization, text extraction, grammar correction, and tone/style transformation, especially with the context window improvements from Private Cloud Compute.

Where should a beginner start to learn about AI/ML, and what resources are recommended?

  • Start with Swift Playgrounds for an introduction to ML using Core ML for image classification.
  • The Foundation Models Framework is easy to pick up for prompt engineering and tool additions.
  • Xcode now offers a one-click playground launch for quick experimentation with language models.
  • For deeper understanding, explore how LLMs work under the hood, including token generation and training.
  • Project-based learning, building an app or model, is highly effective.

Is it possible to import everything from PyTorch into Core AI or the Apple ecosystem, and what are the limitations?

  • Most PyTorch models that can be exported using PyTorch APIs will convert straightforwardly to Core AI.
  • Core AI supports all core A10 operations; custom operations may require custom lowering or wrapping.
  • For performance optimization, custom Metal kernels can be written.
  • MLX APIs are similar to PyTorch, making it easy to experiment with models from Hugging Face.
  • Many PyTorch models on Hugging Face have already been converted for MLX.

What is the best user experience when Foundation Models are unavailable (e.g., older devices, Apple Intelligence off, low battery), and how can developers detect availability and degrade gracefully?

  • Use the pre-warm API to load the on-device system language model into memory before user interaction, reducing initial latency.
  • An availability API allows checking if the model is supported on the device.
  • For unsupported devices, consider alternatives like bringing custom model via Core AI or relying on a server model.
  • If Apple Intelligence is off, guide users to enable it rather than showing an error.
  • If a button or feature relies on an unavailable model, it's better to hide it or provide a different experience to avoid user frustration.

How should developers migrate from Core ML to Core AI, and what is the recommended approach for supporting older devices?

  • Core ML is still supported and useful for models like decision trees or when microsecond performance is critical.
  • Core AI is the recommended path for modern generative AI models.
  • Consider the user base's OS update timeline for dropping support for older OS versions.
  • Implement a wrapper or use conditional logic to use Core AI on newer OS versions and Core ML on older ones.
  • If the app uses generative models, migrating to Core AI is recommended.

What is the maximum on-device local model size that can be used before considering private cloud compute, and what are the considerations for iOS and macOS?

  • On iOS, it's generally recommended to keep models below 2GB to avoid stressing the system and to be a good platform citizen.
  • On macOS, model size is more flexible and depends on available memory; profiling the app is key.
  • Consider available memory within the app and system-wide memory usage.
  • Quantization can reduce model size, but evaluate the trade-off between size and accuracy loss.
  • For LLMs, a 6-billion parameter model might be pushing the 2GB limit on iOS, depending on quantization.

How should developers decide when AI should assist the user versus when it should remain in the background?

  • Focus on the fundamental user experience to deliver; AI should enhance, not dictate, the experience.
  • Use AI tools strategically to fill functional gaps or provide specific benefits, rather than for the sake of using AI.
  • User delight and the overall experience are paramount.
  • Experimentation and evaluation can help determine if an AI feature is meaningful and beneficial to users.
  • AI should help users achieve their goals more quickly and efficiently without being intrusive or obstructive.

For older iPhones without Apple Intelligence and using a self-hosted LLM backend, is wrapping the backend in a custom language model provider the recommended path on iOS 27?

  • Yes, the new Language Model Protocol allows using any model, including custom server models, with a unified Swift API.
  • If the custom server model conforms to the OpenAI chat completions web request protocol, developers can use the open-source utilities package for integration.
  • This protocol enables seamless integration of system models and custom models within the same app, with conditional logic for model selection.