Coding Intelligence, Machine Learning & AI

Unofficial AI summary of the WWDC26 group lab. May contain mistakes.

Related Labs: Coding Intelligence for Beginners, Apple Intelligence, Machine Learning & AI

What are the roles of Core AI, Core ML, and MLX, and how should a beginner understand them?

  • Foundation Models Framework: Top-level for LLM use cases. Integrates with various inference backends. The framework will be open source very soon, tying together the Language Model Protocol ecosystem.
  • Core AI: For custom models (trained or downloaded). Offers SLAs and guarantees, suitable for applications.
  • MLX: For powerful, custom use cases, including distributed training and local AI.
  • Core ML: Primarily for traditional ML models like decision trees; new neural network work should use Core AI.
  • Recommendation: Start with Foundation Models for LLMs. For custom non-LLM tasks, use Core AI. Use MLX for advanced or custom training needs.

What is the on-device foundation model's context window in iOS 27, and is the input and output counted against one shared token budget?

  • The on-device context window is 4096 tokens.
  • This is a shared budget for both input and output. For example, 4000 input tokens leave only 96 for the response.
  • Private Cloud Compute (PCC) offers a larger 32K context window with the same shared budget principle.
  • For even larger context needs (up to 1 million tokens), consider third-party models via MLX or Core AI.

Can foundation models run inside background app refresh or background processing tasks, especially when the phone is locked or the app has been backgrounded?

  • Yes, foundation models can run in background tasks.
  • The system may rate-limit requests if it's busy, returning a rate-limited error. Developers should handle this by retrying later.
  • On iOS, background LLM use may be rate-limited or deprioritized when the system is busy.
  • On macOS, will not be rate-limited with the local foundation model while the app is in the foreground.
  • Private Cloud Compute may experience rate limiting due to system load (system busy) or excessive requests (too many requests / quota exhaustion).

For macOS 27, Apple Intelligence, what does "waitlist" mean, and are different models offered on the waitlist?

  • The "waitlist" applies only to Siri.
  • It does not apply to on-device Apple Intelligence features or Private Cloud Compute (PCC) language models.
  • The beta includes the AFM core advanced 20 billion model, used for voice features.

Can the Foundation Models Framework mix on-device models, private cloud compute, and third-party LLM providers within a single agentic flow, and what are the data privacy and attribution boundaries?

  • Yes, developers can mix these using the Dynamic Profiles API.
  • Baton Pass: A pattern where context is fully shared between models in a flow. Suitable for on-device or PCC models where privacy is not a concern.
  • Phone a Friend: A pattern where only the current query is sent to a third-party model, preserving privacy of previous interactions. This is akin to tool calling with an ephemeral session.
  • Privacy and attribution boundaries depend on the chosen pattern and the third-party provider's policies.
  • Profile modifiers can manage context window sizes between different models.

Who handles on-device speech personalization for user-specific names?

  • Configure speech personalization through the speech framework documentation and developer forums.
  • Speech recognition systems often include a separate personalization component that developers may be able to fine-tune.
  • Building a custom model to handle specific names and pronunciations is possible for supported languages.

How can coding agents learn a project's code style and architecture?

  • Leverage Source Code: Agents learn by observing existing code in the project.
  • Provide Examples: Use agent-specific files (e.g., AGENTS.md) to provide examples of code style. Keep these files concise to fit within context windows.
  • Reference Other Files: Link to style guides or documentation within the AGENTS.md file.
  • Document Learnings: Instruct agents to document their findings and assumptions about the codebase. This creates a knowledge base for future reference.
  • Search and Learning: Agents learn by searching for information. Providing access to relevant markdown files or documentation enhances their learning.
  • Xcode Integration: Xcode's documentation search can help agents learn new APIs, even if the model is older.
  • Agent Client Protocol (ACP): Using ACP with local LLM providers (like LM Studio) in Xcode 27 offers more capabilities than simple chat completions, including state management and file I/O.
  • Model Choice: Smaller local models may have limitations with very large codebases or complex reasoning.
  • Continuous Teaching: Developers need to continuously teach agents their style and project-specific preferences.
  • Resetting: When new models are released, consider starting with a clean slate to see what the new model can do without prior context.

What practical steps can teams take to integrate automated UI testing into their workflows on Apple platforms?

  • Unit Tests: Focus on testing small, independent kernels of functionality. Agents can help enumerate permutations and combinations for comprehensive unit tests.
  • Integration Tests: Introduce dependencies and run tests that are slightly longer and more expensive. Agents can help structure code for better testability.
  • UI Tests: Use these as a final check for the overall UI integration. Agents can now interact with the simulator to perform actions (tap, swipe, type) and analyze accessibility trees and screenshots to identify bugs.
  • Agent-driven UI Testing: In Xcode 27, agents can use the simulator to test and learn patterns, then write UI tests. They can also run for extended periods to find bugs.
  • Screenshots and Finalized UI: Providing screenshots of finalized UI can help agents understand and potentially recreate it.

Have there been any updates to Natural Language Processing and Apple Vision Kit, and what is the preferred method for image extraction now that Foundation Models support image attachments?

  • Vision Framework: Has significant updates, including segmentation models. Refer to the "What's New in Image Understanding" session.
  • Foundation Models vs. Vision Framework:
    • Vision Framework: Preferred for well-understood, structured tasks like barcode detection or OCR. It's optimized, efficient, and highly testable.
    • Foundation Models: Preferred for tasks requiring semantic understanding, natural language nuance, or when dealing with novel or complex image analysis that doesn't fit a predefined structure.
  • New Tools: Foundation Models now include tools like barcode readers and OCR, blurring the lines.
  • Use Case Driven: If a structured API exists (like Translate API for translation), use it. If the task is dynamic, requires stylistic changes, or involves natural language prompts, Foundation Models are more suitable.

On-device LLMs have limited token capacity. What are the best practices for managing prompt size, tool definitions, and context to avoid exceeding limits while maintaining high-quality responses?

  • Track Token Usage: Use new symbols in the Foundation Models Framework (iOS 26.4) to programmatically check context size and count tokens. Access response.usage to see input, output, and cached tokens.
  • Drop Tool Calls/Results: After a tool call is processed and a response is generated, consider dropping the tool call and its output from the transcript to save context.
  • Summarize History: Use the Summarize History modifier from the open-source Foundation Models Utilities repo to periodically compress older, less relevant transcript entries.
  • Compress or Drop Older Entries: Remove or compress entries that are no longer relevant to save context.
  • Profile Modifiers: Use these to manage context window sizes, especially when switching between models with different capacities.
  • Open Source Utilities: The Foundation Models Utilities repo provides building blocks applicable to any backend conforming to the language model protocol.
  • Evaluations Framework: Crucial for determining optimal context management strategies (e.g., how often to summarize, what to condense) based on use case and model performance.
  • Configurable Prompts: Prompts used for summarization or other context management should be overwritable.
  • Break Down Tasks: Avoid giving a single session too many independent tasks; break them into multiple sessions to utilize context windows more effectively.
  • Model Architecture: Newer models often have more efficient attention mechanisms (e.g., sliding window, linear attention) that handle context better.
  • Configurable Reasoning: Some models offer configurable reasoning levels, allowing developers to trade-offs between reasoning depth for latency and context handling.

Can Foundation Models guardrails be prevented from refusing emotionally intense but legitimate journal entries, and how can guardrail refusals be detected separately from other errors?

  • Permissive Content Transformations: For system language models, enabling this setting during guardrail initialization prevents refusals on emotionally charged input.
  • Model Refusal vs. Guardrail Error:
    • Model Refusal: The model itself declines to elaborate or expand on content due to its alignment training. This is a model response, not a guardrail error.
    • Guardrail Error: A separate moderation model flags input or output as problematic. This can be caught and handled separately.
  • Third-Party Models: Guardrails and these specific refusal mechanisms do not apply when using custom custom models.
  • Feedback: If guardrails are not behaving as expected, file feedback. Explore custom model options if Apple's models don't meet specific needs.
  • Improvements: Guardrails have been significantly improved with more training data to reduce false positives.

What is Apple's guiding philosophy or approach to AI evaluation?

  • Evaluation-Driven Development: Start with evaluation from the beginning, treating it as the "living specification" of the feature.
  • Formative Assessment: Evaluations are for learning and improvement, not just checking what's already learned. They identify weak spots.
  • Comprehensive Test Cases: Include core use cases, headroom for growth, and edge cases in evaluations.
  • Tools and Frameworks: Apple provides tools and a framework to make evaluation easy, from curated datasets to synthetic data generation.
  • Iterative Improvement: Use evaluation results to refine models and configurations, comparing different approaches to "hill climb" the feature's performance.
  • Confidence in Choices: Evaluation helps developers confidently select the best models and configurations for their specific use cases.

Is it possible for models used by different apps on iPhone to be shared across apps to save storage space?

  • No, direct sharing of models across unrelated apps is not possible.
  • Complexity: Sharing models is complicated due to resource contention and the need to control access.
  • App Groups: Model caching in Core AI can be shared if apps belong to the same app group.
  • Use Case Specific Optimizations: Models are often optimized (quantized) differently for specific use cases, making a universal shared model impractical.
  • Sandboxing: iOS app sandboxing prevents apps from accessing each other's downloaded models directly.
  • Developer Ownership: Model weights can be shared between apps owned by the same developer within an app group, but the runtime loading of models in memory is not shared.