Hey everybody, welcome to the Machine Learning and AI Group Lab. I'm Shashank, I'm part of the evangelism team here at Apple. And today I'm joined by a fantastic panel of experts from across our machine learning framework teams. This has been an amazing WWDC, a lot of exciting machine learning and AI announcements. We had the new Core AI framework, new evaluations framework, a slew of exciting updates to the foundation models framework, and distributed training and inference with MLX and so much more. So to quick things off, we'll start off with a quick round of introductions. And as each of you introduce yourself, please share one thing you are most excited about in your area and one thing you think the developers should be most excited about. How about we start with you, Tao? Hi, my name is Tao. I'm from the Core AI team. I'm super excited to be here today. One thing I'm super excited about this year is the new Core AI models, open sourced on GitHub. You can find out a curated set of ready-to-use model exporting recipes and reusable Python primitive to bring new models to Core AI, as well as Swift runtime utilities for you to integrate those models in your app. Last but not the least, there is a set of agentic skills that you can leverage using your favorite coding agent. Exciting. Awesome. Hi, I'm Marcus. I'm a manager on the evaluations framework team. I'm super excited about being able to release this framework to you all and being able to let you guys get started with actually evaluating your ML-powered feature. But the thing I think that I don't think everybody knows about yet is we've also shipped an update in Xcode, which allows you to get a report back of your evaluation runs. I'm personally really excited about this because it's been really, really helpful in terms of getting people to actually increase the scores of their evaluations using really rich data and really rich feedback. Also, as a little bit of a selfish note, I wrote my first program in Xcode. Now it feels really, really cool to be able to actually contribute something to the thing that I learned to program on. Hi, I'm Michael. I'm also from the CoreAI team. Tao stole one of my exciting things, but I'll repeat it. I think it's really exciting now that we're-- along with our frameworks and open source tools, We also have skills to put your AI agent. I think that's really transformed how a lot of people worked, especially me, not very familiar with PyTorch, but I can quickly go and ask and have a conversation about, hey, help me make a plan of how to export this model or what would be a good strategy for testing it and verifying things are working. But I'll put a separate plug in for-- we have a more advanced feature in Core AI this year about being able to compile your models ahead of time, which really helps with large, complex models, particularly on iOS so we can sort of get the load or first cold start experience to be much quicker. Awesome. I'm Louis. I'm on the Foundation Models Framework team. And yeah, there's so much new stuff this year. But I think the new image input support that we have on the on-device model and also the new one on private cloud compute is definitely one of my favorite features. All right. I'm Ronan from the MLX team. And a couple of things which are exciting with MLX this year, I would say, is the GPU neural accelerator support that we added for the M5. That's really amazing in terms of performance. And then the 5.5 RDMA support, which allows us to run frontier models of our cluster of Macs. It's really, truly amazing if you want to do advanced tasks on a couple of Macs. Really cool. I'm sure we'll get to all of these through your questions. In addition to this panel, thank you for introducing yourself. We have a whole team of experts behind the scenes triaging all your questions, so please keep them coming. We'll try to get to as many as we can today. If you don't have code-specific questions or we can't get your questions for some reason, I encourage you to head over to developer.apple.com/forums and ask your questions there. We'll continue to answer them throughout this week and in the future as well. If you do have a bug report or a feature request, feedback assistant.apple.com is your friend. Please go there and share the extra information that we would need to help you. And we can keep this discussion more applicable to a broader audience. All right, with that, let's dive into your questions. Here's a question from Phanteks. How should developers decide whether a task is better handled by on-device foundation models versus private cloud compute? Looks like we are starting off with you, Louis. LOUIS VALER: All right, yeah. I mean, yeah, when you're deciding between any models, I would say doing that based on evaluations definitely is the way to go about it. But just to spell out the concrete differences right between the on-device built-in system model we have and the new server model this year that you have access to on private cloud compute. One of the major differences is, of course, context size. So on-device, you have 4K. With the server model on private cloud, you have 32K. So depending on what kind of feature you're building, that can influence your decision. The other thing is, of course, that with the on-device model, it's available offline. That's what you need for your use case. Obviously, that's a benefit. And the other thing with the server model is that it supports reasoning. It's a new capability with the server model on private cloud compute where you can set the level of reasoning to let the model really think to itself before it generates a response. So again, depending on what kind of feature you're building, that might help a lot. But in terms of making that decision based on data, honestly, with the evaluation framework-- I think a lot of what I would consider in this situation is actually being able to do comparative evaluation. So what you want to do is take your model feature. What's really lovely about the Foundation Models Framework stuff, right, is that you can sort of switch between on-device and private cloud compute with a simple line of code, right, the competitive statement. So therefore, what you can do is basically, like, take your model feature, maybe it's a service class of some sort, instantiate that one with private cloud compute, one with the on-device model, and then kind of run them against the same set of evaluators. That'll allow you to actually, like, run them as a suite of tests, And then when that runs, you can go to the report and actually check. We've actually also built in some really nice comparison tools into Xcode that actually allow you to actually see side by side how those things stack up against one another based on your evaluations. So this is actually a really great way to kind of figure that out. And then as you kind of chop and change and make small tweaks, you'll be able to actually understand, like, in what cases do these things work better for one another? And what you might also consider is that maybe if you find an instance where the private cloud compute is the better option, it's still knowing that if you don't have a network connection, you have to fall back to on-device, just also how well your feature will perform in those scenarios. FRANCESC CAMPOY: Pretty cool. Thanks, Marcus. I'm curious, actually, in terms of latency, do you have a take? If my app was low latency, should I go with the local model or the PCC model? MARCUS WALTERS: Yeah, that's interesting. Yeah, it comes down to measuring it, presumably. But it's true that with the on-device model, you don't have the network latency. But if your network conditions are good, then typically the round trip to the server is not the expensive part necessarily. So in terms of tokens per second that you're getting out of it, that might not really influence the decision too much, surprisingly. That's cool. I will plug in that the instruments this year, Foundation Mods instruments, has been significantly improved with a whole slew of extra metrics. Things like round-trip latency, token budgets, there's many more, right? Yeah, exactly. All of this you can measure and while using evaluations framework to figure out what works best for your application in terms of snappiness for the user and stuff like that. Awesome. One small thing you mentioned, Marcus, which I thought was interesting is you said can use if-else to switch between private and on-device. Louis, we have dynamic profiles now. Right, yeah. Very quickly, do you want to just share? Yeah, actually, because the question here is asking, how do I decide between the two models? And maybe you want to use both. And you could use both in your app for different-- well, within the same feature, there might be things where you start maybe with the on-device model for some routing, and then you only use a server model for certain more complex things within that feature. And yeah, dynamic profile is this new API that we're adding this year to let you more easily use models together while sharing context and doing handoffs, all sorts of stuff. We have a really great video about that as well this year to learn all about that. So yeah. Yeah, thanks. There is a video called Agent Experience with Dynamic Profiles, something along those lines, where there are examples, and we talk about how to use these. So be sure to check those out. Great. Next question, this is from Pichaya. I successfully set up agent mode in Xcode to local LLM using open code ACP with config model to MLX community QEN 3.5, 4 billion MLX 8-bit version using MLX server. But the way Xcode communicates to model still acts weird long time and has the IAM_N tag. What to config more? Did I wrongly set the chat template? Ronan, if you want to take it, we don't have an Xcode representation in this panel today, but we can probably talk to the MLX side of things, and you're probably on the right direction with the chat template. Yeah. But I think we do see this kind of ending token when we're bringing up the Corel language model, similar as the MLX language model, which is a foundation model. language model protocol conformance with package. So when you see such kind of things, generally we look into the tokenizer. Look at your configuration of your model, see if this token is actually correctly recognized. Maybe there is something that we could configure on the MLX language model side. But that's a very common issue when you're bringing new language models. Yeah. So I would say that we have actually a lot of good experience using open code on its own with this type of language model. So it should work in practice, right? Now, maybe it's a configuration issue. I don't know. I know there are several ways as well to communicate with language models from Xcode. So there, you know, you know better than me. I mean, I think you have like the chat mode, which is the kind of all way, I believe. And the SAP mode, which is the way to go. So it should work in some way. So maybe the best would be to post in forums and try to debug that, you know? Is there any sort of debugging stuff that they should be able to pull for that that's helpful in this case? Or is it just kind of saying, like, hey, this is what I'm seeing is enough? I would say what I'm seeing is enough because we know that it should work with Open Word, right? We have both MLX language model and the Corea language model open sourced. So you could pose your question there, and then someone should be able to help on the GitHub. Yeah, yeah. Thank you. And we have a session this year, MLX, where we demonstrate, the presenter demonstrates plugging using Xcode with OpenCode specifically. We do use a QN model. I'm not sure if there's a specific model. But it does seem like a configuration related to templates. So GitHub issues is a good place to share this, or forums too, and we can get you taken care of. Thank you. Now, this question is, is it possible to use local LLM running on MLX LM using downloaded model, or the one we created ourselves, trained ourselves using Core AI to be coding agent that can manipulate the project more, not just a chat one on Xcode. How do I do that? I think there are a couple of different questions hidden in this question and maybe a few corrections. Maybe we start with Core AI, which is purely an inference framework. CHRIS BROADFOOT: Sure. I mean, I think this question I'm getting in terms of-- is a question if they're trying to integrate it into Xcode. Xcode, yeah. CHRIS BROADFOOT: Right? And so there, we'd probably refer to you as sort of on the Xcode and ACP protocol in terms integration. And the other one is, can you provide your own model? And you sort of pointed out in the question, both for MLX and Corea, yes, you can provide your own model. Try and understand the aspect of the question of manipulate the project more. I think that's more of the capability of the model and what it has access to, which may, again, if you're going through Xcode, is through the Xcode. I think it relates to the chat mode you know, versus the agentic, you know, way. And, you know, I would say the agentic way is the way to go. Yes, I think that's what we've heard from the Xcode team in general. Yeah, exactly. That's my understanding as well. Yeah, I think I misunderstood the question as saying training with CoreA, but the reference is it's trained, and we want to bring it in, right? Yeah. And, yeah, we've shown examples in sessions where if you run an MLX LM server, Xcode can automatically detect all the models on device. And you can pick one. And this year, we have chat mode and the ACP mode. We have some demos and examples in the session. And I believe those examples will be on GitHub. So yeah, do check them out when you get a chance. Anything else from the panel? OK. I'm curious, actually, in terms of like plugging core models, I mean, I guess it differs a bit from-- So Quora will provide the core inference, just like MLX. There isn't a server component on top of Quora that we have open like you have for the MLX one, but you'd have to write that conformance to the protocol right now. Somebody has to contribute or have an OpenAI-compatible language server layer on top. You're interested in that? Please make an issue on GitHub or contribute it. Or contribute, right? Maybe you want to build a language server that other apps can use and developers have options, right? Options are good. Sounds good. Next question. This is from Drobinin. Apple updates the on-device and PCC models silently and tune prompts regress when that happens. Is there a model version identify exposed at runtime so my evaluation suite can detect a model change and get releases on it? what's the recommended way to know model changed? Yeah. Well, yeah, I mean, the best thing you can do right now is check based on the OS version, right? Because with the on-device model, that's obviously part of the OS. The on-device model really only updates as part of an OS update. So it's not like we push out an updated model that silently would affect your device without updating the OS, right? At least you have that control, right? With the server model, it's more complicated, of course. Now, what I would also say is that, again, honestly, going back to evaluations, where assume that these models will update over time, both on-device and server. So whatever prompting you're doing, that yes, over time, it's going to change the behavior that you get. And that with evaluations, you sort of gain back that control. Yeah, exactly. When a new OS update comes out, if you can rerun your evaluation and see how it affects your problem. part of your test suite. - If you have this, again, this comparative situation that I was talking about before, you could run it and then look at a run that ran on the old version of the OS and the new version of the OS and see if there's a notable difference, right? And then kind of from that, you can consider how you want to proceed. Maybe you prompt engineer for a different version of the OS or add a tool or something, whatever makes the most sense for you from there, but the idea is that, yeah, you should be able to detect it and kind of create a plan. And generally, our prompting advice is to not get too hyper-specific with specific words. So you don't really want to end up in a situation where you have any sort of if statements where if you're running on, say, 26.0, that uses a slightly different sentence than on 26.4, than on 27.0. Ideally, you can just rephrase your prompt or be more specific where it sort of gives you the quality you want across the versions. I had a related question, actually. do you plan to do updates more like on minor versions or like major versions? Yeah, with the on-device model, like looking at the past year, for example, we had two updates. Oh, I see. So typically, so like twice a year, typically, that we have a big update. Okay. I think that's a good question because the updates happen in the beta releases. Right. So you all have a chance to test it during the beta period. Yep. And you make sure that using evaluations, it's behaving as you want. So once the general availability happens, then you expect the bulk of the audience to actually use the apps for it to work. Yeah, like for example, now we're in the beta period. Right, exactly. 27.0, so definitely try it out for your use case. Or if you have new use cases, even with the image input support routing this year, we're looking for that feedback. Like we evaluate the model, of course, before we release it in beta. But it's hard to know even how you developers, how you use the model. So please tell us if you have a use case that's maybe not working as you expected with the model during the beta period. I also want to emphasize the positive part of it. So the question is in terms of regressing. So the advantage of the model updating over time is it can get better. So that also is a trade-off between having it fixed. You also discover in valuations positive things. Right, right. Yeah, let's be clear. We updated to make it better. Yes. Great. All right, next question. This is from Eric's questions. Can foundation models run from widgets, app intents, or background tasks? We want AI classification at capture time, even when the app isn't in the foreground. Are there rate limits or thermal rules we should design around? Yeah, with the on-device model, it can run in the background and in widgets. But keep in mind that based on the system conditions, because there's so much else that can be going on on your phone, of course, that you may be rate limited for the on-device model. And really, that just means that you should try again later, right? Because if something else, I mean, terminals or whatever is going on in the device, right, that the request may fail, and it will throw a specific error that you can catch with an API for that. So you know if it was for that specific reason. And then you can just try again later, either in the background or in the foreground. Right. And don't show the error to the user. Handle it. If it's in the background, maybe that's difficult. Right, right. So actually, technically, apps compete to access to the on-device language model, right? Well, with the on-device LLM, it's sort of managed by the system, where if two apps want to use the on-device model, it's only in memory once, right? Like, the system is very smart in managing those resources. But do you give priority to the foreground task? Yeah, like if something else expensive is happening, like someone's playing a game, for example, that's resource intensive, that's going to take precedence. Because obviously the user has more control over what they're doing in the foreground. They want to play that game. They want the optimal experience there. If it just so happens that in the background an app is trying to do something with the LLM, that feels more like it could happen later or it's deprioritized. Makes sense. Louis, is there a nuance there on what platform? macOS versus iOS? Actually, yeah, good point. On macOS, that rate limiting does not apply at all. Foreground or background? Yes, exactly. It's really just on iOS if it's running in the background. I think it still respects the sort of quality of service setting in terms of your requests of where they're going. The system overall will respect, right, it's going to prioritize user interactive threads or QoS in terms of when you're doing that over, let's say, background or utility. Definitely. Even, I think, on macOS, but on macOS, it's not going to, I guess you're saying, it's not going to rate limit you. Right. Yeah. Exactly. Thanks. Great. Good discussion. This question is from Ants Crashing. How can developers create or adapt models to their own apps, data, and domain? Ideas I have are creating user profile image from existing images, but with a style theme like cartoonized, et cetera. Or an app, how to guide that adapts as new app functionality is created. Best resources for exploring this. Sounds like an opportunity to bring in a custom model. Tao or Michael? Yeah, I think I can take that. It sounds like the latter part of this question, you're trying to bring in an image generation model of some sort to generate this user profile images on the fly. I think that if I were to speculate what you're trying to do, maybe the best way is to use the foundation model to basically bringing the user data of your app into your prompt and generate some sort of like a per user profile capture from those app usage from that user. And then use those prompt to generate the image using your for your image, use a profile image. So if you really want to go one step beyond even bringing other more powerful image generation models, you can consider even fine tuning an open source model from one of these open source Hugging Face or GitHub, and then bring that model into your app. CHRIS BROADFOOT: Yeah, I think in this world of these powerful foundation models, user adaptation does not necessarily require for you to fully fine tune or adapt. So I'd say that would be the extreme case. The question is, can you achieve yours with sort of customizing or extracting information that will be useful to prompt either the on-device model or a custom model in terms of bringing in that sort of customization? Yeah. Yeah, I totally agree with that. I would try to make it happen with the tools which are provided with the OS first. And if that doesn't work out, then go. Yeah, cool thing with the local models is that data stays in the device, and you have access to with user consent, other personal data across the Apple apps ecosystem. And then what I think some people refer to it as in-context learning. Like you provide it to the model. But if there is a reason for you to fine-tune and train your own model, you can. MLX is a great place to start. And-- Yeah. Obviously, we support fine-tuning and training recipes for all sorts of models. I mean, many models are supported by the open source community, actually. So if, you know, that's basically the end of the spectrum route, you know, compared to prompt engineering, I would say. Right, right. But, yeah, it's another way. And if you do that, then, you know, it's actually easy, you know, to make the model available in CoreAID and just run it, you know. Yeah. One of the other things that, like, I also want to throw in here is that this is also a really great chance to do what we call hill climbing, which is this process of using your evaluations to sort of progressively increase the quality of your generations, right? Because as you start this process, you're going to be like, I wrote this prompt, and this prompt kind of got me here, but it didn't quite get me exactly what I was hoping for. And so maybe you'll make a small change to your prompt or add additional tools or maybe swap out a different model or maybe you go to actually go down the fine-tuning route and actually fine-tune a model, right? And you want to see how those changes kind of like over time affect these things. And one of the great things is that like, even if you are doing this with image generation, the evaluations framework is built up to handle any of these sort of options, right? You can bring any model to it. And then from that, you could use like a model, like a model judge evaluator sort of say like, hey, is this image that got generated in the style that I expect, right? Is it kind of vibrant and cartoony if that's kind of what you're aiming for or kind of whatever else have you. So this is also a really great opportunity just to kind of like use those evaluation scores to kind of like guide you in the process of like making these small incremental changes in a way that like you know is getting you toward where you want to be. - Yeah, 'cause one of the harder questions that I've heard people ask is more like knowing the size of the model that you need for your task. So let's say you are going with a custom model that you want to run with core AI on device, right? How do you know the size of the model that you need for example, for this for like generating profile pictures? Yeah, so in that respect, actually, there is maybe another alternative, which is like training a very small adapter per user. And actually, I'm not super knowledgeable, but I remember there is a feature to support adapters with like... The LoRa adapters, yeah, but that's not for diffusion models, right? Oh, I see. So, yeah. Okay, yeah. But still, you know, you could imagine it on top of like the text part of the foundational model. if you extract some features beforehand. I don't know how. Right. I think with all of them, I guess I have a question for you, Marcus, in terms of evaluation. So there's the model judge. But in terms of how do they simulate user data, is there a recommended way for them to generate data that they would say, oh, let me-- can we use the foundation model framework to generate the evaluation data? It's like, oh, pretend you're a user who's into nature and other things. Like, generate content so that I can test, based on that content, do I generate a prompt? - Yeah, so in the evaluations framework is a type that we call the sample generator. And so it is built for taking any text-based input and producing kind of synthetic style output. So you can bring any model to that and then you can go ahead and just say, you know, I want this level of output, this many of them with this kind of variety, and it will go ahead and generate that for you. Now, it is text-based, so the limitations kind of do kind of limit around, like you can't have diffusion style output. But I think you could build a really interesting pipeline where you use the sample generator to generate these incoming prompts to your diffusion model and then kind of run the diffusion model to sort of generate a number of these sample images. And then you sort of, if you're like, yeah, this is a good one and this prompt is good, then you maybe save that prompt and image pair as your input for your evaluation and then kind of continue to move and you have this kind of working pipeline of kind of like a large, diverse data set. That's basically what I was wondering about. Is it possible to do that? That's great. - It's very interesting. So the evaluation framework supports not just language models, so you can have diffusion models in there. - Yeah, it's built with the idea that any, we built it with the idea around any system with any sort of randomness can be evaluated through this thing, right? So if you want to go as like, you know, that's a diffusion model, so you want to build like a linear regression model, you want to do, I think you and I were talking the other day about upscaling. If you wanted to do GPU upscaling, you could do it with that. As long as you have a way to turn that result of your model generation into a Boolean or a scalar number that you can actually evaluate against and do some statistical analysis over, we support it. That's pretty cool. Thanks, thanks. Next question from Phanteks. What types of use cases are currently best suited for Foundation Models Framework, and which use cases should developers avoid? Louis? Yeah, I mean, it's a large language model, so it can do so much, right? Right, right. Whether it's content extraction or content generation, and especially with the image input that we added this year, there's so much you can do with it. It's -- I feel like people come up with new use cases that we have -- Yeah, yeah. So in terms of, I mean, the only limitations I would say, besides your imagination, would be more literally like, you know, it cannot generate images as output, for example. That's when you need a diffusion model, you would like either, you know, bring your own or, yeah. Right, right. So there's that. But other than that, like, yeah, it's a large language model. So yeah. Yeah, use it for all the things. I'll give you some examples. So I think one thing we would consider is like, is your use case, right? If you're processing real-time video data, I'm guessing you're not going to want to call of the foundation model, framework per frame, even if it's analyzing the image. If you need something that is like sub-milliseconds because you're ranking 1,000 entries in a list view or something like that, you can probably go with simpler models and maybe the foundation model or that full power of that LLM is not really where you want to go first. Maybe you can prototype with it, but sort of like I think of latency, real time, I think there are categories of use cases Where you you you know, it's it is such a powerful model, but it it it takes a little bit of time We're not wondering, you know, we're not sub millisecond. I'm assuming Yeah, definitely for a response maybe for a token, but Exactly, yeah, also Apple has these other domain specific API like a speech recognition So a good point could be used for maybe suited for your use case as well in line with the real-time processing So yeah, definitely try all these models. Yeah. Yeah, I think the vision ones are a great example I think there's a great session here about vision updates there and how it interacts with the foundation model. And the foundation model using vision as a tool for very specific tasks. But again, those specific tools are super highly optimized task specific ones, which can be used in different contexts in which maybe the foundation model is not the appropriate context. So like the real time video processing, trajectory tracking of a ball, zooming across the screen, things like that. Yeah, I think a lot of people overlook the high-level frameworks we have for high-level APIs, like vision, speech, translation. You can use Foundation Models framework to translate, but we have an entire suite of APIs there for translation, which has more language support than the Foundation Models. So depending on your use case, you might want to choose a very specialized API that's available versus prompting a language model. The other minor thing is probably we saw app developers build amazing experiences, as you mentioned, like summarization or text extraction and maybe grammar correction and then like transformation, tone, style, all of those great stuff. Maybe if you were hitting a context window limit in the past, PCC alleviates some of those use cases, right? Yeah. So, yeah, I think the question here is more like when do you want to use an LLM? Right. Oh, in general, yeah. Some people would have you believe for everything. Maybe in the future you can be running it real time on a camera feed. So right now. Just other machine learning technology. One model to rule them all, right? Someday, someday. Great discussion. Thank you. Next question is from Natasha Prabhu. Where should a beginner start if they want to learn about AI ML? Are there any particular resources you would recommend? Wow. - I'm gonna jump on this one. - Yes please. - In a past life I was an engineer on the Swift Playgrounds team. And so I'm gonna put a plug in for Swift Playgrounds. One of the pieces of content that we actually made was an intro to ML, which used CoreML to build a image classifier so that you could play the game of rock, paper, scissors by actually like throwing out your hand to do the like hand motions. And then we talked about sort of like, how do you build a data set? How do you build a diverse data set? And then actually, how do you then go ahead and create that model and then use that model inside of your app? So starting with something small like that can kind of give you an idea of if you want to go down the full model training route, there's a way in there. But the other great thing is that the Foundation Models Framework is so easy to kind of pick up today that you can kind of just get started right away by just sort of making calls to the language model and getting kind of into prompt engineering and tool additions and sort of all that stuff. to actually teach you the fundamentals of how to work with these technology. And then as you get further and further into it, you kind of peel the onion back and then get into things like maybe you download a whole bunch of weights off of Hugging Face and get started with Core AI or MLX and do something much more involved and think about how to break down those sorts of problems. So that's my shout-out for my previous job. Playgrounds. No, playgrounds, I think this year in Xcode, we added a very easy way to start Xcode and directly launch a playground with one click. Previously, you had to create a project and create a playground final, and you click, and it provides all the scaffolding required to just session, language model session, session.respond, and you're off to a start. I would definitely encourage people to go deeper, right? And yes, we have a great API that abstracts and makes things super easy, but it can be super rewarding to learn more about these large language models work under the hood? How do they generate that text, like token by token, things like constraint decoding, the decoding loop, how they're trained? Yeah, exactly. I was wondering what the question is referring to. Is it like using AI/ML? Or is it actually learning how those things are actually working, right? So for later, I mean, indeed, so you can start from-- there are many tutorials out there with PyTorch, MNX or whatnot right and oh you there are many courses available online as well right but in general I feel it's easier to start with like some with some concrete examples as we are mentioning right like so you know start with a concrete example or something quite simple there are many tutorials online actually that's a great segue the sponsor of this podcast is yeah I think in my past I would have turned towards you know like great tutorials and learning you know but I do think like nowadays getting your hands dirty and it's the best and also even with an AI agent assisting you yeah and asking questions and being like telling it look I told you lost me I I don't know about this model optimization or like step back uh so I think yeah it's it's amazing all the tools you have available to you today to find learning at your pace yeah like if you're new to - If you're Python, if you're new to PyTorch. - Yeah, yeah. - The barrier has never been lower, right? - Exactly, yeah. I mean, I totally agree with the edge and stuff. To learn new things, it's amazing. Like, you can coerate with, you know, code prototype or whatever, you know. So it's very easy to build things from scratch, basically. - I think build an app. Have an idea, build an app. And I think-- - Or build a model. - Build a model, build an app. I think some people refer to it as project-based learning, But ultimately, you don't start with the math. You start with trying to achieve something, and you slowly open the layer. I would not start with matrix multiplication. That's not the exciting part. I mean, it's powered by that. Yeah, yeah, yeah. You hurt my heart. Yeah. Exactly. Great. Yeah. So, yeah, a lot of resources, I think. Just get started. Everyone has to start somewhere. So you start building something, and passion will drive you to the depths of machine learning theory, if that's where you want to go. Awesome. You can also go with an internship with us as well. Yeah? Yeah? No, I know. There are positions open that-- If you want to consider. Yeah. All right, next question. Is everything we can do in PyTorch be imported to Core AI or Apple ecosystem? What's possible? What's not? Ooh, I think this straddles sort of both Core AI with importing PyTorch stuff, but also familiarity with MLX and PyTorch in being able to experiment. So maybe we'll start with Core AI. JOHN MCCUTCHAN: We'll start with PyTorch. So basically, in general, if you can express your model in PyTorch and it is exportable by the PyTorch APIs, it will convert straightforward to Core AI. Now, that's not for everything. There's always new operations being added and new techniques. So if it can't be exported, then you have to look at the PyTorch APIs, or why it can't be exported. Core AI supports all the core A10 ops, which is an internal detail if you know about PyTorch. And so if it can be expressible in those, Core AI will be able to do it. Now, you may come across some new specific particular op, custom op written that they have that may not lower all the way down, but you can provide a custom lowering in that case. You can sort of just wrap that op in a little custom lowering and explain it to Core AI in Torch operations, which will lower. But a common thing also is that if you also want to consider performance, you can also wrap that even in a custom op or a kernel. So you can even write a custom metal kernel, which may take a bunch of ops you've wrapped together and you say, look, yes, it could convert, but maybe the performance is not there. So you kind of have those level of flexibility. But again, majority of ones you see out there, and maybe you can ask how, like in the query models one, I don't know how many of those models, I'm certain the majority of them don't require any custom query. No, we don't, yeah. You just straight out export it. Yeah, I think it's mainly on those frontier where people are experimenting with, you know, lots of people love different attention mechanisms and things like that, where even if they do convert, you know, there may be some performance implications. And I think, you know, for MLX, a lot of experience there with writing custom kernels and optimizing those. So the question says, Korea or the Apple ecosystem. So I wonder if Apple's ecosystem-- one definition of that is you want to put this in your app, where it deployed. The other possible interpretation could be that, hey, you want to run this on your Mac and experiment, learn ML. I know MLX APIs are relatively similar to PyTorch. Yeah, I mean, if, you know, I mean, by the way, I've heard Torch 20 years ago, right? So in that respect, indeed. Ron and one of the co-authors of Torch. Indeed, they are quite similar. So if you are used to PyTorch, you know, you will feel quite at home, you know, when it comes to the neural network package. You know, FMNX is very, very much, you know, similar to the PyTorch one. we we have like different modern approaches for the core fmnx but you know let's not talk about that there uh so you know uh if you know by torches should be fine uh i would add that uh many people have actually converted on hugging face you know by pytorch model into mx models right so it's actually often straightforward to just you know experiment with uh models which are already available in Python because they have already converted on Hugging Face. And MLX LM supports many models from, you know, like more than 10,000 models from Hugging Face out of the box. So if you want to just experiment, it's very easy. Right. If you just want to run a model server, you could convert and maybe, like the earlier question, how do you connect your open code, you know, to a model server to, yeah, a lot of different ways to experiment across the Apple ecosystem, from Mac to running servers to integrating into your apps. Yeah. And AMS. Yeah, I was going to also just say, in terms of these one representation and another, is another place to make a pitch for a lot of these AI agents and skills. Because the APIs are very similar, you can even just directly port it into and directly author it in either the core AI set of things or in MLX. So a lot of stuff is like translation version is so much easier and so what I often do myself you know just exporting the weights and then you know it's quite easy to just reload them in whatever framework you want yeah effectively yeah and so you can learn about how LMS work and things like that they have some regular structure for many of these models right and so yeah sometimes you just you can basically write the description of the model particularly for deploying on an Apple's platform right most of the time you're most interested in running the inference aspect of right And so a lot of the PyTorch models and things you'll find also have all of the training aspects inside of it. And so in a sense, sometimes there are aspects to inference which are not part of the training graph. So things like key value caching and things like that that have to get sort of injected anyway. And sort of some of these AI tools I'm advocating for can help you do that translation. In addition to, we have examples online in these open source repos, and many of the people in the community have already done these things and established those patterns. And in that respect, the AI tools can help you writing custom kernels as well, by the way. That's true. Yeah. One of the harder parts of the job is when something's not supported and you have to write a custom kernel, I think, hasn't. Of course, I think you have to understand some level of metal programming, but it's a good start, I think, right? To get stuff working. Exactly. Definitely helps. Good discussion. Thank you. This question is from Eric's questions again. What's the best UX when foundation models are unavailable? Older device, Apple intelligence is off, low battery. Is there, I mean, valid concerns, right? Is there an official way to detect availability early and degrade gracefully, and we can pre-warm the model to cut first request latency? Well, there's a couple of questions in there. Right. Well, just start with the last one. People are trying to sneak in multiple questions. But all good. We are here to answer your questions. Yeah, totally. I mean, to start with the last question that was there, like, yes, we have a pre-warm API where if you are using the on-device system language model on an Apple intelligence device, if you know ahead of time sort of when the user is going to, maybe your view appears and you know that there's a button there that often triggers the model, you can pre-warm the on-device model, which really means loading it into memory by the system before the user even interacts with that button, right? So that's a good signal that your app can give to reduce that initial latency. Now, for the prior questions there about if it's, well, yes, we have an availability API, that too, so you can check that. So you can know if the model is not supported on the device it's running on, like if it's an older device, right? Now, in those cases, it's a good question, like in terms of core AI, like what kind of alternatives or fallbacks you can offer. Would you bring your own model maybe to give some of that feature set on an older device? Or would you rely on a server model? I don't know what the best fallback. I think it depends on their use case, right? We'd have to know more about their use case and their main concerns of the thing. So if it's network connectivity and you need the PCC model, there's a question of, well, if you needed the private cloud compute model, you probably did an evaluation and found you need a pretty large model. And so maybe having another on-device model or core may not satisfy your use case. And so, yeah, maybe there you have to find some, use the language model protocol and have other server-side models. Good point, actually, that. So even if you're running on a device that doesn't support Apple intelligence, you can still use the foundation models framework API this year, right? Because we added that protocol that you mentioned, the language model protocol, to let you plug in any model you want. So cloud, one of the popular-- Server, on-device with Core, MLX, right? Any model you want really on-device or server. And that way, you can still take advantage of the great API we have, even if it's not with the built-in system model, if that's not available for whatever reason. Right. I think there's a nuance here about UX, right? And we've discussed this in sessions last year, and also some of the code-alongs we've done, wherein, depending on the availability API's response, you want to show different things to the user. Yeah. Right? If one of the things is if Apple Intelligence is off, then rather than saying error, you can say, hey, go switch on Apple or guide your users to consider turning on Apple Intelligence. So that's like gracefully handling those things. Older device, then either you handle it under the hood or advise the user. I think there are ways to sort of not pass on the pain to the user, stuff that you can handle from a -- - But if you don't have a fallback and you know the button is not gonna work, it's better to hide the button than to have the user tab the button and just get an error. - Or provide a different experience in the app wherein the user doesn't feel like something's missing. - Yeah. - Right? They feel like it's the experience they are getting. So, yeah, the availability API is awesome. I think we've discussed this in some of the sessions, and we may have expanded some of the options this year too. But yeah, do check out the documentation. There are a lot of options for you to handle this gracefully in your app experience. Good discussion. Next question. This is from CryWoff. How should we migrate from Core ML to Core AI? We have to wait a year or two until most users have updated their devices to iOS 27? Or is there a way to do this quicker? It's just a two-letter difference, so it shouldn't be-- Yeah. I'll first start off with like, you don't have to, like, it's not a call to migrate off of CoreAI, I mean CoreML. CoreML works. It is supported. It will be there and supports some models that CoreAI does not, like decision trees and others, if you really want sub-millisecond, microsecond pieces. Those still models are still useful. And so, but if you're interested in the features of CoreAI, and CoreAI is where we're investing for sort of modern AI generative models, and you want to consider that transition, you can consider the kind of things you would do with any sort of new API or framework that's only available in the latest OSs. So you can decide what's the time frame from you in which your user base has moved on and you're willing to drop support. Or these are both asset-based APIs in the sense that the core model asset-- you can bring down a different model asset and use a different API under the hood, depending on which OS you are on. The APIs are simple enough that it's only a few line difference. Or you putting a wrapper on top of it wouldn't be too difficult. So that would be my recommendation, is to consider the models that you're actually doing. Again, it's not a call to action to be like, stop using QuarML. Like, QuarML's still great and it's still supported. And so you can make that transition when it makes sense for you and when it makes sense for your users. The depth of QuarML is greatly exaggerated. I would add one thing. If you are currently in your app, have some generative model experience, definitely consider using Core AI. If you have that on your Core ML stack, using Core AI is the recommended way. Yes. Going forward, we're going to recommend that for those experiences. Awesome. Thank you. Hopefully that puts you at ease if you are worried or concerned. Next question is from Catastrophe Zero. What's the maximum on-device local model size we can bring into our app before we should begin considering private cloud compute? HENRY SINGH: Yeah, I think the real question there is sort of what's-- well, it probably depends on the device. But what is the true limit, I guess, for-- let's say I want to bring an LLM with Core AI to my app? HENRY SINGH: Yeah, so a couple of things to consider, right, is thinking about what you can actually fit in your app. And you can look at your-- I think there's some OS process available memory. There's some call you can check out, how much available memory do I have in my app to use? And you can also look at instruments and a bunch of other tools we have to see how much memory a particular model uses. That said, that is you finding out and thinking of everything being around your app, which I think everyone has great apps, and I think some of them you feel like have the whole system. But you also want to be a good platform citizen and think about what the experience is in terms of that user of switching away from your app launching the camera and stuff, which we will preference over the app when the user is indicating I want to use something else. So we generally recommend on iOS to try to keep your models kind of below 2 gigabytes, if possible, particularly on those. You're starting to stress the system a lot. Again, it depends on your use case. And you can request in some places to have higher memory limits. So 2 gigabytes refers to iOS, I guess? Yeah, iOS. So is there an iOS. On macOS, yeah, it's sort of, you know, just like any macOS, it's available memory. You need to profile your app and see what you're on. iOS is, you know, a bit more constrained environment in terms of thinking about the whole system experience. So, yeah, you have to look at that collectively, I guess. So, yeah, there's no hard and fast rule. That thing I just gave you was sort of a rule of thumb off the cuff in terms of if you look at that, how much available memory you have, right, you know, generally you'll see that's how much is available. available for your entire app. So that includes also the models you're loading into memory, or the graphical assets you're putting, or your game rendering engine, independent of AI and ML considerations. And then you have always the option of quantizing more. Yes. But then I would compare-- Then you're going to go to evaluations in terms of-- A bigger model quantized. It depends on your application, I would say. We have a session this year, CoreEI session, where we show quantization, accuracy loss, and then quantizing parts of the workflow to recover the accuracy. And I think there's a little bit of experimentation to decide how much can you quantize and how much accuracy loss is tolerable. And in some cases, it's almost minimal, right? I think that combined with evaluation sort of get you to a place where -- to two bit like and you want to stay under the two gigabyte limit yeah you could get to like six billion yeah it's pushing it a little bit like you get around that like model size for an LLM right and depending on if it's like image generation you know segmentation model or language model I think the tolerance to quantization tends to write yeah different yeah and against Arona's point right we're giving a lot of guidance on iOS like on macOS is a lot large variety of the Mac platform in terms of configurations of that hardware. And so you can also consider the size of your model being dependent on the device and available memory that is there. And so you can scale your workloads or scale your models for application. So you don't need to always be using the same model you are on your Mac mini as your Mac studio, for example. And on four Mac studios... You have multiple Mac studios. We have a session this year We show how you can connect for Mac Studios using a thunderbolt fire and then you can run a trillion parameter model Yeah, so an app extreme right? Some of the latest we can't do that with iPhones Do you need to anyway? Yeah, I don't think we see both some double five already okay on that Until like game boy link like you get with all your friends You just link with a cable. Trillion parameters. Great. Moving on. This question is from Ahmed Zia. How do you decide when AI should help the user versus when it should stay out of the way? A bit of a philosophical question here. You can answer this question. I will summarize the discussion here based on all your individual inputs. But I don't know. I think broadly, as an app developer, you have to go back to the fundamental experience you want to deliver to the user. You're not doing AI for the sake of AI. Yes, these tools exist, but so have other tools existed. You don't use every framework in the ecosystem. So if there is a specific type of experience you want to deliver, whether it is, I don't know, something that creative, then maybe a language model can be used. If you want to do grammar correction, yeah, maybe sure you can use a model behind the scenes. But maybe your experience doesn't need to have AI-powered use cases. You don't even need to advertise that you're using AI, right? Ultimately, the delight of the user experience, I think that's what matters. You as the creator of the app and the experience you want to deliver should guide the tools that you use. If there's a gap in the functionality, then it turns out that the AI model can help you, then sure. Anyway, that's my stuff. Like one of the things I think we all do as developers is just try things and see what sticks, right? There's nothing more fun than kind of like a side quest activity that is like, I'm just gonna try this thing, right? And you get through that process and you then can kind of make that decision once you've sort of seen it in action and say, yeah, this is good and I like this or this isn't really quite gonna be as like meaningful to my users I think it could be. Now, of course, there is like, if you wanna take it the full way and go like full scientific on it, you could build an evaluation that could tell you that, right? like, you know, one of my favorite things to do as a developer is like build user stories, right? I really enjoy thinking about like, who's gonna pick this up, who's gonna use this, and who's gonna do what with it? And then kind of building like a series of like data points that kind of align to those use cases. And then from there, you kind of can work backwards and say like, okay, this is resulting in like quality scores, you know, of this number, or like my, you know, maybe they're really high, and that's great. And maybe it's like, that's actually a really good sign that this is really doing something useful for the large complement of users that I think will use this. Or maybe it's the opposite, where you're just like, oh my god, these scores are so terrible, and this isn't really worth a lot of the time. And so maybe this is just, you know, it was a really fun exercise. I learned a lot about how to use these technologies and these frameworks, and that's really cool. And maybe I'll use it for the next project that I do. But today it's just, you know, it was a fun little side quest activity. There's a lot of ways you can I mean what we're doing of course is we're trying to integrate AI into our system, right? So it's it's really sort of the little things like in the keyboard and you know Your normal use of everything in you know in iOS on your iPhone or right any any device That's you know try to just enrich that and just make people like get from point A to point B quicker Right right the help of AI and not necessarily like distract or like yeah, you should get out of the way as much Yeah, was that the question? I guess it was saying should it stay out of it? I think if you're like, I don't think anything should be in the way of what the user is trying to do. So I think just even the wording of the question, I feel like the answer is kind of in there, is that if it's getting in the way of a user experience, then it's probably not the right answer right now. Yeah, the entry points to AI is, I think, a relatively new discussion, wherein with the advent of like chat bot experiences right the entry point tends to be ai like the first question you ask but a lot of experiences that are part of the os have been there working under the hood like you said the dictation or a lot of things i think ultimately is sort of it it shouldn't be blocking you from doing something it should be helping the experiences And when I talk to our design friends, that's when I realized that we are all technologists and we work bottom up. You're like, hey, here's all these cool APIs. I want to use them, right? But user experience matters so much more. What kind of experience are you delivering to your app users? Yeah. I mean, for me, AI is a tool, right? Right. So use the right tool for your task, right? Do I need a hammer or jackhammer for a name? It depends. If you only have a hammer, if you only have a hammer, it looks like a name. Good discussion. Thank you. Thanks, everyone. This question is from Triumphy. We support older iPhones without Apple intelligence. We have a self-hosted LLM backend. Cool. Cool was me saying that. Not in the question. Okay. On iOS 27, is wrapping the backend in a custom language model provider, the recommended path. So one language model session, tools, generable, transcripts of all devices, any pitfalls versus system models. Yeah, exactly. So the new language model protocol that we introduced this year really lets you use any model with the same unified Swift API for if you're prompting or doing tool calling, getting generable output. So if you have your own custom server model that you're using right now, then I think the first question to ask would be, does that conform to the typical OpenAI chat completions web request protocol? And if so, we actually have an open source utilities package this year that has an example -- or not just an example, but has an implementation of a type that conforms to that protocol that sort of matches that typical web request format. So you could use that as is, I think, for a lot of server-hosted models that you might be using already. And yeah, I mean, we also have MLX and core AI packages. So I think, yeah, I would say look into the protocol, right? It really lets you use any model you want without having to necessarily rewrite the other parts of your code about how you're prompting the model. Right, right. And especially if in the same app you're using, maybe the system language model, but also a different model, then again, I think we mentioned it before, where it can just be an if statement. It's just an if statement for deciding which model to use, perhaps depending on the platform availability, but then all of your other code stays the same, and that's really nice. Right. We are at the end of the group lab session today, but I want to sort of summarize. I know. Having so much fun. Yeah. It's been an exciting week for sure, with so many good announcements. I think sort of what you said at the end is one of my highlights, which is the new Foundation MOLS framework sort of standardizing the API surface for developers. It's so much easier now to be able to try different models, whether it's using Core AI conformance, the package, or the MLX one, and bringing your own models. You should be testing everything as we've been evaluating. I promise it's fun. Yeah, yeah. I mean-- I have a good time doing it. Yeah, you should be integrating this into your test suites and making sure, ultimately, I think the end user, people using their apps and their experiences, what should be guiding everything. So, awesome. Well, that brings us to the end of today's group lab. Thank you all for joining us. We hope this was useful. I also want to say thanks to our awesome panelists here and all the folks working behind the scenes triaging your questions. So thank you, thank you. A few things I'll reiterate here. As I mentioned earlier, if we didn't get your questions, and I know there were so many here we didn't get to, please head over to the forums at developer.apple.com. We can continue our discussions there. And bug reports, feature requests, again, feedback assistant.apple.com is your friend. We really do appreciate all the things you filed. So thank you. And speaking of feedback, you should also receive an email with a survey link to let us know about your experience at WWDC. We'd love to incorporate these feedback for future events. Thanks again for joining us, and we hope you had a wonderful WWDC this year. Thank you. Like and subscribe. - All right, I can still see you.