Explore the power of machine learning and Apple Intelligence within apps. Discuss integrating features, share best practices, and explore the possibilities for your app here.

All subtopics
Posts under Machine Learning & AI topic

Post

Replies

Boosts

Views

Activity

LanguageModelSession always returns very lengthy responses
No matter what, the LanguageModelSession always returns very lengthy / verbose responses. I set the maximumResponseTokens option to various small numbers but it doesn't appear to have any effect. I've even used this instructions format to keep responses between 3-8 words but it returns multiple paragraphs. Is there a way to manage LLM response length? Thanks.
3
0
329
Sep ’25
Best approach for animating a speaking avatar in a macOS/iOS SwiftUI application
I am developing a macOS application using SwiftUI (with an iOS version as well). One feature we are exploring is displaying an avatar that reads or speaks dynamically generated text produced by an AI service. The basic flow would be: Text generated by an AI service Text converted to speech using a TTS engine An avatar (2D or 3D) rendered in the app that animates lip movement synchronized with the speech Ideally the avatar would render locally on the device. Questions: What Apple frameworks would be most appropriate for implementing a speaking avatar? SceneKit RealityKit SpriteKit (for 2D avatars) Is there any recommended way to drive lip-sync animation from speech audio using Apple frameworks? Does AVSpeechSynthesizer expose phoneme or viseme timing information that could be used for avatar animation? If such timing information is not available, what is the recommended approach for synchronizing character mouth animation with speech audio on macOS/iOS? Are there examples of real-time character animation synchronized with speech on macOS/iOS? Any architectural guidance or references would be greatly appreciated.
0
0
536
3w
get error with xcode beta3 :decodingFailure(FoundationModels.LanguageModelSession.GenerationError.Context
@Generable enum Breakfast { case waffles case pancakes case bagels case eggs } do { let session = LanguageModelSession() let userInput = "I want something sweet." let prompt = "Pick the ideal breakfast for request: (userInput)" let response = try await session.respond(to: prompt,generating: Breakfast.self) print(response.content) } catch let error { print(error) } i want to test the @Generable demo but get error with below:decodingFailure(FoundationModels.LanguageModelSession.GenerationError.Context(debugDescription: "Failed to convert text into into GeneratedContent\nText: waffles", underlyingErrors: [Swift.DecodingError.dataCorrupted(Swift.DecodingError.Context(codingPath: [], debugDescription: "The given data was not valid JSON.", underlyingError: Optional(Error Domain=NSCocoaErrorDomain Code=3840 "Unexpected character 'w' around line 1, column 1." UserInfo={NSJSONSerializationErrorIndex=0, NSDebugDescription=Unexpected character 'w' around line 1, column 1.})))]))
1
0
138
Jul ’25
Subject: Technical Report: Float32 Precision Ceiling & Memory Fragmentation in JAX/Metal Workloads on M3
Subject: Technical Report: Float32 Precision Ceiling & Memory Fragmentation in JAX/Metal Workloads on M3 To: Metal Developer Relations Hello, I am reporting a repeatable numerical saturation point encountered during sustained recursive high-order differential workloads on the Apple M3 (16 GB unified memory) using the JAX Metal backend. Workload Characteristics: Large-scale vector projections across multi-dimensional industrial datasets Repeated high-order finite-difference calculations Heavy use of jax.grad and lax.cond inside long-running loops Observation: Under these conditions, the Metal/MPS backend consistently enters a terminal quantization lock where outputs saturate at a fixed scalar value (2.0000), followed by system-wide NaN propagation. This appears to be a precision-limited boundary in the JAX-Metal bridge when handling high-order operations with cubic time-scale denominators. have identified the specific threshold where recursive high-order tensor derivatives exceed the numerical resolution of 32-bit consumer architectures, necessitating a migration to a dedicated 64-bit industrial stack. I have prepared a minimal synthetic test script (randomized vectors only, no proprietary logic) that reliably reproduces the allocator fragmentation and saturation behavior. Let me know if your team would like the telemetry for XLA/MPS optimization purposes. Best regards, Alex Severson Architect, QuantumPulse AI
0
0
226
Mar ’26
Can MPSGraphExecutable automatically leverage Apple Neural Engine (ANE) for inference?
Hi, I'm currently using Metal Performance Shaders Graph (MPSGraphExecutable) to run neural network inference operations as part of a metal rendering pipeline. I also tried to profile the usage of neural engine when running inference using MPSGraphExecutable but the graph shows no sign of neural engine usage. However, when I used the coreML model inspection tool in xcode and run performance report, it was able to use ANE. Does MPSGraphExecutable automatically utilize the Apple Neural Engine (ANE) when running inference operations, or does it only execute on GPU? My model (Core ML Package) was converted from a pytouch model using coremltools with ML program type and support iOS17.0+. Any insights or documentation references would be greatly appreciated!
0
0
491
Nov ’25
CoreML Instrument Testing Native Clawbot using FM.SyML & OAIC & Diffusion
After running performance test on my CoreML qwen3 vision, I appreciated the update where results were viewable... ON Mac it mentions Ios18 and im not sure if or how to change.. that bottle neck lead to rebuilding CoreML view. I woke up and realized I have all the pieces together... and ended up with a swift package working demo of Clawbot.. the current issue is Im trying to use gguf 3b to code it.. I have become well aware that everything I create using the big models, they soon become the default themes /layouts for everyone else simply asking for this or that (I appoligise) so here I am asking (while looking to schedule meet with dev) if its possible to speak with anyone about th 1000s of Apple Intelligence PCC, Xcode, and vision reports and feedback ive sent , in terms of just general ways I can work more efficiently without the crash... ive already build a TUI for MLX but the tools for coreML while seems promising are not intuitive, but the vision format instruction was nice to see. Anyway my question is:
0
0
95
Feb ’26
SpeechAnalyzer / AssetInventory and preinstalled assets
During testing the “Bringing advanced speech-to-text capabilities to your app” sample app demonstrating the use of iOS 26 SpeechAnalyzer, I noticed that the language model for the English locale was presumably already downloaded. Upon checking the documentation of AssetInventory, I found out that indeed, the language model can be preinstalled on the system. Can someone from the dev team share more info about what assets are preinstalled by the system? For example, can we safely assume that the English language model will almost certainly be already preinstalled by the OS if the phone has the English locale?
1
0
274
Jul ’25
ImagePlayground: Programmatic Creation Error
Hardware: Macbook Pro M4 Nov 2024 Software: macOS Tahoe 26.0 & xcode 26.0 Apple Intelligence is activated and the Image playground macOS app works Running the following on xcode throws ImagePlayground.ImageCreator.Error.creationFailed Any suggestions on how to make this work? import Foundation import ImagePlayground Task { let creator = try await ImageCreator() guard let style = creator.availableStyles.first else { print("No styles available") exit(1) } let images = creator.images( for: [.text("A cat wearing mittens.")], style: style, limit: 1) for try await image in images { print("Generated image: \(image)") } exit(0) } RunLoop.main.run()
0
0
330
Sep ’25
Custom keypoint detection model through vision api
Hi there, I have a custom keypoint detection model and want to use it via vision's CoremlRequest API. Here's some complication for input and output: For input My model expect 512x512 a image. Which would be resized and padded from a 1920x1080 frame. I use the .scaleToFit option, but can I also specify the color used for padding? For output: My model output a CoreMLFeatureValueObservation, can I have it output in a format vision recognizes? such as joints/keypoints If my model is able to output in a format vision recognizes, would it take care to restoring the coordinates back to the original frame? (undo the padding) If not, how do I restore it from .scaletofit option? Best,
1
0
935
Oct ’25
Context window 90% of adapter model full after single user prompt
I have been able to train an adapter on Google's Colaboratory. I am able to start a LanguageModelSession and load it with my adapter. The problem is that after one simple prompt, the context window is 90% full. If I start the session without the adapter, the same simple prompt consumes only 1% of the context window. Has anyone encountered this? I asked Claude AI and it seems to think that my training script needs adjusting. Grok on the other hand is (wrongly, I tried) convinced that I just need to tweak some parameters of LanguageModelSession or SystemLanguageModel. Thanks for any tips.
13
0
3.2k
Feb ’26
Threading issues when using debugger
Hi, I am modifying the sample camera app that is here: https://aninterestingwebsite.com/tutorials/sample-apps/capturingphotos-camerapreview ... In the processPreviewImages, I am using the Vision APIs to generate a segmentation mask for a person/object, then compositing that person onto a different background (with some other filtering). The filtering and compositing is done via CoreImage. At the end, I convert the CIImage to a CGImage then to a SwiftUI Image. When I run it on my iPhone, it works fine, and has not crashed. When I run it on the iPhone with the debugger, it crashes within a few seconds with: EXC_BAD_ACCESS in libRPAC.dylib`std::__1::__hash_table<std::__1::__hash_value_type<long, qos_info_t>, std::__1::__unordered_map_hasher<long, std::__1::__hash_value_type<long, qos_info_t>, std::__1::hash, std::__1::equal_to, true>, std::__1::__unordered_map_equal<long, std::__1::__hash_value_type<long, qos_info_t>, std::__1::equal_to, std::__1::hash, true>, std::__1::allocator<std::__1::__hash_value_type<long, qos_info_t>>>::__emplace_unique_key_args<long, std::__1::piecewise_construct_t const&, std::__1::tuple<long const&>, std::__1::tuple<>>: It had previously been working fine with the debugger, so I'm not sure what has changed. Is there a difference in how the Vision APIs are executed if the debugger is attached vs. not?
1
0
418
Jan ’26
ANE Error with Statefu Model: "Unable to compute prediction" when State Tensor width is not 32-aligned
Hi everyone, I believe I’ve encountered a potential bug or a hardware alignment limitation in the Core ML Framework / ANE Runtime specifically affecting the new Stateful API (introduced in iOS 18/macOS 15). The Issue: A Stateful mlprogram fails to run on the Apple Neural Engine (ANE) if the state tensor dimensions (specifically the width) are not a multiple of 32. The model works perfectly on CPU and GPU, but fails on ANE both during runtime and when generating a Performance Report in Xcode. Error Message in Xcode UI: "There was an error creating the performance report Unable to compute the prediction using ML Program. It can be an invalid input data or broken/unsupported model." Observations: Case A (Fails): State shape = (1, 3, 480, 270). Prediction fails on ANE. Case B (Success): State shape = (1, 3, 480, 256). Prediction succeeds on ANE. This suggests an internal memory alignment or tiling issue within the ANE driver when handling Stateful buffers that don't meet the 32-pixel/element alignment. Reproduction Code (PyTorch + coremltools): import torch.nn as nn import coremltools as ct import numpy as np class RNN_Stateful(nn.Module): def __init__(self, hidden_shape): super(RNN_Stateful, self).__init__() # Simple conv to update state self.conv1 = nn.Conv2d(3 + hidden_shape[1], hidden_shape[1], kernel_size=3, padding=1) self.conv2 = nn.Conv2d(hidden_shape[1], 3, kernel_size=3, padding=1) self.register_buffer("hidden_state", torch.ones(hidden_shape, dtype=torch.float16)) def forward(self, imgs): self.hidden_state = self.conv1(torch.cat((imgs, self.hidden_state), dim=1)) return self.conv2(self.hidden_state) # h=480, w=255 causes ANE failure. w=256 works. b, ch, h, w = 1, 3, 480, 255 model = RNN_Stateful((b, ch, h, w)).eval() traced_model = torch.jit.trace(model, torch.randn(b, 3, h, w)) mlmodel = ct.convert( traced_model, inputs=[ct.TensorType(name="input_image", shape=(b, 3, h, w), dtype=np.float16)], outputs=[ct.TensorType(name="output", dtype=np.float16)], states=[ct.StateType(wrapped_type=ct.TensorType(shape=(b, ch, h, w), dtype=np.float16), name="hidden_state")], minimum_deployment_target=ct.target.iOS18, convert_to="mlprogram" ) mlmodel.save("rnn_stateful.mlpackage") Steps to see the error: Open the generated .mlpackage in Xcode 16.0+. Go to the Performance tab and run a test on a device with ANE (e.g., iPhone 15/16 or M-series Mac). The report will fail to generate with the error mentioned above. Environment: OS: macOS 15.2 Xcode: 16.3 Hardware: M4 Has anyone else encountered this 32-pixel alignment requirement for StateType tensors on ANE? Is this a known hardware constraint or a bug in the Core ML runtime? Any insights or workarounds (other than manual padding) would be appreciated.
0
0
478
Dec ’25
Is there anywhere to get precompiled WhisperKit models for Swift?
If try to dynamically load WhipserKit's models, as in below, the download never occurs. No error or anything. And at the same time I can still get to the huggingface.co hosting site without any headaches, so it's not a blocking issue. let config = WhisperKitConfig( model: "openai_whisper-large-v3", modelRepo: "argmaxinc/whisperkit-coreml" ) So I have to default to the tiny model as seen below. I have tried so many ways, using ChatGPT and others, to build the models on my Mac, but too many failures, because I have never dealt with builds like that before. Are there any hosting sites that have the models (small, medium, large) already built where I can download them and just bundle them into my project? Wasted quite a large amount of time trying to get this done. import Foundation import WhisperKit @MainActor class WhisperLoader: ObservableObject { var pipe: WhisperKit? init() { Task { await self.initializeWhisper() } } private func initializeWhisper() async { do { Logging.shared.logLevel = .debug Logging.shared.loggingCallback = { message in print("[WhisperKit] \(message)") } let pipe = try await WhisperKit() // defaults to "tiny" self.pipe = pipe print("initialized. Model state: \(pipe.modelState)") guard let audioURL = Bundle.main.url(forResource: "44pf", withExtension: "wav") else { fatalError("not in bundle") } let result = try await pipe.transcribe(audioPath: audioURL.path) print("result: \(result)") } catch { print("Error: \(error)") } } }
0
0
122
Jun ’25
Does ImageRequestHandler(data:) include depth data from AVCapturePhoto?
Hi all, I'm capturing a photo using AVCapturePhotoOutput, and I've set: let photoSettings = AVCapturePhotoSettings() photoSettings.isDepthDataDeliveryEnabled = true Then I create the handler like this: let data = photo.fileDataRepresentation() let handler = try ImageRequestHandler(data: data, orientation: .right) Now I’m wondering: If depth data delivery is enabled, is it actually included and used when I pass the Data to ImageRequestHandler? Or do I need to explicitly pass the depth data using the other initializer? let handler = try ImageRequestHandler( cvPixelBuffer: photo.pixelBuffer!, depthData: photo.depthData, orientation: .right ) In short: Does ImageRequestHandler(data:) make use of embedded depth info from AVCapturePhoto.fileDataRepresentation() — or is the pixel buffer + explicit depth data required? Thanks for any clarification!
1
0
282
Jul ’25
Getting FoundationsModel running in Simulator
I have a mac (M4, MacBook Pro) running Tahoe 26.0 beta. I am running Xcode beta. I can run code that uses the LLM in a #Preview { }. But when I try to run the same code in the simulator, I get the 'device not ready' error and I see the following in the Settings app. Is there anything I can do to get the simulator to past this point and allowing me to test on it with Apple's LLM?
3
0
393
Jul ’25
LanguageModelSession always returns very lengthy responses
No matter what, the LanguageModelSession always returns very lengthy / verbose responses. I set the maximumResponseTokens option to various small numbers but it doesn't appear to have any effect. I've even used this instructions format to keep responses between 3-8 words but it returns multiple paragraphs. Is there a way to manage LLM response length? Thanks.
Replies
3
Boosts
0
Views
329
Activity
Sep ’25
Best approach for animating a speaking avatar in a macOS/iOS SwiftUI application
I am developing a macOS application using SwiftUI (with an iOS version as well). One feature we are exploring is displaying an avatar that reads or speaks dynamically generated text produced by an AI service. The basic flow would be: Text generated by an AI service Text converted to speech using a TTS engine An avatar (2D or 3D) rendered in the app that animates lip movement synchronized with the speech Ideally the avatar would render locally on the device. Questions: What Apple frameworks would be most appropriate for implementing a speaking avatar? SceneKit RealityKit SpriteKit (for 2D avatars) Is there any recommended way to drive lip-sync animation from speech audio using Apple frameworks? Does AVSpeechSynthesizer expose phoneme or viseme timing information that could be used for avatar animation? If such timing information is not available, what is the recommended approach for synchronizing character mouth animation with speech audio on macOS/iOS? Are there examples of real-time character animation synchronized with speech on macOS/iOS? Any architectural guidance or references would be greatly appreciated.
Replies
0
Boosts
0
Views
536
Activity
3w
get error with xcode beta3 :decodingFailure(FoundationModels.LanguageModelSession.GenerationError.Context
@Generable enum Breakfast { case waffles case pancakes case bagels case eggs } do { let session = LanguageModelSession() let userInput = "I want something sweet." let prompt = "Pick the ideal breakfast for request: (userInput)" let response = try await session.respond(to: prompt,generating: Breakfast.self) print(response.content) } catch let error { print(error) } i want to test the @Generable demo but get error with below:decodingFailure(FoundationModels.LanguageModelSession.GenerationError.Context(debugDescription: "Failed to convert text into into GeneratedContent\nText: waffles", underlyingErrors: [Swift.DecodingError.dataCorrupted(Swift.DecodingError.Context(codingPath: [], debugDescription: "The given data was not valid JSON.", underlyingError: Optional(Error Domain=NSCocoaErrorDomain Code=3840 "Unexpected character 'w' around line 1, column 1." UserInfo={NSJSONSerializationErrorIndex=0, NSDebugDescription=Unexpected character 'w' around line 1, column 1.})))]))
Replies
1
Boosts
0
Views
138
Activity
Jul ’25
Subject: Technical Report: Float32 Precision Ceiling & Memory Fragmentation in JAX/Metal Workloads on M3
Subject: Technical Report: Float32 Precision Ceiling & Memory Fragmentation in JAX/Metal Workloads on M3 To: Metal Developer Relations Hello, I am reporting a repeatable numerical saturation point encountered during sustained recursive high-order differential workloads on the Apple M3 (16 GB unified memory) using the JAX Metal backend. Workload Characteristics: Large-scale vector projections across multi-dimensional industrial datasets Repeated high-order finite-difference calculations Heavy use of jax.grad and lax.cond inside long-running loops Observation: Under these conditions, the Metal/MPS backend consistently enters a terminal quantization lock where outputs saturate at a fixed scalar value (2.0000), followed by system-wide NaN propagation. This appears to be a precision-limited boundary in the JAX-Metal bridge when handling high-order operations with cubic time-scale denominators. have identified the specific threshold where recursive high-order tensor derivatives exceed the numerical resolution of 32-bit consumer architectures, necessitating a migration to a dedicated 64-bit industrial stack. I have prepared a minimal synthetic test script (randomized vectors only, no proprietary logic) that reliably reproduces the allocator fragmentation and saturation behavior. Let me know if your team would like the telemetry for XLA/MPS optimization purposes. Best regards, Alex Severson Architect, QuantumPulse AI
Replies
0
Boosts
0
Views
226
Activity
Mar ’26
Can MPSGraphExecutable automatically leverage Apple Neural Engine (ANE) for inference?
Hi, I'm currently using Metal Performance Shaders Graph (MPSGraphExecutable) to run neural network inference operations as part of a metal rendering pipeline. I also tried to profile the usage of neural engine when running inference using MPSGraphExecutable but the graph shows no sign of neural engine usage. However, when I used the coreML model inspection tool in xcode and run performance report, it was able to use ANE. Does MPSGraphExecutable automatically utilize the Apple Neural Engine (ANE) when running inference operations, or does it only execute on GPU? My model (Core ML Package) was converted from a pytouch model using coremltools with ML program type and support iOS17.0+. Any insights or documentation references would be greatly appreciated!
Replies
0
Boosts
0
Views
491
Activity
Nov ’25
face and body detection in the Vision framework a local model or a cloud model?
Is the face and body detection service in the Vision framework a local model or a cloud model? Is there a performance report? https://aninterestingwebsite.com/documentation/vision
Replies
1
Boosts
0
Views
505
Activity
Sep ’25
Unable to use ChatGPT in Xcode
When I use ChatGPT in Xcode, the following error is displayed: It was working fine before, but suddenly it became like this, without changing any configuration. Why?
Replies
2
Boosts
0
Views
378
Activity
Jul ’25
CoreML Instrument Testing Native Clawbot using FM.SyML & OAIC & Diffusion
After running performance test on my CoreML qwen3 vision, I appreciated the update where results were viewable... ON Mac it mentions Ios18 and im not sure if or how to change.. that bottle neck lead to rebuilding CoreML view. I woke up and realized I have all the pieces together... and ended up with a swift package working demo of Clawbot.. the current issue is Im trying to use gguf 3b to code it.. I have become well aware that everything I create using the big models, they soon become the default themes /layouts for everyone else simply asking for this or that (I appoligise) so here I am asking (while looking to schedule meet with dev) if its possible to speak with anyone about th 1000s of Apple Intelligence PCC, Xcode, and vision reports and feedback ive sent , in terms of just general ways I can work more efficiently without the crash... ive already build a TUI for MLX but the tools for coreML while seems promising are not intuitive, but the vision format instruction was nice to see. Anyway my question is:
Replies
0
Boosts
0
Views
95
Activity
Feb ’26
SpeechAnalyzer / AssetInventory and preinstalled assets
During testing the “Bringing advanced speech-to-text capabilities to your app” sample app demonstrating the use of iOS 26 SpeechAnalyzer, I noticed that the language model for the English locale was presumably already downloaded. Upon checking the documentation of AssetInventory, I found out that indeed, the language model can be preinstalled on the system. Can someone from the dev team share more info about what assets are preinstalled by the system? For example, can we safely assume that the English language model will almost certainly be already preinstalled by the OS if the phone has the English locale?
Replies
1
Boosts
0
Views
274
Activity
Jul ’25
ImagePlayground: Programmatic Creation Error
Hardware: Macbook Pro M4 Nov 2024 Software: macOS Tahoe 26.0 & xcode 26.0 Apple Intelligence is activated and the Image playground macOS app works Running the following on xcode throws ImagePlayground.ImageCreator.Error.creationFailed Any suggestions on how to make this work? import Foundation import ImagePlayground Task { let creator = try await ImageCreator() guard let style = creator.availableStyles.first else { print("No styles available") exit(1) } let images = creator.images( for: [.text("A cat wearing mittens.")], style: style, limit: 1) for try await image in images { print("Generated image: \(image)") } exit(0) } RunLoop.main.run()
Replies
0
Boosts
0
Views
330
Activity
Sep ’25
Custom keypoint detection model through vision api
Hi there, I have a custom keypoint detection model and want to use it via vision's CoremlRequest API. Here's some complication for input and output: For input My model expect 512x512 a image. Which would be resized and padded from a 1920x1080 frame. I use the .scaleToFit option, but can I also specify the color used for padding? For output: My model output a CoreMLFeatureValueObservation, can I have it output in a format vision recognizes? such as joints/keypoints If my model is able to output in a format vision recognizes, would it take care to restoring the coordinates back to the original frame? (undo the padding) If not, how do I restore it from .scaletofit option? Best,
Replies
1
Boosts
0
Views
935
Activity
Oct ’25
Context window 90% of adapter model full after single user prompt
I have been able to train an adapter on Google's Colaboratory. I am able to start a LanguageModelSession and load it with my adapter. The problem is that after one simple prompt, the context window is 90% full. If I start the session without the adapter, the same simple prompt consumes only 1% of the context window. Has anyone encountered this? I asked Claude AI and it seems to think that my training script needs adjusting. Grok on the other hand is (wrongly, I tried) convinced that I just need to tweak some parameters of LanguageModelSession or SystemLanguageModel. Thanks for any tips.
Replies
13
Boosts
0
Views
3.2k
Activity
Feb ’26
Threading issues when using debugger
Hi, I am modifying the sample camera app that is here: https://aninterestingwebsite.com/tutorials/sample-apps/capturingphotos-camerapreview ... In the processPreviewImages, I am using the Vision APIs to generate a segmentation mask for a person/object, then compositing that person onto a different background (with some other filtering). The filtering and compositing is done via CoreImage. At the end, I convert the CIImage to a CGImage then to a SwiftUI Image. When I run it on my iPhone, it works fine, and has not crashed. When I run it on the iPhone with the debugger, it crashes within a few seconds with: EXC_BAD_ACCESS in libRPAC.dylib`std::__1::__hash_table<std::__1::__hash_value_type<long, qos_info_t>, std::__1::__unordered_map_hasher<long, std::__1::__hash_value_type<long, qos_info_t>, std::__1::hash, std::__1::equal_to, true>, std::__1::__unordered_map_equal<long, std::__1::__hash_value_type<long, qos_info_t>, std::__1::equal_to, std::__1::hash, true>, std::__1::allocator<std::__1::__hash_value_type<long, qos_info_t>>>::__emplace_unique_key_args<long, std::__1::piecewise_construct_t const&, std::__1::tuple<long const&>, std::__1::tuple<>>: It had previously been working fine with the debugger, so I'm not sure what has changed. Is there a difference in how the Vision APIs are executed if the debugger is attached vs. not?
Replies
1
Boosts
0
Views
418
Activity
Jan ’26
Image playground stuck
Got new iPhone Boxing Day all works bar image playground uninstalled/reinstalled turns ai on/off still stuck
Replies
1
Boosts
0
Views
519
Activity
Dec ’25
ANE Error with Statefu Model: "Unable to compute prediction" when State Tensor width is not 32-aligned
Hi everyone, I believe I’ve encountered a potential bug or a hardware alignment limitation in the Core ML Framework / ANE Runtime specifically affecting the new Stateful API (introduced in iOS 18/macOS 15). The Issue: A Stateful mlprogram fails to run on the Apple Neural Engine (ANE) if the state tensor dimensions (specifically the width) are not a multiple of 32. The model works perfectly on CPU and GPU, but fails on ANE both during runtime and when generating a Performance Report in Xcode. Error Message in Xcode UI: "There was an error creating the performance report Unable to compute the prediction using ML Program. It can be an invalid input data or broken/unsupported model." Observations: Case A (Fails): State shape = (1, 3, 480, 270). Prediction fails on ANE. Case B (Success): State shape = (1, 3, 480, 256). Prediction succeeds on ANE. This suggests an internal memory alignment or tiling issue within the ANE driver when handling Stateful buffers that don't meet the 32-pixel/element alignment. Reproduction Code (PyTorch + coremltools): import torch.nn as nn import coremltools as ct import numpy as np class RNN_Stateful(nn.Module): def __init__(self, hidden_shape): super(RNN_Stateful, self).__init__() # Simple conv to update state self.conv1 = nn.Conv2d(3 + hidden_shape[1], hidden_shape[1], kernel_size=3, padding=1) self.conv2 = nn.Conv2d(hidden_shape[1], 3, kernel_size=3, padding=1) self.register_buffer("hidden_state", torch.ones(hidden_shape, dtype=torch.float16)) def forward(self, imgs): self.hidden_state = self.conv1(torch.cat((imgs, self.hidden_state), dim=1)) return self.conv2(self.hidden_state) # h=480, w=255 causes ANE failure. w=256 works. b, ch, h, w = 1, 3, 480, 255 model = RNN_Stateful((b, ch, h, w)).eval() traced_model = torch.jit.trace(model, torch.randn(b, 3, h, w)) mlmodel = ct.convert( traced_model, inputs=[ct.TensorType(name="input_image", shape=(b, 3, h, w), dtype=np.float16)], outputs=[ct.TensorType(name="output", dtype=np.float16)], states=[ct.StateType(wrapped_type=ct.TensorType(shape=(b, ch, h, w), dtype=np.float16), name="hidden_state")], minimum_deployment_target=ct.target.iOS18, convert_to="mlprogram" ) mlmodel.save("rnn_stateful.mlpackage") Steps to see the error: Open the generated .mlpackage in Xcode 16.0+. Go to the Performance tab and run a test on a device with ANE (e.g., iPhone 15/16 or M-series Mac). The report will fail to generate with the error mentioned above. Environment: OS: macOS 15.2 Xcode: 16.3 Hardware: M4 Has anyone else encountered this 32-pixel alignment requirement for StateType tensors on ANE? Is this a known hardware constraint or a bug in the Core ML runtime? Any insights or workarounds (other than manual padding) would be appreciated.
Replies
0
Boosts
0
Views
478
Activity
Dec ’25
Any Recommandation for a Image Enhance and Denoise Model
I'm really not familiar with ML, but I need a model that can enhance and denoise 4k video stream at 30fps. I have tried to search latest papers but they all have very complex structure, and I don't think I can convert them to mlmodel. So can anyone give me any recommandation for such models? If there is an existing mlmodel, that would be great!
Replies
0
Boosts
0
Views
262
Activity
Oct ’25
Is there anywhere to get precompiled WhisperKit models for Swift?
If try to dynamically load WhipserKit's models, as in below, the download never occurs. No error or anything. And at the same time I can still get to the huggingface.co hosting site without any headaches, so it's not a blocking issue. let config = WhisperKitConfig( model: "openai_whisper-large-v3", modelRepo: "argmaxinc/whisperkit-coreml" ) So I have to default to the tiny model as seen below. I have tried so many ways, using ChatGPT and others, to build the models on my Mac, but too many failures, because I have never dealt with builds like that before. Are there any hosting sites that have the models (small, medium, large) already built where I can download them and just bundle them into my project? Wasted quite a large amount of time trying to get this done. import Foundation import WhisperKit @MainActor class WhisperLoader: ObservableObject { var pipe: WhisperKit? init() { Task { await self.initializeWhisper() } } private func initializeWhisper() async { do { Logging.shared.logLevel = .debug Logging.shared.loggingCallback = { message in print("[WhisperKit] \(message)") } let pipe = try await WhisperKit() // defaults to "tiny" self.pipe = pipe print("initialized. Model state: \(pipe.modelState)") guard let audioURL = Bundle.main.url(forResource: "44pf", withExtension: "wav") else { fatalError("not in bundle") } let result = try await pipe.transcribe(audioPath: audioURL.path) print("result: \(result)") } catch { print("Error: \(error)") } } }
Replies
0
Boosts
0
Views
122
Activity
Jun ’25
How to get access to VisionPro cameras?
Access to VisionPro cameras is required for a research project. The project is on mixed reality software development for healthcare applications in dentistry.
Replies
1
Boosts
0
Views
614
Activity
Jul ’25
Does ImageRequestHandler(data:) include depth data from AVCapturePhoto?
Hi all, I'm capturing a photo using AVCapturePhotoOutput, and I've set: let photoSettings = AVCapturePhotoSettings() photoSettings.isDepthDataDeliveryEnabled = true Then I create the handler like this: let data = photo.fileDataRepresentation() let handler = try ImageRequestHandler(data: data, orientation: .right) Now I’m wondering: If depth data delivery is enabled, is it actually included and used when I pass the Data to ImageRequestHandler? Or do I need to explicitly pass the depth data using the other initializer? let handler = try ImageRequestHandler( cvPixelBuffer: photo.pixelBuffer!, depthData: photo.depthData, orientation: .right ) In short: Does ImageRequestHandler(data:) make use of embedded depth info from AVCapturePhoto.fileDataRepresentation() — or is the pixel buffer + explicit depth data required? Thanks for any clarification!
Replies
1
Boosts
0
Views
282
Activity
Jul ’25
Getting FoundationsModel running in Simulator
I have a mac (M4, MacBook Pro) running Tahoe 26.0 beta. I am running Xcode beta. I can run code that uses the LLM in a #Preview { }. But when I try to run the same code in the simulator, I get the 'device not ready' error and I see the following in the Settings app. Is there anything I can do to get the simulator to past this point and allowing me to test on it with Apple's LLM?
Replies
3
Boosts
0
Views
393
Activity
Jul ’25