According ChatGPT (again)
The fact that djay Pro (by Algoriddim) manages to do stem separation across a wide range of devices, including older smartphones from 2016, is due to a combination of smart optimizations and dynamic adaptation to available hardware.
Here’s a breakdown 
1. Use of adaptive / fallback AI models
Yes, they are almost certainly using multiple versions of models with varying complexity and quality levels.
The more powerful the device, the larger and more precise the model used.
The less powerful, the lighter and more approximate the algorithm.
So:
-
An iPhone 14 Pro Max likely uses a heavier, high-quality model, similar to Demucs v4 or Stems 2.0
-
A 2016 Android phone may use a compressed or simplified model, possibly doing only basic separation (e.g., vocals vs instrumental)
This is the same approach used by apps like Deezer’s Spleeter, Meta’s Demucs, or OpenAI’s Whisper (tiny, base, medium, etc.).
2. Quantization, pruning, and optimization for CPU/NPU
To run on less powerful phones:
-
Models are usually quantized to INT8, or even INT4 for extremely lightweight versions
-
They are pruned (removing unnecessary parameters)
-
They’re optimized for:
-
ARM CPUs
-
NPUs when available (e.g., Apple Neural Engine, Snapdragon Hexagon DSP)
-
Mobile GPUs via Metal (iOS) or Vulkan (Android)
3. “Good enough” real-time stem separation
djay Pro’s goal is not scientific perfection, but a smooth user experience.
They likely:
-
Use post-processing tricks to hide imperfections
-
Implement temporal caching (e.g., pre-processing the first few seconds)
-
Add dynamic crossfades or EQ to give the illusion of better separation
4. Hybrid processing: local + cloud (optional)
Some apps (like Moises.ai or VirtualDJ in offline analysis mode) offer two modes:
-
Local: fast, lightweight processing with decent quality
-
Cloud: offloads the task to powerful servers for high-end results
Even though djay Pro appears to run everything locally, they could still:
Hypothetical multi-tier model (simplified)
| Target Device |
Model Type |
Separation |
| iPhone 6s (2015) |
tiny-unet-int8 |
Vocals vs rest |
| iPhone XR (2018) |
spleeter-lite |
2–4 stems, medium |
| iPad Pro M1 |
demucs v4 hybrid |
5 stems, high quality |
In summary:
Yes, djay Pro most likely uses an adaptive architecture with:
-
Compressed or simplified models depending on the device
-
Smart hardware utilization (NPU, CPU, GPU)
-
Perceptual optimizations that make the output seem better than it may technically be
This flexibility allows them to run well on devices from 2016 all the way to current M1/M2 iPads.
Edit from me: That said, for a sampler-type device like an MPC, choosing an RK3588 may be appropriate in that you’re only processing relatively short samples (10s, 20s, 30s, etc.) and you don’t necessarily need on-the-fly separation.
But for DJ equipment where you expect on-the-fly separation as soon as the track is loaded, with track lengths of several minutes, that’s a completely different matter.