2 posts tagged "MLX"
2026
Porting WeDLM to MLX
Tencent's WeDLM paper reports 3x speedup over vLLM using diffusion-style parallel decoding—multiple tokens generated per forward pass while maintaining KV cache compatibility. This post documents an independent port of WeDLM to Apple's MLX framework, covering the architecture decisions, performance optimizations, and implementation details from a week of development on an M4 Max.
[... 6,963 words]2025
Porting Meta SAM-3D to Apple Silicon: Custom Metal Kernels and Memory Magic
Tl;DR This post documents the process of porting Meta's SAM-3D Objects (a 12GB foundation model for single-image 3D reconstruction) from CUDA/Linux to Apple Silicon macOS. The work involved rebuilding sparse convolution backends, implementing custom Metal compute shaders, and engineering a sequential model loading strategy that reduced peak memory from 61GB to 17GB.
[... 2,804 words]