Friday, 2 January 2026

Porting WeDLM to MLX

Tencent's WeDLM paper reports 3x speedup over vLLM using diffusion-style parallel decoding—multiple tokens generated per forward pass while maintaining KV cache compatibility. This post documents an independent port of WeDLM to Apple's MLX framework, covering the architecture decisions, performance optimizations, and implementation details from a week of development on an M4 Max.

[... 6,963 words]

#2:50 am / ML, Apple-Silicon, MLX

Zimeng Xiong's Weblog

Friday, 2 January 2026

Porting WeDLM to MLX

Navigation