Zimeng Xiong's Weblog

About

2 posts tagged "MLX"

2026

Porting WeDLM to MLX

Tencent's WeDLM paper reports 3x speedup over vLLM using diffusion-style parallel decoding—multiple tokens generated per forward pass while maintaining KV cache compatibility. This post documents an independent port of WeDLM to Apple's MLX framework, covering the architecture decisions, performance optimizations, and implementation details from a week of development on an M4 Max.

[... 6,963 words]

2025

Porting Meta SAM-3D to Apple Silicon: Custom Metal Kernels and Memory Magic

Tl;DR This post documents the process of porting Meta's SAM-3D Objects (a 12GB foundation model for single-image 3D reconstruction) from CUDA/Linux to Apple Silicon macOS. The work involved rebuilding sparse convolution backends, implementing custom Metal compute shaders, and engineering a sequential model loading strategy that reduced peak memory from 61GB to 17GB.

[... 2,804 words]
002353 visitors