Skip to main content

    Scaling Model Training with FSDP, PyTorch, and Ray

    Web Content

    fsdp internalsdistributed trainingpytorch raydeepspeedvoice cloningqwen3-tts
    February 6, 2026
    Source

    Summary

    This document provides a comprehensive deep dive into Fully Sharded Data Parallelism (FSDP), PyTorch's native implementation of ZeRO-3, for scaling large model training. It explains FSDP's internal mechanisms, memory efficiency, and communication costs through a step-by-step walkthrough of a training iteration. FSDP shards model parameters, gradients, and optimizer states across all GPUs, reducing per-GPU memory from 32 GB (DDP) to 8 GB (FSDP with 4 GPUs) for an 8GB parameter model. This is achieved through vertical partitioning into "units" and horizontal sharding of entities across devices. The FSDP training process involves: 1) Initial setup (dataset split, model sharding). 2) Forward pass: All-Gather parameters for each unit, parallel computation, save activations, then Reshard. 3) Backward pass: All-Gather parameters, compute local gradients, Reduce-Scatter gradients, free activations. 4) Optimizer step: independent local updates on each GPU. FSDP2 offers improvements like per-parameter sharding and native DTensor support. The document demonstrates FSDP implementation using PyTorch FSDP2 and Ray Train, covering model definition (Vision Transformer), sharding, distributed checkpointing (DCP for parallel I/O and automatic resharding), and the training loop. It also introduces DeepSpeed as an alternative, configurable via JSON for ZeRO stages. Finally, a real-world project fine-tunes the 1.7B parameter Qwen3-TTS model for voice cloning. This pipeline includes data processing (Whisper transcription, Ray Data), audio code extraction (Qwen3-TTS-Tokenizer, 12Hz, 16 codebooks), distributed SFT training (freezing most parameters, training the talker, conditioning with speaker embeddings), and inference. The Qwen3-TTS talker has 847,234,560 trainable parameters.

    Build your own second brain

    Save and connect content like this in your personal library.

    Save any link or doc

    One place for everything

    AI summaries and topics

    Understand at a glance

    Your second brain

    Search and connect ideas

    Join thousands building their second brain