LLM TRAINING CONSOLE

POWERED BY GOOGLE CLOUD TPU v5p PODS
STATUS: Google for Startups Applicant - Cloud Credits Pending
Accelerator Pod
Topology 4x4x4 (64 Chips)
Pod State ACTIVE TRAIN
MXU Utilization 0%
HBM Bandwidth 0 GB/s
Training Progress
Dataset HPLT, C4, OSCAR, Wikipedia, YouTube, SlimPajama, FineWeb-Edu
Optimizer AdamW (B1=0.9, B2=0.95)
TOKENS / SEC
0
EST. EPOCH END
--:--
Epoch 3/10 Progress 0%
Loss & Perplexity
Global Step: 14205
Training Loss 2.4012
Validation Loss 2.4500
Teacher Loss 2.1005
Perplexity 11.03
Training Daemon (Master-0)
> Initializing TPU Mesh (4x4x4)... DONE
> Loading Sharded Checkpoint v2.4... DONE
> Resuming training from step 14200...
Kernel / Hardware Events