Search: Paralives

Implementing Falcon-H1 Hybrid Architecture in NVIDIA Megatron Core | NVIDIA Technical Blog

…In Megatron Core (Megatron-LM), TII contributed: The foundational ParallelHybridLayer , a layer that runs Mamba and attention in parallel and sums their outputs The updated layer allocation logic that introduces the PARALLEL…

Mar 9, 2026 · Mireille Fares

Speeding Up Variable-Length Training with Dynamic Context Parallelism and NVIDIA Megatron Core | NVIDIA Technical Blog

Agentic AI / Generative AI Speeding Up Variable-Length Training with Dynamic Context Parallelism and NVIDIA Megatron Core Jan 28, 2026 By Kunlun Li , Tailai Ma , Parth Mannan , Sophia Yang , Guohao Wu and…

Jan 28, 2026 · Kunlun Li

Accelerating Long-Context Model Training in JAX and XLA | NVIDIA Technical Blog

…Context parallelism and ring attention Context parallelism (CP) is a parallelization strategy designed specifically for handling long sequences in transformer models. Unlike data parallelism, which splits the batch, or tensor parallelism, which…

Feb 3, 2026 · Sevin Fide Varoglu

Scaling Biomolecular Modeling Using Context Parallelism in NVIDIA BioNeMo | NVIDIA Technical Blog

Simulation / Modeling / Design Scaling Biomolecular Modeling Using Context Parallelism in NVIDIA BioNeMo Apr 28, 2026 By Dejun Lin , Kyle Tretina , Roy Tal and Neha Tadimeti Discuss (0) Discuss (0) L T F…

Apr 28, 2026 · Dejun Lin

NVIDIA Megatron Core

…Explore Features and Benefits of NVIDIA Megatron-Core Parallelism Techniques The Megatron Core library offers advanced model parallelism techniques, including tensor, sequence, pipeline, context, and MoE expert parallelism, for large-scale training…

Optimizing Communication for Mixture-of-Experts Training with Hybrid Expert Parallel | NVIDIA Technical Blog

Networking / Communications Optimizing Communication for Mixture-of-Experts Training with Hybrid Expert Parallel Feb 02, 2026 By Fan Yu , Tong Liu and Kai Sun Discuss (0) Discuss (0) L T F R…

Feb 2, 2026 · Fan Yu

Deploying Disaggregated LLM Inference Workloads on Kubernetes | NVIDIA Technical Blog

…Scaling disaggregated inference pipelines involves per-role and per-tensor-parallel-group scaling, with application-level autoscalers like NVIDIA Dynamo and llm-ds workload variant autoscaler maintaining optimal ratios across roles based…

Mar 23, 2026 · Anish Maddipoti

Using Accelerated Computing to Live-Steer Scientific Experiments at Massive Research Facilities | NVIDIA Technical Blog

…Both facilities move beyond batch analysis, favoring modular, highly parallel pipelines that execute reliably regardless of experiment size. Data movement, transformation, and extraction are automated to the degree that human oversight is…

Feb 10, 2026 · Quynh L. Nguyen

NVIDIA Isaac Lab-Arena

…You can also run large-scale, GPU-accelerated, parallel evaluations. It’s built on NVIDIA Isaac Lab , enabling rapid prototyping across diverse embodiments, objects, and environments without complex system building. This makes…

Scaling Autonomous AI Agents and Workloads with NVIDIA DGX Spark | NVIDIA Technical Blog

…As additional nodes are added, this overhead becomes increasingly dominant, limiting scaling efficiency. Parallelism for AI agents: Inference at scale Tensor parallelism enables efficient inference sharing across multiple nodes to fit the…

Mar 16, 2026 · Allen Bourgoyne

Followed topics

Paralives