2026-01-21

1/21/2026, 12:00:00 AM ~ 1/22/2026, 12:00:00 AM (UTC) Recent Announcements Amazon SageMaker HyperPod introduces enhanced lifecycle scripts debugging Amazon SageMaker HyperPod now provides enhanced troubleshooting capabilities for lifecycle scripts, making it easier to identify and resolve issues during cluster node provisioning. SageMaker HyperPod helps you provision resilient clusters for running AI/ML workloads and developing state-of-the-art models such as large language models (LLMs), diffusion models, and foundation models (FMs).\n When lifecycle scripts encounter issues during cluster creation or node operations, you now receive detailed error messages that include the specific CloudWatch log group and log stream names where you can find execution logs for lifecycle scripts....

January 21, 2026

2026-01-12

1/12/2026, 12:00:00 AM ~ 1/13/2026, 12:00:00 AM (UTC) Recent Announcements Amazon SageMaker HyperPod now validates service quotas before creating clusters on console Amazon SageMaker HyperPod console now validates service quotas for your AWS account before initiating cluster creation, enabling you to confirm sufficient quota availability before provisioning begins. SageMaker HyperPod helps you provision resilient clusters for running AI/ML workloads and developing state-of-the-art models such as large language models (LLMs), diffusion models, and foundation models (FMs)....

January 12, 2026

2025-12-19

12/19/2025, 12:00:00 AM ~ 12/22/2025, 12:00:00 AM (UTC) Recent Announcements Amazon Application Recovery Controller region switch now supports three new capabilities Amazon Application Recovery Controller (ARC) Region switch allows you to orchestrate the specific steps to switch your multi-Region applications to operate out of another AWS Region and achieve a bounded recovery time in the event of a Regional impairment to your applications. Region switch saves hours of engineering effort and eliminates the operational overhead previously required to complete failover steps, create custom dashboards, and manually gather evidence of a successful recovery for applications across your organization and hosted in multiple AWS accounts....

December 19, 2025

2025-12-16

12/16/2025, 12:00:00 AM ~ 12/17/2025, 12:00:00 AM (UTC) Recent Announcements AWS IoT Device Management Commands now supports dynamic payloads AWS IoT Device Management commands now supports dynamic payload functionality, enabling developers to create reusable command templates with placeholders that can be replaced with different values during command execution. The enhancement also includes parameter validation rules to verify that parameter values conform to specified criteria before execution. With this update, developers can now set parameters for the command during runtime, making it easier for them to re-use a command....

December 16, 2025

2025-12-03

12/3/2025, 12:00:00 AM ~ 12/4/2025, 12:00:00 AM (UTC) Recent Announcements Amazon SageMaker HyperPod now supports checkpointless training Amazon SageMaker HyperPod now supports checkpointless training, a new foundational model training capability that mitigates the need for a checkpoint-based job-level restart for fault recovery. Checkpointless training maintains forward training momentum despite failures, reducing recovery time from hours to minutes. This represents a fundamental shift from traditional checkpoint-based recovery, where failures require pausing the entire training cluster, diagnosing issues manually, and restoring from saved checkpoints, a process that can leave expensive AI accelerators idle for hours, costing your organization wasted compute....

December 3, 2025