CloudSoda Release 25.10 – Cloud Soda

Release 25.10 brings significant performance optimizations to file scanning and reports along with enhanced flexibility for data governance workflows. This update also introduces deeper integration with PowerScale quotas, expands our global connectivity options, and significantly accelerates onboarding with new bulk provisioning tools.

Highlights

Accelerated Windows Scanning: We’ve optimized how the platform interacts with Windows file systems, resulting in scan speeds up to 2x faster by reducing system call overhead.
Rapid Resource Onboarding: A new CLI-based bulk provisioning tool allows administrators to configure storages, agents, and scopes via CSV, replacing manual clicking with automated, idempotent execution.
Decoupled Duplicate Detection: Duplicate detection is now a standalone, schedulable job. This allows for dedicated storage analysis without tying up resources during standard indexing scans.
Real-Time PowerScale Quotas: A new API interface provides near real-time tracking of PowerScale quota metrics, offering better visibility into storage limits and usage.
Metadata Preservation: Enhanced data fidelity options now allow you to preserve object metadata when transferring between buckets across AWS, Google Cloud, and Azure.

New Features

Data Orchestration

Bulk Resource Provisioning (CLI): We have introduced a new command-line tool designed to streamline onboarding and large-scale configuration.
- CSV-Driven Setup: Users can now provision multiple storages, accessors, agents, and scope assignments in a single operation by uploading a defined CSV file.
- Idempotent & Safe: The tool is built to be idempotent—it verifies if resources already exist before attempting creation, preventing duplication errors. It also includes specific validation logic for SMB accessors to ensure connection integrity.
- Operational Velocity: This feature significantly reduces the risk of manual configuration errors and dramatically speeds up the deployment of complex multi-storage environments.
Metadata Preservation Control (API Only): Added a specific job option to preserve metadata stored on objects during transfers between S3, GCS, and Azure buckets. This ensures critical context travels with your data during migrations.
PowerScale Quota Visibility (API Only): Users can now access a read-only interface for PowerScale quota metrics via GraphQL. This includes support for filtering by path and thresholds, providing immediate insight into storage consumption versus limits.
Agent Network Diagnostics: Added a ping command to the Agent CLI. This allows administrators to test network connectivity and peer-to-peer mesh status directly from the agent for faster troubleshooting. E.g. soda ping <orgid>.agent.cloudsoda.io

Data Intelligence

Standalone Duplicate Detection Jobs: Duplicate detection has been separated from standard scan jobs. You can now configure reports to run duplicate detection against specific storage targets independently, improving workflow flexibility and report accuracy.
Enhanced Search Filters:
- Time-Based Filtering: Added an Indexed Date filter to search results, helping users identify files based on when they were last processed by the system.
- Folder Picker: A new interactive folder tree in the Search section allows for precise selection of starting paths, eliminating the need to type paths manually.

Feature Spotlight: Duplicate Detection Redesign

We have fully redesigned the duplicate detection feature. This new architecture transforms duplicate detection into a targeted, scalable reporting tool, giving you precise control over data hygiene and cost optimization.

Key Capabilities:

Custom Scheduling & Control: A new "Report" job type (found under Intelligence > Schedules) allows you to run duplicate detection independently of standard indexing scans.
Targeted Scope: You can now define exactly which storages or folders are analyzed. Compare on-prem vs. cloud, or check specific project folders without scanning the entire ecosystem.
Resource Protection: New job blocking logic prevents resource conflicts, ensuring report generation doesn’t impact ongoing operations.
Fault Tolerance: The system is now resilient to application restarts, ensuring long-running reports complete successfully.

Why the Change? Previously, duplicate detection was tied to scan jobs, which limited flexibility and could result in "polluted" results (e.g., flagging necessary backups as duplicates). The new engine separates detection from indexing, allowing for specific use cases like:

Deduplication within a single project.
Comparison between specific on-prem and cloud volumes.
Excluding specific "safe" zones from analysis.

Note: With this update, the legacy "isDuplicate" column and filters have been removed from the standard Search and File Management interfaces to prevent misleading data views.

Improvements & Enhancements

Performance & Reliability

Windows Scanning Optimization: Removed unnecessary metadata retrieval steps (inodes) during Windows file scans, significantly reducing kernel overhead and accelerating scan completion times.
Duplicate Detection Accuracy and Speed: Improved the underlying logic for duplicate detection to eliminate false positives and better manage memory usage during large-scale analysis resulting in a dramatically faster processing for large sets of files.
Resilient Indexing: Implemented smarter retry logic (exponential back-off) when the search index is under heavy load, preventing temporary congestion from failing entire scan jobs.
UI Responsiveness: Major updates to form handling (Policies, Jobs, Catalogs) have eliminated UI lag when editing complex configurations.

Global Connectivity & Infrastructure

Expanded Azure Support: Added support for the Belgium Central region.
New AWS Direct Connect Locations: Added support for Digital Realty MAD3 (Madrid, Spain) and DataBank LAS1 (Las Vegas, USA).
Kernel Validation: The Agent now validates the Linux kernel version at startup to ensure compatibility and prevent runtime errors on unsupported legacy systems.

Security & Governance

Scoped Project Visibility: Projects and Reports are now strictly pre-filtered based on a user's authorized storage access. Users will no longer see references to resources containing storage IDs they do not have permission to view.
Localization: Currency display in the Data Intelligence module now automatically adapts to the user’s browser locale settings.

Bug Fixes

Data Orchestration

AWS Custom URLs: Fixed an issue where Agents were defaulting to standard regional endpoints instead of configured custom API URLs for AWS storage.
Dry Run Accuracy: Resolved an issue preventing dry runs from executing correctly when conflict rules were configured.
Network Discovery: Fixed an issue where certain IPv4 addresses were omitted during network interface enumeration, ensuring more reliable peer-to-peer connectivity.
Wasabi Compatibility: Addressed a schema validation issue regarding storage usage reporting for specific Wasabi storage classes.

Data Intelligence

Scanner Stability: Fixed an issue where scanner processes could hang indefinitely without updating their status.
Orphaned Indices: Resolved a case where scanning storage could occasionally leave behind unaliased indices.
Report Filters: Corrected field mapping in the Report section to ensure filters apply correctly to generated data.
Job Management: Fixed a UI issue that occurred when navigating away from a job page immediately after cancelling a task.

Important Notices

Reports Page Deprecation: The "Reports" page in the Data Orchestration module is deprecated and will be removed in a future release. We are consolidating reporting features into the Data Intelligence module to reduce redundancy. If this change impacts any part of your current data workflow, please reach out to our support team immediately.

Data Orchestration Rules Toggle Persistence Deprecation: The ability to save the enabled/disabled status of specific rules under the rules tabs for Quick Transfer or Policy movement jobs in the Data Orchestration module is being deprecated and will be removed in a future release. The toggle itself will remain available in the UI for one-time testing and validation during the job creation process, but its status will no longer be persisted or saved with the job. This change is part of an ongoing initiative to streamline and uniformize the platform experience. If this change impacts any part of your current data workflow, please reach out to our support team immediately.

Highlights

New Features

Data Orchestration

Data Intelligence

Feature Spotlight: Duplicate Detection Redesign

Improvements & Enhancements

Performance & Reliability

Global Connectivity & Infrastructure

Security & Governance

Bug Fixes

Data Orchestration

Data Intelligence

Important Notices

Related articles