CloudSoda Release 26.4 – Cloud Soda

CloudSoda 26.4 introduces Catalogs (Preview) — a new way to stream curated file metadata from your storage into the data platforms your teams already use — alongside a richer Activity log, license-driven feature entitlements, and a large round of scanner reliability and platform hardening across both CloudSoda Data Intelligence and CloudSoda Data Orchestration.

Highlights

Catalogs (Preview) — turn your unstructured storage into a live, queryable index inside Snowflake, Databricks, BigQuery, or any platform that reads from Kafka. Define what to include, set a refresh schedule, and CloudSoda streams the metadata for you.
License-based feature entitlements — Intelligence and Catalog capabilities now switch on automatically based on your license.
Smarter Activity log — errors and warnings now carry diagnostic codes, summaries, and expandable detail so you can see exactly what went wrong, and where.

New Features

Catalogs (Preview)

Your unstructured data lives in storage. The questions you want to answer about it live in your data platform. Catalogs close that gap.

A Catalog is a rule that selects files across your on-premises and cloud storage and continuously streams their metadata — name, path, size, owner, timestamps, storage tier, and more — into Kafka. From Kafka, that metadata flows into the analytics platform of your choice: Snowflake, Databricks, BigQuery, or any Kafka-aware destination. The result is a live index of your file estate inside the platform where your data engineers and analysts already work, so they can join unstructured-data metadata against their structured data without ever moving the files themselves.

Typical uses:

Correlate file metadata with structured business data already in your warehouse (for example, joining medical-imaging files to project records, or media assets to production schedules).
Build a queryable, always-current catalog of a multi-petabyte file estate in Snowflake or BigQuery.
Feed downstream pipelines and lakehouse tables from a single, governed metadata stream.

What's in this release:

Create-catalog flow with storage selection and validation, plus a tabbed detail page for overview, publishing, and schedule settings.
Recurring ingest schedules built with a new visual schedule builder, with upcoming runs shown in the catalog detail view. Catalogs can also be triggered on demand.
Enable / disable a catalog at any time.
Live statistics — catalog size and file count update automatically after each successful ingest, with a "Last Ingested" timestamp.
Metadata streaming to a secured, per-tenant messaging channel (dedicated credentials and access control), with channels cleaned up automatically when a catalog is removed.
Catalog permissions — new CATALOG_MANAGE and CATALOG_USE permissions give you fine-grained control over who can create and use catalogs.

Catalogs is available as a Preview feature and is not enabled by default. If you'd like to try it, contact your account representative.

License-Based Feature Entitlements

CloudSoda now reads enabled capabilities directly from your license:

Intelligence and Catalog features are shown or hidden automatically based on what your license includes.
Entitlements are applied consistently across the UI and API, with no manual configuration required.
Existing accounts are bootstrapped from their current license on upgrade.

Activity Log Diagnostics

The Activity log is now far more informative:

Error and warning entries carry a code, summary, and diagnostic detail, with a collapsible view that keeps the log readable while letting you drill into the full diagnostic when needed.
Agent attachment failures now produce clear error entries explaining why an attachment is broken.

Improvements

CloudSoda Data Intelligence

Job status filter cards for quickly filtering jobs by status, plus new duplicate-file filtering.
Smarter task scheduling — task blocking is now limited to genuinely conflicting work (for example, a scan vs. a report on the same data) instead of blocking unrelated tasks.
Dedicated snap-diff setup phase that decouples snapshot/changelist creation from file scanning for faster, more predictable runs.
More resilient jobs — the scan-job watcher now reconnects after a full-system restart, job cancellation reliably targets only running tasks, and the job result panel now shows human-readable numbers.

Scanner & Data Movement

Adaptive throughput limiter for storage-protocol scanning to better match each target's capacity.
More tunable indexing — increased and configurable OpenSearch/Elasticsearch bulk-indexer workers, longer retry/back-off windows for transient storage errors, and configurable job-log retention.
Clearer S3 errors — credential-chain failures and InvalidObjectState responses are now surfaced as actionable file-level errors.

User Experience

Refreshed design system: updated color and icon tokens, consistent focus states, and tightened spacing across forms and dialogs.
New recurring-schedule builder with a visual popover, replacing the previous third-party control.
Clearer forms and tables — long values truncate with a tooltip instead of overflowing, and field-level validation errors now appear directly on forms (login, customer creation, and others) instead of as generic messages.
Streamlined detail pages — primary actions are promoted out of the context menu into dedicated, easier-to-reach positions, and a consolidated storage Settings tab brings storage configuration (including the Ignore section) into one place.
Accessibility and internationalization: accessible labels and error associations on form fields, an always-present live region for alerts, improved keyboard focus indicators, and locale-aware formatting of numbers and durations.

Bug Fixes

Scanning & Storage

Windows scanner reliability: fixed directory reparse points causing infinite loops and corrupt scans, intermittent errors from file handles opened without synchronous I/O, logging failures from a missing log directory and double-initialized logger, a phantom child emitted for a drive root that left the rollup empty, and file mode bits not respecting the read-only attribute.
SMB/NFS scanning: fixed orphaned rollups when a stat call fails mid-scan.
S3 and S3-compatible storage: fixed a scan crash on object keys with invalid percent-encoding (such keys are now skipped and surfaced as errors), and multipart uploads failing against storage that does not support checksums.
Scan correctness: fixed exclusion rules being silently bypassed, and scan jobs becoming stuck in the enrichment phase when OpenSearch was unavailable.

Platform, UI & Other

UI fixes: fixed the file-browser "select all" checkbox doing nothing, policy/ignore-rule edits not appearing until a forced refresh, byte values dropping decimal precision when converted, a PowerScale password edit forcing an unnecessary username change, and popper-based inputs rendering behind modals on Chrome/Linux.
Access control: fixed an identity-bridge issue that removed managers from any roles it managed.
Data Intelligence: fixed project-cost percentages not showing for new values, the schedule "scan all" button not applying to all storage, and a duplicate-key crash during partition sub-range generation.