CloudSoda's innovative data management software offers seamless communication between on-premises and cloud storage environments through two key components: the Controller and the Agent.
The Controller can be deployed either on-premises, within the customer's cloud or hosted by CloudSoda for an additional fee. The Controller contains the user interface (UI) and the application programming interface (API), and acts as the data movement manager for the agents. Additionally, the Controller is responsible to track and report all data-related activity.
The Agent can be installed on a laptop, server or virtual machine and it must have a direct network connection to the Controller. Multiple agents can connect with one another over the LAN and WAN, creating a mesh network that allows data transfers between agents when a single agent is unable to see both the source and target storage.
This paper provides an overview of how the Controller and Agents manage data integrity, data transfer, as well as software security and firewall settings for the CloudSoda application.
CloudSoda is responsible to move and copy data, supporting both file-to-file and file-to-object transactions. To ensure data integrity during transfers, CloudSoda employs two distinct methods
When a file-to-file data transfer is initiated, a MD5 checksum (1) is generated on the source file. The file is transferred to a temporary file on the target system to avoid overwriting an existing file. A MD5 checksum is then generated on the target temp file and the two checksums are compared. If they match, then CloudSoda renames and updates the attributes on the target file, ensuring a complete transfer without data corruption. If the checksums do not match, then the data transfer operation is retried (up to three times) before being marked as failed.
When a file is transferred to an S3 object, the same transfer process is attempted, but the validation is based on the S3 ETag (2) mechanism. AWS or S3-based object storage generates an Etag when the file is uploaded and completed. CloudSoda can validate the resulting object Etag by calculating it using source data and the number of parts used in the upload. If the Etag does not match or the upload fails, then CloudSoda will retry the transfer operation (up to 40 times) with an exponential backoff before being marked as failed. When an object is downloaded, the object’s Etag is compared to the local temp file's MD5 before CloudSoda completes the object-to-file transfer. This comparison guarantees that the files and objects are identical and no data corruption occurs during transfers.
For Google Cloud Storage, the data transfer validation process is identical to S3, but it uses a CRC32C hash instead of an Etag. However, Azure Blob is different in that it does not provide a hash unless it is a single-part upload. CloudSoda validates the upload for Azure Blob by verifying that the correct number of bytes is written to the blob container. At the end of the upload, CloudSoda sets the MD5 of the object as metadata, making it available for future use.
The CloudSoda Controller orchestrates data movement while the Agent performs the data movement. The Agent can be internal or external to the Controller. An internal Agent supports mounting NFS/SMB shares directly. While it is possible for an external Agent to mount NFS/SMB storage, it is not supported because it can result in unexpected behavior. Therefore, we recommend that if an external agent needs to access data over NFS/SMB, the end user mount the storage directly on the host.
When moving data via SMB/NFS, these protocols do not support encryption, by default. For cloud or object transfers, CloudSoda uses the cloud vendor's provided SDK. For AWS, the Azure SDK transfers the data using TLS v1.2 (4) (5) (6) over the WAN to ensure that data is encrypted in flight. For Google Cloud Storage, the Google SDK uses TLS 1.3. to transfer data into their object storage. (7) (8) (9)
An external Agent can be installed on Linux, Windows or MacOS. This Agent can access any files that the host operating system can access, including Local Volumes, Direct Attached Storage or SMB/NFS mounted storage.
Software Access and Firewalls
The CloudSoda Controller can be deployed on-premises, in a customer cloud or in a cloud hosted by CloudSoda. Figure 1 depicts the ports and software packages that CloudSoda uses when communicating with file-based storage, cloud provider storage, CloudSoda Agents, and our managed software control plane.
Figure 1 - CloudSoda Network Diagram
The CloudSoda Controller use ports and network connections to perform four primary tasks:
- Administration: This involves deploying, upgrading, and monitoring CloudSoda.
- Data Movement: The Controller facilitates file/object-based data movement through the Agent or SMB/NFS.
- Management: This includes facilitating the creation of the mesh network, as well as upgrading and monitoring the status of the Agents.
- Web UI and API: The Controller offers both a graphical user interface and an application programming interface for utilizing the platform.
Administering CloudSoda offers many benefits to customers, such as easy installation, real-time software patching and updates, and simplified troubleshooting. Additionally, the system gathers vital information about the health of CloudSoda software and hardware to help maintain optimal performance. No customer data is collected during this process.
If you use the Controller's internal Agent for NFS/SMB, then the Controller needs access to all the ports indicated in Figure 1 to perform file operations successfully.
In contrast, when accessing cloud storage, CloudSoda does not require any firewall exceptions. Ports 80/443 are used by CloudSoda to establish a connection with cloud storage targets. After the connection is established, then all data sent to cloud storage targets is encrypted, as described above.
CloudSoda uses Agents to move or copy data. To enable communication with the CloudSoda Controller, an Agent requires two outbound UDP ports 7498/7499. All data sent between the Agent and the Controller is Encrypted. The Agent can be installed on Linux, Windows or macOS operating systems and it can access any files accessible to the host operating system, including local volumes, directly attached storage or SMB/NFS-mounted storage.
Figure 2 - Controller with Agents
To use CloudSoda’s web UI, port 443 must be open to access the software. CloudSoda can be accessed via port 80, but it is upgraded to SSL (443) for the connection.
To use CloudSoda's API, port 443 must be open to access the interface.
CloudSoda AgentAn Agent can be installed on Linux, Windows, or macOS operating systems. An Agent has the potential to access any files that the host operating system can access, including local volumes, Direct Attached Storage or SMB/NFS mounted storage. For public and private clouds, an Agent can access object storage including S3, GCP Cloud Storage, Azure Blob, and other S3-like storage.
Figure 3 - Data Transfers via CloudSoda Control Plane and Agent Mesh
NOTE: The CloudSoda Internal Agent cannot participate in the Agent mesh and therefore, transferring data to another agent is not supported.
Ports: TCP 80, 443 for container pulls
Ports: TCP 80,443 for the controlplane
Ports TCP 80,443 for monitoring
Ports TCP 443, 10516, 10255, 10250, UDP 123 for monitoring Datadog: