CloudSoda's innovative data management software offers seamless communication between on-premises and cloud storage environments through two key components: the Controller and the Agent.
The Controller can be deployed either on-premises, within the customer's cloud or hosted by CloudSoda for an additional fee. The Controller contains the user interface (UI) and the application programming interface (API), and acts as the data movement manager for the agents. Additionally, the Controller is responsible to track and report all data-related activity.
The Agent can be installed on a laptop, server or virtual machine and it must have a direct network connection to the Controller. Multiple agents can connect with one another over the LAN and WAN, creating a mesh network that allows data transfers between agents when a single agent is unable to see both the source and target storage.
This paper provides an overview of how the Controller and Agents manage data integrity, data transfer, as well as software security and firewall settings for the CloudSoda application.
Data Integrity
CloudSoda is responsible to move and copy data, supporting both file-to-file and file-to-object transactions. To ensure data integrity during transfers, CloudSoda employs two distinct methods
When a file-to-file data transfer is initiated, a MD5 checksum (1) is generated on the source file. The file is transferred to a temporary file on the target system to avoid overwriting an existing file. A MD5 checksum is then generated on the target temp file and the two checksums are compared. If they match, then CloudSoda renames and updates the attributes on the target file, ensuring a complete transfer without data corruption. If the checksums do not match, then the data transfer operation is retried (up to three times) before being marked as failed.
When a file is transferred to an S3 object, the same transfer process is attempted, but the validation is based on the S3 ETag (2) mechanism. AWS or S3-based object storage generates an Etag when the file is uploaded and completed. CloudSoda can validate the resulting object Etag by calculating it using source data and the number of parts used in the upload. If the Etag does not match or the upload fails, then CloudSoda will retry the transfer operation (up to 40 times) with an exponential backoff before being marked as failed. When an object is downloaded, the object’s Etag is compared to the local temp file's MD5 before CloudSoda completes the object-to-file transfer. This comparison guarantees that the files and objects are identical and no data corruption occurs during transfers.
For Google Cloud Storage, the data transfer validation process is identical to S3, but it uses a CRC32C hash instead of an Etag. However, Azure Blob is different in that it does not provide a hash unless it is a single-part upload. CloudSoda validates the upload for Azure Blob by verifying that the correct number of bytes is written to the blob container. At the end of the upload, CloudSoda sets the MD5 of the object as metadata, making it available for future use.
Data Movement
The CloudSoda Controller orchestrates data movement while the Agent performs the data movement.
When moving data via SMB/NFS, these protocols do not support encryption, by default. For cloud or object transfers, CloudSoda uses the cloud vendor's provided SDK. For AWS, the Azure SDK transfers the data using TLS v1.2 (4) (5) (6) over the WAN to ensure that data is encrypted in flight. For Google Cloud Storage, the Google SDK uses TLS 1.3. to transfer data into their object storage. (7) (8) (9)
An Agent can be installed on Linux, Windows or MacOS. This Agent can access any files that the host operating system can access, including Local Volumes, Direct Attached Storage or SMB/NFS mounted storage. The Agent also can use an SMB accessor to access SMB mounts Directly.
Software Access and Firewalls
The CloudSoda Controller can be deployed on-premises, in a customer cloud or in a cloud hosted by CloudSoda. Figure 1 depicts the ports and software packages that CloudSoda uses when communicating with external services. A detailed list of the ports and services can be found in Appendix A.
Figure 1 - CloudSoda Network Diagram
The CloudSoda Controller use ports and network connections to perform four primary tasks:
- Administration: This involves deploying, upgrading, and monitoring CloudSoda.
- Data Movement: The Controller facilitates file/object-based data movement through the Agent.
- Management: This includes facilitating the creation of the mesh network, as well as upgrading and monitoring the status of the Agents.
- Web UI and API: The Controller offers both a graphical user interface and an application programming interface for utilizing the platform.
Administering CloudSoda offers many benefits to customers, such as easy installation, real-time software patching and updates, and simplified troubleshooting. Additionally, the system gathers vital information about the health of CloudSoda software and hardware to help maintain optimal performance. No customer data is collected during this process.
Data Movement
If you use a CloudSoda Agent for NFS/SMB, then the Agent needs access to all the ports indicated in Figure 2 to perform file operations successfully.
In contrast, when accessing cloud storage, CloudSoda does not require any firewall exceptions. Ports 80/443 are used by CloudSoda to establish a connection with cloud storage targets. After the connection is established, then all data sent to cloud storage targets is encrypted, as described above.
Communication
CloudSoda uses Agents to move or copy data. To enable communication with the CloudSoda Controller, an Agent requires two outbound UDP ports 7498/7499. All data sent between the Agent and the Controller is Encrypted. The Agent can be installed on Linux, Windows or macOS operating systems and it can access any files accessible to the host operating system, including local volumes, directly attached storage or SMB/NFS-mounted storage.
Figure 2 - Controller with Agents
Web UI
To use CloudSoda’s web UI, port 443 must be open to access the software. CloudSoda can be accessed via port 80, but it is upgraded to SSL (443) for the connection.
API
To use CloudSoda's API, port 443 must be open to access the interface.
CloudSoda Agent
An Agent can be installed on Linux, Windows, or macOS operating systems. An Agent has the potential to access any files that the host operating system can access, including local volumes, Direct Attached Storage or SMB/NFS mounted storage. For public and private clouds, an Agent can access object storage including S3, GCP Cloud Storage, Azure Blob, and other S3-like storage.Figure 3 - Data Transfers via CloudSoda Control Plane and Agent Mesh
References
- https://en.wikipedia.org/wiki/MD5
- https://aws.amazon.com/premiumsupport/knowledge-center/data-integrity-s3/
- https://aws.amazon.com/sdk-for-go/
- https://docs.aws.amazon.com/sdk-for-go/v1/developer-guide/security.html
- https://github.com/Azure/azure-storage-blob-go
- https://docs.microsoft.com/en-us/azure/storage/common/transport-layer-security-configure-client-version?tabs=powershell
- https://pkg.go.dev/cloud.google.com/go/storage
- https://cloud.google.com/storage/docs/gsutil/addlhelp/SecurityandPrivacyConsiderations#transport-layer-security
- https://cloud.google.com/blog/products/networking/tls-1-3-is-now-on-by-default-for-google-cloud-services
Appendix A
The following is a detailed list of ports and cloud endpoints. Since cloud based services use load balancing and regional based routing all the IP's will be dynamic. Ports: TCP 80, 443 for pulling container images.
CloudSoda pulls images from the following locations.
us-west1-docker.pkg.dev
registry-1.docker.io
Ports: UPD 53 and TCP 443 for DNS Services.
URLs:
https://route53.amazonaws.com
https://acme-v02.api.letsencrypt.org/directory
Ports: TCP 80,443, 3080 for the control plane.
URL: https://cloudsoda.teleport.sh
Port: TCP 50051 for CloudSoda location and price book service. URLs:
https://compass.cloudsoda.io
https://books.cloudsoda.io
Ports TCP 80,443 for exception monitoring.
URL: https://api.rollbar.com
Ports TCP 443, 10516, 10255, 10250, UDP 123 for monitoring Datadog.
URLs:
trace.agent.datadoghq.com
process.datadoghq.com
agent-intake.logs.datadoghq.com
agent-http-intake.logs.datadoghq.com
orchestrator.datadoghq.com
app.datadoghq.com
*.agent.datadoghq.com (10)
The following endpoints are necessary for the install only. However, we recommend for the install to temporarily allow all HTTPS traffic out to ensure a smooth installation.
https://get.k3s.io
https://k3s-ci-builds.s3.amazonaws.com
https://update.k3s.io
https://cloudsoda.teleport.sh
https://charts.releases.teleport.dev
https://cannery.cloudsoda.io
https://github.com
https://raw.githubusercontent.com
https://api.github.com
https://cloudsodainitscripts-usw2-p-1.s3.amazonaws.com
(10) https://docs.datadoghq.com/agent/guide/network/?tab=agentv6v7)
Comments
0 comments
Article is closed for comments.