Intro
CloudSoda's innovative data management software offers seamless communication between on-premises and cloud storage environments through two key components: the Controller and the Agent. The Controller can be deployed either on-premises, within the customer's cloud, or hosted by CloudSoda for an additional fee. The Controller contains the user interface (UI), and the application programming interface (API), and acts as the data movement manager for the agents. Furthermore, the Controller is responsible for tracking and reporting all data activities. The Agent, on the other hand, is installed on a laptop, server, or virtual machine and must have a direct network connection to the Controller. The agents can also connect with each other over the LAN and WAN, creating a mesh network that allows data transfer between agents when a single agent is unable to see the source and target storage.
This document provides an overview of data integrity, data transfer, as well as software security, and firewall settings for the CloudSoda application.
Data Integrity
CloudSoda is responsible for moving and copying data, whether it is file-to-file or file-to-object. To ensure data integrity during transfers, CloudSoda employs two distinct methods. Firstly, when a file-to-file transfer is initiated, an MD5 checksum (1) is generated on the source file. The file is then transferred to a temporary file on the target system to avoid overwriting an existing file. A MD5 checksum is then generated on the target temp file, and the two checksums are compared. If they match, CloudSoda will rename and update the attributes on the target file, ensuring a complete transfer without data corruption. If they do not match, the operation is retried up to three times before being marked as failed. When a file is transferred to an S3 object, the same process is attempted, but the validation is based on the S3 ETag (2) mechanism. AWS or S3-based object storage generates the Etag when the file is uploaded and completed, and CloudSoda can validate the resulting object Etag by calculating it using the source data and the number of parts used in the upload. If the Etag does not match or the upload fails, CloudSoda will retry up to 40 times with an exponential backoff before marking the operation as failed. When an object is downloaded, the object’s Etag is compared to the local temp file's MD5 before CloudSoda completes the object-to-file transfer. This guarantees that the files and objects are identical and that there is no data corruption during transfers.
For Google Cloud Storage, the validation process is identical to S3, but it uses a CRC32C hash instead of an Etag. However, Azure Blob is different in that it does not provide a hash unless it is a single-part upload. Therefore, CloudSoda validates the upload for Azure Blob by verifying that the correct number of bytes is written to the blob container. At the end of the upload, CloudSoda sets the MD5 of the object as metadata, making it available for future use.
Data Movement
The CloudSoda Controller orchestrates the data movement while the Agent performs the data movement. The Agent can be internal or external to the Controller. The internal Agent supports mounting NFS/SMB shares directly. While it is possible for an external Agent to mount NFS/SMB storage, it is not supported because it can result in unexpected behavior. Therefore, it is recommended that if an external agent needs to access data over NFS/SMB that the end user directly mounts the storage on the host.
There is no encryption when moving data via the SMB/NFS protocols as they do not support it by default. For cloud or object transfers CloudSoda uses the cloud vendors provided SDK. The AWS, Azure SDK transfers the data using TLS v1.2 (4) (5) (6) over the WAN to ensure data is encrypted in flight. The Google SDK uses TLS 1.3. to transfer data into their object storage. (7) (8) (9)
The external Agent can be installed on Linux, Windows, or MacOS. The Agent has the potential to access any files that the host operating system can access, which includes Local Volumes, Direct Attached Storage, or SMB/NFS mounted storage.
Software Access and Firewalls
The CloudSoda Controller can be deployed on-prem, in a customer cloud, or cloud hosted by CloudSoda. Fig 1 refers to the ports and software packages CloudSoda uses when talking to file-based storage, cloud provider storage, CloudSoda Agents, and our managed software control plane.
Figure 1
The CloudSoda Controller uses ports and connections to perform four primary tasks:
- Administration: This involves deploying, upgrading, and monitoring CloudSoda.
- Data Movement: The Controller facilitates file/object-based data movement through the Agent or SMB/NFS.
- Management: This includes facilitating the creation of the mesh network, as well as upgrading and monitoring the status of the Agents.
- Web UI and API: The Controller offers both a graphical user interface and an application programming interface for utilizing the platform.
Administering CloudSoda offers many benefits to customers, such as easy installation, real-time software patching and updates, and simplified troubleshooting. Additionally, it's necessary to gather information about the health of CloudSoda software and hardware. However, no customer data is collected during this process.
Data Movement:
If you use the Controller's internal Agent for NFS/SMB, the Controller needs access to all the ports indicated on the diagram to perform file operations successfully.
In contrast, when accessing cloud storage, CloudSoda does not require any firewall exceptions. Ports 80/443 are used by CloudSoda to establish a connection with cloud storage targets. After the connection is established, all data sent to cloud storage targets is encrypted as per the aforementioned section.
Communication:
CloudSoda uses agents to move or copy data. To enable communication with the CloudSoda Controller, the agent requires two outbound UDP Ports 7498/7499. The agent can be installed on Linux, Windows, or MacOS operating systems and can access any files accessible to the host operating system, including local volumes, directly attached storage, or SMB/NFS-mounted storage.
Figure 2
Web UI:
To use CloudSoda’s web UI, port 443 must be open to access the software. CloudSoda can be accessed via port 80, but it is upgraded to SSL (443) for the connection.
API:
To use CloudSoda's API, port 443 must be open to access the interface.
CloudSoda Agent
The Agent can be inst alled on Linux, Windows, or macOS operating systems. The Agent has the potential to access any files that the host operating system can access, including local volumes, Direct Attached Storage, or SMB/NFS mounted storage. For public and private clouds, the Agent can access object storage including S3, GCP Cloud Storage, Azure Blob, and other S3 like storage.CloudSoda agent-to-agent transfer is initiated when the source and target storage are on disparate agents. To facilitate this agent-to-agent transfer, an Agent mesh is created in which all the Agents attempt to connect with each other. The Agents need the following UDP ports (1024-65535) open (ingress & egress) in order to connect. The Agents establish the mesh by using every network interface possible to connect, including LAN routes, VPN tunnels, and over the WAN using the CloudSoda control plane. For an Agent to leverage the CloudSoda control plane, it must be able to access https://controlplane.sna.cloudsoda.io/. Without access to the SoDA control plane, transfers through a NAT or bridging secured private networks are not possible. See Figure 3.
NOTE: The CloudSoda Internal Agent cannot participate in the agent mesh and therefore, transferring data to another agent is not supported.
References
- https://en.wikipedia.org/wiki/MD5
- https://aws.amazon.com/premiumsupport/knowledge-center/data-integrity-s3/
- https://aws.amazon.com/sdk-for-go/
- https://docs.aws.amazon.com/sdk-for-go/v1/developer-guide/security.html
- https://github.com/Azure/azure-storage-blob-go
- https://docs.microsoft.com/en-us/azure/storage/common/transport-layer-security-configure-client-version?tabs=powershell
- https://pkg.go.dev/cloud.google.com/go/storage
- https://cloud.google.com/storage/docs/gsutil/addlhelp/SecurityandPrivacyConsiderations#transport-layer-security
- https://cloud.google.com/blog/products/networking/tls-1-3-is-now-on-by-default-for-google-cloud-services
Appendix A
Ports: TCP 80, 443 for container pulls
gitlab:
registry.gitlab.com
docker hub:
auth.docker.io
index.dockerhub.io
dockerhub.io
quay.io
cdn.quay.io
index.docker.io
Ports: TCP 80,443 for the controlplane
Rancher:
rnch-prd-usw2-1.cloudsoda.io
Ports TCP 80,443 for monitoring
Rollbar:
35.184.69.251
35.201.93.97
35.201.81.77
Ports TCP 443, 10516, 10255, 10250, UDP 123 for monitoring Datadog:
trace.agent.datadoghq.com
process.datadoghq.com
agent-intake.logs.datadoghq.com
agent-http-intake.logs.datadoghq.com
orchestrator.datadoghq.com
app.datadoghq.com
*.agent.datadoghq.com (10)
(10) https://docs.datadoghq.com/agent/guide/network/?tab=agentv6v7)
Comments
0 comments
Article is closed for comments.