This article outlines an architectural framework to help design your CloudSoda deployment. It includes reference architectures for:
CloudSoda Agents
CloudSoda Controller
Data Intelligence Controller (both single-instance and clustered deployments)
For specific use cases, collaborate with the CloudSoda team to develop a custom architectural solution tailored to your requirements.
CloudSoda Agents
To install and host CloudSoda Agents, servers and VMs must meet the following hardware and OS requirements. If the servers or VMs do not meet these specifications, CloudSoda Agents may not function properly. These requirements serve as architectural guidelines—actual needs depend on:
- Dataset size
- Storage type
- Network speed involved in the data transfer
| CloudSoda Agent | ||||
| Minimum | Small | Recommended | Large | |
| CPU | 2 Cores | 8 Cores | 16 Cores | 48 Cores |
| Memory | 4 GB | 16 GB | 32 GB | 96 GB |
| Storage | 200 GB | 200 GB | 200 GB | 200 GB |
| Estimated Performance | 500 Mbps-1 Gbps | 1-3 Gbps | 3-10 Gbps | 10+ Gbps |
|
Agent OS Support Matrix | ||
| Linux | RHEL 9.2 and later & Ubuntu 24.04 and later | x86/arm |
| MacOS | 14 Sonoma or later | Intel/Apple Silicon |
| Windows | Windows 11 and later & Server 2019 and later | x86 |
Things to Know
- The CloudSoda Agent supports data movement across various hardware architectures, including laptops, servers, virtual machines, and other platforms.
- Data transfer speeds depend on factors such as latency, the number of files, and file sizes. Performance is directly influenced by the number of available cores—more cores result in faster processing.
- For cloud transfers, we recommend maintaining a 2:1 memory-to-core ratio. Additionally, when transferring data to or from the cloud or an object store, using a processor with an integrated SHA extension can enhance performance by approximately 30%.
- For a list of compatible processors, please refer to the following links:
CloudSoda Controller (Single instance)
To install and host the CloudSoda Controller, servers and VMs must meet the hardware and OS requirements outlined below. Available cloud templates are also provided.
If your servers or VMs do not meet these specifications, the CloudSoda Controller may not function as expected. Keep in mind that these requirements serve as architectural guidelines, actual needs may vary depending on your specific CloudSoda deployment.
| CloudSoda Controller | |||||
| Non-Production/Sandbox | Minimum | Recommended | Cloud | ||
| CPU | 8 Cores | 16 Cores | 32 Cores | AWS |
c6a.4xlarge GP3 Throughput: 1000 MB/s IOPS: 16000 |
| Memory | 16 GB | 32 GB | 64 GB | Azure | Standard_D16as_v5 |
| Storage (SSD) |
200 GB (500 MB/s) (Single Partition) |
750 GB (1000 MB/s) (Single Partition) |
2 TB (1000 MB/s) (Single partition) |
GCP | c2d-standard-16 |
| Controller OS Support Matrix | |
| Linux | Ubuntu 24.04 LTS Server, RHEL 9.2+ and similar |
Things to Know
- In a production environment, the CloudSoda Controller requires a minimum of 750GB of storage on the primary partition.
The Controller should be assigned a single IP address, excluding the loopback address.
For deployments handling over 100 million files or performing hourly scans, additional storage on the controller node and/or additional host may be necessary. For further guidance, please contact the CloudSoda team.
- For CloudSoda deployments with over 250 million files, CloudSoda can perform assessment, but performance and available features may be limited.
CloudSoda Controller (Clustered Deployment)
For environments with over 250 million files or where high redundancy is essential, we recommend a cluster deployment.
To install and host a cluster deployment, servers and virtual machines must meet the hardware and OS requirements outlined below. If these specifications are not met, the Data Intelligence Controller may not function as expected.
Please note that these requirements serve as architectural guidelines—actual needs may vary based on your specific CloudSoda deployment. Opensearch performance is heavily-dependent on underlying disk IO and internode latency. All nodes in the cluster must be colocated in the same physical location to ensure clustered services do not degrade as an artifact of induced latency. For cloud-hosted environments we recommend using the provider's Opensearch service.
| CloudSoda Controller | ||
| Controller | OpenSearch node | |
| CPU | 16 Cores | 16 Cores |
| Memory | 32 GB | 64 GB |
| Storage |
1TB SSD (500 MB/s) |
1.5TB SSD (1000 MB/s) |
| Cloud Volume Spec / Provisioned IOPS | GP3, 3,000 | iO2 or better, 20,000~30,000 |
| OS |
Ubuntu 24.04 LTS Server RHEL 9.2 and similar |
Ubuntu 24.04 LTS Server RHEL 9.2 and similar |
| AWS EC2 instance type example | c6a.4xlarge | r6g.xlarge.search for hosted opensearch, i3 or i4i family for ec2-based |
|
Number of VM (500 million to 1 billion files and folders) |
1 | 3 |
|
Number of VM (1 billion files to 1.5 billion files and folders) |
1 | 4 |
|
Number of VM (1.5 billion files to 2 billion files and folders) |
1 | 5 |
Comments
0 comments
Please sign in to leave a comment.