Reference Architecture Guide – Cloud Soda

This article outlines an architectural framework to help design your CloudSoda deployment. It includes reference architectures for:

CloudSoda Agents
CloudSoda Controller
Data Intelligence Controller (both single-instance and clustered deployments)

For specific use cases, collaborate with the CloudSoda team to develop a custom architectural solution tailored to your requirements.

CloudSoda Agents

To install and host CloudSoda Agents, servers and VMs must meet the following hardware and OS requirements. If the servers or VMs do not meet these specifications, CloudSoda Agents may not function properly. These requirements serve as architectural guidelines—actual needs depend on:

Dataset size
Storage type
Network speed involved in the data transfer

CloudSoda Agent
	Minimum	Small	Recommended	Large
CPU	2 Cores	8 Cores	16 Cores	48 Cores
Memory	4 GB	16 GB	32 GB	96 GB
Storage	200 GB	200 GB	200 GB	200 GB
Estimated Performance	500 Mbps-1 Gbps	1-3 Gbps	3-10 Gbps	10+ Gbps

Agent OS Support Matrix
Linux	RHEL 9.2 and later & Ubuntu 24.04 and later	x86/arm
MacOS	14 Sonoma or later	Intel/Apple Silicon
Windows	Windows 11 and later & Server 2019 and later	x86

Things to Know

The CloudSoda Agent supports data movement across various hardware architectures, including laptops, servers, virtual machines, and other platforms.
Data transfer speeds depend on factors such as latency, the number of files, and file sizes. Performance is directly influenced by the number of available cores—more cores result in faster processing.
For cloud transfers, we recommend maintaining a 2:1 memory-to-core ratio. Additionally, when transferring data to or from the cloud or an object store, using a processor with an integrated SHA extension can enhance performance by approximately 30%.
For a list of compatible processors, please refer to the following links:

CloudSoda Controller (Single instance)

To install and host the CloudSoda Controller, servers and VMs must meet the hardware and OS requirements outlined below. Available cloud templates are also provided.

If your servers or VMs do not meet these specifications, the CloudSoda Controller may not function as expected. Keep in mind that these requirements serve as architectural guidelines, actual needs may vary depending on your specific CloudSoda deployment.

CloudSoda Controller
	Non-Production/Sandbox	Minimum	Recommended	Cloud
CPU	8 Cores	16 Cores	32 Cores	AWS	c6a.4xlarge GP3 Throughput: 1000 MB/s IOPS: 16000
Memory	16 GB	32 GB	64 GB	Azure	Standard_D16as_v5
Storage (SSD)	200 GB (500 MB/s) (Single Partition)	750 GB (1000 MB/s) (Single Partition)	2 TB (1000 MB/s) (Single partition)	GCP	c2d-standard-16

Controller OS Support Matrix
Linux	Ubuntu 24.04 LTS Server, RHEL 9.2+ and similar

Things to Know

In a production environment, the CloudSoda Controller requires a minimum of 750GB of storage on the primary partition.

The Controller should be assigned a single IP address, excluding the loopback address.
For deployments handling over 100 million files or performing hourly scans, additional storage on the controller node and/or additional host may be necessary. For further guidance, please contact the CloudSoda team.
For CloudSoda deployments with over 250 million files, CloudSoda can perform assessment, but performance and available features may be limited.

CloudSoda Controller (Clustered Deployment)

For environments with over 250 million files or where high redundancy is essential, we recommend a cluster deployment.

To install and host a cluster deployment, servers and virtual machines must meet the hardware and OS requirements outlined below. If these specifications are not met, the Data Intelligence Controller may not function as expected.

Please note that these requirements serve as architectural guidelines—actual needs may vary based on your specific CloudSoda deployment. Opensearch performance is heavily-dependent on underlying disk IO and internode latency. All nodes in the cluster must be colocated in the same physical location to ensure clustered services do not degrade as an artifact of induced latency. For cloud-hosted environments we recommend using the provider's Opensearch service.

CloudSoda Controller
	Controller	OpenSearch node
CPU	16 Cores	16 Cores
Memory	32 GB	64 GB
Storage	1TB SSD (500 MB/s)	1.5TB SSD (1000 MB/s)
Cloud Volume Spec / Provisioned IOPS	GP3, 3,000	iO2 or better, 20,000~30,000
OS	Ubuntu 24.04 LTS Server RHEL 9.2 and similar	Ubuntu 24.04 LTS Server RHEL 9.2 and similar
AWS EC2 instance type example	c6a.4xlarge	r6g.xlarge.search for hosted opensearch, i3 or i4i family for ec2-based
Number of VM (500 million to 1 billion files and folders)	1	3
Number of VM (1 billion files to 1.5 billion files and folders)	1	4
Number of VM (1.5 billion files to 2 billion files and folders)	1	5

CloudSoda Agents

Things to Know

CloudSoda Controller (Single instance)

Things to Know

CloudSoda Controller (Clustered Deployment)

Related articles