Google Cloud Anthos¶
You can run CVEDIA-RT in a GKE cluster using Google Hybrid Cloud Management Anthos
Anthos can deploy CVEDIA-RT pods at scale with GPU acceleration. CVEDIA-RT will run headless you will need to use our REST API to interact with it.
CVEDIA-RT supports any NVIDIA GPU instance, you should use instances with a single GPU attached. Models for the specific platform you're using will be downloaded on fly.
AMD GPUS
AMD GPU acceleration is not supported, CVEDIA-RT will fallback to CPU inference.
Artifact Setup¶
Before you can run anything, you need to copy CVEDIA-RT docker image to a Artifact Registry You can find what is the latest CVEDIA-RT docker image at our }{:target="_blank"release page or directly at docker hub
Cluster Setup¶
If you don't yet have a GKE cluster created, Anthos support 2 types of cluster, autopilot
and standard
.
When running on autopilot
you won't have fine control on what kind of machines are executing tasks, meaning they will all be CPU based instances.
When running on standard
you can control your nodes and node pool, allowing you to create much faster GPU instances, however there's less automanagement in terms of costs.
Creating autopilot
GKE cluster¶
- Open Anthos
- Click Configure
GKE Autopilot
- Set to public
- Click Create
Creating standard
GKE cluster¶
This type of cluster is recommended if you want to run on GPUs, prices are significantly higher. You might need to ask for GPU quotas, read more in how to gpus.
- Open Anthos
- Click Configure
GKE Standard
- Click Node Pools -> default-pool -> nodes
- In
Machine Configuration
, selectGPU
- Select a
GPU type
that best suits you, CVEDIA-RT will run in any NVIDIA gpu - Number of GPUs should be always 1
- Click Create
Creating a Workload¶
- Open Anthos
- Click
Workloads
in the sidebar - Click
+ Deploy
- Select CVEDIA-RT image from your Artifact Registry
- Click
+ Add environment variable
, add name:RUN_UI
value:0
- Click
Done
- Click
Continue
- Select a Cluster
- Click
Deploy
Exposing a Workload¶
Once your workload is running, you will need to expose it in order to be able to access it.
- Open Anthos
- Click
Workloads
in the sidebar - Click your running CVEDIA-RT workload instance
- In the top header, click on
Actions
->Expose
- Set port
80
, target port80
and Service typeLoad Balancer
- Click
Expose
Once the cluster applies your changes, will be able to access CVEDIA-RT's REST API using the public IP reported by the load balancer in the workload deployment details screen.
Session Affinity¶
Usually all CVEDIA-RT solutions require a consistent time / frame order for it's tracking systems to work, when you run it with multiple replicas each one of them will have a it's own isolated state, without session affinity tracking will be poor and CVEDIA-RT outputs maybe non sense.
To avoid this problem you should enable sessionAffinity
, this will assure that clients will always talk with the same workload instance, you can read more at ingress features
Alternatively, for small deployments, you can run CVEDIA-RT with a single replica, so all traffic gets routed to the same instance.
Notes¶
CVEDIA-RT container automatically provide health metrics back to the cluster, in the top of that you can use the API /status
to query for the instance metrics, allowing for easy scalability.