Welcome to Cloudtuner documentation!

About Cloudtuner

Cloudtuner is a platform that optimizes cloud costs and performance for any workload, providing effective cloud cost management for all types of organizations.

Cloudtuner primary capabilities

 

  • Pool Transparency

    Pool breakdown and transparency across all business units, users, projects and individual cloud services. The customer sets soft and hard limits.

  • Simple multicloud asset provisioning workflow via dashboard or API

    A single Cloudtuner dashboard is used to manage hybrid clouds along with the virtual machines, volumes and network settings. Custom and mandatory tagging is supported. Custom TTL rules can be created to clear out workloads according to customer’s policies.

  • Flexible alerts and notifications

    Custom notifications about all related events, e.g. immediate workload volume increase, pool forecast overdraft or unusual behavior.

  • Swift user creation and business units distribution

    Step-by-step guides for configuring user settings, organization subdivisions, pools and quotas assignments.

  • VM Power Schedules

    The Power Schedule feature for cloud instances enables users to automate the management of virtual machine (VM) states by scheduling their start and stop times.

  • Optimal utilization of Reserved Instances, Savings Plans, and Spot Instances

    A visualization of your compute usage covered by RIs and SPs versus your uncovered usage. By analyzing this data, you can identify potential savings by increasing your RI/SP coverage where beneficial.

  • S3 duplicate object finder

    The S3 Duplicate Finder is designed to help optimize your AWS S3 storage usage by identifying and managing duplicate objects across your buckets.

More features are just around the corner and will be introduced shortly!

How it works

Cloudtuner is dedicated to improve cloud usage experience but does not actively interfere with processes in your environment since it only requires Read-Only rights for the connected cloud account which is used as the main Data Source for all recommendations.

The following data is used:

  1. Billing info – all details related to cloud expenses.
  2. State of resources (for actively discoverable types) in the cloud – necessary for applying Constraints like TTL and Expense limit as well as for Cloudtuner’s Recommendation Engine.
  3. Monitoring data from the cloud used for identifying underutilized instances.

Naturally, every cloud platform differs in the way the above data is obtained.

Alibaba

1. Billing information is acquired via Billing API:

2. Cloud’s Monitor service is used as the source of all monitoring data.

3. Resource discovery is performed via Discovery API:

Refer to this guide for more details.

Amazon Web Services (AWS)

1. Billing information is retrieved from the Data Exports located in a designated S3 bucket in the cloud:
2. Resource discovery is performed via Discovery API:

3. Amazon Cloud Watch is used as the source of monitoring data.

Refer to this guide for more details.

Azure

1. Billing information is acquired via Billing API:
2. Resource discovery is performed via Discovery API:

3. Cloud’s Monitoring service is used as the source of all monitoring data.

Refer to this guide for more details.

GCP

1. Billing information is retrieved from the BigQuery service.

2. Cloud’s Monitoring service is used as the source of all monitoring data.

3. Resource discovery is performed via Discovery API:

Refer to this guide for more details.

Kubernetes

To enable cost management and FinOps capabilities for your Kubernetes cluster, you need to deploy a software component that collects information about running pods and converts it into cost metrics. Refer to this guide for more details.

This software is open-source, free of malware, and requires read-only access to Kubernetes metadata and performance metrics. Please review the code and find a detailed description on GitHub.

menu

  • Home. Find your organization’s current spending and projected expenses for the upcoming month. Details about Home page.

  • Recommendations. Get practical information about suggesting services and features available in Cloudtuner. Find a brief description and other pertinent information in the cards to help you assess the situation quickly. How to use recommendation cards.

  • Resources. Observe the expenses for all resources across all connected clouds within the organization for the selected period. Organize and categorize resources based on your specific requirements.

  • Pools. View pools with limits or projected expenses that require your attention. Manage pools using the available features on this page.

  • Shared Environments. This feature enables you to focus more on application logic rather than infrastructure management. Use this page to book cloud environments for specific periods to allow multiple users or applications to access and use the same underlying infrastructure.

menu_therest

  • FinOps. View a breakdown chart that visualizes your expenses over time, get a visual or analytical representation of costs, and become familiar with the concepts of FinOps and MLOps.

  • MLOps. Keep track of your tasks, machine learning models, and datasets. Define hyperparameters and automatically run your training code on the Hypertuning page. Share specific artifacts with your team members.

  • Policies. Identify and respond to unusual patterns or deviations from normal behavior, control costs and manage resources efficiently, implement robust tagging policies, and manage the resource lifecycle and automated power on/off schedules effectively.

  • Sandbox. Evaluate CPU and memory usage by k8s resources, view applied recommendations, and compare instance pricing across different clouds.

  • System. Assign roles to users for resource management, integrate with various services, and view events.

Organization selector#

organization_field

In case there are several organizations registered in Cloudtuner that the user is considered to be a part of, clicking on the Organization field → Organization overview in the top right corner of the page will redirected to a new view containing a matrix of key information on each organization.

Organizations that require attention and optimization are marked in red.

organization_overview

Documentation, product tour, and profile buttons

On the right of the Organization selector find the DocumentationProduct tour, and Community documentation buttons.

documentation – get detailed documentation about the product.

product_tour – get the product tour.

community_documentation – community documentation gives brief description about each page. Turn the community documentation panel on or off as desired.

profile – log out from the solution.

Home Page

A Home page provides quick access to the most popular pages of the solution.

main

For convenience, a button goto_button for going to the main page of the section is implemented. Use it to get more detailed information.

In most tables presented in the sections, the rows are clickable, which allows to get quick access to the information about an item.

The page is divided into 7 sections:

  • Organization expenses. See the total expenses of the previous month, the expenses of the current month, and this month forecast. A red line on the chart shows the expenses limit.

    organization_expenses

  • Top resource expenses for last 30 days. Control the resources with the highest expenses. In addition, view the Perspectives or “Go to the resources” via buttons near the caption.

    top_resource_expences

  • Recommendations. Find summary cards with possible monthly savings, and expenses separated to categories cost, security, and critical on the Recommendations section. Click on the card caption to get details on the Recommendations page.

    recommendations

  • Policy violations. Pay special attention to the status field. If it is red, the policy is violated.

    policy_violations

  • Pools requiring attention. Navigate between tabs on the Pools requiring attention section to get the “Exceeded limit” or “Forecast overspend” pools. Use buttons in the Actions column to see resources list and see in cost explorer.

    policy_violations

  • Tasks. View the status and metrics of the last run.

    tasks

  • Models. Keep track of your machine learning models.

    models

Recommendations

OptScale features a set of automated tools for ongoing optimization of registered Data Sources. The section is intended to help maintain awareness of the less apparent deficiencies of the infrastructure like configuration flaws and security risks.

Force check and check time

At the moment, a check-up is performed every 3 hours and the results are reflected in the Recommendations hub, which is accessible via the left sidebar of the main page. A user with the Organization manager role can initialize a Force check that will immediately run the Data Sources’ evaluation sequence.

Alternatively, find the last and next check time in the Summary Cards.

recommendations_hub

Filtering

There are two types of recommendations that are featured in OptScale: savings, security. Additionally, there is an option to view critical, non-empty, and all recommendations. Refine the recommendations using the Categories filter if necessary. By default, all recommendations are displayed.

categories

In addition to filtering by categories, the page allows selection of Data Sources and Applicable services.

Summary cards

Use Summary Cards to get an at-a-glance overview of important details, such as total expenses, check time, savings, duplicates found, etc.

This clickme icon shows that you can click on the Summary Card to get detailed information.

summary_cards

Color schema and recommendation cards

Pay special attention to the color schema of the page.

pink color indicates a critical situation, signaling that you should focus on the card and its information.

pale-green color signifies that there are no recommendations or that potential savings are zero.

pale-orange color appears when there are items requiring attention.

In all other situations, the card is white.

Detailed instructions on how to use recommendation cards find on our web-site.

Settings

Adjusting parameters is beneficial as it allows for fine-tuning and optimizing results to better meet specific goals or conditions. The set of parameters varies depending on the card, so set the values according to your needs.

Note

New settings will be applied on the next recommendations check.

settings

Security Recommendations

Inactive IAM users

Users that have not been active for more than 90 days may be considered obsolete and become subject to deletion due to the potential security risks they produce for the organization as they can be compromised and become access points for malicious users.

The list of inactive users can be downloaded in the json format for subsequent automated processing with the help of Cleanup Scripts.

IAM_user

The number of days is a custom parameter. Use settings to change it.

Instances with insecure Security Groups settings

Security check that browses through the resources to find network vulnerabilities and provides a list of instances that are liable to RDP/SSH hacking.

Insecure ports and permissions:

  • port tcp/22
  • port tcp/3389
  • all inbound traffic

with one of:

  • CidrIp: 0.0.0.0/0
  • CidrIpv6: ::/0

AWS

  • Describe regions: ec2.describe_regions()
  • Describe instances: ec2.describe_instances()
  • Describe security groups: ec2.describe_security_groups()

Azure

  • Describe instances: compute.virtual_machines.list_all()
  • Describe security groups: network.network_security_groups.list_all()

Note

Network interfaces without associated security groups are skipped.

The list of insecure SGs can be downloaded in the json format for subsequent automated processing.

insecure_sg

IAM users with unused console access

The active IAM users that have console access turned on, but have not been using it for more than 90 days are in the list. Consider revoking console access to increase security.

The parameter in bold is a custom parameter. Use settings to change it.

Public S3 buckets

The S3 buckets in the list are public. Please ensure that the buckets use the correct policies and are not publicly accessible unless explicitly required.

Savings Optimization Recommendations

Abandoned Instances

Instances that have the average CPU consumpsion less than 5% and the network traffic below 1000 bytes/s for the last 7 days.

We recommend to terminate these instances to reduce expenses.

The parameters in bold are custom parameters. Use settings to change them.

Obsolete Images

Images that have not been used for a while might be subject to deletion, which would unlock the underlying snapshots.

Selection criteria:

  • image creation date was more than 1 week ago.
  • there has been no instances created from/related to this image in the past 7 days.

The list of obsolete images can be downloaded in the json format for subsequent automated processing with the help of Cleanup Scripts.

obsolete_images

The parameter in bold is a custom parameter. Use settings to change it.

Obsolete Snapshots

Redundant and old snapshots will save up on storage expenses if deleted. The list of snapshots can be downloaded from OptScale in JSON format to be used in further implementations like clean-up scripts and maintenance procedures.

Selection criteria:

  • its source EBS volume does not exist anymore,
  • there are no AMIs created from this EBS snapshot,
  • it has not been used for volume creation for the last 4 days.

The list of obsolete snapshots can be downloaded in the json format for subsequent automated processing with the help of Cleanup Scripts.

obsolete_snapshots

The parameter in bold is a custom parameter. Use settings to change it.

Obsolete IPs

Obsolete IPs can be tracked for Alibaba, Azure and AWS clouds.

Selection criteria:

  • IP was created more than 7 days ago,
  • IP has not been used during last 7 days,
  • it costs money to be kept.

The parameter in bold is a custom parameter. Use settings to change it.

Not attached volumes

Notification about volumes that have not been attached for more than one day. These are considered to be forgotten or no longer relevant; deletion of such resources may be advised.

The list of unattached volumes can be downloaded in the json format for subsequent automated processing with the help of Cleanup Scripts.

unattached_volumes

Underutilized instances

This recommendation is aimed at detection of underutilized instances in AWS and Azure and suggests more suitable flavors for these machines.

Instance is considered to be underutilized if:

  • it is active.
  • it exists for more than 3 days.
  • its CPU metric average for past 3 days is less than 80%.

underutilized

The parameter in bold is a custom parameter. Use settings to change it.

Reserved Instances opportunities

This card contains instances that:

  • are active.
  • have sustainable compute consumers more than 90 days.
  • have not been covered with Reserved Instances or Saving Plans.

For such instances, it is recommended to consider purchasing Reserved Instances to save on compute usage.

Check RI/SP Coverage to see the detailed breakdown of current reservations.

The parameter in bold is a custom parameter. Use settings to change it.

Abandoned Kinesis Streams

Kinesis Streams with provisioned Shard capacity that have not performed operations in the last 7 days are listed on this card.

Consider removing them to reduce expenses.

The parameter in bold is a custom parameter. Use settings to change it.

Instances with migration opportunities

This card shows opportunities to migrate instances if OptScale detects that the same instance type is cheaper in a geographically close region (within the same continent).

Some of your active instances may cost less in another nearby region with the same specifications. Consider migrating them to the recommended region to reduce expenses.

Instances for shutdown

Some of your instances have an inactivity pattern which allows you to set up an on/off power schedule (average CPU consumption is less than 5% and network traffic below 1000 bytes/s for the last 14 days). Consider creating a power schedule to reduce expenses.

The parameters in bold are custom parameters. Use settings to change them.

Instances with Spot (Preemptible) opportunities

The instances that:

  • have been running for the last 3 days,
  • have existed for less than 6 hours,
  • were not created as Spot (or Preemptible) Instances

are in the list.

Consider using Spot (Preemptible) Instances.

To change the check period click SETTINGS on the card.

short_living_instances

The parameters in bold are custom parameters. Use settings to change them.

Underutilized RDS Instances

An underutilized instance is one average CPU consumption less than 80% for the last 3 days.

OptScale detects such active RDS instances and lists them on the card.

Consider switching to the recommended size from the same family to reduce expenses.

Change the Rightsizing strategy by clicking on the Settings on the card.

rightsizing_strategy

The parameters in bold are custom parameters. Use settings to change them.

Obsolete Snapshot Chains

Some snapshot chains do not have source volumes, images created from their snapshots and have not been used for volume creation for the last 3 days. Consider their deletion to save on snapshot storage expenses.

Change the check period in the card’s Settings.

The parameter in bold is a custom parameter. Use settings to change it.

Instances with Subscription opportunities

The instances in the list are active and have been identified as sustained compute consumers (for more than 90 days) but are not covered by Subscription or Savings Plans.

Consider purchasing Subscriptions to reduce compute costs.

Change the check period in the card’s Settings.

The parameter in bold is a custom parameter. Use settings to change it.

Not deallocated Instances

Detection of inactive non-deallocated machines that have not been running for more than 1 day and still billed by the cloud.

The list of non-deallocated VMs can be downloaded in the json format for subsequent automated processing with the help of Cleanup Scripts.

Change the check period in the card’s Settings.

Non-deallocated

The parameter in bold is a custom parameter. Use settings to change it.

Instances eligible for generation upgrade

Upgrade older generation instances to the latest generation within the same family.

Abandoned Amazon S3 buckets

The bucket is abandoned is the average data size has been less than 1024 megabytes, Tier1 requests quantity has been less than 100, and GET requests quantity has been less than 2000 for the last 7 days

It is recommended to delete it to reduce expenses.

Change the check period and other parameters in the card’s Settings.

The parameters in bold are custom parameters. Use settings to change them.

Clean-up Scripts based on Recommendations

Below are the instructions on how to use the clean-up scripts found in the Recommendations section.

Note

The script will attempt to delete all resources that are recommended for deletion (based on the downloadable json file) and will not fail on errors. Upon its completion, a summary will be generated containing a list of deleted resources, a list of non-existing (already deleted) resources and a list of resources that could not be deleted due to other reasons.

Alibaba

Requirements

  1. AliyunInstallation guide.

  2. jq – the package allows executing json scripts with bash. Download page.

Action plan

1. Install the requirements on a machine running Linux OS.

2. Sign in aliyun:

aliyun configure --mode AK --profile test
Configuring profile 'test' in 'AK' authenticate mode...
Access Key Id []: 
Access Key Secret []: 
Default Region Id []: ap-southeast-1
Default Output Format [json]: json (Only support json)
Default Language [zh|en] en: en
Saving profile[test] ...Done.
Configure Done!!!
..............888888888888888888888 ........=8888888888888888888D=..............
...........88888888888888888888888 ..........D8888888888888888888888I...........
.........,8888888888888ZI: ...........................=Z88D8888888888D..........
.........+88888888 ..........................................88888888D..........
.........+88888888 .......Welcome to use Alibaba Cloud.......O8888888D..........
.........+88888888 ............. ************* ..............O8888888D..........
.........+88888888 .... Command Line Interface(Reloaded) ....O8888888D..........
.........+88888888...........................................88888888D..........
..........D888888888888DO+. ..........................?ND888888888888D..........
...........O8888888888888888888888...........D8888888888888888888888=...........
............ .:D8888888888888888888.........78888888888888888888O ..............

3. Configure timeouts:

aliyun configure set --read-timeout 20  --connect-timeout 20 --retry-count 3

4. Run script bash <script_name> <path to recommendation json file>.

5. Run script on Alibaba shell:

  • Open console → Online Linux Shell.

  • Copy script and recomendation json file via the Upload/Download files button.

  • Run script as follows: bash <script_name> <path to recommendation json file>. Use absolute paths or perform cd before execution.

AWS Source

Requirements

  1. AWS cli v2Official Amazon User Guide.

  2. jq – the package allows executing json scripts with bash. Download page from the developer.

Action plan

1. Install the requirements on a machine running Linux OS.

2. Configure the AWS Command Line Interface. (Run the aws configure command. For more info, please refer to the following section of the AWS User Guide)

3. Download the script from the corresponding subsection of the OptScale’s Recommendations page.

script_download

4. From the same page, download the json file containing a list of all resources that are recommended for deletion.

5. Run the script as follows: bash <script_name> <path to json file>

Azure Source

Requirements

  1. Azure cliOfficial Microsoft User Guide.
  2. jq – the package allows executing json scripts with bash. Download page.

Action plan

  1. Install the requirements on a machine running Linux OS.
  2. Sign in with the Azure cli.
  3. Download the script from the corresponding subsection of the OptScale’s Recommendations page.
  4. From the same page, download the json file containing a list of all resources that are recommended for deletion.
  5. Run the script as follows: bash <script_name> <path to json file>.

Action plan when using Azure Shell

  1. Open Azure shell.
  2. Download the script and the json file from the corresponding subsection of the OptScale’s Recommendations page.
  3. Copy these files via the Upload/Download files button. The files will be placed in /usr/csuser/clouddrive.
  4. Run the script as follows: bash <script_name> <path to json file> using absolute paths or navigate to the necessary folder before executing.

GCP Source

Requirements

  1. Gcloud cligcloud CLI overview.

  2. jq – the package allows executing json scripts with bash. Download page.

Action plan

1. Install requirements on a machine running Linux OS.

2. Configure gcloud: run gcloud init. See more info.

3. Run script bash <script_name> <path to recommendation json file>.

Archived Recommendations

If a recommendation has been applied or has become irrelevant, it will be moved to the archive. To view it, click the Archive button in the top-right corner of the Recommendations page.

archive

Alibaba

To track a new Alibaba Data Source in your OptScale account, please select the Alibaba Cloud tab at the Data Source Connection step during the initial configuration or later on in the Settings section of the main page.

connect_alibaba

Name#

In the first field, you can specify any preferred name to be used for this Data Source in OptScale.

Alibaba Cloud Access key ID#

The Cloud Access key ID is a unique string that is assigned to your RAM user.

Attention

Programmatic Access must be enabled for this user to support access through the API.

To find it:

user_access_key

Note

The Cloud Access key ID can also be found in the AccessKey.csv file downloaded from Alibaba during access key creation.

Access Secret#

Secret access key for the RAM user can be found in the AccessKey.csv file downloaded from the console during access key creation. Information about the secret will not be accessible through the UI after it has been created.

(Optional) New RAM User Creation with Secret#

Alternatively, you can create a separate user in your Alibaba cloud account for OptScale to operate with.

To do this:

create_user

  • Provide a unique name for the new RAM user and enable Programmatic access by checking the corresponding box.
  • Copy and save Access Key ID and Secret or download the AccessKey.csv file to your computer.

key_and_secret

Required Policy#

The account must have the following single permission to support OptScale: Read-only access to Alibaba Cloud services.

To add it:

  • Click on the Add Permissions button to the right of your RAM user.
  • Find ReadOnlyAccess in the list and click on it. It will appear in the Selected section.
  • Click OK.

add_permission

 

AWS#

Root account – Data Export already configured#

OptScale supports the AWS Organizations service that allows linking several Data Sources in order to centrally manage data of multiple users while receiving all billing exports within a single invoice. The Root account (payer) will be the only one having access to collective data related to cloud spendings. When registering this type of profile in OptScale, the user is given an option for Data Exports to be detected automatically.

Warning

When you connect the root account but do not connect the linked accounts, all expenses from the unconnected linked accounts will be ignored, even if they exist in the data export file. To retrieve expenses from both linked and root accounts, connect all AWS accounts (not just the root). OptScale ignores data from unconnected linked accounts.

To track a new AWS Data Source in your OptScale account, please select the AWS Root Account tab at the Data Source Connection step during the initial configuration.

root_account

Automated import of billing data#

Step 1. Having Data Exports configured for your cloud account is the main prerequisite in order to proceed with the remaining actions. If Data Export hasn’t been configured, refer to the Root Account – Data Export not configured yet section.

Step 2. Update bucket policy

  • Navigate to the Permissions tab of your AWS S3 bucket and select Bucket Policy.
  • Replace <bucket_name> with the name of the bucket.
  • Replace <AWS account ID> with the AWS Account ID (12 digits without “-”):
{
  "Version": "2012-10-17", 
  "Statement": [
      {
          "Sid": "EnableAWSDataExportsToWriteToS3AndCheckPolicy",
          "Effect": "Allow",
          "Principal": {
              "Service": [
                  "billingreports.amazonaws.com",
                  "bcm-data-exports.amazonaws.com"
              ]
          },
          "Action": [
              "s3:PutObject",
              "s3:GetBucketPolicy"
          ],
          "Resource": [
              "arn:aws:s3:::<bucketname>/*",
              "arn:aws:s3:::<bucketname>"
          ],
          "Condition": {
              "StringLike": {
                  "aws:SourceAccount": "<AWS account ID>",
                  "aws:SourceArn": [
                      "arn:aws:cur:us-east-1:<AWS account ID>:definition/*",
                      "arn:aws:bcm-data-exports:us-east-1:<AWS account ID>:export/*"
                  ]
              }
          }
      }
  ]
}

billing_policy1

Step 3. Create user policy for read only access

  • Go to Identity and Access Management (IAM) → Policies.
  • Create a new user policy for read only access to the bucket (<bucket_name> must be replaced in policy):
{
   "Version": "2012-10-17",
   "Statement": [
    {
        "Sid": "ReportDefinition",
        "Effect": "Allow",
        "Action": [
            "cur:DescribeReportDefinitions"
            ],
            "Resource": "*"
    },
    {
        "Sid": "GetObject",
        "Effect": "Allow",
        "Action": [
            "s3:GetObject"
        ],
            "Resource": "arn:aws:s3:::<bucket_name>/*"
    },
    {
        "Sid": "BucketOperations",
        "Effect": "Allow",
        "Action": [
            "s3:ListBucket",
            "s3:GetBucketLocation"
        ],
        "Resource": "arn:aws:s3:::<bucket_name>"
    }
   ]
}  

billing_policy2

create_policy

Step 4. Create user and grant policies

  • Go to Identity and Access Management (IAM) → Users to create a new user.

add_user

  • Attach the created policy to the user:

policy_attach

  • Confirm creation of the user.
  • Create access key for user (Identity and Access Management (IAM) → Users → Select the created user → Create access key):

create_access_key

  • Download or copy Access key and Secret access key. Use these keys when connecting a Data Source in OptScale as the AWS Access Key ID and AWS Secret Access Key, respectively (at step 5).

retrieve_access_key

Step 5. Create Data Source in OptScale

  • Go to OptScale.
  • Register as a new user.
  • Log in as a registered user.
  • Create a Data Source.

    • Provide user credentials (see screenshot above for more details): AWS Access key ID, AWS Secret access key.
    • Select Export typeAWS Billing and Cost Management → Data Exports → Find the report configured earlier → Export type.
    • Select Connect only to data in bucket.
    • Provide Data Export parameters:

      • Export Name: AWS Billing and Cost Management → Data Exports table → Export name.
      • Export S3 Bucket Name: AWS Billing and Cost Management → Data Exports table → S3 bucket.

      cloud_account

    • Export path: AWS Billing and Cost Management → Data Exports table → Click on Export name → Edit → Data export storage settings → S3 destination → last folder name(without “/”)

    delivery_and_storage_option

connect_data_source

  • After creating a Data Source, you will need to wait for the export to be generated by AWS and uploaded to OptScale according to the schedule (performed on an hourly basis).

Discover resources#

OptScale needs to have permissions configured in AWS for the user Data Source in order to correctly discover resources and display them under a respective section of the dashboard for the associated employee.

Make sure to include the following policy in order for OptScale to be able to parse EC2 resources data:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "OptScaleOperations",
            "Effect": "Allow",
            "Action": [
                "s3:GetBucketPublicAccessBlock",
                "s3:GetBucketPolicyStatus",
                "s3:GetBucketTagging",
                "iam:GetAccessKeyLastUsed",
                "cloudwatch:GetMetricStatistics",
                "s3:GetBucketAcl",
                "ec2:Describe*",
                "s3:ListAllMyBuckets",
                "iam:ListUsers",
                "s3:GetBucketLocation",
                "iam:GetLoginProfile",
                "cur:DescribeReportDefinitions",
                "iam:ListAccessKeys"
            ],
            "Resource": "*"
        }
    ]
}

Your AWS Data Source should now be ready for integration with OptScale! Please contact our Support Team at support@hystax.com if you have any questions regarding the described configuration flow.

Root account – Data Export not configured yet#

OptScale supports the AWS Organizations service that allows linking several Data Sources in order to centrally manage data of multiple users while receiving all billing reports within a single invoice. The Root account (payer) will be the only one having access to collective data related to cloud spendings. When registering this type of profile in OptScale, the user is given an option for Data Exports to be created automatically.

Warning

When you connect the root account but do not connect the linked accounts, all expenses from the unconnected linked accounts will be ignored, even if they exist in the data export file. To retrieve expenses from both linked and root accounts, connect all AWS accounts (not just the root). OptScale ignores data from unconnected linked accounts.

To track a new AWS Data Source in your OptScale account, please select the AWS Root Account tab at the Data Source Connection step during the initial configuration.

root_account_no_data_export

Automated creation of billing bucket and Data Export#

Step 1. Create user policy for bucket and export creation access.

  • Go to Identity and Access Management (IAM) → Policies. Create a new policy for fully automatic configuration (both bucket and export are created) (<bucket_name> must be replaced in policy)
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ReportDefinition",
            "Effect": "Allow",
            "Action": [
                "cur:DescribeReportDefinitions",
                "cur:PutReportDefinition"
            ],
                "Resource": "*"

        },
        {
            "Sid": "CreateCurExportsInDataExports",
            "Effect": "Allow",
            "Action": [
                "bcm-data-exports:ListExports",
                "bcm-data-exports:GetExport",
                "bcm-data-exports:CreateExport"
            ],
            "Resource": "*"
        },
        {
            "Sid": "CreateBucket",
            "Effect": "Allow",
            "Action": [
                "s3:CreateBucket"
            ],
            "Resource": "*"
        },
        {
            "Sid": "GetObject",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject"
            ],
            "Resource": "arn:aws:s3:::<bucket_name>/*"
        },
        {
            "Sid": "BucketOperations",
            "Effect": "Allow",
            "Action": [
                "s3:PutBucketPolicy",
                "s3:ListBucket",
        "s3:GetBucketLocation"
            ],
            "Resource": "arn:aws:s3:::<bucket_name>"
        }
    ]
}

review_and_create

Step 2. Create user and grant policies

  • Go to Identity and Access Management (IAM) → Users to create a new user.

    specify_user_details

  • Attach the created policy to the user:

    set_permissions

  • Confirm creation of the user.

  • Create access key for user (Identity and Access Management (IAM) → Users → Created user → Create access key):

    create_access_key

  • Download or copy the Access key and Secret access key. Use these credentials when creating a Data Source connection in OptScale.

    create_access_key

    Enter the Access key into the AWS Access Key ID field and the Secret access key into the AWS Secret Access Key field (at step 3).

Step 3. Create Data Source in OptScale:

  • Go to OptScale.

  • Register as a new user.

  • Log in as a registered user.

  • Create a Data Source:

    • Provide user credentials copied on the previous step. Enter the Access key into the AWS Access Key ID field and the Secret access key into the AWS Secret Access Key field.

    • Select Export type.

    • Select Create new Data Export.

    • Provide the parameters with which the bucket and Data Export will be created: Export NameExport S3 Bucket Name (<bucket_name> from the user policy from step 1), and Export path prefix.

connect_aws

Note

Specify the bucket in the “Export S3 Bucket Name” field if it already exists. OptScale will then create the report and store it in the bucket using the specified prefix.

  • After creating a Data Source, you will need to wait for AWS to generate the export and upload it to OptScale according to the schedule (approximately one day).

Warning

AWS updates or creates a new export file once a day. If the export file is not placed in the specified bucket under the specified prefix, the export will fail with an error.

status_failed

Discover Resources#

OptScale needs to have permissions configured in AWS for the user Data Source in order to correctly discover resources and display them under a respective section of the dashboard for the associated employee.

Make sure to include the following policy in order for OptScale to be able to parse EC2 resources data:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "OptScaleOperations",
            "Effect": "Allow",
            "Action": [
                "s3:GetBucketPublicAccessBlock",
                "s3:GetBucketPolicyStatus",
                "s3:GetBucketTagging",
                "iam:GetAccessKeyLastUsed",
                "cloudwatch:GetMetricStatistics",
                "s3:GetBucketAcl",
                "ec2:Describe*",
                "s3:ListAllMyBuckets",
                "iam:ListUsers",
                "s3:GetBucketLocation",
                "iam:GetLoginProfile",
                "cur:DescribeReportDefinitions",
                "iam:ListAccessKeys"
            ],
            "Resource": "*"
        }
    ]
}

Your AWS Data Source should now be ready for integration with OptScale! Please contact our Support Team at support@hystax.com if you have any questions regarding the described configuration flow.

Create Data Export#

Note

Creating a Data Export is only available for the Root cloud account (payer), while all its Linked accounts will be centrally managed and receive their billing data through the main account’s invoice.

In order to utilize automatic / manual billing data import in OptScale, first, you need to create a Data Export in AWS. Please refer to their official documentation to become acquainted with the guidelines for Data Exports.

  • Navigate to AWS Billing & Cost Management → Data Exports.

  • Create a new Data Export.

Standard#

Step 1. Export type

  • Select Standard data export export type.

Step 2. Export name

  • Input export name.

Step 3. Data table content settings:

  • Select CUR 2.0.

  • Select Include resource IDs checkbox.

  • Choose the time granularity for how you want the line items in the export to be aggregated.

Step 4. Data export delivery options:

  • Pick Overwrite existing data export file.

  • Select compression type.

Step 5. Data export storage setting:

  • Create a new or use an existing bucket for the export.

  • Enter the S3 path prefix that you want prepended to the name of your Data Export.

Step 6. Review

  • Confirm export creation. Data Export will be prepared by AWS during 24 hours.

Legacy CUR Export#

Step 1. Export type

  • Select Legacy CUR export (CUR) export type.

Step 2. Export name

  • Input export name.

Step 3. Export content

  • Select Include resource IDs and Refresh automatically checkboxes.

Step 4. Data export delivery options:

  • Choose the time granularity for how you want the line items in the export to be aggregated.

  • Pick Overwrite existing report.

  • Select compression type.

Step 5: Data export storage setting:

  • Create a new or use an existing bucket for the export.

  • Enter the S3 path prefix that you want prepended to the name of your Data Export.

Step 6. Review

  • Confirm export creation. Data Export will be prepared by AWS during 24 hours.

When it’s done, follow the steps from the section Root account – Data Export already configured

Linked#

OptScale supports the AWS Organizations service that allows linking several Data Sources in order to centrally manage data of multiple users while receiving all billing exports within a single invoice.

Selecting a AWS Linked tab will make the registration flow easier eliminating the option to input bucket information for billing purposes since this will be received through the root account, whose user will then be able to distribute periodic reports individually if intended by the company management. In this case, only Access key and Secret access key are required.

linked_account

Note

If you only specify a AWS Linked account without providing credentials for the main one, OptScale will not be able to import any billing data.

Use Connect to create a Data Source in OptScale. If some of the provided values are invalid, an error message will indicate a failure to connect.

Discover Resources#

OptScale needs to have permissions configured in AWS for the user Data Source in order to correctly discover resources and display them under a respective section of the dashboard for the associated employee.

Make sure to include the following policy in order for OptScale to be able to parse EC2 resources data:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "OptScaleOperations",
            "Effect": "Allow",
            "Action": [
                "s3:GetBucketPublicAccessBlock",
                "s3:GetBucketPolicyStatus",
                "s3:GetBucketTagging",
                "iam:GetAccessKeyLastUsed",
                "cloudwatch:GetMetricStatistics",
                "s3:GetBucketAcl",
                "ec2:Describe*",
                "s3:ListAllMyBuckets",
                "iam:ListUsers",
                "s3:GetBucketLocation",
                "iam:GetLoginProfile",
                "cur:DescribeReportDefinitions",
                "iam:ListAccessKeys"
            ],
            "Resource": "*"
        }
    ]
}

Your AWS Data Source should now be ready for integration with OptScale! Please contact our Support Team at support@hystax.com if you have any questions regarding the described configuration flow.

Migrating from CUR to CUR 2.0#

The information on this page can be useful if an AWS Data Source (Legacy CUR export schema) has already been connected and you want to configure CUR 2.0 data and update the AWS Data Source.

A new bucket is required#

Create a new Data Export with CUR 2.0 schema. Navigate to AWS Billing & Cost Management → Data Exports page.

Step 1. Export type

  • Select Standard data export export type.

Step 2. Export name

  • Input export name. The content of the Export name field will be required when updating an AWS Data Source in OptScale.

aws_export_cur2_01

Step 3. Data table content settings:

  • Select CUR 2.0.
  • Select Include resource IDs checkbox.
  • Choose the time granularity for how you want the line items in the export to be aggregated.

aws_export_cur2_03

Step 4. Data export delivery options:

  • Pick Overwrite existing data export file.
  • Select compression type.

Step 5. Data export storage setting

  • Configure a new bucket. The content of the S3 path prefix and S3 bucket name fields will be required when updating an AWS Data Source in OptScale.

aws_export_cur2_04

Step 6. Review

  • Confirm export creation. Data Export will be prepared by AWS during 24 hours.

Click on the existing AWS Data Source on the Data Source page. The page with detailed information opens. Click the UPDATE CREDENTIALS button to update the Data Source credentials. Switch on Update Data Export parameters to update info about the billing bucket.

aws_migrate_cur1_cur2_07

Select Standard data export (CUR 2.0) export type. Enter Export name from the first step as Export nameS3 bucket name as Export Amazon S3 bucket name, and S3 bucket name as Export path prefix.

Save and wait for a new export to import!

The bucket already exists#

Use this case if you have already connected an AWS Data Source (on Legacy CUR export schema) and want to configure CUR 2.0 data into the same bucket.

Create a new Data Export with CUR 2.0 schema. Navigate to AWS Billing & Cost Management → Data Exports page.

Step 1. Export type

  • Select Standard data export export type.

Step 2. Export name

  • Input export name. The content of the Export name field will be required when updating an AWS Data Source in OptScale.

aws_export_cur2_01

Step 3. Data table content settings:

  • Select CUR 2.0.
  • Select Include resource IDs checkbox.
  • Choose the time granularity for how you want the line items in the export to be aggregated.

aws_export_cur2_03

Step 4. Data export delivery options:

  • Pick Overwrite existing data export file.
  • Select compression type.

Step 5: Data export storage setting:

  • Select an existing bucket in the Data export storage settings section.

    aws_export_cur2_05

  • Input NEW S3 path prefix.

    aws_export_cur2_06

Click on the existing AWS Data Source on the Data Source page. The page with detailed information opens.

Click the UPDATE CREDENTIALS button to update the Data Source credentials. Switch on Update Data Export parameters to update info about the billing bucket.

aws_migrate_cur1_cur2_08

Select Standard data export (CUR 2.0) export type and update Export name and Export path prefix fields, save and wait for a new export to import!

Azure#

Subscription#

To track a new Azure Data Source in your OptScale account, please select the Azure Subscription tab at the Data Source Connection step during the initial configuration or later on in the Settings section of the main page.

connect_azure

Name#

In the first field, you can specify any preferred name to be assigned to this Data Source in OptScale.

Subscription ID#

The Subscription ID is a unique string that identifies your Azure subscription. To find it:

  • Log in to the Microsoft Azure Portal.
  • Search for Subscriptions to access a list of all subscriptions associated with your Azure account. The list will include a subscription ID for each one.

When OptScale is programmatically signing in to Azure, it needs to pass a tenant ID and a application ID along with a secret, which is an authentication key.

Application (client) ID#

Application (client) ID has to be generated manually in Azure to allow API communication with OptScale:

  • Access the Azure Active Directory and navigate to App registrations
  • CLick on + New registration and provide a name, e.g. OptScale, and then Register at the bottom of the page
  • A new Application ID will become available (as in the screenshot below)

app_registration

Attention

Once you have registered an Application, it is essential to explicitly grant it permissions in a form of a Role assignment to work with the current Azure subscription.

To perform a Role assignment, from the Azure home page navigate to Subscriptions and select the one you have provisioned to be linked to OptScale.

After being redirected to its dashboard, click Access control (IAM) in the left navigation bar and then go the Role assignments tab and click +AddAdd role assignment.

access_control

You will be prompted to input the Role, which has to be Reader, in the first field. The second one can be left unchanged. The third field should contain the name of a registered application from the previous steps, e.g. OptScale. Click Save to add the role assignment.

add_role

Directory (tenant) ID#

Directory (tenant) ID is a globally unique identifier (GUID) that is different from your organization name or domain. Its value is easily accessible in the overview of the application that has been added in the previous steps via App registrations.

Go to Home → App registrations → e.g. OptScale → Overview → Directory (tenant) ID

tenant_ID

Secret#

Secret should be created within the newly registered application:

  • Go to the App registrations, click on your application, e.g. OptScale
  • Select Certificates & Secrets in the left navigation bar and add a + New client secret

Attention

Secret’s value will be hidden shortly after its creation. Make sure to copy it in a safe place.

create_secret

Once the required fields are filled out, you can click Connect to validate the information. Once you have connected to the account, the data will be pulled from the source shortly afterwards and become available in the UI.

Your Azure Data Source account should now be ready for integration with OptScale! Please contact our Support Team at support@hystax.com if you have any questions regarding the described configuration flow.

Tenant#

To track a new Azure tenant Data Source in your OptScale account, please select the Azure tenant tab at the Data Source Connection step during the initial configuration or later on in the Settings section of the main page.

connect_azure_tenant

Name#

In the first field, you can specify any preferred name to be assigned to this Data Source in OptScale.

Application (client) ID#

Application (client) ID has to be generated manually in Azure to allow API communication with OptScale:

  • Access the Azure Active Directory and navigate to App registrations
  • CLick on + New registration and provide a name, e.g. OptScale, and then Register at the bottom of the page
  • A new Application ID will become available (as in the screenshot below)

app_registration

Attention

Once you have registered an Application, it is essential to explicitly grant it permissions in a form of a Role assignment to work with the current Azure subscription.

To perform a role assignment, from the Azure home page navigate to Subscriptions and select the ones you have provisioned to be linked to OptScale.

After being redirected to its dashboard, click Access control (IAM) in the left navigation bar and then go the Role assignments tab and click +AddAdd role assignment.

access_control

You will be prompted to input the Role, which has to be Reader, in the first field. The second one can be left unchanged. The third field should contain the name of a registered application from the previous steps, e.g. OptScale. Click Save to add the role assignment.

add_role

Directory (tenant) ID#

Directory (tenant) ID is a globally unique identifier (GUID) that is different from your organization name or domain. Its value is easily accessible in the overview of the application that has been added in the previous steps via App registrations.

Go to Home → App registrations → e.g. OptScale → Overview → Directory (tenant) ID

tenant_ID

Secret#

Secret should be created within the newly registered application:

  • Go to the App registrations, click on your application, e.g. OptScale
  • Select Certificates & Secrets in the left navigation bar and add a + New client secret

Attention

Secret’s value will be hidden shortly after its creation. Make sure to copy it in a safe place.

create_secret

Once the required fields are filled out, you can click Connect to validate the information. Once you have connected to the account, the data will be pulled from the source shortly afterwards and become available in the UI.

GCP#

Google Cloud#

Enable Billing export#

Please follow the official GCP guide to enable billing data export – Set up Cloud Billing data export to BigQuery | Google Cloud.

As the result you should have a new table in your BigQuery project. Note the names of the dataset and the table. You will need them later when connecting your cloud account to OptScale.

gcp_billing_tables

Prepare a role for OptScale#

With a CLI command#

Run the following command in GCP CLI:

gcloud iam roles create
optscale_connection_role \--project=hystaxcom
\--permissions=bigquery.jobs.create, bigquery.tables.getData, compute.addresses.list, 
compute.addresses.setLabels, compute.disks.list, compute.disks.setLabels, compute.firewalls.list, 
compute.globalAddresses.list, compute.instances.list, compute.instances.setLabels, compute.images.list, 
compute.images.setLabels, compute.machineTypes.get, compute.machineTypes.list, compute.networks.list, 
compute.regions.list, compute.snapshots.list, compute.snapshots.setLabels, compute.zones.list, 
iam.serviceAccounts.list, monitoring.timeSeries.list, storage.buckets.get, storage.buckets.getIamPolicy, 
storage.buckets.list, storage.buckets.update

Via Google Cloud console#

1. Go to Roles page and click Create Role.

2. Give the role any name and description.

3. Add the following permissions:

  • bigquery.jobs.create
  • bigquery.tables.getData
  • compute.addresses.list
  • compute.addresses.setLabels
  • compute.disks.list
  • compute.disks.setLabels
  • compute.firewalls.list
  • compute.globalAddresses.list
  • compute.instances.list
  • compute.instances.setLabels
  • compute.images.list
  • compute.images.setLabels
  • compute.machineTypes.get
  • compute.machineTypes.list
  • compute.networks.list
  • compute.regions.list
  • compute.snapshots.list
  • compute.snapshots.setLabels
  • compute.zones.list
  • iam.serviceAccounts.list
  • monitoring.timeSeries.list
  • storage.buckets.get
  • storage.buckets.getIamPolicy
  • storage.buckets.list
  • storage.buckets.update

Create service account#

Official documentation on service accounts – Service accounts | IAM Documentation | Google Cloud.

  1. Go to Service accounts page and click Create Service Account
  2. Give it any name and click Create and Continue.
  3. Specify the role that you created earlier and click Continue and then Done

Generate API key for your service account#

  1. Find your service account in the service accounts list and click on its name to go to service account details page.
  2. Go to Keys tab.
  3. Click Add key -> Create new key
  4. Service account API key will be downloaded as a .json file. You will need this file on the next stage when connecting your cloud account to OptScale.

Connect Data Source in OptScale#

Use the newly downloaded service account credentials json file with the billing dataset details to connect your GCP cloud account.

gcp_connection_form

Google Cloud tenant#

Prepare a role for OptScale#

With a CLI command#

Run the following command in GCP CLI:

gcloud iam roles create
optscale_connection_role \--project=hystaxcom
\--permissions=bigquery.jobs.create, bigquery.tables.getData, compute.addresses.list, 
compute.addresses.setLabels, compute.disks.list, compute.disks.setLabels, compute.firewalls.list, 
compute.globalAddresses.list, compute.instances.list, compute.instances.setLabels, compute.images.list, 
compute.images.setLabels, compute.machineTypes.get, compute.machineTypes.list, compute.networks.list, 
compute.regions.list, compute.snapshots.list, compute.snapshots.setLabels, compute.zones.list, 
iam.serviceAccounts.list, monitoring.timeSeries.list, storage.buckets.get, storage.buckets.getIamPolicy, 
storage.buckets.list, storage.buckets.update

Via Google Cloud console#

1. Go to Roles page and click Create Role.

2. Give the role any name and description.

3. Add the following permissions:

  • bigquery.jobs.create
  • bigquery.tables.getData
  • compute.addresses.list
  • compute.addresses.setLabels
  • compute.disks.list
  • compute.disks.setLabels
  • compute.firewalls.list
  • compute.globalAddresses.list
  • compute.instances.list
  • compute.instances.setLabels
  • compute.images.list
  • compute.images.setLabels
  • compute.machineTypes.get
  • compute.machineTypes.list
  • compute.networks.list
  • compute.regions.list
  • compute.snapshots.list
  • compute.snapshots.setLabels
  • compute.zones.list
  • iam.serviceAccounts.list
  • monitoring.timeSeries.list
  • storage.buckets.get
  • storage.buckets.getIamPolicy
  • storage.buckets.list
  • storage.buckets.update

Create service account#

Official documentation on service accounts – Service accounts | IAM Documentation | Google Cloud.

  1. Go to Service accounts page and click Create Service Account.

  2. Give it any name and click Create and Continue.

  3. Specify the role that you created earlier and click Continue and then Done.

Grant access#

For each project that needs to be added to the tenant, go to the IAM & Admin section in the Google Cloud Console, select IAM, and press the GRANT ACCESS button. Add the created service account and assign the created role to it.

Generate API key for your service account#

  1. Find your service account in the service accounts list and click on its name to go to service account details page.

  2. Go to Keys tab.

  3. Click Add key -> Create new key.

  4. Service account API key will be downloaded as a .json file. You will need this file on the next stage when connecting your cloud account to OptScale.

Connect a Data Source in OptScale#

To track a new Google Cloud tenant Data Source in your OptScale account, please select the Google Cloud tenant tab at the Data Source Connection step during the initial configuration or later on in the Settings section of the main page.

connect_gcp_tenant

Kubernetes#

To track a new Kubernetes cluster Data Source in your OptScale account, please select the Kubernetes tab on the Data Source Connection page.

connect_kubernetes

FieldDescription
NameSpecify any preferred name to be used for this data source in OptScale.
UserSpecify a user for the cost metrics collector to use when pushing data to this data source.
PasswordSpecify a password for the cost metrics collector to use when pushing data to this data source.

Use Connect to create a Data Source in OptScale.

Click on the newly created Data Source on the Data Sources page. The page with detailed information appears.

kubernetes_clickon

Use KUBERNETES INTEGRATION or instructions to get the instructions to install the software that collects information about running pods and converts them into cost metrics.

kubernetes_instuctions

Software installation on a cluster#

To get cost metrics download and install helm chart on the Kubernetes cluster. Helm chart is created to collect Kubernetes resources information and share it with OptScale FinOps project. Install one release per cluster.

1. Download Hystax repo#

Use this command to download repo:

helm repo add hystax https://hystax.github.io/helm-charts

2. Install Helm Chart#

There is a difference in instructions when a Kubernetes Data Source is connected on my.optscale.com or on OptScale deployed from open source. In both cases, instructions given are adapted for a selected Data Source and deployed OptScale. All you need is just to copy-paste it and replace the <password_specified_during_data_source_connection> phrase with a user’s password.

My.optscale.com#

kubernetes_chart_installation_prod

kubernetes_chart_installation_prod

Note

Specify the user’s password instead of the <password_specified_during_data_source_connection> phrase.

Warning

Please await the completion of the metric generation process, which typically requires approximately one hour.

Open Source OptScale#

kubernetes_instuctions_list_install

kubernetes_chart_installation

Alibaba

To track a new Alibaba data source in the OptScale account, a RAM user required.

Before connecting, ensure that the user has ReadOnly access permission and check the access to the AccessKey pair. If these conditions are not met, go to the Configure Alibaba user section before proceeding to Connect to OptScale

Configure Alibaba user

Before connecting, the user must be configured it first. If you already have a RAM user, go to Step 2:

1. (Optional) Follow the official Alibaba user guide to create a new RAM user.

2. Ensure the user has ReadOnly access permission. To add this permission, follow the steps 1-3 of the Method 1 of the official Alibaba user guide.

3. An AccessKey pair is required for the connection. Locate a previously saved pair or create a new one.

Attention

The AccessKey Secret value is hidden shortly after creation. Be sure to copy and store it in a safe place.

Now, your Alibaba Data Source account is ready for integration with cloudtuner!

Connect to Cloudtuner

Go to cloudtuner→ Data Sources → click the Add button → select Alibaba Cloud.

connect_alibaba

Fill in the fields. Enter AccessKey ID and AccessKey Secret copied from the cloud into Alibaba Cloud Access key ID and Alibaba Cloud Secret access key. Click the Connect button.

Please contact our Support Team at if you have any questions regarding the described configuration flow.

AWS

Root account – Data Export already configured#

OptScale supports the AWS Organizations service that allows linking several Data Sources in order to centrally manage data of multiple users while receiving all billing exports within a single invoice. The Root account (payer) will be the only one having access to collective data related to cloud spendings. When registering this type of profile in OptScale, the user is given an option for Data Exports to be detected automatically.

Warning

When you connect the root account but do not connect the linked accounts, all expenses from the unconnected linked accounts will be ignored, even if they exist in the data export file. To retrieve expenses from both linked and root accounts, connect all AWS accounts (not just the root). OptScale ignores data from unconnected linked accounts.

To track a new AWS Data Source in your OptScale account, please select the AWS Root Account tab at the Data Source Connection step during the initial configuration.

root_account

Automated import of billing data#

Step 1. Having Data Exports configured for your cloud account is the main prerequisite in order to proceed with the remaining actions. If Data Export hasn’t been configured, refer to the Root Account – Data Export not configured yet section.

Step 2. Update bucket policy

  • Navigate to the Permissions tab of your AWS S3 bucket and select Bucket Policy.
  • Replace <bucket_name> with the name of the bucket.
  • Replace <AWS account ID> with the AWS Account ID (12 digits without “-”):
{
  "Version": "2012-10-17", 
  "Statement": [
      {
          "Sid": "EnableAWSDataExportsToWriteToS3AndCheckPolicy",
          "Effect": "Allow",
          "Principal": {
              "Service": [
                  "billingreports.amazonaws.com",
                  "bcm-data-exports.amazonaws.com"
              ]
          },
          "Action": [
              "s3:PutObject",
              "s3:GetBucketPolicy"
          ],
          "Resource": [
              "arn:aws:s3:::<bucketname>/*",
              "arn:aws:s3:::<bucketname>"
          ],
          "Condition": {
              "StringLike": {
                  "aws:SourceAccount": "<AWS account ID>",
                  "aws:SourceArn": [
                      "arn:aws:cur:us-east-1:<AWS account ID>:definition/*",
                      "arn:aws:bcm-data-exports:us-east-1:<AWS account ID>:export/*"
                  ]
              }
          }
      }
  ]
}

billing_policy1

Step 3. Create user policy for read only access

  • Go to Identity and Access Management (IAM) → Policies.
  • Create a new user policy for read only access to the bucket (<bucket_name> must be replaced in policy):
{
   "Version": "2012-10-17",
   "Statement": [
    {
        "Sid": "ReportDefinition",
        "Effect": "Allow",
        "Action": [
            "cur:DescribeReportDefinitions"
            ],
            "Resource": "*"
    },
    {
        "Sid": "GetObject",
        "Effect": "Allow",
        "Action": [
            "s3:GetObject"
        ],
            "Resource": "arn:aws:s3:::<bucket_name>/*"
    },
    {
        "Sid": "BucketOperations",
        "Effect": "Allow",
        "Action": [
            "s3:ListBucket",
            "s3:GetBucketLocation"
        ],
        "Resource": "arn:aws:s3:::<bucket_name>"
    }
   ]
}  

billing_policy2

create_policy

Step 4. Create user and grant policies

  • Go to Identity and Access Management (IAM) → Users to create a new user.

add_user

  • Attach the created policy to the user:

policy_attach

  • Confirm creation of the user.
  • Create access key for user (Identity and Access Management (IAM) → Users → Select the created user → Create access key):

create_access_key

  • Download or copy Access key and Secret access key. Use these keys when connecting a Data Source in OptScale as the AWS Access Key ID and AWS Secret Access Key, respectively (at step 5).

retrieve_access_key

Step 5. Create Data Source in OptScale

  • Go to OptScale.
  • Register as a new user.
  • Log in as a registered user.
  • Create a Data Source.

    • Provide user credentials (see screenshot above for more details): AWS Access key ID, AWS Secret access key.
    • Select Export typeAWS Billing and Cost Management → Data Exports → Find the report configured earlier → Export type.
    • Select Connect only to data in bucket.
    • Provide Data Export parameters:

      • Export Name: AWS Billing and Cost Management → Data Exports table → Export name.
      • Export S3 Bucket Name: AWS Billing and Cost Management → Data Exports table → S3 bucket.

      cloud_account

    • Export path: AWS Billing and Cost Management → Data Exports table → Click on Export name → Edit → Data export storage settings → S3 destination → last folder name(without “/”)

    delivery_and_storage_option

connect_data_source

  • After creating a Data Source, you will need to wait for the export to be generated by AWS and uploaded to OptScale according to the schedule (performed on an hourly basis).

Discover resources#

OptScale needs to have permissions configured in AWS for the user Data Source in order to correctly discover resources and display them under a respective section of the dashboard for the associated employee.

Make sure to include the following policy in order for OptScale to be able to parse EC2 resources data:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "OptScaleOperations",
            "Effect": "Allow",
            "Action": [
                "s3:GetBucketPublicAccessBlock",
                "s3:GetBucketPolicyStatus",
                "s3:GetBucketTagging",
                "iam:GetAccessKeyLastUsed",
                "cloudwatch:GetMetricStatistics",
                "s3:GetBucketAcl",
                "ec2:Describe*",
                "s3:ListAllMyBuckets",
                "iam:ListUsers",
                "s3:GetBucketLocation",
                "iam:GetLoginProfile",
                "cur:DescribeReportDefinitions",
                "iam:ListAccessKeys"
            ],
            "Resource": "*"
        }
    ]
}

Your AWS Data Source should now be ready for integration with OptScale! Please contact our Support Team at support@hystax.com if you have any questions regarding the described configuration flow.

Root account – Data Export not configured yet#

OptScale supports the AWS Organizations service that allows linking several Data Sources in order to centrally manage data of multiple users while receiving all billing reports within a single invoice. The Root account (payer) will be the only one having access to collective data related to cloud spendings. When registering this type of profile in OptScale, the user is given an option for Data Exports to be created automatically.

Warning

When you connect the root account but do not connect the linked accounts, all expenses from the unconnected linked accounts will be ignored, even if they exist in the data export file. To retrieve expenses from both linked and root accounts, connect all AWS accounts (not just the root). OptScale ignores data from unconnected linked accounts.

To track a new AWS Data Source in your OptScale account, please select the AWS Root Account tab at the Data Source Connection step during the initial configuration.

root_account_no_data_export

Automated creation of billing bucket and Data Export#

Step 1. Create user policy for bucket and export creation access.

  • Go to Identity and Access Management (IAM) → Policies. Create a new policy for fully automatic configuration (both bucket and export are created) (<bucket_name> must be replaced in policy)
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ReportDefinition",
            "Effect": "Allow",
            "Action": [
                "cur:DescribeReportDefinitions",
                "cur:PutReportDefinition"
            ],
                "Resource": "*"

        },
        {
            "Sid": "CreateCurExportsInDataExports",
            "Effect": "Allow",
            "Action": [
                "bcm-data-exports:ListExports",
                "bcm-data-exports:GetExport",
                "bcm-data-exports:CreateExport"
            ],
            "Resource": "*"
        },
        {
            "Sid": "CreateBucket",
            "Effect": "Allow",
            "Action": [
                "s3:CreateBucket"
            ],
            "Resource": "*"
        },
        {
            "Sid": "GetObject",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject"
            ],
            "Resource": "arn:aws:s3:::<bucket_name>/*"
        },
        {
            "Sid": "BucketOperations",
            "Effect": "Allow",
            "Action": [
                "s3:PutBucketPolicy",
                "s3:ListBucket",
        "s3:GetBucketLocation"
            ],
            "Resource": "arn:aws:s3:::<bucket_name>"
        }
    ]
}

review_and_create

Step 2. Create user and grant policies

  • Go to Identity and Access Management (IAM) → Users to create a new user.

    specify_user_details

  • Attach the created policy to the user:

    set_permissions

  • Confirm creation of the user.

  • Create access key for user (Identity and Access Management (IAM) → Users → Created user → Create access key):

    create_access_key

  • Download or copy the Access key and Secret access key. Use these credentials when creating a Data Source connection in OptScale.

    create_access_key

    Enter the Access key into the AWS Access Key ID field and the Secret access key into the AWS Secret Access Key field (at step 3).

Step 3. Create Data Source in OptScale:

  • Go to OptScale.

  • Register as a new user.

  • Log in as a registered user.

  • Create a Data Source:

    • Provide user credentials copied on the previous step. Enter the Access key into the AWS Access Key ID field and the Secret access key into the AWS Secret Access Key field.

    • Select Export type.

    • Select Create new Data Export.

    • Provide the parameters with which the bucket and Data Export will be created: Export NameExport S3 Bucket Name (<bucket_name> from the user policy from step 1), and Export path prefix.

connect_aws

Note

Specify the bucket in the “Export S3 Bucket Name” field if it already exists. OptScale will then create the report and store it in the bucket using the specified prefix.

  • After creating a Data Source, you will need to wait for AWS to generate the export and upload it to OptScale according to the schedule (approximately one day).

Warning

AWS updates or creates a new export file once a day. If the export file is not placed in the specified bucket under the specified prefix, the export will fail with an error.

status_failed

Discover Resources#

OptScale needs to have permissions configured in AWS for the user Data Source in order to correctly discover resources and display them under a respective section of the dashboard for the associated employee.

Make sure to include the following policy in order for OptScale to be able to parse EC2 resources data:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "OptScaleOperations",
            "Effect": "Allow",
            "Action": [
                "s3:GetBucketPublicAccessBlock",
                "s3:GetBucketPolicyStatus",
                "s3:GetBucketTagging",
                "iam:GetAccessKeyLastUsed",
                "cloudwatch:GetMetricStatistics",
                "s3:GetBucketAcl",
                "ec2:Describe*",
                "s3:ListAllMyBuckets",
                "iam:ListUsers",
                "s3:GetBucketLocation",
                "iam:GetLoginProfile",
                "cur:DescribeReportDefinitions",
                "iam:ListAccessKeys"
            ],
            "Resource": "*"
        }
    ]
}

Your AWS Data Source should now be ready for integration with OptScale! Please contact our Support Team at support@hystax.com if you have any questions regarding the described configuration flow.

Create Data Export#

Note

Creating a Data Export is only available for the Root cloud account (payer), while all its Linked accounts will be centrally managed and receive their billing data through the main account’s invoice.

In order to utilize automatic / manual billing data import in OptScale, first, you need to create a Data Export in AWS. Please refer to their official documentation to become acquainted with the guidelines for Data Exports.

  • Navigate to AWS Billing & Cost Management → Data Exports.

  • Create a new Data Export.

Standard#

Step 1. Export type

  • Select Standard data export export type.

Step 2. Export name

  • Input export name.

Step 3. Data table content settings:

  • Select CUR 2.0.

  • Select Include resource IDs checkbox.

  • Choose the time granularity for how you want the line items in the export to be aggregated.

Step 4. Data export delivery options:

  • Pick Overwrite existing data export file.

  • Select compression type.

Step 5. Data export storage setting:

  • Create a new or use an existing bucket for the export.

  • Enter the S3 path prefix that you want prepended to the name of your Data Export.

Step 6. Review

  • Confirm export creation. Data Export will be prepared by AWS during 24 hours.

Legacy CUR Export#

Step 1. Export type

  • Select Legacy CUR export (CUR) export type.

Step 2. Export name

  • Input export name.

Step 3. Export content

  • Select Include resource IDs and Refresh automatically checkboxes.

Step 4. Data export delivery options:

  • Choose the time granularity for how you want the line items in the export to be aggregated.

  • Pick Overwrite existing report.

  • Select compression type.

Step 5: Data export storage setting:

  • Create a new or use an existing bucket for the export.

  • Enter the S3 path prefix that you want prepended to the name of your Data Export.

Step 6. Review

  • Confirm export creation. Data Export will be prepared by AWS during 24 hours.

When it’s done, follow the steps from the section Root account – Data Export already configured

Linked#

OptScale supports the AWS Organizations service that allows linking several Data Sources in order to centrally manage data of multiple users while receiving all billing exports within a single invoice.

Selecting a AWS Linked tab will make the registration flow easier eliminating the option to input bucket information for billing purposes since this will be received through the root account, whose user will then be able to distribute periodic reports individually if intended by the company management. In this case, only Access key and Secret access key are required.

linked_account

Note

If you only specify a AWS Linked account without providing credentials for the main one, OptScale will not be able to import any billing data.

Use Connect to create a Data Source in OptScale. If some of the provided values are invalid, an error message will indicate a failure to connect.

Discover Resources#

OptScale needs to have permissions configured in AWS for the user Data Source in order to correctly discover resources and display them under a respective section of the dashboard for the associated employee.

Make sure to include the following policy in order for OptScale to be able to parse EC2 resources data:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "OptScaleOperations",
            "Effect": "Allow",
            "Action": [
                "s3:GetBucketPublicAccessBlock",
                "s3:GetBucketPolicyStatus",
                "s3:GetBucketTagging",
                "iam:GetAccessKeyLastUsed",
                "cloudwatch:GetMetricStatistics",
                "s3:GetBucketAcl",
                "ec2:Describe*",
                "s3:ListAllMyBuckets",
                "iam:ListUsers",
                "s3:GetBucketLocation",
                "iam:GetLoginProfile",
                "cur:DescribeReportDefinitions",
                "iam:ListAccessKeys"
            ],
            "Resource": "*"
        }
    ]
}

Your AWS Data Source should now be ready for integration with OptScale! Please contact our Support Team at support@hystax.com if you have any questions regarding the described configuration flow.

Migrating from CUR to CUR 2.0#

The information on this page can be useful if an AWS Data Source (Legacy CUR export schema) has already been connected and you want to configure CUR 2.0 data and update the AWS Data Source.

A new bucket is required#

Create a new Data Export with CUR 2.0 schema. Navigate to AWS Billing & Cost Management → Data Exports page.

Step 1. Export type

  • Select Standard data export export type.

Step 2. Export name

  • Input export name. The content of the Export name field will be required when updating an AWS Data Source in OptScale.

aws_export_cur2_01

Step 3. Data table content settings:

  • Select CUR 2.0.
  • Select Include resource IDs checkbox.
  • Choose the time granularity for how you want the line items in the export to be aggregated.

aws_export_cur2_03

Step 4. Data export delivery options:

  • Pick Overwrite existing data export file.
  • Select compression type.

Step 5. Data export storage setting

  • Configure a new bucket. The content of the S3 path prefix and S3 bucket name fields will be required when updating an AWS Data Source in OptScale.

aws_export_cur2_04

Step 6. Review

  • Confirm export creation. Data Export will be prepared by AWS during 24 hours.

Click on the existing AWS Data Source on the Data Source page. The page with detailed information opens. Click the UPDATE CREDENTIALS button to update the Data Source credentials. Switch on Update Data Export parameters to update info about the billing bucket.

aws_migrate_cur1_cur2_07

Select Standard data export (CUR 2.0) export type. Enter Export name from the first step as Export nameS3 bucket name as Export Amazon S3 bucket name, and S3 bucket name as Export path prefix.

Save and wait for a new export to import!

The bucket already exists#

Use this case if you have already connected an AWS Data Source (on Legacy CUR export schema) and want to configure CUR 2.0 data into the same bucket.

Create a new Data Export with CUR 2.0 schema. Navigate to AWS Billing & Cost Management → Data Exports page.

Step 1. Export type

  • Select Standard data export export type.

Step 2. Export name

  • Input export name. The content of the Export name field will be required when updating an AWS Data Source in OptScale.

aws_export_cur2_01

Step 3. Data table content settings:

  • Select CUR 2.0.
  • Select Include resource IDs checkbox.
  • Choose the time granularity for how you want the line items in the export to be aggregated.

aws_export_cur2_03

Step 4. Data export delivery options:

  • Pick Overwrite existing data export file.
  • Select compression type.

Step 5: Data export storage setting:

  • Select an existing bucket in the Data export storage settings section.

    aws_export_cur2_05

  • Input NEW S3 path prefix.

    aws_export_cur2_06

Click on the existing AWS Data Source on the Data Source page. The page with detailed information opens.

Click the UPDATE CREDENTIALS button to update the Data Source credentials. Switch on Update Data Export parameters to update info about the billing bucket.

aws_migrate_cur1_cur2_08

Select Standard data export (CUR 2.0) export type and update Export name and Export path prefix fields, save and wait for a new export to import!

Azure#

Subscription#

To track a new Azure Data Source in your OptScale account, please select the Azure Subscription tab at the Data Source Connection step during the initial configuration or later on in the Settings section of the main page.

connect_azure

Name#

In the first field, you can specify any preferred name to be assigned to this Data Source in OptScale.

Subscription ID#

The Subscription ID is a unique string that identifies your Azure subscription. To find it:

  • Log in to the Microsoft Azure Portal.
  • Search for Subscriptions to access a list of all subscriptions associated with your Azure account. The list will include a subscription ID for each one.

When OptScale is programmatically signing in to Azure, it needs to pass a tenant ID and a application ID along with a secret, which is an authentication key.

Application (client) ID#

Application (client) ID has to be generated manually in Azure to allow API communication with OptScale:

  • Access the Azure Active Directory and navigate to App registrations
  • CLick on + New registration and provide a name, e.g. OptScale, and then Register at the bottom of the page
  • A new Application ID will become available (as in the screenshot below)

app_registration

Attention

Once you have registered an Application, it is essential to explicitly grant it permissions in a form of a Role assignment to work with the current Azure subscription.

To perform a Role assignment, from the Azure home page navigate to Subscriptions and select the one you have provisioned to be linked to OptScale.

After being redirected to its dashboard, click Access control (IAM) in the left navigation bar and then go the Role assignments tab and click +AddAdd role assignment.

access_control

You will be prompted to input the Role, which has to be Reader, in the first field. The second one can be left unchanged. The third field should contain the name of a registered application from the previous steps, e.g. OptScale. Click Save to add the role assignment.

add_role

Directory (tenant) ID#

Directory (tenant) ID is a globally unique identifier (GUID) that is different from your organization name or domain. Its value is easily accessible in the overview of the application that has been added in the previous steps via App registrations.

Go to Home → App registrations → e.g. OptScale → Overview → Directory (tenant) ID

tenant_ID

Secret#

Secret should be created within the newly registered application:

  • Go to the App registrations, click on your application, e.g. OptScale
  • Select Certificates & Secrets in the left navigation bar and add a + New client secret

Attention

Secret’s value will be hidden shortly after its creation. Make sure to copy it in a safe place.

create_secret

Once the required fields are filled out, you can click Connect to validate the information. Once you have connected to the account, the data will be pulled from the source shortly afterwards and become available in the UI.

Your Azure Data Source account should now be ready for integration with OptScale! Please contact our Support Team at support@hystax.com if you have any questions regarding the described configuration flow.

Tenant#

To track a new Azure tenant Data Source in your OptScale account, please select the Azure tenant tab at the Data Source Connection step during the initial configuration or later on in the Settings section of the main page.

connect_azure_tenant

Name#

In the first field, you can specify any preferred name to be assigned to this Data Source in OptScale.

Application (client) ID#

Application (client) ID has to be generated manually in Azure to allow API communication with OptScale:

  • Access the Azure Active Directory and navigate to App registrations
  • CLick on + New registration and provide a name, e.g. OptScale, and then Register at the bottom of the page
  • A new Application ID will become available (as in the screenshot below)

app_registration

Attention

Once you have registered an Application, it is essential to explicitly grant it permissions in a form of a Role assignment to work with the current Azure subscription.

To perform a role assignment, from the Azure home page navigate to Subscriptions and select the ones you have provisioned to be linked to OptScale.

After being redirected to its dashboard, click Access control (IAM) in the left navigation bar and then go the Role assignments tab and click +AddAdd role assignment.

access_control

You will be prompted to input the Role, which has to be Reader, in the first field. The second one can be left unchanged. The third field should contain the name of a registered application from the previous steps, e.g. OptScale. Click Save to add the role assignment.

add_role

Directory (tenant) ID#

Directory (tenant) ID is a globally unique identifier (GUID) that is different from your organization name or domain. Its value is easily accessible in the overview of the application that has been added in the previous steps via App registrations.

Go to Home → App registrations → e.g. OptScale → Overview → Directory (tenant) ID

tenant_ID

Secret#

Secret should be created within the newly registered application:

  • Go to the App registrations, click on your application, e.g. OptScale
  • Select Certificates & Secrets in the left navigation bar and add a + New client secret

Attention

Secret’s value will be hidden shortly after its creation. Make sure to copy it in a safe place.

create_secret

Once the required fields are filled out, you can click Connect to validate the information. Once you have connected to the account, the data will be pulled from the source shortly afterwards and become available in the UI.

GCP#

Google Cloud#

Enable Billing export#

Please follow the official GCP guide to enable billing data export – Set up Cloud Billing data export to BigQuery | Google Cloud.

As the result you should have a new table in your BigQuery project. Note the names of the dataset and the table. You will need them later when connecting your cloud account to OptScale.

gcp_billing_tables

Prepare a role for OptScale#

With a CLI command#

Run the following command in GCP CLI:

gcloud iam roles create
optscale_connection_role \--project=hystaxcom
\--permissions=bigquery.jobs.create, bigquery.tables.getData, compute.addresses.list, 
compute.addresses.setLabels, compute.disks.list, compute.disks.setLabels, compute.firewalls.list, 
compute.globalAddresses.list, compute.instances.list, compute.instances.setLabels, compute.images.list, 
compute.images.setLabels, compute.machineTypes.get, compute.machineTypes.list, compute.networks.list, 
compute.regions.list, compute.snapshots.list, compute.snapshots.setLabels, compute.zones.list, 
iam.serviceAccounts.list, monitoring.timeSeries.list, storage.buckets.get, storage.buckets.getIamPolicy, 
storage.buckets.list, storage.buckets.update

Via Google Cloud console#

1. Go to Roles page and click Create Role.

2. Give the role any name and description.

3. Add the following permissions:

  • bigquery.jobs.create
  • bigquery.tables.getData
  • compute.addresses.list
  • compute.addresses.setLabels
  • compute.disks.list
  • compute.disks.setLabels
  • compute.firewalls.list
  • compute.globalAddresses.list
  • compute.instances.list
  • compute.instances.setLabels
  • compute.images.list
  • compute.images.setLabels
  • compute.machineTypes.get
  • compute.machineTypes.list
  • compute.networks.list
  • compute.regions.list
  • compute.snapshots.list
  • compute.snapshots.setLabels
  • compute.zones.list
  • iam.serviceAccounts.list
  • monitoring.timeSeries.list
  • storage.buckets.get
  • storage.buckets.getIamPolicy
  • storage.buckets.list
  • storage.buckets.update

Create service account#

Official documentation on service accounts – Service accounts | IAM Documentation | Google Cloud.

  1. Go to Service accounts page and click Create Service Account
  2. Give it any name and click Create and Continue.
  3. Specify the role that you created earlier and click Continue and then Done

Generate API key for your service account#

  1. Find your service account in the service accounts list and click on its name to go to service account details page.
  2. Go to Keys tab.
  3. Click Add key -> Create new key
  4. Service account API key will be downloaded as a .json file. You will need this file on the next stage when connecting your cloud account to OptScale.

Connect Data Source in OptScale#

Use the newly downloaded service account credentials json file with the billing dataset details to connect your GCP cloud account.

gcp_connection_form

Google Cloud tenant#

Prepare a role for OptScale#

With a CLI command#

Run the following command in GCP CLI:

gcloud iam roles create
optscale_connection_role \--project=hystaxcom
\--permissions=bigquery.jobs.create, bigquery.tables.getData, compute.addresses.list, 
compute.addresses.setLabels, compute.disks.list, compute.disks.setLabels, compute.firewalls.list, 
compute.globalAddresses.list, compute.instances.list, compute.instances.setLabels, compute.images.list, 
compute.images.setLabels, compute.machineTypes.get, compute.machineTypes.list, compute.networks.list, 
compute.regions.list, compute.snapshots.list, compute.snapshots.setLabels, compute.zones.list, 
iam.serviceAccounts.list, monitoring.timeSeries.list, storage.buckets.get, storage.buckets.getIamPolicy, 
storage.buckets.list, storage.buckets.update

Via Google Cloud console#

1. Go to Roles page and click Create Role.

2. Give the role any name and description.

3. Add the following permissions:

  • bigquery.jobs.create
  • bigquery.tables.getData
  • compute.addresses.list
  • compute.addresses.setLabels
  • compute.disks.list
  • compute.disks.setLabels
  • compute.firewalls.list
  • compute.globalAddresses.list
  • compute.instances.list
  • compute.instances.setLabels
  • compute.images.list
  • compute.images.setLabels
  • compute.machineTypes.get
  • compute.machineTypes.list
  • compute.networks.list
  • compute.regions.list
  • compute.snapshots.list
  • compute.snapshots.setLabels
  • compute.zones.list
  • iam.serviceAccounts.list
  • monitoring.timeSeries.list
  • storage.buckets.get
  • storage.buckets.getIamPolicy
  • storage.buckets.list
  • storage.buckets.update

Create service account#

Official documentation on service accounts – Service accounts | IAM Documentation | Google Cloud.

  1. Go to Service accounts page and click Create Service Account.

  2. Give it any name and click Create and Continue.

  3. Specify the role that you created earlier and click Continue and then Done.

Grant access#

For each project that needs to be added to the tenant, go to the IAM & Admin section in the Google Cloud Console, select IAM, and press the GRANT ACCESS button. Add the created service account and assign the created role to it.

Generate API key for your service account#

  1. Find your service account in the service accounts list and click on its name to go to service account details page.

  2. Go to Keys tab.

  3. Click Add key -> Create new key.

  4. Service account API key will be downloaded as a .json file. You will need this file on the next stage when connecting your cloud account to OptScale.

Connect a Data Source in OptScale#

To track a new Google Cloud tenant Data Source in your OptScale account, please select the Google Cloud tenant tab at the Data Source Connection step during the initial configuration or later on in the Settings section of the main page.

connect_gcp_tenant

Kubernetes#

To track a new Kubernetes cluster Data Source in your OptScale account, please select the Kubernetes tab on the Data Source Connection page.

connect_kubernetes

FieldDescription
NameSpecify any preferred name to be used for this data source in OptScale.
UserSpecify a user for the cost metrics collector to use when pushing data to this data source.
PasswordSpecify a password for the cost metrics collector to use when pushing data to this data source.

Use Connect to create a Data Source in OptScale.

Click on the newly created Data Source on the Data Sources page. The page with detailed information appears.

kubernetes_clickon

Use KUBERNETES INTEGRATION or instructions to get the instructions to install the software that collects information about running pods and converts them into cost metrics.

kubernetes_instuctions

Software installation on a cluster#

To get cost metrics download and install helm chart on the Kubernetes cluster. Helm chart is created to collect Kubernetes resources information and share it with OptScale FinOps project. Install one release per cluster.

1. Download Hystax repo#

Use this command to download repo:

helm repo add hystax https://hystax.github.io/helm-charts

2. Install Helm Chart#

There is a difference in instructions when a Kubernetes Data Source is connected on my.optscale.com or on OptScale deployed from open source. In both cases, instructions given are adapted for a selected Data Source and deployed OptScale. All you need is just to copy-paste it and replace the <password_specified_during_data_source_connection> phrase with a user’s password.

My.optscale.com#

kubernetes_chart_installation_prod

kubernetes_chart_installation_prod

Note

Specify the user’s password instead of the <password_specified_during_data_source_connection> phrase.

Warning

Please await the completion of the metric generation process, which typically requires approximately one hour.

Open Source OptScale#

kubernetes_instuctions_list_install

kubernetes_chart_installation

Clusters

Oftentimes a set of cloud resources that serve a common purpose can be viewed as a single entity from the user perspective. In many cases, resources are not intended to run independently and are created/terminated simultaneously while sharing an attribute like tag value that connects them together. Such groups of resources can be merged into clusters making them stand-alone virtual resources in OptScale for better management options.

Sub-resources are added into clusters automatically based on cluster type definitions that are created by the user with the Organization Manager role.

To set up a new cluster, go to the Configure cluster types on the right-hand corner of the Resources page and click the Add button on the Cluster Types page.

main_cluster

Please define new cluster type name (will be used as the type of created cluster resources) and clusterization tag key. Once created, this cluster type will be applied to newly discovered resources. Existing resources can be clusterized using the Re-apply cluster types action.

A definition consists of the following two parameters:

  • cluster type name – must be unique within the organization.

  • tag key – a common parameter to be used for consolidating sub-resources.

add_cluster

Cluster type definitions are automatically applied to resources upon their discovery in OptScale (through billing or direct discovery) for consolidation even before any existing assignment rules take effect.

The list of cluster types is prioritized to avoid any conflicts. A new cluster type is placed at the bottom of the list so it does not affect existing clusters and can be prioritized manually later on.

Clicking on the Re-apply cluster types button triggers a new resource allocation sequence that uses the current order of cluster types as a rule.

A cluster type can be deleted from the same page. Upon deletion, all clusters of this type are disassembled as well.

In the Resources section, items that are part of a cluster are marked with the cluster_icon_small symbol to distinguish them among usual resources.

cluster_icon

Clicking on their names will bring up the Cluster Details page that lists all included sub-resources, their total expenses, and constraints that are applied throughout the cluster.

Note

When a resource becomes part of a cluster, it can no longer be reassigned separately or receive an individual constraint in OptScale.

 

 

Shared Environments

OptScale Shared Environments provide a signle point to manage your production and test environments. These resources are meant to be acquired for exclusive utilization by an organization member or a team member only temporarily and then released making them available again. The common use cases are release stands and demo environments where the availability of a range of components (e.g. for deploying some software version) has to be controlled in order to eliminate accidental breaking or loss of interconnections.

Depending on your cloud platform, there are two ways to create a Shared Environment in the product:
  • if you have a connected Cloud Account, you can easily mark the resources as Shared Environments
  • or you can simply add new Shared Environment, not related to any connected Cloud Account, from the Shared Environments page.

Mark existing cloud resources as Shared Environments#

At first, when an Organization is added into OptScale, the section is empty and new Shared Environments have to be created manually by a user with the Organization Manager role from the scope of shareable resources that can only include Instances or OptScale clusters.

To be able to manage your cloud resource as environment, select the resource on the Resources page and click Mark as Shared Environment.

mark_environment

The marked environment will appear on the Shared Environments page. The resource is now available for booking, and you can proceed with using webhooks and integrating with your CI/CD.

Create new Shared Environment#

To create a new Environment in OptScale select Shared Environments at the left sidebar of the main page and click Add.

add_environment

You will be prompted to input a Name and a Resource type, set if SSH access is required, specify additional properties for a new Shared Environment. Dy defauls, the solution requests to add the Description, IP, and Software. Use the Add Property button in case if you want to add extra properties.

create_env

A newly created entity will be added to this section and listed in a table providing information about a Pool that it belongs to, its Status which can be In useAvailable, or UnavailableUpcoming bookings, related Software and Jira tickets (optional).

list_env

Action buttons#

The following action buttons allow to control an existing environment:

book_button – a Shared Environment can be booked for a selected period of time.

book_menu

release_button – a booked Shared Environment can be released to make it available again. A notification will be sent to the corresponding tenant user.

deactivate_button – a scope of resources can be deactivated temporarily to prohibit it from being booked.

delete_button – deletion of the Shared Environment.

Detailed instructions on how to organize access to shared resources using OptScale find on our website.

Resource Assignment

Strong resource assignment improves efficiency, reduces costs, enhances accountability, and helps projects stay on track, all of which contribute to higher overall success rates in organizational and project management contexts.

Newly created resources are distributed among pools based on assignment rules. If they have certain tags, they will immediately be assigned to the appropriate pool as soon as they are first discovered. If they do not belong to any pool, they will be assigned to the Data Source pool. The Data Source pool is created when the Data Source is connected.

OptScale allows you to manage assignment rules by viewing, creating, editing, and changing their priority.

Assignment Rules Table#

Go to the Pools page and click on the Configure Assignment Rules button.

poolspage

Assignment Rules is a centralized interface for viewing and managing all existing assignment rules in the system. This page provides a list of rules displayed in a tabular format for easy navigation and interaction. Users can view the status of each rule and take appropriate actions directly from this page.

assignment_rules

All assignment rules are displayed in a table, allowing easy browsing and management. The general actions Add and Re-apply Ruleset are placed above the table. Search functionality is available based on criteria such as NameAssigned toConditions, and Priority. For ease of navigation, the page supports pagination, allowing to view assignment rules in manageable chunks. All data columns are sortable.

Each row represents a specific rule and includes details such as:

  • a descriptive name for the rule,
  • the pool and owner to whom the resource is assigned,
  • a summary of the conditions that trigger the rule,
  • the priority used to apply assignment rules to resources,
  • actions.

Use the Actions column buttons to manage rules:

prioritize – prioritize – sets the rule priority to the highest. The priority of the other rules will be decreased by one

promote – promote – increases rule priority by 1. The rule will be swapped with the previous rule.

demote – demote – decreases rule priority by 1. The rule will be swapped with the rule behind it.

deprioritize – deprioritize – assigns a rule with the lowest priority across the given organization. All other affected rules will be updated.

edit – edit – allows modification of an existing rule. Adjust conditions and assignment details, ensuring that the rule remains accurate and up to date with changing requirements. A rule can be enabled/disabled by ticking the Active checkbox. Active rules are marked with green dots next to their names, inactive ones are marked with grey dots.

delete – delete – removes the rule from the list.

Note

OptScale will assign newly detected resources automatically according to the rules listed above. Rules are evaluated according to the priority. You can also force re-apply them for the whole organization or a specific pool if you need to reflect allocation policies changes immediately.

Add Assignment Rule#

There are two ways to add an Assignment Rule: through the Assignment Rules page or the Resource page. On the Resource page, you get a specialized version of the Add Automatic Resource Assignment Rule Form with pre-filled fields based on your resource information.

Assignment Rules page#

The Add Automatic Resource Assignment Rule Form is used to define new assignment rules for automatic resource allocation based on specified conditions. The form is divided into sections to collect relevant information for creating an assignment rule, including the rule’s name, status, conditions, and assignment targets.

add_rule

The list of available Conditions:

Type Key/Value Description
Name/ID starts with string value Matches resources where the name or ID begins with the specified value.
Name/ID ends with string value Matches resources where the name or ID ends with the specified value.
Name/ID is string value Matches resources with an exact name or ID match.
Name/ID contains string value Matches resources where the name or ID contains the specified substring.
Tag is key-value pair Matches resources with a tag that matches both the specified key and value.
Tag exists key-value pair Matches resources that have a tag with the specified key.
Tag value starts with key-value pair Matches resources with a tag value starting with the specified value for the given key.
Source is selected row Matches resources from a selected data source. Options are presented in a dropdown for selecting the source.
Resource type is string value Matches resources where the resource type contains the specified value.
Region is string value Matches resources where the region contains the specified value.

Input a Name for the new automatic resource assignment rule as well as the Conditions that have to be fulfilled in order for the rule to become applicable. Add and remove conditions to suit your needs. As the result, a matching resource will be included into the selected Target Pool and assigned to an Owner.

A newly created rule is always prioritized across the organization and is put at the top of the list for the discovered resources to be checked against its conditions first. If these are not satisfied, a resource will be checked against the remaining rules in descending order until one is found applicable.

Resources page#

This form is a specialized version of the Add Automatic Resource Assignment Rule Form from the Assignment Rules page. The page is accessible by clicking the Add Assignment Rule button directly from a specific resource’s details page.

add_rule_resources

Its purpose is to simplify the creation of assignment rules for a particular resource by pre-filling relevant fields with the resource’s existing data.

add_rule_resources

Add Conditions according your needs, specify a NameTarget Pool, and Owner.

Re-apply Ruleset#

reapply – Re-apply Ruleset – initiates a new check of the already assigned resources against the current ruleset. Resources will be reorganized accordingly even if they were explicitly assigned otherwise before. This feature helps manage assignments, especially when rules are edited, new ones are added, or the priority of existing rules is changed.

Note

The re-check process takes some time.

To start the process, click the Re-apply Rules button. A side modal opens.

reapply_ruleset

Select whether you want to re-apply to the entire organization or a specific pool. In the second selection, specify a pool and, if necessary, enable the With sub-pools checkbox. The Run button starts the process and closes the modal, while the request continues to run in the background. The Cancel button simply closes the modal without performing any actions.

Note

Re-apply options are available if the user has Organization Manager or Root Pool Manager permissions.

This feature provides quick access to the assignment rules that apply specifically to a given pool. It helps users understand how resources are being assigned and managed within the pool, ensuring transparency and simplifying rule verification and modification by centralizing relevant rules in one place.

To find a list of assignment rules associated with the selected pool, open the detailed page of the desired pool from the Pools page.

reapply_ruleset

Each rule entry includes the NameOwner, and Conditions of the rule, providing a clear overview of the rule configurations.

The Conditions column offers a detailed summary of the conditions defined for each rule, including condition types such as Name/ID containsTag is, or Source is.

Click the See all assignment rules link at the bottom of the section to navigate to the main assignment rules listing page, where all rules across different pools can be managed.

Resources constraints & Pool constraint policies

To address the ever-dynamic cloud infrastructure where resources are being created and deleted continuously, OptScale introduces a set of tools to help limit the related expenses and the lifetime of individual assets. The following feature is implemented in form of constraints that User can set for a specific resource or generally for a Pool.

There are two constraint types that can be set:

  • TTL – time to live, a resource should not live more than the specified period. For a resource, specify a date and time. For a pool, input an integer between 1 and 720 hours.
  • Daily expenses limit – resource spendings should not exceed the specified amount in dollars. Input as integer, min $ 1, 0 – unlimited.

When OptScale discovers active resources in the connected source, it checks that they don’t violate any existing Pool constraints that were applied as policies before.

When a resource hits a constraint, both manager and owner of the resource are alerted via email. If a resource is unassigned – alerts are sent to the organization managers. On the Pools page an exclamation mark will appear next to the pool name.

Note

As of now, OptScale provides solely notifications about violated constraints and does not interact with the connected source itself to perform any constraint-related adjustments.

Resources constraints#

Navigate to the desired asset by selecting the appropriate resource on the Resources page.

resources

To assign constraints to resources, go to the Constraints tab on the selected resource’s page.

On the Constraints tab, use the slider to enable/disable the current setting. Click on the pencil image below the constraint’s name and fill in the values for TTL or the Daily expense limit in the fields.

constraint_resources

If a resource doesn’t have a specific constraint set, it inherits the policies from its Pool. However, resource owner or manager can override an existing Pool constraint policy for an individual resource by issuing a custom constraint for any given asset.

Pool constraint policies#

This is a more high-level setting that facilitates the flow in a way that allows implementing policies for entire Pools instead of a single resource. Thus, a manager can enforce all resources in the Pool to share constraints so that they are applied to all resources in this Pool, while custom resource-specific constraints can still exist and yet override the general policy.

Click on Pools in the left sidebar and choose a Pool group or its sub Pool.

constraint_budget

Click on the pencil image next to a constraint’s name and fill in the values for TTL or the Daily expense limit in the empty fields. Use the slider to enable/disable the current setting.

Note

A constraint will not be visible if the related resource has already been deleted from OptScale or if a resource has been tracked only by imported billing data.

Pool deletion#

The Pool structure can be changed by deleting unnecessary Pools via the dedicated section of the main page. This option is not available for Pools that have sub-Pools – latter have to be deleted first.

Note

An employee should have the Manager role in the parent of the Pool that they want to delete.

Use delete button from the Actions menu to delete a Pool.

budget_actions

Warning

This action is irreversible. The solution will ask for confirmation before deletion.

delete_budget

When a Pool is deleted:

  • all resources are reassigned to its parent Pool;
  • all rules that used to point to this Pool are redirected to its root.

Integrations

Google Calendar#

Integrating Google Calendar to display shareable resource booking intervals as events allows users to view and manage availability in real time. Each booked interval appears as a Google Calendar event, making it easy to see open and reserved slots at a glance. This setup enables streamlined scheduling, letting users quickly check and share resource availability with others through a familiar calendar interface.

Follow these steps to connect Google Calendar to OptScale:

Prepare your Google Calendar#

1. Create or choose one of the existing secondary calendars in your Google Calendar

2. Share it with the OptScale service account

calendar-service@optscale.iam.gserviceaccount.com to do this:

  • Select the calendar

  • Open the Calendar Settings

  • Navigate to the Share with specific people or groups section

  • Click the Add people and groups button

google_add_people

  • Add the email address: calendar-service@optscale.iam.gserviceaccount.com and select the Make changes to events permission.

google_make_changes

Note

If you have an open-source OptScale, please use the client_email specified in the user_template.yml file instead of calendar-service@optscale.iam.gserviceaccount.com.

3. Copy the Calendar ID from the Integrate calendar section.

google_calendar_id

Connect the Calendar to OptScale#

1. Open the Integrations page and click the CONNECT CALENDAR button

2. Paste the Calendar ID into the opened side modal

google_connect

3. Click the Connect button to view your Shared Environment schedules directly in the Google Calendar.

Slack App#

Slack has become a popular communication tool that brings Managers, DevOps and Engineering team members together in their everyday tasks. OptScale can be integrated into Slack as an application to provide a range of notifications, monitoring and management options in a familiar interface to engage everyone in a more efficient FinOps strategy without delays that are often caused by the necessity to access several platforms.

To add our app to your Workspace in Slack and connect it to your OptScale account:

1. Access OptScale’s UI (https://my.optscale.com/, by default).

2. Log in as the user that you want to assign the Slack app to. (Re-login as a preferred user to get Slack notifications depending on the Organization Role).

3. Go to <optscale_url>/slacker/v2/install (https://my.optscale.com/slacker/v2/install).

4. Click on the Add to Slack button.

5. Click Allow on the next page to add the permissions and be redirected to the Slack desktop app or its browser version.

allow_slack

6. Once the application is installed in Slack, you will see a greeting message which includes a link to authorize the app in OptScale, so please follow it.

authorize_link

Your OptScale account has been connected to the Slack app!

app_connected

If the user is a member of several organizations in OptScale, the next step is to choose a default organization.

default_organization

Listing resources#

To get a list of resources that have been assigned to you in the current organization, select OptScale in the Apps section and type in “resources”.

list_resources

By clicking the Details button to the right of any presented resource, you will receive a new message containing its full name, region, pool, owner as well as the amount of related expenses and constraints with the option to modify resource’s current TTL via the Update TTL button.

update_ttl

Additionally, each message in the app contains a link to OptScale web console or the corresponding Resource page for a quick access to the main portal.

Listing environments#

To get a list of existing shareable resources (first 10 by name), select OptScale in the Apps section and type in “envs”.

list_envs

Listing organization#

To get a list of your organizations, select OptScale in the Apps section and type in “org”.

list_org

The active organization is labeled “Active”. To select another organization as active, click the Choose button to the organization’s right.

Notifications & alerts#

All team members can receive alerts about occurrences and tasks in the cloud that require additional attention through the designated notification channel.

Below are the instructions on how to include OptScale alerts into your Notifications channel in Slack:

1. Create a Notifications channel in Slack if you do not have one already

2. Access the Channel details

3. Select the Integrations tab

channel_details

add_apps

4. Select OptScale from the list of apps in your workspace. You will receive the following message

optscale_bot

Now that the app is connected to the Notification channel, you can create additional alerts for your team:

Note

Only users with the Manager Role can modify and update notifications.

1. Select OptScale in the Apps section and type in “alerts”.

2. You will receive a list of existing alerts, from which you can Delete or Add alert. Choose the latter option to add a new one.

alerts

3. In the new window, select the desired pool that should be tracked, the threshold limit to trigger the alert and the target channel where the members should be notified and click Add.

add_alert

The task is complete! You have added a new alert for your team members to help control your Organization’s cloud expenses.

Quickstart

MLOps, or Machine Learning Operations, is a set of practices that aims to streamline and optimize the lifecycle of machine learning models. It integrates elements from machine learning, DevOps, and data engineering to enhance the efficiency and effectiveness of deploying, monitoring, and maintaining ML models in production environments. MLOps enables developers to streamline the machine learning development process from experimentation to production. This includes automating the machine learning pipeline, from data collection and model training to deployment and monitoring. This automation helps reduce manual errors and improve efficiency. Our team added this feature to OptScale. It can found in the MLOps section. This section provides everything you need to successfully work with machine learning models.

Use the comm_doc Community documentation to get a brief description of each page.

There are two essential concepts in OptScale MLOps: tasks and runs:

  • run is a separate execution of your training code. A new run entry appears in OptScale for each run. Run is a single iteration of a task. This can include the process of training a model on a given dataset using specific parameters and algorithms. Each run records the settings, data, results, and metrics, allowing researchers and developers to track model performance and compare different approaches.

  • task allows you to group several runs into one entity so that they can be conveniently viewed.

Follow this sequence of steps to successfully work with MLOps:

  1. Create a task
  2. Create metrics
  3. Assign metrics to the task
  4. Integrate your training code with OptScale.

Tutorial#

Step 1. Prepare task and metrics#

Create a Task#

Note

Only organizational managers can create tasks.

To add a new Task, go to the Tasks page and click the Add button to set up and profile the Task you’d like to manage.

1000

When creating a new Task in OptScale, specify: Task name, key, description, owner, and tracked metrics.

1010

Note

Please note that the task key is an immutable value and is used inside the init command.

Note

If your organization does not have metrics, block “Metrics“ will be empty. You can always add them to the task later (see Assign Metrics to the Task).

Create Metrics#

Note

Only organizational managers can create metrics.

To add a new Metric, go to the Metrics page and click the Add button.

1020

When adding a Metric in OptScale, you need to specify the following details:

1040

  • Name: This is the name that will display in the interface.
  • Key: A unique metric identifier, used in commands like send.

  • Tendency: Choose either “Less is better” or “More is better” to define the desired trend.

  • Target value: The target value you aim to achieve for this metric.
  • Aggregate function: This function will apply if multiple values are recorded for the metric within the same second.

Note

The key is an immutable value.

These settings help ensure that metrics are tracked consistently and accurately across your tasks and runs.

Assign Metrics to the Task#

To assign a metric to a Task in OptScale:

1. Open the Task by going to the Tasks page and clicking on the Task name

2. Click the Configure button

3. Select the Metrics tab

4. Add the desired metrics.

1080

This allows you to configure and assign specific metrics to the Task for tracking and analysis.

Step 2. Integrate your training code with OptScale#

Note

Please note that the command parameter values and file references in the examples might be fictitious. Use your own values and references.

Install optscale_arcee#

The optscale_arcee is a Python package which integrates ML tasks with OptScale by automatically collecting executor metadata from the cloud and processing stats.

It requires Python 3.7+ to run. You can install optscale_arcee from PyPI on the instance where the training will occur. If it’s not already installed, use:

pip install optscale_arcee

This setup ensures that optscale_arcee can collect and report relevant metrics and metadata during ML tasks.

Commands and examples#

Find examples of specialized code prepared by our team at the https://github.com/hystax/optscale_arcee/tree/main/examples.

To view all available commands:

1. Go to the Tasks page of the MLOps section of the menu

2. Click the Task name you want to run

3. Click the Profiling Integration button of the opened page.

1090

The Profiling Integrations side modal provides a comprehensive list of commands available for profiling and integration with OptScale.

1095

Please pay special attention to the import and initialization commands. They are required.

Import#

Note

This command is required.

Import the optscale_arcee module into your training code as follows:

import optscale_arcee as arcee

Initialization#

Note

This command is required.

To initialize an optscale_arcee collector, you need to provide both a profiling token and a task key for which you want to collect data. This ensures that the data collected is associated with the correct Task and can be accurately monitored and analyzed in OptScale.

The profiling token is shared across the organization, meaning it is common to all users and tasks within that organization. The task key is specified when creating a Task.

Find the profiling token and task key in the Profiling Integration side modal. To access it, click the task name in the list on the Tasks page, then click the Profiling Integration button.

initialization_01

To initialize the collector using a context manager, use the following code snippet:

with arcee.init("YOUR-PROFILING-TOKEN", "YOUR-TASK-KEY"):
    # some code

Alternatively, to get more control over error catching and execution finishing, you can initialize the collector using a corresponding method. Note that this method will require you to manually handle errors or terminate arcee execution using the error and finish methods.

arcee.init("YOUR-PROFILING-TOKEN", "YOUR-TASK-KEY"):
# some code
arcee.finish()
# or in case of error
arcee.error()

This information is provided in the Profiling Integration side modal. You can simply copy the command. Please note that the command changes depending on the selected task.

OptScale from open-source

If you are using OptScale from the open-source version, you need to specify the endpoint_url parameter in the init method. The address you provide must be publicly accessible to ensure proper communication between the optscale_arcee collector and the OptScale server.

with arcee.init("YOUR-PROFILING-TOKEN", "YOUR-TASK-KEY", endpoint_url="https://public_ip:443/arcee/v2"):
  # some code

Send metrics#

To send metrics, use the send method with the following parameter:

  • data (dict, required): a dictionary of metric names and their respective values (note that metric data values should be numeric).
arcee.send({ "metric_key_1": value_1, "metric_key_2": value_2 })

Example:

# Import the optscale_arcee module
import optscale_arcee as arcee

# Initialize the collector using a context manager
with arcee.init("YOUR-PROFILING-TOKEN", "YOUR-TASK-KEY"):
  # Send metric
  arcee.send({ "accuracy": 71.44, "loss": 0.37 })

Note

In the interface you will see only those metrics that are attached to the task (details can be found How to assign Metrics to the Task).

Finish task run#

To finish a run, use the finish method

arcee.finish()

Fail task run#

To fail a run, use the error method

arcee.error()

Execute instrumented training code#

Run your script with the command:

python training_code.py

If the script runs successfully, the corresponding run will appear with a Running status under the associated Task in OptScale.

This indicates that optscale_arcee is actively tracking and sending metrics during execution.

Step 3. Extended settings. Commands and examples#

Add hyperparameters#

To add hyperparameters, use the hyperparam method with the following parameters:

  • key (str, required): the hyperparameter name.
  • value (str | number, required): the hyperparameter value.
arcee.hyperparam(key, value)

Note

Unlike metrics, hyperparameters do not need to be created first.

Example:

# Import the optscale_arcee module
import optscale_arcee as arcee

# Initialize the collector using a context manager
with arcee.init("YOUR-PROFILING-TOKEN", "YOUR-TASK-KEY"):
  # Add hyperparam
  arcee.hyperparam("epochs", "10")

Tag task run#

To tag a run, use the tag method with the following parameters:

  • key (str, required): the tag name.
  • value (str | number, required): the tag value.
arcee.tag(key, value)

Example:

# Import the optscale_arcee module
import optscale_arcee as arcee

# Initialize the collector using a context manager
with arcee.init("YOUR-PROFILING-TOKEN", "YOUR-TASK-KEY"):
  # Add run tags
  arcee.tag("purpose", "testing")
  arcee.tag("code_commit", "commit_id")

Add milestone#

To add a milestone, use the milestone method with the following parameter:

  • name (str, required): the milestone name.
arcee.milestone(name)

Example:

# Import the optscale_arcee module
import optscale_arcee as arcee

# Initialize the collector using a context manager
with arcee.init("YOUR-PROFILING-TOKEN", "YOUR-TASK-KEY"):
    # Download training data
    # Add milestone with name "Download training data"
    arcee.milestone("Download training data")
    ...
    # Download test data
    # Add milestone with name "Download test data"
    arcee.milestone("Download test data")
    ...

Add stage#

To add a stage, use the stage method with the following parameter:

  • name (str, required): the stage name.
arcee.stage(name)

Example:

# Import the optscale_arcee module
import optscale_arcee as arcee
...
# Initialize the collector using a context manager
with arcee.init("YOUR-PROFILING-TOKEN", "YOUR-TASK-KEY"):
    # Download training data
    # Add stage with name "preparing"
    arcee.stage("preparing")
    ...

Log datasets#

To log a dataset, use the dataset method with the following parameters:

  • path (str, required): the dataset path
  • name (str, optional): the dataset name
  • description (str, optional): the dataset description
  • labels (list, optional): the dataset labels.
arcee.dataset(path, name, description, labels)

This method works properly, regardless of whether the dataset exists in OptScale.

Let’s take a look at the possible scenarios.

Log a dataset known to OptScale#

Follow the instruction to add a new Dataset.

To log the dataset use the path from the Datasets page as the path parameter in the dataset(path) method.

existing_dataset_03

Example:

In the example we’ll log the dataset shown on the screenshot with the name “100 flowers“ and the path “https://s3.amazonaws.com/ml-bucket/flowers_231021.csv“.

# Import the optscale_arcee module
import optscale_arcee as arcee

# Initialize the collector using a context manager
with arcee.init("YOUR-PROFILING-TOKEN", "YOUR-TASK-KEY"):
  # Log existing dataset
  arcee.dataset("https://s3.amazonaws.com/ml-bucket/flowers_231021.csv")

Log a dataset unknown to OptScale#

If the dataset is unknown to OptScale, simply specify the desired path as a parameter in dataset(path), and the dataset will be registered automatically.

Update the dataset information on the Datasets page by clicking the Edit icon.

Create models and set additional model parameters#

Create models#

To create a model, use the model method with the following parameters:

  • key (str, required): the unique model key
  • path (str, optional): the run model path
arcee.model(key, path)

Note

In OptScale, if a specified model does not exist, it will typically be created automatically. However, you also have the option to create a model in advance manually via the OptScale interface. To do this, you can refer to the instructions in the OptScale documentation.

Example:

We have created the model with the name “Iris model prod“ and the key = “iris_model_prod“

creating_models_03

To create a version of the “Iris model prod“ model:

# Import the optscale_arcee module
import optscale_arcee as arcee

# Initialize the collector using a context manager
with arcee.init("YOUR-PROFILING-TOKEN", "YOUR-TASK-KEY"):
  # Create “Iris model prod“ model version 
  arcee.model("iris_model_prod", "https://s3.amazonaws.com/ml-bucket/flowers_231021.pkl")

Each model can have multiple versions. When a new model is added, it starts at version “1”. If additional models are registered under the same model key, the version number increments automatically with each new registration.

Set additional parameters for the model#

Model version#

To set a custom model version, use the model_version method with the following parameter:

  • version (str, required): the version name.
arcee.model_version(version)
Model version alias#

Model version aliases allow you to assign a mutable, named reference to a particular version of a registered model. Each alias should be unique within the model’s scope. If an alias is already in use, it should be reassigned.

To set a model version alias, use the model_version_alias method with the following parameter:

  • alias (str, required): the alias name.
arcee.model_version_alias(alias)
Tag#

To add tags to a model version, use the model_version_tag method with the following parameters:

  • key (str, required): the tag name.
  • value (str, required): the tag value.
arcee.model_version_tag(key, value)

Example:

# Import the optscale_arcee module
import optscale_arcee as arcee

# Initialize the collector using a context manager
with arcee.init("YOUR-PROFILING-TOKEN", "YOUR-TASK-KEY"):
  # Create “Iris model prod“ model version 
  arcee.model("iris_model_prod", "https://s3.amazonaws.com/ml-bucket/flowers_231021.pkl")
  # Set custom version name
  arcee.model_version("My custom version")
  # Set model verion alias
  arcee.model_version_alias("winner")
  # Set model verion tag
  arcee.model_version_tag("env", "staging")

Create artifacts#

To create an artifact, use the artifact method with the following parameters:

  • path (str, required): the run artifact path.
  • name (str, optional): the artifact name.
  • description (str, optional): the artifact description.
  • tags (dict, optional): the artifact tags.
arcee.artifact(path, name, description, tags)

Example:

# Import the optscale_arcee module
import optscale_arcee as arcee

# Initialize the collector using a context manager
with arcee.init("YOUR-PROFILING-TOKEN", "YOUR-TASK-KEY"):
#Create Accuracy line chart artifact
arcee.artifact("https://s3/ml-bucket/artifacts/AccuracyChart.png",
               name="Accuracy line chart",
               description="The dependence of accuracy on the time",
               tags={"env": "staging"})

Set artifact tag#

To add a tag to an artifact, use the artifact_tag method with the following parameters:

  • path (str, required): the run artifact path.
  • key (str, required): the tag name.
  • value (str, required): the tag value.
arcee.artifact_tag(path, key, value)

Example:

# Import the optscale_arcee module
import optscale_arcee as arcee

# Initialize the collector using a context manager
with arcee.init("YOUR-PROFILING-TOKEN", "YOUR-TASK-KEY"):
  # Add artifact tag
  arcee.artifact_tag("https://s3/ml-bucket/artifacts/AccuracyChart.png",
                   "env"
, "staging demo")

 


Tips#

How to create a dataset#

1. Open the Datasets page and click the Add button

existing_dataset_01

existing_dataset_02

2. When adding a dataset, specify:

  • Path – the dataset path, providing information about the dataset’s source. Used in the dataset method.

  • Name – the name of the dataset, e.g. "iris_data"

  • Timespan from/ Timespan to – the dataset validity time

  • Description – the dataset description

  • Labels – the dataset labels.

How to create a model#

1. Open the Models page and click the Add button

creating_models_01

creating_models_02

2. Specify in the Add Model page:

  • Name – the name with which the model will be displayed in the interface

  • Key – the model identifier which is used in the model command

  • Description – the model description

  • Tags – the model tags.

Tasks page#

A task is the primary unit of organization and access control for runs; all runs belong to a task. Tasks let you visualize, search for, and compare runs and access run metadata for analysis. Find already created tasks on the Tasks page of the MLOps section of the menu.

mlops_005

The Tasks page contains a list of profiled tasks, including their key, status, goals, expenses, and output metrics from the most recent run.

mlops_010

You can also set filters and take action. The Profiling Integration action gives you complete instructions on successfully launching code on your instance to get the result.

Use Manage Metrics to add or edit metrics. Here, you can define a set of metrics against which every module should be evaluated. You can also set a target value and tendency for each captured metric.

Behind the Executors button hides the Executors’ list, showing you compute resources used to run your ML activities.

You can see information about some tasks by clicking on its name.

mlops_020

mlops_030

Summary cards contain information about the last run status, last run duration, lifetime cost (the cost of all runs in the task), and recommendations (recommendations for instances used as executors).

mlops_040

Let’s take a closer look at the Tracked Metrics section.

mlops_050

Tracked Metrics are a graphical representation of all runs and specific numbers for the last run. The green circle indicates that the metric has reached the target value under the tendency, while the brown one indicates that it has not. After that, the last run value/target metric value line values for all task runs follow.

green_circle

brown_circle

The Last run executor shows on which instance the last run was executed.

mlops_060

If the executor is known to OptScale, you will receive complete information about it and its expenses. Please note that supported clouds are AWS, Azure, Alibaba, and GCP. If the executor is unknown to the OptScale, then some information about it will be unavailable. The section will look like it is shown below:

mlops_070

To edit the task, use the Configure button in the Actions menu in the right-hand corner of the page.

mlops_080

How to view training results#

The results of the training code are displayed in OptScale. Find them in the Home or MLOps section. Use the comm_doc Community documentation to get a brief description of each page.

Information on the latest run can be found in four places:

  • Home page → Tasks section. View a table displaying the status, execution time, and metric values of the last run for recently used tasks.

how_to_view_010

  • MLOps → Tasks section. Focus on the Last Run, Last Run Duration, and Metrics columns.

how_to_view_020

  • MLOps → Tasks → click on a task → Overview tab. Observe Tracked metricsTracked metrics are a graphical representation of all runs and specific numbers for the last run. Additionally, find information about the instance in which the last run was executed.

how_to_view_025

  • MLOps → Tasks → click on a task → navigate to the Runs tab. Find information about all runs on this page. The last run details are at the top of the table.

Task details page#

On the Overview tab, beneath the description field, you can view the status of the metrics, including whether they reached their goals, the total number of runs, and the launch time of the last successful run. The Last Run Cost and details about the Last Run Executor are also displayed.

how_to_view_030

Find a detailed description of the page in the how-tos section of our website.

Runs Tab#

Detailed statistics for each run are available in the Runs tab. The information is presented in two formats: graphical and tabular. The graph can be filtered, and you can select metrics and other parameters to display. Hovering over the graph reveals data in a tooltip. The table offers comprehensive details for each run.

how_to_view_035

Click on a run to view its Metrics, Datasets, Hyperparameters, Tags, and Charts. Artifacts and Executors are displayed in separate tabs, providing details about their location and associated costs. You can also find detailed information about the script execution, including its launch source (Git status), the executed command (command), and the console output (logs).

how_to_view_040

Model Version Tab#

The table on the Model Version tab displays all versions of models from task runs, along with their parameters. Each row is clickable and provides access to detailed information.

how_to_view_050

Leaderboards Tab#

Compare groups of task runs that are organized by hyperparameters and tags to determine the optimal launch parameters. This section also includes convenient functionality for comparing groups of runs.

Recommendations Tab#

On the Recommendations tab, you can view recommendations for cloud instances used during training. Recommendations are available for instances used in the past 7 days.

Executors Tab#

The table displays all executors for every task run.

MLOps Hypertuning flow

OptScale allows you to launch runs with a different set of hyperparameters on cloud instances linked to AWS Data Source. This functionality helps optimize your machine learning workflows by enabling you to test different configurations in parallel, utilizing the cloud’s scalability for faster experimentation.

Note

A separate instance is launched for each run.

Warning

OptScale launches cloud instances with the configuration defined for your cloud in the solution interface. When the run is completed, the instance will be destroyed.

To use the hyperparameter tuning feature, create a template, then a runset, and launch it.

Note

The AmazonEC2FullAccess policy is required for the user specified in the AWS Data Source connection to use Hypertuning.

First Launch#

Step 1. Create Runset Template#

Since all runs must belong to tasks, you need to create a task first. Don’t forget to create and attach relevant metrics to the task to effectively track performance.

Once the task and metrics are created, it’s time to create the runset templates.

Go to the Hypertuning page and click on the ADD button.

Note

Only organizational managers can create runset templates.

010_add_runset_template

Specify the template information and runset settings.

020_add_runset_template_form

Tasks: a set of tasks to which the created runs belong.

Data Sources: select a cloud where the instances are launched and the training takes place.

Regions: select a region to launch instances, such as us-east-1.

Instance types: select the type of instances on which runs are executed, such as p3m5t3t2.

Maximum runset budget: specify the maximum cost for all launched instances during the execution of runs.

Note

If the budget is exceeded, the execution of the runs will be interrupted. They will acquire a ‘Aborted’ status, and the instances will be deleted.

Resource name-prefix: specify a prefix for all launched instances.

Tags for created resources: specify tags for all launched instances.

Hyperparameters: specify the name and the environment variable for each learning parameter. For example, epochs = int(os.getenv('EPOCHS', 5)), where epochs is the name of the hyperparameter and EPOCHS is the environment variable.

Note

Hyperparameter values are set when launching the runset.

Step 2. Create and Launch Runset#

When creating a runset, a single value must be specified for each template parameter: Tasks, Data Sources, Regions, and Instance Types. It is also necessary to set specific values for the hyperparameters and define abort conditions.

Click on the name of the created template.

030

Click the LAUNCH button.

040

Fill in the fields and click Launch.

Note

If you have already launched runsets for this template, press Fill from latest launch to automatically fill in all fields with data from the most recent runset.

Warning

If you select an instance type that is not supported in the chosen region, an error occurs.

045_runset_configuration

Request Spot instances: switch on to execute runs on Spot instances.

Max attempts: specify the number of attempts to use a Spot instance before switching back to Pay-as-you-Go.

Hyperparameter: enter a comma-separated list of hyperparameter values you want to try with this runset.

Note

The number of runs created is equal to the number of hyperparameter combinations.

Example 1: You have the EPOCHS hyperparameter with values 5, 7, 10, and 20. In this case, 4 runs will be created. In the first run, the EPOCHS parameter will be 5; in the second run, it will be 7; and so on.

Example 2: You have two hyperparameters: EPOCHS (with values 2, 3, and 5) and STATE (with values 3 and 4). In this case, 6 runs will be created. In the first run, the EPOCHS parameter will be 2 and the STATE parameter will be 3; in the second run, the EPOCHS parameter will be 3 and the STATE parameter will be 3; and so on.

Commands to execute: specify the commands to run on each executor. They can include setup steps, data preprocessing tasks, task execution processes, or any other operations required for your ML runs. For example,

sudo pip install torchvision==0.13.0
wget https://hystax-eu-fra.s3.eu-central-1.amazonaws.com/linear_learn.py -O /home/ubuntu/linear_learn.py
python3 /home/ubuntu/linear_learn.py

Abort conditions: conditions that, when met, cause the runset is completed, even if not all runs was finished.

  • Abort runset when projected expenses exceed: turn on and enter the expenses if the execution of the runs should be interrupted when the budget is exceeded. If the condition is met, the runs acquire an ‘Aborted’ status, and the instances are deleted.

  • Abort individual run if its duration exceeds: turn on and enter the time in minutes, if the execution of the runs should be interrupted when the time’s up. If the condition is met, the runs acquire an ‘Aborted’ status, and the instances are deleted.

  • Abort runset when one of the runs reaches task goals: turn on, if the runset should be interrupted after the completion of a run, when all its metrics have reached the set goals.

View Results#

To view the results, click on the template on the Hypertuning page.

Observe configuration details and all runsets launched for the template, along with brief information.

050_template_details

Find summary cards displaying total run counts, last runset expenses, and total expenses information.

Note

Expense values are pre-calculated based on the price of the selected instance type and update as soon as the cloud provider sends billing data.

It is easy to get detailed information about each runset. Just click the runset name in the Runsets table.

060_runsets_list

Runset details, the correlations chart, runset run information, and information about raised instances can be found here.

070_runsets_page

Correlations chart and Runs tab#

The Correlations chart shows the values of hyperparameters and metrics for all created runs. Find all the runset runs under the Runs tab.

080_correlations

Click the gear 090_settings_button icon next to the Correlations caption of the section to choose which parameters to display.

The ability to select and highlight specific axis values helps filter the data. Сlick on individual tick marks or dragging across multiple tick marks for an in-depth exploration of the hyperparameters and metrics correlations in the runset.

100_filter

The content of the Runs table depends on the chart and changes when the chart data is updated or filtered.

To clear the selection, press the CLEAR FILTERS button.

Executors tab#

All instances created in the cloud are shown on this tab. Pay special attention to the Status field, as it updates constantly depending on the state of the instance in the cloud.

110_executors_tab

Brief status descriptions:

Terminated – the instance was successfully deleted in the cloud after the successful completion of the run.

Error – the instance was deleted as a result of an abort condition, execution error, or manual stop of the runset. Reasons:

  • Destroy flag is set – one of the runset runs reached its goals. The runset is completed, and all unfinished runs are marked as Aborted.

  • Duration exceeded – the run executed on this instance lasted longer than indicated in the corresponding abort condition.

Setting up SMTP

SMTP is the standard protocol for transmitting email messages over the internet. Configure SMTP to enable OptScale to send smart notifications to users about events and actions related to cloud cost optimization in their infrastructure.

Note

This instruction is only applied on custom OptScale deployments.

To set up SMTP on a custom OptScale deployment:

1. Fill in these fields in overlay/user_template.yml (optscale/optscale-deploy/overlay/user_template.yml):

# SMTP server and credentials used for sending emails
smtp:
  server:
  email:
  login:
  port:
  password:
  protocol:

2. Restart the cluster with the updated user_template.yml:

./runkube.py --with-elk  -o overlay/user_template.yml -- <deployment name> <version>

3. If emails are still missing after smtp configured, check the errors in kibana using query container_name: *heraldengine*

 

Kibana logs#

This guide is intended to help users navigate Kibana to view logs when encountering issues. By following the outlined steps, you’ll be able to efficiently access and analyze log data, enabling you to identify and troubleshoot potential problems.

Note

This instruction is only applied on custom OptScale deployments.

  1. Open http://cluster_ip:30081
  2. Use the username and password from your own user_template.yaml file (optscale/optscale-deploy/overlay/user_template.yml)
  3. Go to the Discover tab
  4. Filter the data by container name name: *<container_name>*
  5. Add a message filter.

Note

If kibana prompts you to create an index pattern, create it with the following parameters: kibana_index_pattern

Troubleshooting

If address http://cluster_ip:30081 is not available, ensure that you used the --with-elk flag on cluster installation step (GitHub – hystax/optscale) and elk-0 pod in the running state.

Slack integration#

Slack has become a popular communication tool that brings Managers, DevOps and Engineering team members together in their daily tasks. You can connect your Slack account with OptScale to access OptScale’s functionality directly from Slack.

Note

This instruction is only applied on custom OptScale deployments.

To prepare your OptScale cluster for Slack integration, follow these instructions: optscale/slacker/README.md

Launch login modal Launch register modal