Implement an Alibaba Cloud billing export aligned with the FOCUS specification with an objective to produce a reliable, query-ready dataset suitable for analytics, reporting, and cost governance use cases.
NOTE: The latest FOCUS version as of the publication date of this Paper is 1.3, with Alibaba Cloud supporting 1.0 in invitational preview. Access to the FOCUS export may need to be requested. Results may vary depending on specification versioning and upstream data completeness (see Alibaba Cloud FOCUS 1.0 field differences).
Prior to using the FOCUS export, billing data was available only as raw daily CSV files stored in Amazon S3. These files were organized in a deeply nested time-based folder structure and included complex fields such as JSON-encoded tags. While the source data was technically complete, it was not easily consumable at scale. Schemas were weakly typed, tags were embedded as strings, and queries required significant preprocessing.
The goal of this effort was to transform these raw exports into an incremental, normalized dataset that could be efficiently queried using Amazon Athena and integrated into downstream cost analytics workflows.
Several structural challenges needed to be addressed in order to operationalize the data.
Although the exported billing files were delivered as CSV, some fields (particularly tags) contained JSON objects with key-value pairs such as:
These fields are frequently critical for FinOps analysis but are not directly usable in SQL without transformation.
New billing files were written daily to S3, requiring a solution that could:
The dataset also needed to remain:
This ensured that the resulting dataset could be used consistently across analytics environments and reporting tools, while acknowledging that full FOCUS conformance depends on upstream data completeness.
The first step involved configuring the Alibaba Cloud billing export based on the official procedure for Billing FOCUS Export.
This process automatically generates an OSS bucket containing daily exports of FOCUS billing data.
Each export includes compressed CSV files representing billing line items.
To automate ingestion of this data into a FinOps data lake, a Function Compute service was implemented within Alibaba Cloud (similar to AWS Lambda).
These credentials must allow the function to write data into the destination S3 bucket.
Security note: In a production deployment, consider using a dedicated IAM user with least-privilege permissions (write-only to the target S3 prefix), enabling key rotation, and evaluating cross-cloud identity federation as an alternative to long-lived access keys.
The function is configured to execute automatically whenever new export files are created.
Exported files are stored in a raw S3 bucket, which serves as the base dataset for downstream processing.
The transformation pipeline was implemented using:
The design focused on four core principles:
Using a Glue Visual ETL job, the pipeline performs several transformation steps.
The ETL job recursively reads billing files from the raw S3 bucket.
CSV parsing is configured with:
This prevents column parsing errors caused by embedded commas within fields.
An ApplyMapping step converts relevant columns into strongly typed fields such as:
Enforcing types early prevents silent schema drift and ensures consistency across downstream analytics systems.
The S3 folder structure already encodes temporal metadata. The pipeline extracts partition columns directly from the file path, including:
This allows natural partitioning without modifying the source files.
The tags column is parsed from a JSON string into a map<string,string> structure.
From this map, common tag attributes are extracted into normalized fields such as:
These fields become directly queryable in SQL while preserving the full tag map for flexibility.
The transformed dataset is written in Parquet format with Snappy compression and partitioned by:
This significantly reduces query scan costs and improves performance in Athena.
The AWS Glue job automatically updates the AWS Glue Data Catalog, ensuring schema definitions remain synchronized with the transformed dataset.
Once the normalized Parquet dataset is written to S3, it is made queryable through Amazon Athena.
Instead of relying on schema-on-read approaches or ad hoc definitions, an explicit Athena table is created and registered in the AWS Glue Data Catalog.
The table definition reflects:
Explicit table definition ensures that:
The table references the normalized S3 location, and new data is appended incrementally.
Partitions are discovered dynamically rather than hard-coded in the table definition.
To synchronize metadata with the underlying storage layout, a scheduled maintenance job runs the command:
MSCK REPAIR TABLE db.focus_alibaba_normalized;
This command updates Athena metadata to include newly created partitions.
By separating data ingestion, transformation, and metadata management, the pipeline maintains a clear and robust architecture.
Glue workflows and triggers coordinate the execution of these stages.
Although the normalized dataset is queryable directly, exposing the base table to analysts would still introduce unnecessary provider-specific complexity.
To address this, an additional abstraction layer was implemented using curated SQL views in Athena.
The first layer includes views tailored specifically for Alibaba Cloud billing data. These views:
This layer shields consumers from the complexity of the underlying schema.
A second layer of views aligns datasets across multiple providers, including:
These views expose a consistent, FOCUS-structured schema, allowing datasets from multiple providers to be combined without additional transformations. Note that the unified view inherits any upstream data gaps; columns such as ContractedCost and PricingQuantity will contain null values for Alibaba rows until the upstream export populates them.
This abstraction layer serves as the contract between data engineering and analytics, providing a stable and documented interface.
Business intelligence tools can connect directly to Athena and query these views without needing to understand ingestion pipelines, file formats, or provider-specific semantics.
The resulting pipeline produces a FOCUS-aligned billing dataset that:
What began as a collection of raw CSV billing exports has evolved into an automated, production-ready billing export pipeline designed to support FinOps analysis and decision-making. Full FOCUS conformance will depend on continued improvements to the upstream Alibaba Cloud export.
During early validation of the dataset, several fields were identified that may require review or adjustment in the upstream export. These observations were shared with the Alibaba Cloud team during the initial testing phase. Until these gaps are resolved upstream, the pipeline’s output will contain null or unexpected values in the affected columns, limiting certain analytics use cases.
For a complete list of known field differences, see Alibaba Cloud FOCUS 1.0 Preview Field Differences.
The following fields currently appear to contain numeric references rather than descriptive values:
Note: ProviderName and PublisherName were deprecated in FOCUS 1.3, replaced by ServiceProviderName. Pipelines targeting future FOCUS versions should plan for this change.
The following fields were not present in the exported dataset and may require verification:
The following fields were present in the exported schema but contained only null values. This list reflects the author’s export as of January 2026; consult the field differences page for the latest status.
Mandatory, nulls not allowed (hard conformance violations):
Mandatory, nulls conditionally allowed:
Conditional:
Observed by the author but not listed in Alibaba’s published conformance gaps (may require further verification):
The following issues are documented by Alibaba Cloud but not directly observed in the pipeline testing described above:
NOTE: The author reached out to the Alibaba Cloud team with this feedback as of January 2026.
Please get in touch with additional feedback or observations from additional implementations. Feedback helps the FOCUS Maintainers and Steering Committee improve the underlying schema and export functionality.
We’d like to thank the following for their work on this Paper: