Thought Leadership

Meter ingestion options for high throughput metering use cases

Lior Mechlovich, Daniel Elman

August 30, 2022

Amberflo was built with the highest throughput use cases in mind – think billions of events per week – to accomplish this, we employed a serverless architecture that auto-scales to ensure consistent performance regardless of load. Still, for some use cases where the event volume reaches into the billions per customer (such as messaging or API monetization), the generic event-based payment model becomes unfeasible, even with volume-based discounting in place.

To ingest events, Amberflo offers a variety of options depending on the customer’s infrastructure and organizational preferences. The primary method that we recommend is using our SDKs, as these can be integrated closest to the source (where events are being generated in the client system) and can deliver the lowest possible latency. We also allow customers to connect directly to our API endpoints without using SDKs, in case their backend language is not supported or if they prefer to interact with cloud services such as S3 directly. Beyond those methods, Amberflo can take meter data from a CSV file, from a third-party logging or monitoring solution, or from a cloud storage solution like S3. If a customer has an existing meter data pipeline, we can work with them to direct the events to Amberflo.

Amberflo SDKs are built to operate asynchronously with automatic batching and flushing; this means that when an event is generated by a client system and sent to the Amberflo SDK, it is not immediately sent back to Amberflo for aggregation but instead, batches are sent at preconfigured time intervals (batch size and send interval are configurable). After the batch is sent, the cache is flushed and new events can be received.

Aggregate and Squash at the Source (Non-Distributed)

For use cases demanding ultra high volume metering such as large-scale messaging or API monetization, the standard ingestion method using our SDKs may not be feasible. In that case, despite the batching, each event is received and aggregated by Amberflo on a one-to-one basis. Even with volume-based discounting, this can quickly become cost-prohibitive using this approach. Instead, we recommend squashing these batches by aggregating some of these events at the source before ingestion to Amberflo.

For example, consider a batch of 1000 events (or 10 seconds’ worth with a standard batch size of 100 per second). This batch can be squashed at the source by aggregating the similar meters (similar meaning meters associated with the same customer, and having the same dimensions) and reporting the total aggregate value to Amberflo. Suppose in that batch of 1000 events (representing API calls), only 5 customers accessed the API; then what could be reported to Amberflo would be ingested as 5 events, one for each customer, corresponding to number of calls each customer made to the API over that 10 seconds (with the total across all 5 customers adding up to 1000).

Pre-Aggregation in a Distributed Context

In a distributed system, an additional step needs to be made of unifying and staging the data in an intermediate location such as S3. In a distributed system, meter events are generated at multiple different sources (such as multiple different servers or IoT devices). There needs to be a way to unify all the events that occurred over a given time period, regardless of where they originated. We recommend staging all events in S3 before aggregating or ingesting to Amberflo. After joining the data across locations, similar events can be aggregated as described above, with the aggregate totals reported to Amberflo at configurable intervals.

In this case, consider a factory with four different assets (machines) which complete 100 operations per second; every 10 seconds, batches of events from Assets A, B, C, and D are sent to S3 indicating success or failure of the machine for each operation. There, these events are squashed further by aggregating all of the successes and failures, then reporting these totals to Amberflo as two separate events (success total and failure total), rather than reporting the individual successes and failures to Amberflo.

Controlling Reporting Granularity

The trade-off when employing this approach is a slight loss in granularity. Since you are reporting aggregated values to Amberflo, there is no way to recover the specific values if they are needed at a later date. Continuing the above example, you would be able to see the failure rate across assets A-D (updated every 10 seconds), but you would not be able to see how each asset individually was performing using this approach. That said, by configuring the batch sizes and reporting interval, you have some control over the level of granularity being generated.

Related Posts