Data Analytics Cost Reduction

Over time, daily CSV-formatted data was increasing the cost of Athena queries being made for analytics and for QuickSight dashboards that are embedded into an Amplify web aplication to share analytics insights. Athena queries charge by the amount of data scanned in the S3 objects holding the data. The cost was somewhat variable due to the number of Athena queries run that day by the analysts.

To reduce Athena costs going forward, an ETL Glue Job was called daily to transform that day's data from CSV from to compressed Apache Parquet format and add the Parquet data as partitioned S3 data for Athena. Parquet, being a columnar format, greatly reduces the data scanned by Athena for its SQL queries.

The chart below shows two phases of cost reductions. The first/easiest by switching the Athena queries being called by QuickSight SPICE scheduled refreshes. These SPICE datasets were used for QuickSight dashboards embedded in the Amplify web application.

The second cost reduction was achieved by advertising the alternate Parquet-formatted database to the data analysts, who quickly switched once they confirmed obtaining the same query results in much less time, enabling them to explore the data faster and more efficiently while still reducing the cost to the company.

Data Analytics Cost Reduction