Setup Zuora Connector for AWS Athena
Configure AWS Athena destination within the Zuora Connector
To set up your AWS Athena connector, you must first configure your AWS Athena destination within the Zuora Connector. Follow the steps below to set up AWS IAM credentials, manage permissions, configure bucket access, and make other necessary configurations to ensure a seamless data transfer process from Zuora to AWS Athena.
Prerequisites
By default, Athena authentication uses role-based access. You must have the trust policy pre-populated with the data-syncing service's identifier to grant access. It should look similar to the following JSON object with a proper service account identifier:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal" : { "AWS": "arn:aws:iam:123456789012:sample_role" }, "Action": "sts:AssumeRole", } ] }
Step 1: Create a destination bucket, service policy, and role
Create Athena target bucket
Follow these steps to create a bucket to be used for staging data before transferring to a destination.
- Navigate to the S3 service page.
- Click Create bucket.
- Enter a Bucket name, select an AWS Region, and modify any of the default settings as desired.
Object Ownership can be set to "ACLs disabled" and Block Public Access settings for this bucket can be set to "Block all public access" as recommended by AWS. Make note of the Bucket name and AWS Region.
- Click Create bucket.
Create Athena access policy
- Navigate to the IAM service page > click on the Policies navigation tab > click Create policy.
- Click the JSON tab, and paste the following policy:
- Make sure to replace
ACCOUNT_ID
,WORKGROUP
,BUCKET_NAME
andSCHEMA
with your account information.WORKGROUP
should be primary unless otherwise specified during connection configuration.BUCKET
should refer to the bucket created in the previous step.SCHEMA
used below does not need to be created ahead of time. If it does not exist, it will be created automatically before transferring data.-
{ "Version": "2012-10-17", "Statement": [ { "Sid": "AllowAthenaAccess", "Effect": "Allow", "Action": [ "athena:GetQueryResults", "athena:StartQueryExecution", "athena:StartSession", "athena:GetDatabase", "athena:GetDataCatalog", "athena:GetWorkGroup", "athena:GetTableMetadata", "athena:GetQueryExecution" ], "Resource": [ "arn:aws:athena:*:ACCOUNT_ID:workgroup/WORKGROUP" ] }, { "Sid": "AllowGlueAccessToDestinationDatabaseAndTables", "Effect": "Allow", "Action": [ "glue:GetDatabases", "glue:GetDatabase", "glue:GetTables", "glue:GetTable", "glue:GetPartitions", "glue:CreateTable", "glue:CreateDatabase", "glue:UpdateTable", "glue:DeleteTable" ], "Resource": [ "arn:aws:glue:*:ACCOUNT_ID:catalog", "arn:aws:glue:*:ACCOUNT_ID:database/SCHEMA", "arn:aws:glue:*:ACCOUNT_ID:database/default", "arn:aws:glue:*:ACCOUNT_ID:table/SCHEMA/*" ] }, { "Sid": "AllowS3AccessToBucket", "Effect": "Allow", "Action": [ "s3:PutObject", "s3:ListBucket", "s3:GetBucketLocation", "s3:GetObject", "s3:DeleteObject" ], "Resource": [ "arn:aws:s3:::BUCKET_NAME", "arn:aws:s3:::BUCKET_NAME/*" ] } ] }
- Make sure to replace
- Click through to the Review step. Choose a name for the policy, for example,
transfer-service-policy
(this will be referenced in the next step), add a description, and click Create policy.
Athena vs. S3 permissionsBecause Athena uses S3 as the underlying storage layer, the Resource access requested in the policy is scoped down via resource-specific permissions in the S3 actions.
|
Schema vs. DatabaseDuring destination onboarding, you will be asked to provide both a "schema" and a "database". Though those are mostly synonymous in Athena, they are used for two different purposes here:
|
Create role
- Navigate to the IAM service page.
- Navigate to the Roles navigation tab > click Create role.
- Select Custom trust policy and paste the provided trust policy (from the prerequisite) to allow AssumeRole access to this role > click Next.
- Add the permissions policy created above, and click Next.
- Enter a Role name, for example,
transfer-
role
and click Create role. - Once successfully created, search for the created role in the Roles list, click the role name, and make a note of the ARN value.
Alternative authentication method: AWS User with HMAC Access Key ID & Secret Access Key
Role-based authentication is the preferred authentication mode for Athena based on AWS recommendations, however, HMAC Access Key ID and Secret Access Key is an alternative authentication method that can be used if preferred.
- Navigate to the IAM service page.
- Navigate to the Users navigation tab, and click Add users.
- Enter a User name for the service, for example,
transfer-
service
and click Next.- Under Select AWS access type, select the Access key - Programatic access option. Click Next: Permissions.
- Click the Attach existing policies directly option, and search for the name of the policy created in the previous step. Select the policy, and click Next: Tags.
- Click Next: Review and click Create user.
- In the Success screen, record the Access key ID and the Secret access key.
Step 2: Add your destination
- After completing the initial setup, share your
bucket name
with your Zuora representative who will create a connection link for you. - Using the connection link shared with you by Zuora, you can securely input your Google Cloud Storage details, including the
query results location
,schema
,workgroup
,bucket name
,bucket region
, choose auth method and input required credentials, finalizing the setup for the connection. - After you fill in all the required AWS Athena details through the provided link and test the connection, saving the destination will kickstart the onboarding process and begin transferring data.
Verification and Data Transfer
Once data synchronization is configured with the AWS Athena connector, the data is automatically transferred to the specified AWS Athena environment. This enables direct access and querying of the data within AWS services or through compatible data analysis tools. You can seamlessly manage and analyze this data across your AWS ecosystem.
Format of Transferred Data
Data transferred to the AWS S3 destination for AWS Athena will be loaded as Apache Parquet files in Apache Hive style partitions. The data organization structure is detailed below:
<bucket_name>/<folder_name>/<model_name>/dt=<transfer_date>/<file_part>_<transfer_timestamp>.parquet
where:
<bucket_name>
and<folder_name>
are provided during destination configuration.<model_name>
is the name of the data model being transferred (this is equivalent to a table name in relational data destinations).<transfer_date>
and<transfer_timestamp>
are generated at transfer time and based on the transfer's start time.<transfer_date>
is of the form2006-01-01
, while<transfer_timestamp>
is of the form20060102150405
.<file_part>
is a monotonically increasing integer for a given timestamp, and does not carry any special meaning.
This structure allows for efficient storage and querying of data within AWS Athena, providing seamless access for analytics and data processing tasks.
What are Apache Hive style partitions and Apache Parquet file format?
|