Commit 938aadf8 authored by Cameron McFarland's avatar Cameron McFarland

Updating my notes in the README.

parent e55f8c73
# Terraform for the AWS SnowPlow Pipeline
Locally pull down the docker images we think we'll need.
* docker pull
* docker pull
* docker pull
This configuration uses the following AWS services to host SnowPlow. There
may be more in use, but these are the primary services.
1. EC2 (Auto Scaling Groups, Launch Configurations, ELB, Target Groups,
Security Groups)
1. Kinesis
1. DynamoDB
1. IAM (Policies and Roles)
1. S3
1. VPC (Subnets, VPC, Internet Gateways, Routes, Routing Tables)
If we are going to user docker images, we'll need to make our own images since
we need to put our config in there. There just isn't a supported method to
tunnel configs in via environment variables (at least not easily).
Need to tag them and put them into the two ECR repositories.
## Terraform Version
This environment is using Terraform version 0.12.0. The configuration for the
S3 remote state file is a little different. It is found at the top of the file.
Need to build a VPC for this entire thing.
There is also a subnet. Probably should make three, in different regions.
Also need an internet gateway? And routing tables?
Going to probably need an elastic IP for the snowplow endpoint.
## Design Document
If you want to know more about the SnowPlow infrastructure, please consult
the design document. XXX
## SnowPlow Installs and Configs
There are three types of SnowPlow nodes (Collectors, Enrichers, and S3Loaders)
and they are all configured and installed via user-data in the launch
## Kinesis Streams
Kinesis is how SnowPlow hands off data from collector to enricher to s3loader.
* snowplow-raw-good
* snowplow-raw-bad
* snowplow-enriched-good
* snowplow-enriched-bad
* snowplow-s3loader-bad
## DynamoDB
The enricher and s3loader nodes use DynamoDB to track Kinesis state. Normally
these tables would be allocated by Terraform, but if the nodes themselves don't
create the tables, it did not seem to work properly. Therefore, access to the
tables is controlled by roles and policies, but the tables are managed by the
SnowPlow nodes that need them. If the table needs to be created, the nodes will
do that on their own.
* SnowplowEnrich-gitlab-us-east-1
* SnowplowS3Loader-gitlab-us-east-1
Going to probably need an elastic IP for the snowplow endpoint. Revisit this!
IAM Policies and Roles:
Need roles/policies to allow proper access to the collectors, enrichers and s3 loaders.
ECS Tasks need a IAM role to allow access to things? Yes.
Making a bunch of policies is great, but they were for EC2, not ECS.
Kinesis Streams:
These tables will be made by the enricher and s3loader respectively. With proper roles/policies in place, there is no need to make these with TF.
Volumes and config?
One way is to include the config into the docker image. Not easy to update configs.
Mount an EBS volume? Not sure how this works yet.
The default config (of which there isn't one in the pre-built images) does not support env variables.
Switch to EC2 and a dynamic LB? Why? Why not?
I made two kinesis streams with 1 shard each to start with. snowplow-good and snowplow-bad
OMG it worked.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment