README.md 2.4 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
# Terraform for the AWS SnowPlow Pipeline

This configuration uses the following AWS services to host SnowPlow. There
may be more in use, but these are the primary services.
1. EC2 (Auto Scaling Groups, Launch Configurations, ELB, Target Groups,
  Security Groups)
1. Kinesis
1. DynamoDB
1. IAM (Policies and Roles)
1. S3
1. VPC (Subnets, VPC, Internet Gateways, Routes, Routing Tables)

## Design Document
If you want to know more about the SnowPlow infrastructure, please consult
the design document. XXX

## SnowPlow Installs and Configs
There are three types of SnowPlow nodes (Collectors, Enrichers, and S3Loaders)
and they are all configured and installed via user-data in the launch
configurations.

## Kinesis Streams
Kinesis is how SnowPlow hands off data from collector to enricher to s3loader.
* snowplow-raw-good
* snowplow-raw-bad
* snowplow-enriched-good
* snowplow-enriched-bad
* snowplow-s3loader-bad

## DynamoDB
The enricher and s3loader nodes use DynamoDB to track Kinesis state. Normally
these tables would be allocated by Terraform, but if the nodes themselves don't
create the tables, it did not seem to work properly. Therefore, access to the
tables is controlled by roles and policies, but the tables are managed by the
SnowPlow nodes that need them. If the table needs to be created, the nodes will
do that on their own.
* SnowplowEnrich-gitlab-us-east-1
* SnowplowS3Loader-gitlab-us-east-1

40 41 42 43 44
## Launch Config Changes and Production Instances
Updating the launch config will apply to new systems coming up in the
auto-scaling group. But existing EC2 instances won't be changed. You will
have to rotate them manually to have them replaced.

45
## Runbook Material
Cameron McFarland's avatar
Cameron McFarland committed
46

47 48 49
### Logs
You should find logs for any node to be in **/snowplow/logs**.

50 51 52 53
### SSL Certificate
This is referenced as an ARN to the cert in AWS. We're not going to put the
private key in TF, so this will have to remain as an ARN reference.

Cameron McFarland's avatar
Cameron McFarland committed
54 55 56 57 58 59 60
[[email protected] ~]$ curl http://localhost:8000/health
OK[[email protected] ~]$


Debugging: -Dorg.slf4j.simpleLogger.defaultLogLevel=debug
Testing an event: curl http://34.227.92.217:8000/i\?e\=pv

Cameron McFarland's avatar
Cameron McFarland committed
61
S3Loader:
Cameron McFarland's avatar
Cameron McFarland committed
62
"I realize now my folly: the app name needs to be different between the enricher and loader ergo the 2 dynamoDB tables were conflicting. Everything makes so much sense now…"
63 64

Last steps:
65
Need to get firehose provisioned. It's weird.
66 67
Are we using the right SSH key?
Did we clean up everything we made for testing?