Everyone, these days, talks about the cloud. Deep learning, AI, serverless infrastructure, big data, yadda yadda yadda. Industry types string these buzzwords together and present them as magical formulas for guaranteed success, but who really understands what all the hype is about? What are the benefits of all this shiny tech and, more importantly, how the heck do you go about implementing this stuff in such a way that you reap the benefits without hurting your product or organisation?
We decided to attend AWS re:Invent 2016 in Vegas to get a better idea. We set off with a clear strategy based on our technical objectives and AWS experience, but with an open mind and an eagerness to discover new use-cases, ideas and experiences from companies that had been around the track before us. So what did we learn?
All in or not?
One question we asked ourselves prior to going to Vegas—and that you might want to also ask yourself as well—is: All in or not? By this, we mean: are you going to accept, as an organization, to be fully dependant on Amazon Web Services as the only option for certain parts of your infrastructure? The sooner you make this decision, the easier your architectural choices will be. For instance: should we use AWS Kinesis streams or implement our own Kafka cluster hosted on EBS ready to move to GCP or Azure at the click of a switch? If you need this kind of fallback or have a very specific case that no AWS managed service can answer (and you can’t create an abstraction to support a future fallback), you should play your cards more conservatively. If not, do yourself a favor and focus on your products instead of your DevOps. Remember, there is only one way to beat the House.
After our first full day at re:Invent, the message was clear: AWS is going hard on the full serverless propaganda. But what does serverless mean? Werner Vogels (Amazon CTO) describes the different levels of virtualization, starting with VM (your typical EC2 instances), followed by containers (like Docker on ECS), and it would seem that we’ve now come full circle with the final form of virtualization: serverless computing. A serverless service model is fully managed, eliminating the need to deploy, scale and supervise platform resources yourself. The likes of Lambda, Kinesis, API Gateway, DynamoDB, SQS, SNS, are all serverless services. So what’s the trap? Remember “All in or not?” Well, if you decided against going all in, you might want to take a step back and try to implement a less AWS-centric solution. Pricing models might also not fit all use-cases, so be careful.
Data lake™ and metadata
Searching in your data lake is costly and listing files on S3 takes ages. Yup, we feel your pain. AWS just launched Athena, a service that lets you run SQL-like queries on your S3 bucket. Yay! But, wait, what if you don’t have a precise schema for all these files? That’s why you need to organise all your data prior to dumping it into S3. You need metadata for each file somewhere that’s easily accessible and where you can run clean queries before retrieving the precise files you need.
Where’s my Kappa?
re:Invent ‘16 made no mention Kappa architecture... No one talked about real-time stream processing other than for certain cases of real-time bidding. There are countless ways to go about batch processing, and it feels like all of them were discussed at re:Invent. So if that’s what you’re looking for, you will be served well. Maybe next year stream processing will gain traction (fingers crossed)?
The silent heroes
One thing that was obvious to us during the convention is how some services are so ubiquitous that they are mentioned only in passing. For us, this indicated that we could likely consider these services as safe choices as we build and prepare to roll out our own data platform. S3, EC2, ECS, EBS, SQS, SNS, Route 53, API Gateway: these services were central components in most case studies shown, yet had almost no airtime when it came to discussing the specifics. The audience seemed to have no complaints, suggesting that these tools were mature and simple enough to use that their role was shrugged off as being self-evident.
As a closing remark, re:Invent is an intense convention. There’s a lot of running around and fighting for the best seats in the most popular talks, but in the end we can safely say it was an enriching experience. We learned a lot about what other companies were doing, where they felt their failure and successes laid. We also felt, despite the obvious echo-chamber effect, that it was a good confirmation process for us as prepare to go All in. Would we reattend? Gladly!