From charlesreid1

Line 1: Line 1:
Notes for google cloud data engineer certification.
Notes for google cloud data engineer certification.
==Technology stack==


The following list is based on the sample case study for the GCDE certification exam: https://cloud.google.com/certification/guides/data-engineer/casestudy-flowlogistic
The following list is based on the sample case study for the GCDE certification exam: https://cloud.google.com/certification/guides/data-engineer/casestudy-flowlogistic


The case study focuses on a logistics company tracking orders and shipments via rail, truck, aircraft, and ships.
The case study focuses on a logistics company tracking orders and shipments via rail, truck, aircraft, and ships.
==Goals and Motivation==


Goals:
Goals:
Line 28: Line 28:
* Penalties for late shipments and deliveries translates into direct correlation between profitability and bottom line
* Penalties for late shipments and deliveries translates into direct correlation between profitability and bottom line


Data center description:
==Technology Stack==


Databases:
Databases:
Line 37: Line 37:
Applications:
Applications:
* Customer frontend, middleware for orders and customs
* Customer frontend, middleware for orders and customs
* Tomcat for Java services
* [[Tomcat]] for Java services
* Nginx for static content
* [[Nginx]] for static content
* Batch servers (?)
* Batch servers (?)


Line 47: Line 47:


Analytics:
Analytics:
* Hadoop/Spark servers
* [[Hadoop]]/[[Spark]] servers
* Core data lake
* Core data lake
* Data analysis workloads
* Data analysis workloads


Miscellaneous servers:
Miscellaneous servers:
* Jenkins
* [[Jenkins]]
* Monitoring of servers
* Monitoring of servers
* Bastion hosts
* Bastion hosts
* Security scanners
* Security scanners
* Billing software
* Billing software

Revision as of 00:04, 12 September 2017

Notes for google cloud data engineer certification.

The following list is based on the sample case study for the GCDE certification exam: https://cloud.google.com/certification/guides/data-engineer/casestudy-flowlogistic

The case study focuses on a logistics company tracking orders and shipments via rail, truck, aircraft, and ships.

Goals and Motivation

Goals:

  • Implement real-time inventory tracking system that tracks locations
  • Perform data analytics on order and shipment logs (structured/unstructured data) to make decisions about deploying resources, targeting customers, and expanding into markets
  • Predict delays in shipments

Requirements:

  • Reliable, reproducible environment that scales
  • Aggregated data in centralized data lake
  • Historical data used to perform predictive analytics on future shipments
  • Accurate tracking of worldwide shipments (proprietary technology)
  • Improvement of business agility and speed of innovation via rapid provisioning of new resources
  • Analysis and optimization for performance in the cloud
  • Migration to cloud, if all other requirements met

Deeper reasoning:

  • Inability to upgrade infrastructure hampering growth and efficiency
  • Ineffective at moving data around
  • Need to better understand where/who customers are, what they are shipping
  • IT is too busy managing infrastructure to organize data/build analytics/implement tracking technology
  • Penalties for late shipments and deliveries translates into direct correlation between profitability and bottom line

Technology Stack

Databases:

  • SQL DB storing user data, static data
  • Cassandra DB storing metadata, tracking messages
  • Kafka servers tracking message aggregation and batch insert

Applications:

  • Customer frontend, middleware for orders and customs
  • Tomcat for Java services
  • Nginx for static content
  • Batch servers (?)

Storage:

  • iSCSI (internet small-computer-system interface) to manage VM hosts
  • Fiber channel network for SQL server storage
  • NAS (network attached storage) for image storage, logs, and backups

Analytics:

  • Hadoop/Spark servers
  • Core data lake
  • Data analysis workloads

Miscellaneous servers:

  • Jenkins
  • Monitoring of servers
  • Bastion hosts
  • Security scanners
  • Billing software