From charlesreid1

(Created page with "Notes for google cloud data engineer certification.")
 
No edit summary
Line 1: Line 1:
Notes for google cloud data engineer certification.
Notes for google cloud data engineer certification.
==Technology stack==
The following list is based on the sample case study for the GCDE certification exam: https://cloud.google.com/certification/guides/data-engineer/casestudy-flowlogistic
The case study focuses on a logistics company tracking orders and shipments via rail, truck, aircraft, and ships.
Goals:
* Implement real-time inventory tracking system that tracks locations
* Perform data analytics on order and shipment logs (structured/unstructured data) to make decisions about deploying resources, targeting customers, and expanding into markets
* Predict delays in shipments
Requirements:
* Reliable, reproducible environment that scales
* Aggregated data in centralized data lake
* Historical data used to perform predictive analytics on future shipments
* Accurate tracking of worldwide shipments (proprietary technology)
* Improvement of business agility and speed of innovation via rapid provisioning of new resources
* Analysis and optimization for performance in the cloud
* Migration to cloud, if all other requirements met
Data center description:
Databases:
* SQL DB storing user data, static data
* Cassandra DB storing metadata, tracking messages
* Kafka servers tracking message aggregation and batch insert
Applications:
* Customer frontend, middleware for orders and customs
* Tomcat for Java services
* Nginx for static content
* Batch servers (?)
Storage:
* iSCSI (internet small-computer-system interface) to manage VM hosts
* Fiber channel network for SQL server storage
* NAS (network attached storage) for image storage, logs, and backups
Analytics:
* Hadoop/Spark servers
* Core data lake
* Data analysis workloads
Miscellaneous servers:
* Jenkins
* Monitoring of servers
* Bastion hosts
* Security scanners
* Billing software

Revision as of 23:57, 11 September 2017

Notes for google cloud data engineer certification.

Technology stack

The following list is based on the sample case study for the GCDE certification exam: https://cloud.google.com/certification/guides/data-engineer/casestudy-flowlogistic

The case study focuses on a logistics company tracking orders and shipments via rail, truck, aircraft, and ships.

Goals:

  • Implement real-time inventory tracking system that tracks locations
  • Perform data analytics on order and shipment logs (structured/unstructured data) to make decisions about deploying resources, targeting customers, and expanding into markets
  • Predict delays in shipments

Requirements:

  • Reliable, reproducible environment that scales
  • Aggregated data in centralized data lake
  • Historical data used to perform predictive analytics on future shipments
  • Accurate tracking of worldwide shipments (proprietary technology)
  • Improvement of business agility and speed of innovation via rapid provisioning of new resources
  • Analysis and optimization for performance in the cloud
  • Migration to cloud, if all other requirements met

Data center description:

Databases:

  • SQL DB storing user data, static data
  • Cassandra DB storing metadata, tracking messages
  • Kafka servers tracking message aggregation and batch insert

Applications:

  • Customer frontend, middleware for orders and customs
  • Tomcat for Java services
  • Nginx for static content
  • Batch servers (?)

Storage:

  • iSCSI (internet small-computer-system interface) to manage VM hosts
  • Fiber channel network for SQL server storage
  • NAS (network attached storage) for image storage, logs, and backups

Analytics:

  • Hadoop/Spark servers
  • Core data lake
  • Data analysis workloads

Miscellaneous servers:

  • Jenkins
  • Monitoring of servers
  • Bastion hosts
  • Security scanners
  • Billing software