From charlesreid1

No edit summary
Line 60: Line 60:
* Security scanners
* Security scanners
* Billing software
* Billing software
Note: there is a huge list of all possible Google Cloud products to help figure out what products are used for what technologies [https://cloud.google.com/products/]




[[Category:Google Cloud]]
[[Category:Google Cloud]]

Revision as of 01:08, 12 September 2017

Notes for google cloud data engineer certification.

The following list is based on the sample case study for the GCDE certification exam: https://cloud.google.com/certification/guides/data-engineer/casestudy-flowlogistic

The case study focuses on a logistics company tracking orders and shipments via rail, truck, aircraft, and ships.

Also see the tutorials/guides/resources/use cases here: https://cloud.google.com/solutions/

Goals and Motivation

Goals:

  • Implement real-time inventory tracking system that tracks locations
  • Perform data analytics on order and shipment logs (structured/unstructured data) to make decisions about deploying resources, targeting customers, and expanding into markets
  • Predict delays in shipments

Requirements:

  • Reliable, reproducible environment that scales
  • Aggregated data in centralized data lake
  • Historical data used to perform predictive analytics on future shipments
  • Accurate tracking of worldwide shipments (proprietary technology)
  • Improvement of business agility and speed of innovation via rapid provisioning of new resources
  • Analysis and optimization for performance in the cloud
  • Migration to cloud, if all other requirements met

Deeper reasoning:

  • Inability to upgrade infrastructure hampering growth and efficiency
  • Ineffective at moving data around
  • Need to better understand where/who customers are, what they are shipping
  • IT is too busy managing infrastructure to organize data/build analytics/implement tracking technology
  • Penalties for late shipments and deliveries translates into direct correlation between profitability and bottom line

Technology Stack

Databases:

  • SQL DB storing user data, static data
  • Cassandra DB storing metadata, tracking messages
  • Kafka servers tracking message aggregation and batch insert

Applications:

  • Customer frontend, middleware for orders and customs
  • Tomcat for Java services
  • Nginx for static content
  • Batch servers (?)

Storage:

  • iSCSI (internet small-computer-system interface) to manage VM hosts
  • Fiber channel network for SQL server storage
  • NAS (network attached storage) for image storage, logs, and backups

Analytics:

  • Hadoop/Spark servers
  • Core data lake
  • Data analysis workloads

Miscellaneous servers:

  • Jenkins
  • Monitoring of servers
  • Bastion hosts
  • Security scanners
  • Billing software

Note: there is a huge list of all possible Google Cloud products to help figure out what products are used for what technologies [1]