From charlesreid1

No edit summary
No edit summary
Line 64: Line 64:
Databases:
Databases:
* MySQL: Google Cloud offers the Cloud SQL service, and you can allocate a specific compute instance to run a MySQL (or Postgresql) server.
* MySQL: Google Cloud offers the Cloud SQL service, and you can allocate a specific compute instance to run a MySQL (or Postgresql) server.
** See [[MySQL]]
** See [[Google Cloud/MySQL]]
** See [[Google Cloud/MySQL]]
* Cassandra: Google Cloud Launcher has several pre-configured solutions for different packages, including one for Cassandra.
* Cassandra: Google Cloud Launcher has several pre-configured solutions for different packages, including one for Cassandra.
** See [[Cassandra]]
** See [[Google Cloud/Cassandra]]
** See [[Google Cloud/Cassandra]]
* Kafka: as with Cassandra, preconfigured Kafka instances are available through the Google Cloud Launcher.
* Kafka: as with Cassandra, preconfigured Kafka instances are available through the Google Cloud Launcher.
** See [[Kafka]]
** See [[Google Cloud/Kafka]]
** See [[Google Cloud/Kafka]]


Note: there is a huge list of all possible Google Cloud products to help figure out what products are used for what technologies.
Note: there is a huge list of all possible Google Cloud products to help figure out what products are used for what technologies.

Revision as of 22:37, 12 September 2017

Notes for google cloud data engineer certification.

The following list is based on the sample case study for the GCDE certification exam: [1]

The case study focuses on a logistics company tracking orders and shipments via rail, truck, aircraft, and ships.

Also see the tutorials/guides/resources/use cases here: https://cloud.google.com/solutions/

Goals and Motivation

Goals:

  • Implement real-time inventory tracking system that tracks locations
  • Perform data analytics on order and shipment logs (structured/unstructured data) to make decisions about deploying resources, targeting customers, and expanding into markets
  • Predict delays in shipments

Requirements:

  • Reliable, reproducible environment that scales
  • Aggregated data in centralized data lake
  • Historical data used to perform predictive analytics on future shipments
  • Accurate tracking of worldwide shipments (proprietary technology)
  • Improvement of business agility and speed of innovation via rapid provisioning of new resources
  • Analysis and optimization for performance in the cloud
  • Migration to cloud, if all other requirements met

Deeper reasoning:

  • Inability to upgrade infrastructure hampering growth and efficiency
  • Ineffective at moving data around
  • Need to better understand where/who customers are, what they are shipping
  • IT is too busy managing infrastructure to organize data/build analytics/implement tracking technology
  • Penalties for late shipments and deliveries translates into direct correlation between profitability and bottom line

Technology Stack

Databases:

  • SQL DB storing user data, static data
  • Cassandra DB storing metadata, tracking messages
  • Kafka servers tracking message aggregation and batch insert

Applications:

  • Customer frontend, middleware for orders and customs
  • Tomcat for Java services
  • Nginx for static content
  • Batch servers (?)

Storage:

  • iSCSI (internet small-computer-system interface) to manage VM hosts
  • Fiber channel network for SQL server storage
  • NAS (network attached storage) for image storage, logs, and backups

Analytics:

  • Hadoop/Spark servers
  • Core data lake
  • Data analysis workloads

Miscellaneous servers:

  • Jenkins
  • Monitoring of servers
  • Bastion hosts
  • Security scanners
  • Billing software

Using Google Cloud

Databases:

  • MySQL: Google Cloud offers the Cloud SQL service, and you can allocate a specific compute instance to run a MySQL (or Postgresql) server.
  • Cassandra: Google Cloud Launcher has several pre-configured solutions for different packages, including one for Cassandra.
  • Kafka: as with Cassandra, preconfigured Kafka instances are available through the Google Cloud Launcher.


Note: there is a huge list of all possible Google Cloud products to help figure out what products are used for what technologies.

List of Google Cloud products: https://cloud.google.com/products/

List of Google Cloud Launcher preconfigured machines: https://console.cloud.google.com/launcher