From charlesreid1

No edit summary
No edit summary
Line 6: Line 6:
* Tutorials/Guides/Resources for all of Google Cloud: https://cloud.google.com/solutions/
* Tutorials/Guides/Resources for all of Google Cloud: https://cloud.google.com/solutions/


==Goals and Motivation==
==Case Study==


Goals:
[[Google Cloud/Case Study]]
* Implement real-time inventory tracking system that tracks locations
* Perform data analytics on order and shipment logs (structured/unstructured data) to make decisions about deploying resources, targeting customers, and expanding into markets
* Predict delays in shipments


Requirements:
* Reliable, reproducible environment that scales
* Aggregated data in centralized data lake
* Historical data used to perform predictive analytics on future shipments
* Accurate tracking of worldwide shipments (proprietary technology)
* Improvement of business agility and speed of innovation via rapid provisioning of new resources
* Analysis and optimization for performance in the cloud
* Migration to cloud, if all other requirements met
Deeper reasoning:
* Inability to upgrade infrastructure hampering growth and efficiency
* Ineffective at moving data around
* Need to better understand where/who customers are, what they are shipping
* IT is too busy managing infrastructure to organize data/build analytics/implement tracking technology
* Penalties for late shipments and deliveries translates into direct correlation between profitability and bottom line
==Technology Stack==
Databases:
* SQL DB storing user data, static data
* [[Cassandra]] DB storing metadata, tracking messages
* [[Kafka]] servers tracking message aggregation and batch insert
Applications:
* Customer frontend, middleware for orders and customs
* [[Tomcat]] for Java services
* [[Nginx]] for static content
* Batch servers (?)
Storage:
* iSCSI (internet small-computer-system interface) to manage VM hosts
* Fiber channel network for SQL server storage
* NAS (network attached storage) for image storage, logs, and backups
Analytics:
* [[Hadoop]]/[[Spark]] servers
* Core data lake
* Data analysis workloads
Miscellaneous servers:
* [[Jenkins]]
* Monitoring of servers
* Bastion hosts
* Security scanners
* Billing software
==Using Google Cloud==
Databases:
* MySQL: Google Cloud offers the Cloud SQL service, and you can allocate a specific compute instance to run a MySQL (or Postgresql) server.
** See [[MySQL]]
** See [[Google Cloud/MySQL]]
* Cassandra: Google Cloud Launcher has several pre-configured solutions for different packages, including one for Cassandra.
** See [[Cassandra]]
** See [[Google Cloud/Cassandra]]
* Kafka: as with Cassandra, preconfigured Kafka instances are available through the Google Cloud Launcher.
** See [[Kafka]]
** See [[Google Cloud/Kafka]]
Note: there is a huge list of all possible Google Cloud products to help figure out what products are used for what technologies.
List of Google Cloud products: https://cloud.google.com/products/
List of Google Cloud Launcher preconfigured machines: https://console.cloud.google.com/launcher





Revision as of 17:32, 15 September 2017

Notes for Google Cloud Data Engineer (GCDE) certification. See GCDE.

Links:

Case Study

Google Cloud/Case Study