From charlesreid1

Line 51: Line 51:


===Scaling Data Analysis===
===Scaling Data Analysis===
(Transformational use cases)


Datalab
Datalab
Line 56: Line 58:
Datastore
Datastore


BigTable
BigTable (fast random access, tradeoffs between consistency and availability)


BigQuery (query petabytes in seconds)
BigQuery (query petabytes in seconds)


TensorFlow
TensorFlow (distributed in the cloud over very large data sets)


Demand forecasting with machine learning
Demand forecasting with machine learning

Revision as of 17:49, 15 September 2017

Notes for Google Cloud Data Engineer (GCDE) certification. See GCDE.

Links:

Case Study

The GCDEC page gives an example of a case study that can be used to see how different parts of the Google Cloud platform come together in the kind of scenario a real company might face. The case study focuses on a logistics company that delivers packages and tracks the deliveries with servers, software, and other infrastructure already in-place. The company's goal is to improve their computational infrastructure by moving parts of it to the cloud, and implement the ability to predict late shipments.

Google Cloud/Case Study

Google Cloud Services

Notes on all of the various parts of the Google Cloud platform and the services available on it.

Introduction

Google Cloud for Big Data

  • MapReduce
  • Spark
  • BigQuery

Usage scenarios

Foundations

Compute and Storage

Data ingestion

Data storage

Federated analysis

Compute engine

Cloud storage

Data Analytics

Cloud SQL - relational database

Dataproc for machine learning

  • Bigtop ecosystem:
  • Pig
  • Spark
  • Hive
  • Hadoop

Scaling Data Analysis

(Transformational use cases)

Datalab

Datastore

BigTable (fast random access, tradeoffs between consistency and availability)

BigQuery (query petabytes in seconds)

TensorFlow (distributed in the cloud over very large data sets)

Demand forecasting with machine learning

Data Processing Architectures

PubSub

Dataflow