|
|
| (5 intermediate revisions by the same user not shown) |
| Line 1: |
Line 1: |
| ==Review== | | ==Review Notes Pages== |
|
| |
|
| Review in preparation for interview:
| | [[Google Cloud/Scientific Data Processing]] - doing the scientific data processing qwiklab |
| * Components of workflow and open source tools for each step
| |
| * Highlight each step with a data engineering repository
| |
| * Individual services offered on the cloud - know the idea behind, e.g., why so many database solutions
| |
| * What specific challenges, software, workflows do genomics researchers face/use?
| |
|
| |
|
| ===Review Process===
| | Review page: [[Google Cloud/Review]] |
|
| |
|
| Case study
| | ==Data Engineering Scenarios Review== |
| * Start by reviewing the logistics company case study
| |
| * https://charlesreid1.com/wiki/Google_Cloud/Case_Study
| |
|
| |
|
| Software tools
| | Project 1: [[2018/January/Data Engineering/Scientific Data Processing]] |
| * Basic software technologies: storage, databases, distributed computation, GPUs vs CPUs, Docker/containerization
| |
| * https://charlesreid1.com/wiki/Google_Cloud
| |
| * Google Cloud Genomics
| |
|
| |
|
| Software Quality Assurance
| | Project 2: [[2018/January/Data Engineering/Big Data Text Processing]] |
| * Github pages/10 things list (time machine)
| |
| | |
| GCDEC Review:
| |
| * 1 - https://charlesreid1.com/wiki/GCDEC/Fundamentals/Notes
| |
| * 2 - https://charlesreid1.com/wiki/GCDEC/Unstructured_Data/Notes
| |
| * 3a - https://charlesreid1.com/wiki/GCDEC/BigQuery/Notes
| |
| * 3b - https://charlesreid1.com/wiki/GCDEC/Dataflow/Notes
| |
| * 4a - https://charlesreid1.com/wiki/GCDEC/Building_Tensorflow/Notes
| |
| * 4b - https://charlesreid1.com/wiki/GCDEC/Deploying_Tensorflow/Notes
| |
| * 4c - https://charlesreid1.com/wiki/GCDEC/Engineering_Tensorflow/Notes
| |
| * 5 - https://charlesreid1.com/w/index.php?title=GCDEC/Streaming/Notes&action=edit&redlink=1
| |
| | |
| ===Examples===
| |
| | |
| Google Codelabs:
| |
| * https://codelabs.developers.google.com/
| |
| * Kubernetes and Container Engine - https://codelabs.developers.google.com/codelabs/cloud-compute-kubernetes/index.html?index=..%2F..%2Findex#0
| |
| * Process Astronomy Data to Generate Images - https://codelabs.developers.google.com/codelabs/cloud-compute-the-cosmos/index.html?index=..%2F..%2Findex#0
| |
| * Kubernetes for Java apps - https://codelabs.developers.google.com/codelabs/cloud-springboot-kubernetes/index.html?index=..%2F..%2Findex#0
| |
| * Google Cloud Storage - https://codelabs.developers.google.com/codelabs/es003l-storage/index.html?index=..%2F..%2Findex
| |
| * Campaign finance with bigquery - https://codelabs.developers.google.com/codelabs/cloud-bq-campaign-finance/index.html?index=..%2F..%2Findex#0
| |
| * Text processing with big data - https://codelabs.developers.google.com/codelabs/cloud-dataflow-starter/index.html?index=..%2F..%2Findex#0
| |
| * Recommendations ML - https://codelabs.developers.google.com/codelabs/cloud-accelerate-dataproc/index.html?index=..%2F..%2Findex#0
| |
| * Spark + OpenCV - https://codelabs.developers.google.com/codelabs/cloud-dataproc-opencv/index.html?index=..%2F..%2Findex
| |
| * Speech to Text - https://codelabs.developers.google.com/codelabs/cloud-speech-intro/index.html?index=..%2F..%2Findex#0
| |
| * Translate Text - https://codelabs.developers.google.com/codelabs/cloud-translation-intro/index.html?index=..%2F..%2Findex#0
| |
| | |
| Google Quiklabs:
| |
| * Google Cloud Platform essentials - https://google.qwiklabs.com/quests/23?locale=en
| |
| * Scientific data processing - https://google.qwiklabs.com/quests/28?locale=en
| |
| * Data engineering - https://google.qwiklabs.com/quests/25?locale=en
| |
|
| |
|
| | Project 3: [[2018/January/Data Engineering/Cosmos]] |
|
| |
|
| | ==Flags== |
|
| |
|
| [[Category:Google Cloud]] | | [[Category:Google Cloud]] |
| [[Category:Data Engineering]] | | [[Category:Data Engineering]] |