<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://charlesreid1.com/w/index.php?action=history&amp;feed=atom&amp;title=GCDE%2FOutline_of_Topics</id>
	<title>GCDE/Outline of Topics - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://charlesreid1.com/w/index.php?action=history&amp;feed=atom&amp;title=GCDE%2FOutline_of_Topics"/>
	<link rel="alternate" type="text/html" href="https://charlesreid1.com/w/index.php?title=GCDE/Outline_of_Topics&amp;action=history"/>
	<updated>2026-06-20T06:41:57Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.39.12</generator>
	<entry>
		<id>https://charlesreid1.com/w/index.php?title=GCDE/Outline_of_Topics&amp;diff=21777&amp;oldid=prev</id>
		<title>Admin: Created page with &quot;=Outline=  The Google Cloud Data Engineering Certification exam guide is pretty hefty. The entire contents are given here: https://cloud.google.com/certification/guides/data-e...&quot;</title>
		<link rel="alternate" type="text/html" href="https://charlesreid1.com/w/index.php?title=GCDE/Outline_of_Topics&amp;diff=21777&amp;oldid=prev"/>
		<updated>2017-10-18T06:41:53Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;=Outline=  The Google Cloud Data Engineering Certification exam guide is pretty hefty. The entire contents are given here: https://cloud.google.com/certification/guides/data-e...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;=Outline=&lt;br /&gt;
&lt;br /&gt;
The Google Cloud Data Engineering Certification exam guide is pretty hefty. The entire contents are given here: https://cloud.google.com/certification/guides/data-engineer/#sample-case-study&lt;br /&gt;
&lt;br /&gt;
This page will contain some notes on the different sections of the exam guide, based on the Coursera course and my own experience.&lt;br /&gt;
&lt;br /&gt;
==Section 1: Designing Data Processing Systems==&lt;br /&gt;
&lt;br /&gt;
Main topics:&lt;br /&gt;
* Design of flexible data representations&lt;br /&gt;
* Design of data pipelines&lt;br /&gt;
* Design of data processing architecture&lt;br /&gt;
&lt;br /&gt;
Here are some specific considerations in this category:&lt;br /&gt;
* Future advances in data technology&lt;br /&gt;
* Changes to business requirements&lt;br /&gt;
* Current state, potential future states&lt;br /&gt;
* Potential future migrations&lt;br /&gt;
* Tradeoffs&lt;br /&gt;
* Availability&lt;br /&gt;
* Distributed systems&lt;br /&gt;
* Designing data schema&lt;br /&gt;
&lt;br /&gt;
==Section 2: Building and Maintaining Data==&lt;br /&gt;
&lt;br /&gt;
Focus: data structures and databases&lt;br /&gt;
&lt;br /&gt;
Main topics:&lt;br /&gt;
* Flexible data representations&lt;br /&gt;
* Data pipelines&lt;br /&gt;
* Data processing infrastructure&lt;br /&gt;
&lt;br /&gt;
Considerations in this category:&lt;br /&gt;
* Data cleaning&lt;br /&gt;
* Batch vs. streaming data processing&lt;br /&gt;
* Transformation of data&lt;br /&gt;
* Acquisition of data&lt;br /&gt;
* Importing data&lt;br /&gt;
* Quality control of data&lt;br /&gt;
* New data sources&lt;br /&gt;
* Resources needed for data processing&lt;br /&gt;
* Monitoring of pipelines&lt;br /&gt;
* Adjustment of pipelines&lt;br /&gt;
* Quality control of pipelines&lt;br /&gt;
&lt;br /&gt;
==Section 3: Data Analysis and Machine Learning==&lt;br /&gt;
&lt;br /&gt;
Main topics:&lt;br /&gt;
* Data analysis&lt;br /&gt;
* Machine learning&lt;br /&gt;
* Deploying machine learning models&lt;br /&gt;
&lt;br /&gt;
Considerations in this section include:&lt;br /&gt;
* Collecting data&lt;br /&gt;
* Visualizing data&lt;br /&gt;
* Reducing the dimension of data&lt;br /&gt;
* Cleaning and normalizing data&lt;br /&gt;
* Defining what &amp;quot;success&amp;quot; means&lt;br /&gt;
* Defining other metrics&lt;br /&gt;
* Feature selection&lt;br /&gt;
* Algorithm selection&lt;br /&gt;
* Model debugging&lt;br /&gt;
* Cost vs. performance&lt;br /&gt;
* Online learning&lt;br /&gt;
&lt;br /&gt;
==Section 4: Modeling Business Processes==&lt;br /&gt;
&lt;br /&gt;
Main topics:&lt;br /&gt;
* Transforming business requirements into data representations&lt;br /&gt;
* Optimizing data representations, infrastructure, performance, cost&lt;br /&gt;
&lt;br /&gt;
Considerations in this topic are:&lt;br /&gt;
* Working with business people&lt;br /&gt;
* Working with users&lt;br /&gt;
* Getting business requirements&lt;br /&gt;
* Knowing the scale of resources required&lt;br /&gt;
* Knowing what data cleaning to do&lt;br /&gt;
* How to implement high performance algorithms&lt;br /&gt;
* Common sources of error (e.g., selection bias)&lt;br /&gt;
* How to remove error&lt;br /&gt;
&lt;br /&gt;
==Section 5: Reliability==&lt;br /&gt;
&lt;br /&gt;
Main topics are:&lt;br /&gt;
* Quality control&lt;br /&gt;
* Assessment, troubleshooting and improvement of infrastructure&lt;br /&gt;
* Assessment, troubleshooting and improvement of models&lt;br /&gt;
* Recovering data&lt;br /&gt;
&lt;br /&gt;
This includes knowing and doing the following:&lt;br /&gt;
* Verification of data&lt;br /&gt;
* Test suites&lt;br /&gt;
* Pipeline monitoring&lt;br /&gt;
* Planning for fault-tolerance&lt;br /&gt;
* Planning for execution on failure (retroactive analysis, re-running failedjobs)&lt;br /&gt;
* Stress testing&lt;br /&gt;
* Plan for failure&lt;br /&gt;
&lt;br /&gt;
==Section 6: Visualizing Data==&lt;br /&gt;
&lt;br /&gt;
Main topics are:&lt;br /&gt;
* Building data viz tools&lt;br /&gt;
* Publishing data&lt;br /&gt;
* Reporting on data&lt;br /&gt;
&lt;br /&gt;
Considerations:&lt;br /&gt;
* Automating visualization and report generation&lt;br /&gt;
* Supporting decision-making&lt;br /&gt;
* Summarizing data&lt;br /&gt;
* Reporting on data fidelity, data trackability, data integrity&lt;br /&gt;
&lt;br /&gt;
==Section 7: Design for Security and Compliance==&lt;br /&gt;
&lt;br /&gt;
This is the section I&amp;#039;m least familiar with.&lt;br /&gt;
&lt;br /&gt;
Main topics: &lt;br /&gt;
* Secure data infrastructure&lt;br /&gt;
* Legal compliance in data handling&lt;br /&gt;
&lt;br /&gt;
Specifically:&lt;br /&gt;
* Identity/access management (IAM)&lt;br /&gt;
* Data security&lt;br /&gt;
* Performing penetration testing&lt;br /&gt;
* Need-to-know/separation of responsibility&lt;br /&gt;
* Implementing proper security controls&lt;br /&gt;
* Knowing relevant legislation&lt;br /&gt;
* Preparation for audits&lt;br /&gt;
&lt;br /&gt;
Relevant legislation includes:&lt;br /&gt;
* HIPPA (Health Ins. Portability and Accountability Act)&lt;br /&gt;
* COPPA (Children&amp;#039;s Online Privacy Protection Act)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Flags=&lt;br /&gt;
&lt;br /&gt;
[[Category:Data Engineering]]&lt;br /&gt;
[[Category:Google Cloud]]&lt;br /&gt;
[[Category:ML]]&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
</feed>