CSC 142/Chapter 6: Difference between revisions
From charlesreid1
| (14 intermediate revisions by the same user not shown) | |||
| Line 30: | Line 30: | ||
==Section 6.1: File Reading Basics== | ==Section 6.1: File Reading Basics== | ||
===Definitions=== | ===6.1 Definitions=== | ||
Definitions; | Definitions; | ||
| Line 40: | Line 40: | ||
* Throws clause | * Throws clause | ||
===Material=== | ===6.1 Material=== | ||
Examples of the deluge of data available: | Examples of the deluge of data available: | ||
| Line 107: | Line 107: | ||
==Section 6.2: Token-Based Processing== | ==Section 6.2: Token-Based Processing== | ||
===Definitions=== | ===6.2 Definitions=== | ||
Definitions: | Definitions: | ||
| Line 116: | Line 116: | ||
* Current directory | * Current directory | ||
===Material=== | ===6.2 Material=== | ||
Token - a single chunk of letters or character data | Token - a single chunk of letters or character data | ||
| Line 191: | Line 191: | ||
==Section 6.3: Line Based Processing== | ==Section 6.3: Line Based Processing== | ||
===Material=== | ===6.3 Material=== | ||
Line by line: | Line by line: | ||
| Line 220: | Line 220: | ||
==Section 6.4: Advanced File Processing== | ==Section 6.4: Advanced File Processing== | ||
===Definitions=== | ===6.4 Definitions=== | ||
Definitions: | Definitions: | ||
* Boilerplate code | * Boilerplate code | ||
===Material=== | ===6.4 Material=== | ||
Output files with printstream: | Output files with printstream: | ||
| Line 231: | Line 231: | ||
* Just like with scanner, we create a File object first | * Just like with scanner, we create a File object first | ||
* Then we createa PrintStream object | * Then we createa PrintStream object | ||
* println() prints line to file | |||
Example: Hello File | Example: Hello File | ||
| Line 239: | Line 239: | ||
* Read in tokens, print them out with correct whitespace | * Read in tokens, print them out with correct whitespace | ||
Generalization: syntax "output.println()" and "system.out.println()" look similar because they are similar | |||
We can tie the PrintStream object to the output console, or to a file, and it all works the same | We can tie the PrintStream object to the output console, or to a file, and it all works the same | ||
| Line 250: | Line 250: | ||
==Section 6.5: Zip Code Lookup Case Study== | ==Section 6.5: Zip Code Lookup Case Study== | ||
===Material=== | ===6.5 Material=== | ||
(Dating algorithms??? Really? Social justice.) | (Dating algorithms??? Really? Social justice.) | ||
| Line 326: | Line 326: | ||
==Chapter 6 Homework== | |||
===HW Questions=== | |||
(Recommended) Self-check problems: #3, #8, #11, #12, #17, #18 | |||
(Required) Exercises: #8, #12, #15 | |||
(Required) Projects: #2 | |||
===HW Details=== | |||
Self-check: | |||
* 3, 8 - correct scanner syntax | |||
* 11 - finding mistakes | |||
* 12 - reading tokens from input file with scanner | |||
* 17 - take a line of text and put a box around it | |||
* 18 - object used to write to output files, and methods available | |||
Exercises: | |||
* 8 - double space | |||
* 12 - html tag strip | |||
* 15 - read file of heads/tails and compute statistics | |||
Projects: | |||
* 2 - file diff utility | |||
==Chapter 6 Code== | ==Chapter 6 Code== | ||
| Line 331: | Line 357: | ||
===Lecture Code=== | ===Lecture Code=== | ||
PublicSchools - tokenize CSV input file using a scanner | |||
* | * using csv data about public school location/information from Seattle Open Data: https://data.seattle.gov/ | ||
* | * Token scanner only | ||
* CSV as first example | |||
* Extract particular field from each row to print it out | |||
* Could use a Scanner... but that's pretty inflexible. Kind of like, first-thing-that-you-grab-for. | |||
* Better: ask what we really want to do... we want to tokenize strings... so see if Java standard library provides a class for that | |||
* Turns out, it does: StringTokenizer [https://docs.oracle.com/javase/7/docs/api/java/util/StringTokenizer.html] | |||
<pre> | |||
StringTokenizer st = new StringTokenizer("this is a test"); | |||
while (st.hasMoreTokens()) { | |||
System.out.println(st.nextToken()); | |||
} | |||
</pre> | |||
Reges and Stepp | ZipCode - zip code search and finding | ||
* | * Reges and Stepp | ||
* Read property from file | |||
* Read other properteis from file | |||
* Compare to original property | |||
* Conditionally print out | |||
===Worksheet Code=== | ===Worksheet Code=== | ||
Public School zipcode | |||
* Utilizing City of Seattle data about public schools | |||
* Utilizing code in textbook - zip code case study | |||
* Given a public school, or an integer index, what are nearby schools | |||
==Chapter 6 Goodies== | ==Chapter 6 Goodies== | ||
=== | ===Puzzle 6=== | ||
Affine cipher ax+b, gcd modular arithmetic | |||
[[Puzzles/Crypto Level 1/Puzzle 6]] | |||
===Profiles=== | ===Profiles=== | ||
Latest revision as of 00:42, 30 September 2016
Chapter 6: File Processing
Sections:
6.1 File reading basics
6.2 Token based processing
6.3 Line based processing
6.4 Advanced file processing
6.5 Case study: zip code lookup
Chapter 3 focused on a scanner for user input. Chapter 6 focuses on a scanner for file reading.
Many intro programming classes see this as a complicated topic, and Java doesn't make it easy. It's awkward, but it's manageable.
We will also explore exceptions relate to file processing.
(Python makes this a dream.)
with open('data.txt','r') as f:
lines = f.readlines()
Done.
Section 6.1: File Reading Basics
6.1 Definitions
Definitions;
- File
- File extension
- binary
- ASCII
- Checked exception
- Throws clause
6.1 Material
Examples of the deluge of data available:
- Landmark-project (earthquakes, pollution, baserball, history, weather, etc)
- Gutenberg - see ciphertexts
- ncbi.nlm.nih.gov - biological/genomic data
- IMDB
- Fedstats.gov
- US census
- World bank
- CIA world factbook
Files and file objects:
- Data stored on computer as files
- Files have extensions
- Files can be stored as text, or as binary
- To deal with a file, use a File object
- This provides various methods
- Java API lookup/reference
- Note: we aren't constructing a NEW FILE, we're constructing a new object to represent an existing file
Reading files with scanner:
- Useful methods of File objects: (see list)
- File object is like a pipe: doesn't care much about what kind of fluid flowing thru, or where it comes from
- File object is the delivery system
- You can then pas sthe File object into a scanner
- Again, scanner is like nozzle at end of pipe - does not care much about File type or details of File object, just like nozzle doesn't care about type of fluid
- Need to deal with potential problems; file not there
- Checked exception - like "check" in chess
- Must be dealt with (can't just say, ignore and keep going)
- To handle this exception, put the code that may cause the error into a throws clause
Throws clause: diapers for your code
More in throws/catch clauses:
- You're anticipating a particular kind of mess
- Like an if statement, for exceptions
- If we see this kind of exception, catch it this particular way
public static void main (String[] args) throws FileNotFoundException {
...
}
Other exceptions:
- If you reach the end of a file, then ask for more
- NoSuchElementException
A word on the correct way:
Scanner input = new Scanner(new File("hamlet.txt"))
versus the incorrect way:
Scanner input = new Scanner("hamlet.txt")
(Latter would be like saying, a file with the literal contents "hamlet.txt")
NOTE: This is overloading in action (Scanner can take multiple data types)
Section 6.2: Token-Based Processing
6.2 Definitions
Definitions:
- Token-baesd processing
- Input cursor
- Consuming input
- File path
- Current directory
6.2 Material
Token - a single chunk of letters or character data
- Usually WORDS separated by SPACES
- But could also be NUMBERS separated by COMMAS
- Or, other stuff...
Example: file with 5 numbers
- Read in the first 5 numbers
- Cumulative sum of first 5 numbers
- don't forget the throws
Output:
- Program outputs sum as 337.19999999 instead of 337.2
Utilize scanner functions:
- Scanners have next() and nextDouble() and etc to read next values
Structure of files:
- Computer sees a one-dimensional stream of characters: everything else is our own invention (e.g., line breaks are ignored so computer doesn't even see lines)
- Scanner handles details of, e.g., what to do when it gets to a newline char or a number char
Exceptions from wrong data type:
- InputMismatchException
- Pay close attention to errors: not clear, but provide you with hints
Moving through a file:
- Comptuer sees 1D stream of text
- Can't jump around - like a VCR tape
- So, current location/position is important (input cursor)
- Cursor moves down one char at a time
- Scanner handles details:
- nextFloat() knows what to look for
- advances cursor to next word
Scanner object info:
- if we repeatedly call Scanner, it doesn't reset the cursor
- one scanner --> one File, one position
- processTokens(input,2) --> first 2 tokens
- processTokens(input,3) --> processes tokens 3, 4, and 5 (not 1, 2, 3)
etc.
Paths and directories:
- Organization of files: uses directory structure
- Root directory: top level (C:\ or /)
- If no path specified, look in current directory (where Java is being run from)
- If full path is specified, look for the file
- If relative path, specific location to look, starting from current directory
- Slashes: can use C:\\Windows\\ etc or can use C:/Windows/etc
Example: 2 scanners
- One scanner for user input
- One scanner for file
- Scanner deals with backslashes/escaping backslashes just fine (again - abstract away details, just take care of it)
Example: Complex input file
- Multiple columns, 1st column name, remainder numeric
- File processing will use while loops
- Identify things you do want to generalize
- What things do you know ahead of time - things you DON'T want to generalize
- Things you may not know ahead of time (e.g., number of columns) - generalize
- Example: we know number of columns... we don't know number of lines.
- NOTE: This example does a poor job of explaining this distinction, WHAT to abstract and WHEN
- Use while loop (while hasDouble()) to get column data
- Better way to pose this problem:
- Present a general scenario: (STRING) (SET OF AT LEAST 1 NUMBERS0
- Each line has some data, so find the totals for each line
- THEN, you can raise the question: how many numbers?
- Does each row have same number of numbers?
- If so, line-based processing
- Otherwise, token-based processing
Section 6.3: Line Based Processing
6.3 Material
Line by line:
- Rather than deal with tokens, may want to deal with lines
- Scanner has nextLine() and hasNextLine() methods
- uses toUpperCase() method to turn a file into uppercase
- Choice of lines vs tokens also depends on whether whitespace is important (example: poem, vs CSV file)
String scanners, line/token combos:
- Can combine line and token parsing
- Example: modify employees file so now it has an employee ID number out in front
- Need to deal more gracefully with this change
- Pseudocode
for each line of file:
split into tokens
for each token in file:
token 1 = xxx, token 2 = yyy, etc
Generalizing:
- Here again, we ask: what can we generalize, and what do we assume we always know?
- We can generalize the column layout: the thing that changes i s the number of employees or who the employees are
- One monster scanner for the whole file, line by line
- Lots of mini scanners, 1 scanner per line, to turn the line string into tokens for processing
Section 6.4: Advanced File Processing
6.4 Definitions
Definitions:
- Boilerplate code
6.4 Material
Output files with printstream:
- Can read from files, can also write files
- Just like with scanner, we create a File object first
- Then we createa PrintStream object
- println() prints line to file
Example: Hello File
- Modify hello world to write to file
Example: remove whitespace
- Read in tokens, print them out with correct whitespace
Generalization: syntax "output.println()" and "system.out.println()" look similar because they are similar
We can tie the PrintStream object to the output console, or to a file, and it all works the same
Error handling/ensuring files readable:
- Particularly when taking input, important to ensure we can operate on files before actually operating on files
- If user inputs invalid filename, could crash program - or could just ask again
- Half-fencepost design, ask for input, then check and ask for input again if invalid
Section 6.5: Zip Code Lookup Case Study
6.5 Material
(Dating algorithms??? Really? Social justice.)
Introduce the problem, with background
- File contains data of form:
- 3 lines per city
ZIP CITY, STATE LAT LONG
Program should do the following:
- Introduce program
- Ask for user input
- Find coordinates for target zip code
- Display nearby zip codes
Break up the problem: start with last 2 steps (hardest)
How to find coordinates, for a given zip code?
- Step 1: find it (loop through file, 3 lines at a time)
- Step 2: return it (return the line with lat/long on it)
- All code into self-contained method
- Also deal with exceptions (zip not found)
How to find neighbor coordinates?
- Step 1: find it (same approach: loop through, 3 lines at a time)
- Step 2: print it
Define new method, show matches, using found lat1/long1
- For each city, determine lat/long
- Compute distance from target
- If threshold, print whatever info we need to print
Final program structure:
- main() method, asks for input, gets file scanners
- giveIntro() method
- find() method to find target zip
- showMatches() method to find matches nearby
- distance() method to calculate distance algorithm between lat1/long1 and lat2/long2
Chapter 6 Summary
Deliverables
File reading:
- Purpose of scaners
- Tokens vs lines, when to use
- Syntax required, file object
- Exceptions
Moving through files: tokenization
- One scanner = 1 file and 1 cursor
- Paths/directories
- Complex input files and strategies
- nextDouble() etc
Moving through files: line-based processing
- Lines vs. tokens: when
- Line/token combo: parse line-by-line with one scanner, parse each line with another
- Pseudocode
Advanced file processing
- file writing
- general principle: System.out is one kind of device, files are another
Case study:
- Breaking up complexity
- Don't get overwhelmed! Start simple, with tasks you know how to do
- Especially at beginning, hardest part is knowing what is possible
- Java API, while overwhelming, can help with that!
Chapter 6 Homework
HW Questions
(Recommended) Self-check problems: #3, #8, #11, #12, #17, #18
(Required) Exercises: #8, #12, #15
(Required) Projects: #2
HW Details
Self-check:
- 3, 8 - correct scanner syntax
- 11 - finding mistakes
- 12 - reading tokens from input file with scanner
- 17 - take a line of text and put a box around it
- 18 - object used to write to output files, and methods available
Exercises:
- 8 - double space
- 12 - html tag strip
- 15 - read file of heads/tails and compute statistics
Projects:
- 2 - file diff utility
Chapter 6 Code
Lecture Code
PublicSchools - tokenize CSV input file using a scanner
- using csv data about public school location/information from Seattle Open Data: https://data.seattle.gov/
- Token scanner only
- CSV as first example
- Extract particular field from each row to print it out
- Could use a Scanner... but that's pretty inflexible. Kind of like, first-thing-that-you-grab-for.
- Better: ask what we really want to do... we want to tokenize strings... so see if Java standard library provides a class for that
- Turns out, it does: StringTokenizer [1]
StringTokenizer st = new StringTokenizer("this is a test");
while (st.hasMoreTokens()) {
System.out.println(st.nextToken());
}
ZipCode - zip code search and finding
- Reges and Stepp
- Read property from file
- Read other properteis from file
- Compare to original property
- Conditionally print out
Worksheet Code
Public School zipcode
- Utilizing City of Seattle data about public schools
- Utilizing code in textbook - zip code case study
- Given a public school, or an integer index, what are nearby schools
Chapter 6 Goodies
Puzzle 6
Affine cipher ax+b, gcd modular arithmetic
Puzzles/Crypto Level 1/Puzzle 6
Profiles
Claude Shannon
- Information entropy, signals, ciphers
Flags
| CSC 142 - Intro to Programming I Computer Science 142 - Intro to Programming I, South Seattle College.
Chapter 1: Intro to Java CSC 142/Chapter 1 Chapter 2: Primitive Data and Definite Loops CSC 142/Chapter 2 Chapter 3: Parameters and Objects CSC 142/Chapter 3 Chapter 4: Conditional Execution CSC 142/Chapter 4 Chapter 5: Program Logic and Indefinite Loops CSC 142/Chapter 5 Chapter 6: File Processing CSC 142/Chapter 6 Chapter 7: Arrays CSC 142/Chapter 7 Chapter 8: Classes CSC 142/Chapter 8
Puzzles: Puzzles
Category:Teaching · Category:CSC 142 · Category:CSC Related: CSC 143 Flags · Template:CSC142Flag · e |