-
Notifications
You must be signed in to change notification settings - Fork 1.8k
/
Copy path07-files.php
708 lines (692 loc) · 28.3 KB
/
07-files.php
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
<?php if ( file_exists("../booktop.php") ) {
require_once "../booktop.php";
ob_start();
}?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
<meta charset="utf-8" />
<meta name="generator" content="pandoc" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
<title>-</title>
<style>
html {
color: #1a1a1a;
background-color: #fdfdfd;
}
body {
margin: 0 auto;
max-width: 36em;
padding-left: 50px;
padding-right: 50px;
padding-top: 50px;
padding-bottom: 50px;
hyphens: auto;
overflow-wrap: break-word;
text-rendering: optimizeLegibility;
font-kerning: normal;
}
@media (max-width: 600px) {
body {
font-size: 0.9em;
padding: 12px;
}
h1 {
font-size: 1.8em;
}
}
@media print {
html {
background-color: white;
}
body {
background-color: transparent;
color: black;
font-size: 12pt;
}
p, h2, h3 {
orphans: 3;
widows: 3;
}
h2, h3, h4 {
page-break-after: avoid;
}
}
p {
margin: 1em 0;
}
a {
color: #1a1a1a;
}
a:visited {
color: #1a1a1a;
}
img {
max-width: 100%;
}
svg {
height: auto;
max-width: 100%;
}
h1, h2, h3, h4, h5, h6 {
margin-top: 1.4em;
}
h5, h6 {
font-size: 1em;
font-style: italic;
}
h6 {
font-weight: normal;
}
ol, ul {
padding-left: 1.7em;
margin-top: 1em;
}
li > ol, li > ul {
margin-top: 0;
}
blockquote {
margin: 1em 0 1em 1.7em;
padding-left: 1em;
border-left: 2px solid #e6e6e6;
color: #606060;
}
code {
font-family: Menlo, Monaco, Consolas, 'Lucida Console', monospace;
font-size: 85%;
margin: 0;
hyphens: manual;
}
pre {
margin: 1em 0;
overflow: auto;
}
pre code {
padding: 0;
overflow: visible;
overflow-wrap: normal;
}
.sourceCode {
background-color: transparent;
overflow: visible;
}
hr {
background-color: #1a1a1a;
border: none;
height: 1px;
margin: 1em 0;
}
table {
margin: 1em 0;
border-collapse: collapse;
width: 100%;
overflow-x: auto;
display: block;
font-variant-numeric: lining-nums tabular-nums;
}
table caption {
margin-bottom: 0.75em;
}
tbody {
margin-top: 0.5em;
border-top: 1px solid #1a1a1a;
border-bottom: 1px solid #1a1a1a;
}
th {
border-top: 1px solid #1a1a1a;
padding: 0.25em 0.5em 0.25em 0.5em;
}
td {
padding: 0.125em 0.5em 0.25em 0.5em;
}
header {
margin-bottom: 4em;
text-align: center;
}
#TOC li {
list-style: none;
}
#TOC ul {
padding-left: 1.3em;
}
#TOC > ul {
padding-left: 0;
}
#TOC a:not(:hover) {
text-decoration: none;
}
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
div.columns{display: flex; gap: min(4vw, 1.5em);}
div.column{flex: auto; overflow-x: auto;}
div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
/* The extra [class] is a hack that increases specificity enough to
override a similar rule in reveal.js */
ul.task-list[class]{list-style: none;}
ul.task-list li input[type="checkbox"] {
font-size: inherit;
width: 0.8em;
margin: 0 0.8em 0.2em -1.6em;
vertical-align: middle;
}
.display.math{display: block; text-align: center; margin: 0.5rem auto;}
</style>
</head>
<body>
<h1 id="files">Files</h1>
<p> </p>
<h2 id="persistence">Persistence</h2>
<p> </p>
<p>So far, we have learned how to write programs and communicate our
intentions to the <em>Central Processing Unit</em> using conditional
execution, functions, and iterations. We have learned how to create and
use data structures in the <em>Main Memory</em>. The CPU and memory are
where our software works and runs. It is where all of the “thinking”
happens.</p>
<p>But if you recall from our hardware architecture discussions, once
the power is turned off, anything stored in either the CPU or main
memory is erased. So up to now, our programs have just been transient
fun exercises to learn Python.</p>
<figure>
<img src="../images/arch.svg" alt="Secondary Memory" style="height: 2.5in;"/>
<figcaption>
Secondary Memory
</figcaption>
</figure>
<p>In this chapter, we start to work with <em>Secondary Memory</em> (or
files). Secondary memory is not erased when the power is turned off. Or
in the case of a USB flash drive, the data we write from our programs
can be removed from the system and transported to another system.</p>
<p>We will primarily focus on reading and writing text files such as
those we create in a text editor. Later we will see how to work with
database files which are binary files, specifically designed to be read
and written through database software.</p>
<h2 id="opening-files">Opening files</h2>
<p> </p>
<p>When we want to read or write a file (say on your hard drive), we
first must <em>open</em> the file. Opening the file communicates with
your operating system, which knows where the data for each file is
stored. When you open a file, you are asking the operating system to
find the file by name and make sure the file exists. In this example, we
open the file <em>mbox.txt</em>, which should be stored in the same
folder that you are in when you start Python. You can download this file
from <a
href="http://www.py4e.com/code3/mbox.txt">www.py4e.com/code3/mbox.txt</a></p>
<pre class="python"><code>>>> fhand = open('mbox.txt')
>>> print(fhand)
<_io.TextIOWrapper name='mbox.txt' mode='r' encoding='cp1252'></code></pre>
<p></p>
<p>If the <code>open</code> is successful, the operating system returns
us a <em>file handle</em>. The file handle is not the actual data
contained in the file, but instead it is a “handle” that we can use to
read the data. You are given a handle if the requested file exists and
you have the proper permissions to read the file.</p>
<figure>
<img src="../images/handle.svg" alt="A File Handle" style="height: 2.0in;"/>
<figcaption>
A File Handle
</figcaption>
</figure>
<p>If the file does not exist, <code>open</code> will fail with a
traceback and you will not get a handle to access the contents of the
file:</p>
<pre class="python"><code>>>> fhand = open('stuff.txt')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: 'stuff.txt'</code></pre>
<p>Later we will use <code>try</code> and <code>except</code> to deal
more gracefully with the situation where we attempt to open a file that
does not exist.</p>
<h2 id="text-files-and-lines">Text files and lines</h2>
<p>A text file can be thought of as a sequence of lines, much like a
Python string can be thought of as a sequence of characters. For
example, this is a sample of a text file which records mail activity
from various individuals in an open source project development team:</p>
<pre><code>From [email protected] Sat Jan 5 09:14:16 2008
Return-Path: <[email protected]>
Date: Sat, 5 Jan 2008 09:12:18 -0500
From: [email protected]
Subject: [sakai] svn commit: r39772 - content/branches/
Details: http://source.sakaiproject.org/viewsvn/?view=rev&rev=39772
...</code></pre>
<p>The entire file of mail interactions is available from</p>
<p><a
href="http://www.py4e.com/code3/mbox.txt">www.py4e.com/code3/mbox.txt</a></p>
<p>and a shortened version of the file is available from</p>
<p><a
href="http://www.py4e.com/code3/mbox-short.txt">www.py4e.com/code3/mbox-short.txt</a></p>
<p>These files are in a standard format for a file containing multiple
mail messages. The lines which start with “From” separate the messages
and the lines which start with “From:” are part of the messages. For
more information about the mbox format, see <a
href="https://en.wikipedia.org/wiki/Mbox"
class="uri">https://en.wikipedia.org/wiki/Mbox</a>.</p>
<p>To break the file into lines, there is a special character that
represents the “end of the line” called the <em>newline</em>
character.</p>
<p></p>
<p>In Python, we represent the <em>newline</em> character as a
backslash-n in string constants. Even though this looks like two
characters, it is actually a single character. When we look at the
variable by entering “stuff” in the interpreter, it shows us the
<code>\n</code> in the string, but when we use <code>print</code> to
show the string, we see the string broken into two lines by the newline
character.</p>
<pre class="python"><code>>>> stuff = 'Hello\nWorld!'
>>> stuff
'Hello\nWorld!'
>>> print(stuff)
Hello
World!
>>> stuff = 'X\nY'
>>> print(stuff)
X
Y
>>> len(stuff)
3</code></pre>
<p>You can also see that the length of the string <code>X\nY</code> is
<em>three</em> characters because the newline character is a single
character.</p>
<p>So when we look at the lines in a file, we need to <em>imagine</em>
that there is a special invisible character called the newline at the
end of each line that marks the end of the line.</p>
<p>So the newline character separates the characters in the file into
lines.</p>
<h2 id="reading-files">Reading files</h2>
<p> </p>
<p>While the <em>file handle</em> does not contain the data for the
file, it is quite easy to construct a <code>for</code> loop to read
through and count each of the lines in a file:</p>
<pre class="python"><code>fhand = open('mbox-short.txt')
count = 0
for line in fhand:
count = count + 1
print('Line Count:', count)
# Code: https://www.py4e.com/code3/open.py</code></pre>
<p>We can use the file handle as the sequence in our <code>for</code>
loop. Our <code>for</code> loop simply counts the number of lines in the
file and prints them out. The rough translation of the <code>for</code>
loop into English is, “for each line in the file represented by the file
handle, add one to the <code>count</code> variable.”</p>
<p>The reason that the <code>open</code> function does not read the
entire file is that the file might be quite large with many gigabytes of
data. The <code>open</code> statement takes the same amount of time
regardless of the size of the file. The <code>for</code> loop actually
causes the data to be read from the file.</p>
<p>When the file is read using a <code>for</code> loop in this manner,
Python takes care of splitting the data in the file into separate lines
using the newline character. Python reads each line through the newline
and includes the newline as the last character in the <code>line</code>
variable for each iteration of the <code>for</code> loop.</p>
<p>Because the <code>for</code> loop reads the data one line at a time,
it can efficiently read and count the lines in very large files without
running out of main memory to store the data. The above program can
count the lines in any size file using very little memory since each
line is read, counted, and then discarded.</p>
<p>If you know the file is relatively small compared to the size of your
main memory, you can read the whole file into one string using the
<code>read</code> method on the file handle.</p>
<pre class="python"><code>>>> fhand = open('mbox-short.txt')
>>> inp = fhand.read()
>>> print(len(inp))
94626
>>> print(inp[:20])
From stephen.marquar</code></pre>
<p>In this example, the entire contents (all 94,626 characters) of the
file <em>mbox-short.txt</em> are read directly into the variable
<code>inp</code>. We use string slicing to print out the first 20
characters of the string data stored in <code>inp</code>.</p>
<p>When the file is read in this manner, all the characters including
all of the lines and newline characters are one big string in the
variable <code>inp</code>. It is a good idea to store the output of
<code>read</code> as a variable because each call to <code>read</code>
exhausts the resource:</p>
<pre class="python"><code>>>> fhand = open('mbox-short.txt')
>>> print(len(fhand.read()))
94626
>>> print(len(fhand.read()))
0</code></pre>
<p>Remember that this form of the <code>open</code> function should only
be used if the file data will fit comfortably in the main memory of your
computer. If the file is too large to fit in main memory, you should
write your program to read the file in chunks using a <code>for</code>
or <code>while</code> loop.</p>
<h2 id="searching-through-a-file">Searching through a file</h2>
<p>When you are searching through data in a file, it is a very common
pattern to read through a file, ignoring most of the lines and only
processing lines which meet a particular condition. We can combine the
pattern for reading a file with string methods to build simple search
mechanisms.</p>
<p> </p>
<p>For example, if we wanted to read a file and only print out lines
which started with the prefix “From:”, we could use the string method
<em>startswith</em> to select only those lines with the desired
prefix:</p>
<pre class="python"><code>fhand = open('mbox-short.txt')
for line in fhand:
if line.startswith('From:'):
print(line)
# Code: https://www.py4e.com/code3/search1.py</code></pre>
<p>When this program runs, we get the following output:</p>
<pre><code>From: [email protected]
From: [email protected]
From: [email protected]
From: [email protected]
...</code></pre>
<p>The output looks great since the only lines we are seeing are those
which start with “From:”, but why are we seeing the extra blank lines?
This is due to that invisible <em>newline</em> character. Each of the
lines ends with a newline, so the <code>print</code> statement prints
the string in the variable <em>line</em> which includes a newline and
then <code>print</code> adds <em>another</em> newline, resulting in the
double spacing effect we see.</p>
<p>We could use line slicing to print all but the last character, but a
simpler approach is to use the <em>rstrip</em> method which strips
whitespaces from the right side of a string as follows:</p>
<pre class="python"><code>fhand = open('mbox-short.txt')
for line in fhand:
line = line.rstrip()
if line.startswith('From:'):
print(line)
# Code: https://www.py4e.com/code3/search2.py</code></pre>
<p>When this program runs, we get the following output:</p>
<pre><code>From: [email protected]
From: [email protected]
From: [email protected]
From: [email protected]
From: [email protected]
From: [email protected]
From: [email protected]
...</code></pre>
<p>As your file processing programs get more complicated, you may want
to structure your search loops using <code>continue</code>. The basic
idea of the search loop is that you are looking for “interesting” lines
and effectively skipping “uninteresting” lines. And then when we find an
interesting line, we do something with that line.</p>
<p>We can structure the loop to follow the pattern of skipping
uninteresting lines as follows:</p>
<pre class="python"><code>fhand = open('mbox-short.txt')
for line in fhand:
line = line.rstrip()
# Skip 'uninteresting lines'
if not line.startswith('From:'):
continue
# Process our 'interesting' line
print(line)
# Code: https://www.py4e.com/code3/search3.py</code></pre>
<p>The output of the program is the same. In English, the uninteresting
lines are those which do not start with “From:”, which we skip using
<code>continue</code>. For the “interesting” lines (i.e., those that
start with “From:”) we perform the processing.</p>
<p>We can use the <code>find</code> string method to simulate a text
editor search that finds lines where the search string is anywhere in
the line. Since <code>find</code> looks for an occurrence of a string
within another string and either returns the position of the string or
-1 if the string was not found, we can write the following loop to show
lines which contain the string “<span class="citation"
data-cites="uct.ac.za">@uct.ac.za</span>” (i.e., they come from the
University of Cape Town in South Africa):</p>
<pre class="python"><code>fhand = open('mbox-short.txt')
for line in fhand:
line = line.rstrip()
if line.find('@uct.ac.za') == -1: continue
print(line)
# Code: https://www.py4e.com/code3/search4.py</code></pre>
<p>Which produces the following output:</p>
<pre><code>From [email protected] Sat Jan 5 09:14:16 2008
X-Authentication-Warning: set sender to [email protected] using -f
From: [email protected]
Author: [email protected]
From [email protected] Fri Jan 4 07:02:32 2008
X-Authentication-Warning: set sender to [email protected] using -f
From: [email protected]
Author: [email protected]
...</code></pre>
<p>Here we also use the contracted form of the <code>if</code> statement
where we put the <code>continue</code> on the same line as the
<code>if</code>. This contracted form of the <code>if</code> functions
the same as if the <code>continue</code> were on the next line and
indented.</p>
<h2 id="letting-the-user-choose-the-file-name">Letting the user choose
the file name</h2>
<p>We really do not want to have to edit our Python code every time we
want to process a different file. It would be more usable to ask the
user to enter the file name string each time the program runs so they
can use our program on different files without changing the Python
code.</p>
<p>This is quite simple to do by reading the file name from the user
using <code>input</code> as follows:</p>
<pre class="python"><code>fname = input('Enter the file name: ')
fhand = open(fname)
count = 0
for line in fhand:
if line.startswith('Subject:'):
count = count + 1
print('There were', count, 'subject lines in', fname)
# Code: https://www.py4e.com/code3/search6.py</code></pre>
<p>We read the file name from the user and place it in a variable named
<code>fname</code> and open that file. Now we can run the program
repeatedly on different files.</p>
<pre><code>python search6.py
Enter the file name: mbox.txt
There were 1797 subject lines in mbox.txt
python search6.py
Enter the file name: mbox-short.txt
There were 27 subject lines in mbox-short.txt</code></pre>
<p>Before peeking at the next section, take a look at the above program
and ask yourself, “What could go possibly wrong here?” or “What might
our friendly user do that would cause our nice little program to
ungracefully exit with a traceback, making us look not-so-cool in the
eyes of our users?”</p>
<h2 id="using-try-except-and-open">Using <code>try, except,</code> and
<code>open</code></h2>
<p>I told you not to peek. This is your last chance.</p>
<p>What if our user types something that is not a file name?</p>
<pre><code>python search6.py
Enter the file name: missing.txt
Traceback (most recent call last):
File "search6.py", line 2, in <module>
fhand = open(fname)
FileNotFoundError: [Errno 2] No such file or directory: 'missing.txt'
python search6.py
Enter the file name: na na boo boo
Traceback (most recent call last):
File "search6.py", line 2, in <module>
fhand = open(fname)
FileNotFoundError: [Errno 2] No such file or directory: 'na na boo boo'</code></pre>
<p>Do not laugh. Users will eventually do every possible thing they can
do to break your programs, either mistakenly or with malicious intent.
As a matter of fact, an important part of any software development team
is a person or group called <em>Quality Assurance</em> (or QA for short)
whose very job it is to do the craziest things possible in an attempt to
break the software that the programmer has created.</p>
<p> </p>
<p>The QA team is responsible for finding the flaws in programs before
we have delivered the program to the end users who may be purchasing the
software or paying our salary to write the software. So the QA team is
the programmer’s best friend.</p>
<p> </p>
<p>So now that we see the flaw in the program, we can elegantly fix it
using the <code>try</code>/<code>except</code> structure. We need to
assume that the <code>open</code> call might fail and add recovery code
when the <code>open</code> fails as follows:</p>
<pre class="python"><code>fname = input('Enter the file name: ')
try:
fhand = open(fname)
except:
print('File cannot be opened:', fname)
exit()
count = 0
for line in fhand:
if line.startswith('Subject:'):
count = count + 1
print('There were', count, 'subject lines in', fname)
# Code: https://www.py4e.com/code3/search7.py</code></pre>
<p>The <code>exit</code> function terminates the program. It is a
function that we call that never returns. Now when our user (or QA team)
types in silliness or bad file names, we “catch” them and recover
gracefully:</p>
<pre><code>python search7.py
Enter the file name: mbox.txt
There were 1797 subject lines in mbox.txt
python search7.py
Enter the file name: na na boo boo
File cannot be opened: na na boo boo</code></pre>
<p></p>
<p>Protecting the <code>open</code> call is a good example of the proper
use of <code>try</code> and <code>except</code> in a Python program. We
use the term “Pythonic” when we are doing something the “Python way”. We
might say that the above example is the Pythonic way to open a file.</p>
<p>Once you become more skilled in Python, you can engage in repartee
with other Python programmers to decide which of two equivalent
solutions to a problem is “more Pythonic”. The goal to be “more
Pythonic” captures the notion that programming is part engineering and
part art. We are not always interested in just making something work, we
also want our solution to be elegant and to be appreciated as elegant by
our peers.</p>
<h2 id="writing-files">Writing files</h2>
<p></p>
<p>To write a file, you have to open it with mode “w” as a second
parameter:</p>
<pre class="python"><code>>>> fout = open('output.txt', 'w')
>>> print(fout)
<_io.TextIOWrapper name='output.txt' mode='w' encoding='cp1252'></code></pre>
<p>If the file already exists, opening it in write mode clears out the
old data and starts fresh, so be careful! If the file doesn’t exist, a
new one is created.</p>
<p>The <code>write</code> method of the file handle object puts data
into the file, returning the number of characters written. The default
write mode is text for writing (and reading) strings.</p>
<pre class="python"><code>>>> line1 = "This here's the wattle,\n"
>>> fout.write(line1)
24</code></pre>
<p></p>
<p>Again, the file object keeps track of where it is, so if you call
<code>write</code> again, it adds the new data to the end.</p>
<p>We must make sure to manage the ends of lines as we write to the file
by explicitly inserting the newline character when we want to end a
line. The <code>print</code> statement automatically appends a newline,
but the <code>write</code> method does not add the newline
automatically.</p>
<pre class="python"><code>>>> line2 = 'the emblem of our land.\n'
>>> fout.write(line2)
24</code></pre>
<p>When you are done writing, you have to close the file to make sure
that the last bit of data is physically written to the disk so it will
not be lost if the power goes off.</p>
<pre class="python"><code>>>> fout.close()</code></pre>
<p>We could close the files which we open for read as well, but we can
be a little sloppy if we are only opening a few files since Python makes
sure that all open files are closed when the program ends. When we are
writing files, we want to explicitly close the files so as to leave
nothing to chance.</p>
<p> </p>
<h2 id="debugging">Debugging</h2>
<p> </p>
<p>When you are reading and writing files, you might run into problems
with whitespace. These errors can be hard to debug because spaces, tabs,
and newlines are normally invisible:</p>
<pre class="python"><code>>>> s = '1 2\t 3\n 4'
>>> print(s)
1 2 3
4</code></pre>
<p> </p>
<p>The built-in function <code>repr</code> can help. It takes any object
as an argument and returns a string representation of the object. For
strings, it represents whitespace characters with backslash
sequences:</p>
<pre class="python"><code>>>> print(repr(s))
'1 2\t 3\n 4'</code></pre>
<p>This can be helpful for debugging.</p>
<p>One other problem you might run into is that different systems use
different characters to indicate the end of a line. Some systems use a
newline, represented <code>\n</code>. Others use a return character,
represented <code>\r</code>. Some use both. If you move files between
different systems, these inconsistencies might cause problems.</p>
<p></p>
<p>For most systems, there are applications to convert from one format
to another. You can find them (and read more about this issue) at <a
href="https://wikipedia.org/wiki/Newline">https://www.wikipedia.org/wiki/Newline</a>.
Or, of course, you could write one yourself.</p>
<h2 id="glossary">Glossary</h2>
<dl>
<dt>catch</dt>
<dd>
To prevent an exception from terminating a program using the
<code>try</code> and <code>except</code> statements.
</dd>
<dt>newline</dt>
<dd>
A special character used in files and strings to indicate the end of a
line.
</dd>
<dt>Pythonic</dt>
<dd>
A technique that works elegantly in Python. “Using try and except is the
<em>Pythonic</em> way to recover from missing files”.
</dd>
<dt>Quality Assurance</dt>
<dd>
A person or team focused on insuring the overall quality of a software
product. QA is often involved in testing a product and identifying
problems before the product is released.
</dd>
<dt>text file</dt>
<dd>
A sequence of characters stored in permanent storage like a hard drive.
</dd>
</dl>
<h2 id="exercises">Exercises</h2>
<p><strong>Exercise 1:</strong> Write a program to read through a file
and print the contents of the file (line by line) all in upper case.
Executing the program will look as follows:</p>
<pre><code>python shout.py
Enter a file name: mbox-short.txt
FROM [email protected] SAT JAN 5 09:14:16 2008
RETURN-PATH: <[email protected]>
RECEIVED: FROM MURDER (MAIL.UMICH.EDU [141.211.14.90])
BY FRANKENSTEIN.MAIL.UMICH.EDU (CYRUS V2.3.8) WITH LMTPA;
SAT, 05 JAN 2008 09:14:16 -0500</code></pre>
<p>You can download the file from <a
href="http://www.py4e.com/code3/mbox-short.txt">www.py4e.com/code3/mbox-short.txt</a></p>
<p><strong>Exercise 2:</strong> Write a program to prompt for a file
name, and then read through the file and look for lines of the form:</p>
<pre><code>X-DSPAM-Confidence: 0.8475</code></pre>
<p>When you encounter a line that starts with “X-DSPAM-Confidence:” pull
apart the line to extract the floating-point number on the line. Count
these lines and then compute the total of the spam confidence values
from these lines. When you reach the end of the file, print out the
average spam confidence.</p>
<pre><code>Enter the file name: mbox.txt
Average spam confidence: 0.894128046745
Enter the file name: mbox-short.txt
Average spam confidence: 0.750718518519</code></pre>
<p>Test your file on the <em>mbox.txt</em> and <em>mbox-short.txt</em>
files.</p>
<p><strong>Exercise 3:</strong></p>
<p>Sometimes when programmers get bored or want to have a bit of fun,
they add a harmless <em>Easter Egg</em> to their program. Modify the
program that prompts the user for the file name so that it prints a
funny message when the user types in the exact file name “na na boo
boo”. The program should behave normally for all other files which exist
and don’t exist. Here is a sample execution of the program:</p>
<pre><code>python egg.py
Enter the file name: mbox.txt
There were 1797 subject lines in mbox.txt
python egg.py
Enter the file name: missing.tyxt
File cannot be opened: missing.tyxt
python egg.py
Enter the file name: na na boo boo
NA NA BOO BOO TO YOU - You have been punk'd!</code></pre>
<p>We are not encouraging you to put Easter Eggs in your programs; this
is just an exercise.</p>
</body>
</html>
<?php if ( file_exists("../bookfoot.php") ) {
$HTML_FILE = basename(__FILE__);
$HTML = ob_get_contents();
ob_end_clean();
require_once "../bookfoot.php";
}?>