Skip to content

Commit 81e85d3

Browse files
committed
adding class notes and beginning of unicode lab
1 parent cd036a2 commit 81e85d3

File tree

4 files changed

+140
-0
lines changed

4 files changed

+140
-0
lines changed
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
Notes - 12/9/2014
2+
====================
3+
4+
Unicode and the Persistence of Serialization
5+
---
6+
7+
Projects Due at the end of this week! - Friday
8+
9+
Anything is bytes <-- if it's stored on a disk or sent over a network, it's bytes
10+
11+
Unicode makes it easier to deal with bytes
12+
13+
Used to be able to fit everything into a two byte integer, (65,536 chars.)
14+
15+
Variety of encodings -> way of going between the canonical name of a character, and how it's stored in memory
16+
17+
Py2 strings are a sequence of bytes - Unicode strings are sequences of platonic characters
18+
19+
Platonic characters cannot be written to disk or network
20+
21+
22+
Python has both str and unicode
23+
24+
Two ways to work with binary data:
25+
26+
str and bytes() and bytearray
27+
28+
In Python 3 bytes and strings are completely different!
29+
30+
Unicode object lets you work with characters - all the same methods as the string object
31+
32+
Encoding is converting from unicode object to bytes
33+
34+
Decoding is converting from bytes to a unicode object
35+
36+
37+
import codects
38+
#encoding and decoding stuff
39+
40+
codecs.encode()
41+
codecs.decode()
42+
codecs.open() #better to use io.open
43+
44+
Use Unicode in your source files -
45+
46+
#-*- coding: utf-8 -*-
47+
48+
The Trick in Using Unicode - Be Consistent:
49+
50+
Always unicode, never Python strings
51+
52+
Do the decoding when you input your data
53+
54+
Decode on input
55+
56+
Encode on output
57+
58+
59+
get default encoding - sys.getdefaultencoding()
60+
61+
62+
from __future__ import unicode_literals #<----after running this line u'' is assumed!
63+
64+
-be aware that you can still get Python 2 strings from other places...
65+
66+
JSON Requires UTF-8!
67+
68+
In Python 3, all strings are unicode
69+
70+
Py3 has two distinct concepts:
71+
72+
text - uses str object
73+
binary data - uses bytes
74+
75+
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
I Can Eat Glass:
2+
3+
And from the sublime to the ridiculous, here is a certain phrase in an assortment of languages:
4+
5+
Sanskrit: काचं शक्नोम्यत्तुम् । नोपहिनस्ति माम् ॥
6+
7+
Sanskrit (standard transcription): kācaṃ śaknomyattum; nopahinasti mām.
8+
9+
Classical Greek: ὕαλον ϕαγεῖν δύναμαι· τοῦτο οὔ με βλάπτει.
10+
11+
Greek (monotonic): Μπορώ να φάω σπασμένα γυαλιά χωρίς να πάθω τίποτα.
12+
13+
Greek (polytonic): Μπορῶ νὰ φάω σπασμένα γυαλιὰ χωρὶς νὰ πάθω τίποτα.
14+
15+
Latin: Vitrum edere possum; mihi non nocet.
16+
17+
Old French: Je puis mangier del voirre. Ne me nuit.
18+
19+
French: Je peux manger du verre, ça ne me fait pas mal.
20+
21+
Provençal / Occitan: Pòdi manjar de veire, me nafrariá pas.
22+
23+
Québécois: J'peux manger d'la vitre, ça m'fa pas mal.
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
#!/usr/bin/python
2+
3+
file = open('/home/schuyler/PythonFolder/session10/ICanEatGlass.utf81.txt', 'rw')
4+
5+
unicode_string = u'bananas'.encode('utf-8')
6+
7+
unicode_chess_piece = u'\u2654'
8+
9+
unicode_chess_piece = unicode_chess_piece.encode('utf-8')
10+
11+
print unicode_chess_piece
12+
13+
#print unicode_string
14+
15+
print file.read()
16+
17+
file.write(unicode_chess_piece)
18+
19+
print file.read()
20+
21+
file.close()
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
#!/usr/bin/python
2+
3+
file = open('/home/schuyler/PythonFolder/session10/ICanEatGlass.utf81.txt', 'rw')
4+
5+
unicode_string = u'bananas'.encode('utf-8')
6+
7+
unicode_chess_piece = u'\u2654'
8+
9+
unicode_chess_piece = unicode_chess_piece.encode('utf-8')
10+
11+
print unicode_chess_piece
12+
13+
#print unicode_string
14+
15+
print file.read()
16+
17+
file.write(unicode_chess_piece)
18+
19+
print file.read()
20+
21+
file.close()

0 commit comments

Comments
 (0)