Skip to content

Commit 64d109e

Browse files
committed
Significantly improved internal encoding heuristics and support.
* Default Encoding.default_internal to UTF-8 * Eliminated the use of file-wide magic comments to coerce code evaluated inside the file * Read templates as BINARY, use default_external or template-wide magic comments inside the Template to set the initial encoding * This means that template handlers in Ruby 1.9 will receive Strings encoded in default_internal (UTF-8 by default) * Create a better Exception for encoding issues, and use it when the template source has bytes that are not compatible with the specified encoding * Allow template handlers to opt-into handling BINARY. If they do so, they need to do some of their own manual encoding work * Added a "Configuration Gotchas" section to the intro Rails Guide instructing users to use UTF-8 for everything * Use config.encoding= in Ruby 1.8, and raise if a value that is an invalid $KCODE value is used Also: * Fixed a few tests that were assert() rather than assert_equal() and were caught by Minitest requiring a String for the message * Fixed a test where an assert_select was misformed, also caught by Minitest being more restrictive * Fixed a test where a Rack response was returning a String rather than an Enumerable
1 parent af0d1a8 commit 64d109e

File tree

19 files changed

+417
-54
lines changed

19 files changed

+417
-54
lines changed

actionpack/lib/action_view.rb

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -51,15 +51,17 @@ module ActionView
5151

5252
autoload :MissingTemplate, 'action_view/template/error'
5353
autoload :ActionViewError, 'action_view/template/error'
54-
autoload :TemplateError, 'action_view/template/error'
54+
autoload :EncodingError, 'action_view/template/error'
55+
autoload :TemplateError, 'action_view/template/error'
56+
autoload :WrongEncodingError, 'action_view/template/error'
5557

5658
autoload :TemplateHandler, 'action_view/template'
5759
autoload :TemplateHandlers, 'action_view/template'
5860
end
5961

6062
autoload :TestCase, 'action_view/test_case'
6163

62-
ENCODING_FLAG = "#.*coding[:=]\s*(\S+)[ \t]*"
64+
ENCODING_FLAG = '#.*coding[:=]\s*(\S+)[ \t]*'
6365
end
6466

6567
require 'active_support/i18n'

actionpack/lib/action_view/template.rb

Lines changed: 170 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,89 @@
1-
# encoding: utf-8
2-
# This is so that templates compiled in this file are UTF-8
31
require 'active_support/core_ext/array/wrap'
42
require 'active_support/core_ext/object/blank'
3+
require 'active_support/core_ext/kernel/singleton_class'
54

65
module ActionView
76
class Template
87
extend ActiveSupport::Autoload
98

9+
# === Encodings in ActionView::Template
10+
#
11+
# ActionView::Template is one of a few sources of potential
12+
# encoding issues in Rails. This is because the source for
13+
# templates are usually read from disk, and Ruby (like most
14+
# encoding-aware programming languages) assumes that the
15+
# String retrieved through File IO is encoded in the
16+
# <tt>default_external</tt> encoding. In Rails, the default
17+
# <tt>default_external</tt> encoding is UTF-8.
18+
#
19+
# As a result, if a user saves their template as ISO-8859-1
20+
# (for instance, using a non-Unicode-aware text editor),
21+
# and uses characters outside of the ASCII range, their
22+
# users will see diamonds with question marks in them in
23+
# the browser.
24+
#
25+
# To mitigate this problem, we use a few strategies:
26+
# 1. If the source is not valid UTF-8, we raise an exception
27+
# when the template is compiled to alert the user
28+
# to the problem.
29+
# 2. The user can specify the encoding using Ruby-style
30+
# encoding comments in any template engine. If such
31+
# a comment is supplied, Rails will apply that encoding
32+
# to the resulting compiled source returned by the
33+
# template handler.
34+
# 3. In all cases, we transcode the resulting String to
35+
# the <tt>default_internal</tt> encoding (which defaults
36+
# to UTF-8).
37+
#
38+
# This means that other parts of Rails can always assume
39+
# that templates are encoded in UTF-8, even if the original
40+
# source of the template was not UTF-8.
41+
#
42+
# From a user's perspective, the easiest thing to do is
43+
# to save your templates as UTF-8. If you do this, you
44+
# do not need to do anything else for things to "just work".
45+
#
46+
# === Instructions for template handlers
47+
#
48+
# The easiest thing for you to do is to simply ignore
49+
# encodings. Rails will hand you the template source
50+
# as the default_internal (generally UTF-8), raising
51+
# an exception for the user before sending the template
52+
# to you if it could not determine the original encoding.
53+
#
54+
# For the greatest simplicity, you can support only
55+
# UTF-8 as the <tt>default_internal</tt>. This means
56+
# that from the perspective of your handler, the
57+
# entire pipeline is just UTF-8.
58+
#
59+
# === Advanced: Handlers with alternate metadata sources
60+
#
61+
# If you want to provide an alternate mechanism for
62+
# specifying encodings (like ERB does via <%# encoding: ... %>),
63+
# you may indicate that you are willing to accept
64+
# BINARY data by implementing <tt>self.accepts_binary?</tt>
65+
# on your handler.
66+
#
67+
# If you do, Rails will not raise an exception if
68+
# the template's encoding could not be determined,
69+
# assuming that you have another mechanism for
70+
# making the determination.
71+
#
72+
# In this case, make sure you return a String from
73+
# your handler encoded in the default_internal. Since
74+
# you are handling out-of-band metadata, you are
75+
# also responsible for alerting the user to any
76+
# problems with converting the user's data to
77+
# the default_internal.
78+
#
79+
# To do so, simply raise the raise WrongEncodingError
80+
# as follows:
81+
#
82+
# raise WrongEncodingError.new(
83+
# problematic_string,
84+
# expected_encoding
85+
# )
86+
1087
eager_autoload do
1188
autoload :Error
1289
autoload :Handler
@@ -16,26 +93,22 @@ class Template
1693

1794
extend Template::Handlers
1895

19-
attr_reader :source, :identifier, :handler, :virtual_path, :formats
96+
attr_reader :source, :identifier, :handler, :virtual_path, :formats,
97+
:original_encoding
2098

21-
Finalizer = proc do |method_name|
99+
Finalizer = proc do |method_name, mod|
22100
proc do
23-
ActionView::CompiledTemplates.module_eval do
101+
mod.module_eval do
24102
remove_possible_method method_name
25103
end
26104
end
27105
end
28106

29107
def initialize(source, identifier, handler, details)
30-
if source.encoding_aware? && source =~ %r{\A#{ENCODING_FLAG}}
31-
# don't snip off the \n to preserve line numbers
32-
source.sub!(/\A[^\n]*/, '')
33-
source.force_encoding($1).encode
34-
end
35-
36-
@source = source
37-
@identifier = identifier
38-
@handler = handler
108+
@source = source
109+
@identifier = identifier
110+
@handler = handler
111+
@original_encoding = nil
39112

40113
@virtual_path = details[:virtual_path]
41114
@method_names = {}
@@ -48,15 +121,21 @@ def render(view, locals, &block)
48121
# Notice that we use a bang in this instrumentation because you don't want to
49122
# consume this in production. This is only slow if it's being listened to.
50123
ActiveSupport::Notifications.instrument("!render_template.action_view", :virtual_path => @virtual_path) do
51-
method_name = compile(locals, view)
124+
if view.is_a?(ActionView::CompiledTemplates)
125+
mod = ActionView::CompiledTemplates
126+
else
127+
mod = view.singleton_class
128+
end
129+
130+
method_name = compile(locals, view, mod)
52131
view.send(method_name, locals, &block)
53132
end
54133
rescue Exception => e
55134
if e.is_a?(Template::Error)
56135
e.sub_template_of(self)
57136
raise e
58137
else
59-
raise Template::Error.new(self, view.assigns, e)
138+
raise Template::Error.new(self, view.respond_to?(:assigns) ? view.assigns : {}, e)
60139
end
61140
end
62141

@@ -81,37 +160,97 @@ def inspect
81160
end
82161

83162
private
84-
def compile(locals, view)
163+
# Among other things, this method is responsible for properly setting
164+
# the encoding of the source. Until this point, we assume that the
165+
# source is BINARY data. If no additional information is supplied,
166+
# we assume the encoding is the same as Encoding.default_external.
167+
#
168+
# The user can also specify the encoding via a comment on the first
169+
# line of the template (# encoding: NAME-OF-ENCODING). This will work
170+
# with any template engine, as we process out the encoding comment
171+
# before passing the source on to the template engine, leaving a
172+
# blank line in its stead.
173+
#
174+
# Note that after we figure out the correct encoding, we then
175+
# encode the source into Encoding.default_internal. In general,
176+
# this means that templates will be UTF-8 inside of Rails,
177+
# regardless of the original source encoding.
178+
def compile(locals, view, mod)
85179
method_name = build_method_name(locals)
86180
return method_name if view.respond_to?(method_name)
87181

88182
locals_code = locals.keys.map! { |key| "#{key} = local_assigns[:#{key}];" }.join
89183

90-
code = @handler.call(self)
91-
if code.sub!(/\A(#.*coding.*)\n/, '')
92-
encoding_comment = $1
93-
elsif defined?(Encoding) && Encoding.respond_to?(:default_external)
94-
encoding_comment = "#coding:#{Encoding.default_external}"
184+
if source.encoding_aware?
185+
if source.sub!(/\A#{ENCODING_FLAG}/, '')
186+
encoding = $1
187+
else
188+
encoding = Encoding.default_external
189+
end
190+
191+
# Tag the source with the default external encoding
192+
# or the encoding specified in the file
193+
source.force_encoding(encoding)
194+
195+
# If the original encoding is BINARY, the actual
196+
# encoding is either stored out-of-band (such as
197+
# in ERB <%# %> style magic comments) or missing.
198+
# This is also true if the original encoding is
199+
# something other than BINARY, but it's invalid.
200+
if source.encoding != Encoding::BINARY && source.valid_encoding?
201+
source.encode!
202+
# If the assumed encoding is incorrect, check to
203+
# see whether the handler accepts BINARY. If it
204+
# does, it has another mechanism for determining
205+
# the true encoding of the String.
206+
elsif @handler.respond_to?(:accepts_binary?) && @handler.accepts_binary?
207+
source.force_encoding(Encoding::BINARY)
208+
# If the handler does not accept BINARY, the
209+
# assumed encoding (either the default_external,
210+
# or the explicit encoding specified by the user)
211+
# is incorrect. We raise an exception here.
212+
else
213+
raise WrongEncodingError.new(source, encoding)
214+
end
215+
216+
# Don't validate the encoding yet -- the handler
217+
# may treat the String as raw bytes and extract
218+
# the encoding some other way
95219
end
96220

221+
code = @handler.call(self)
222+
97223
source = <<-end_src
98224
def #{method_name}(local_assigns)
99-
_old_virtual_path, @_virtual_path = @_virtual_path, #{@virtual_path.inspect};_old_output_buffer = output_buffer;#{locals_code};#{code}
225+
_old_virtual_path, @_virtual_path = @_virtual_path, #{@virtual_path.inspect};_old_output_buffer = @output_buffer;#{locals_code};#{code}
100226
ensure
101-
@_virtual_path, self.output_buffer = _old_virtual_path, _old_output_buffer
227+
@_virtual_path, @output_buffer = _old_virtual_path, _old_output_buffer
102228
end
103229
end_src
104230

105-
if encoding_comment
106-
source = "#{encoding_comment}\n#{source}"
107-
line = -1
108-
else
109-
line = 0
231+
if source.encoding_aware?
232+
# Handlers should return their source Strings in either the
233+
# default_internal or BINARY. If the handler returns a BINARY
234+
# String, we assume its encoding is the one we determined
235+
# earlier, and encode the resulting source in the default_internal.
236+
if source.encoding == Encoding::BINARY
237+
source.force_encoding(Encoding.default_internal)
238+
end
239+
240+
# In case we get back a String from a handler that is not in
241+
# BINARY or the default_internal, encode it to the default_internal
242+
source.encode!
243+
244+
# Now, validate that the source we got back from the template
245+
# handler is valid in the default_internal
246+
unless source.valid_encoding?
247+
raise WrongEncodingError.new(@source, Encoding.default_internal)
248+
end
110249
end
111250

112251
begin
113-
ActionView::CompiledTemplates.module_eval(source, identifier, line)
114-
ObjectSpace.define_finalizer(self, Finalizer[method_name])
252+
mod.module_eval(source, identifier, 0)
253+
ObjectSpace.define_finalizer(self, Finalizer[method_name, mod])
115254

116255
method_name
117256
rescue Exception => e # errors from template code

actionpack/lib/action_view/template/error.rb

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,24 @@ module ActionView
44
class ActionViewError < StandardError #:nodoc:
55
end
66

7+
class EncodingError < StandardError #:nodoc:
8+
end
9+
10+
class WrongEncodingError < EncodingError #:nodoc:
11+
def initialize(string, encoding)
12+
@string, @encoding = string, encoding
13+
end
14+
15+
def message
16+
"Your template was not saved as valid #{@encoding}. Please " \
17+
"either specify #{@encoding} as the encoding for your template " \
18+
"in your text editor, or mark the template with its " \
19+
"encoding by inserting the following as the first line " \
20+
"of the template:\n\n# encoding: <name of correct encoding>.\n\n" \
21+
"The source of your template was:\n\n#{@string}"
22+
end
23+
end
24+
725
class MissingTemplate < ActionViewError #:nodoc:
826
attr_reader :path
927

actionpack/lib/action_view/template/handlers/erb.rb

Lines changed: 42 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,11 @@
55

66
module ActionView
77
class OutputBuffer < ActiveSupport::SafeBuffer
8+
def initialize(*)
9+
super
10+
encode!
11+
end
12+
813
def <<(value)
914
super(value.to_s)
1015
end
@@ -72,16 +77,50 @@ class ERB < Handler
7277
cattr_accessor :erb_implementation
7378
self.erb_implementation = Erubis
7479

75-
ENCODING_TAG = Regexp.new("\A(<%#{ENCODING_FLAG}-?%>)[ \t]*")
80+
ENCODING_TAG = Regexp.new("\\A(<%#{ENCODING_FLAG}-?%>)[ \\t]*")
81+
82+
def self.accepts_binary?
83+
true
84+
end
7685

7786
def compile(template)
78-
erb = template.source.gsub(ENCODING_TAG, '')
87+
if template.source.encoding_aware?
88+
# Even though Rails has given us a String tagged with the
89+
# default_internal encoding (likely UTF-8), it is possible
90+
# that the String is actually encoded using a different
91+
# encoding, specified via an ERB magic comment. If the
92+
# String is not actually UTF-8, the regular expression
93+
# engine will (correctly) raise an exception. For now,
94+
# we'll reset the String to BINARY so we can run regular
95+
# expressions against it
96+
template_source = template.source.dup.force_encoding("BINARY")
97+
98+
# Erubis does not have direct support for encodings.
99+
# As a result, we will extract the ERB-style magic
100+
# comment, give the String to Erubis as BINARY data,
101+
# and then tag the resulting String with the extracted
102+
# encoding later
103+
erb = template_source.gsub(ENCODING_TAG, '')
104+
encoding = $2
105+
106+
if !encoding && (template.source.encoding == Encoding::BINARY)
107+
raise WrongEncodingError.new(template_source, Encoding.default_external)
108+
end
109+
end
110+
79111
result = self.class.erb_implementation.new(
80112
erb,
81113
:trim => (self.class.erb_trim_mode == "-")
82114
).src
83115

84-
result = "#{$2}\n#{result}" if $2
116+
# If an encoding tag was found, tag the String
117+
# we're returning with that encoding. Otherwise,
118+
# return a BINARY String, which is what ERB
119+
# returns. Note that if a magic comment was
120+
# not specified, we will return the data to
121+
# Rails as BINARY, which will then use its
122+
# own encoding logic to create a UTF-8 String.
123+
result = "\n#{result}".force_encoding(encoding).encode if encoding
85124
result
86125
end
87126
end

actionpack/lib/action_view/template/resolver.rb

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,10 @@ def query(path, exts, formats)
7070

7171
Dir[query].reject { |p| File.directory?(p) }.map do |p|
7272
handler, format = extract_handler_and_format(p, formats)
73-
Template.new(File.read(p), File.expand_path(p), handler,
73+
74+
contents = File.open(p, "rb") {|io| io.read }
75+
76+
Template.new(contents, File.expand_path(p), handler,
7477
:virtual_path => path, :format => format)
7578
end
7679
end

actionpack/test/abstract_unit.rb

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,10 @@
1212

1313
ENV['TMPDIR'] = File.join(File.dirname(__FILE__), 'tmp')
1414

15+
if defined?(Encoding.default_internal)
16+
Encoding.default_internal = "UTF-8"
17+
end
18+
1519
require 'test/unit'
1620
require 'abstract_controller'
1721
require 'action_controller'

0 commit comments

Comments
 (0)