Re: default charset confusion

From: Date: Mon, 12 Mar 2012 10:05:03 +0000
Subject: Re: default charset confusion
References: 1 2 3 4 5  Groups: php.internals 
Request: Send a blank email to [email protected] to get a copy of this message
Hi

I think following PHP 5.4.0 NEWS entry is misleading.

  . Changed default value of "default_charset" php.ini option from ISO-8859-1 to
    UTF-8. (Rasmus)

I thought default_charset became UTF-8, so I was expecting
following HTTP header.

content-type	text/html; charset=UTF-8

However, I got empty charset (missing 'charset=UTF-8').
So I looked up to source and found the line in SAPI.h

293	#define SAPI_DEFAULT_CHARSET        ""

Empty string should be "UTF-8", isn't it?

BTW, empty charset in HTTP header does not mean the default will
be ISO-8859-1, but it let browser guess the encoding is used.
Guessing encoding may cause XSS under certain conditions.


Anyway, I was curious so I've checked ext/standard/html.c and found

/* {{{ entity_charset determine_charset
 * returns the charset identifier based on current locale or a hint.
 * defaults to UTF-8 */
static enum entity_charset determine_charset(char *charset_hint TSRMLS_DC)
{
	int i;
	enum entity_charset charset = cs_utf_8;
	int len = 0;
	const zend_encoding *zenc;

	/* Default is now UTF-8 */
	if (charset_hint == NULL)
		return cs_utf_8;


There are 2 problems.

 - php.ini's default_charset should be UTF-8.
 - determine_charset() should not blindly default to UTF-8 when there
are no hint.

Old htmlentities/htmlspecialchars actually determines charset from
default_charset/mbstring.internal_encoding/etc. I think old behavior
is better than now.

How about make determine_charset() behaves like 5.3 and set the
SAPI_DEFAULT_CHARSET to "UTF-8"?

Then PHP will behave like as NEWS mentions, htmlentities/htmlspecialchars
default encoding became 'UTF-8' and users will have control for default
htmlenties/htmlspecialchars encoding.

Regards,

--
Yasuo Ohgaki
[email protected]


Thread (39 messages)

« previous php.internals (#58865) next »