<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Where&#039;s Walden? &#187; regular expression</title>
	<atom:link href="http://whereswalden.com/tag/regular-expression/feed/" rel="self" type="application/rss+xml" />
	<link>http://whereswalden.com</link>
	<description>Mozilla, politics, economics, law, backpacking, cycling, and other random desiderata</description>
	<lastBuildDate>Wed, 25 Jan 2012 18:17:19 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>JavaScript change in Firefox 5 (not 4), and in other browsers: regular expressions can&#8217;t be called like functions</title>
		<link>http://whereswalden.com/2011/03/06/javascript-change-in-firefox-5-not-4-and-in-other-browsers-regular-expressions-cant-be-called-like-functions/</link>
		<comments>http://whereswalden.com/2011/03/06/javascript-change-in-firefox-5-not-4-and-in-other-browsers-regular-expressions-cant-be-called-like-functions/#comments</comments>
		<pubDate>Mon, 07 Mar 2011 01:44:08 +0000</pubDate>
		<dc:creator>Jeff</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[callable]]></category>
		<category><![CDATA[ecmascript]]></category>
		<category><![CDATA[es5]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[js]]></category>
		<category><![CDATA[mozilla]]></category>
		<category><![CDATA[regular expression]]></category>
		<category><![CDATA[spidermonkey]]></category>

		<guid isPermaLink="false">http://whereswalden.com/?p=2829</guid>
		<description><![CDATA[Callable regular expressions Way back in the day when Netscape implemented regular expressions in JavaScript, it made them callable. If you slapped an argument list after a regular expression, it&#8217;d act as if you called RegExp.prototype.exec on it with the provided arguments. var r = /abc/, res; res = r("abc"); assert(res.length === 1); res = [...]]]></description>
			<content:encoded><![CDATA[<h2>Callable regular expressions</h2>
<p>Way back in the day when Netscape implemented regular expressions in JavaScript, it made them callable.  If you slapped an argument list after a regular expression, it&#8217;d act as if you called <a href="https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/RegExp/exec"><code>RegExp.prototype.exec</code></a> on it with the provided arguments.</p>
<pre class="code" data-language="javascript">
var r = /abc/, res;

res = r("abc");
assert(res.length === 1);

res = r("def");
assert(res === null);
</pre>
<p>Why?  Beats me.  I&#8217;d have thought <code>.exec</code> was easy enough to type and clearer to boot, myself.  Hopefully readers familiar with the history can explain in comments.</p>
<h2>Problems</h2>
<p>Callable regular expressions present one immediate problem to a &#8220;naive&#8221; implementation: their behavior with <a href="https://developer.mozilla.org/en/JavaScript/Reference/Operators/Special/typeof"><code>typeof</code></a>.  According to ECMAScript, the <code>typeof</code> for any object which is callable should be <code>"function"</code>, and Netscape and Mozilla for a long time faithfully implemented this.  This tended to cause <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=61911">much confusion</a> in practice, so browsers that implemented callable regular expressions eventually changed <code>typeof</code> to arguably &#8220;lie&#8221; for regular expressions and return <code>"object"</code>.  In SpiderMonkey the &#8220;fix&#8221; was an utterly inelegant hack which distinguished callables as either regular expressions or not, to determine <code>typeof</code> behavior.</p>
<p>Past this, callable regular expressions complicate implementing callability and optimizations of it.  Implementations supporting getters and setters (once purely as an extension, now standardized in <abbr title="ECMAScript 5th edition">ES5</abbr>) must consider the case where the getter or setter is a regular expression and do something appropriate.  And of course they must handle regular old calls, qualified (<code>/a/()</code>) and unqualified (<code>({ p: /a/ }).p()</code>) both.  Mozilla&#8217;s had a solid trickle of bugs involving callable regular expressions, almost always filed as a result of <a href="http://squarefree.com/">Jesse</a>&#8216;s evil fuzzers (and not due to actual sites breaking).</p>
<p>It&#8217;s also hard to justify callable regular expressions as an extension.  While ECMAScript explicitly permits extensions, it generally prefers extensions to be new methods or properties of existing objects.  Regular expression callability is neither of these: instead it&#8217;s adding an internal hook to regular expressions to make them callable.  This might not technically be contrary to the spec, but it goes against its spirit.</p>
<h2>Regular expressions won&#8217;t be callable in Firefox 5</h2>
<p>No one&#8217;s ever really used callable regular expressions.  They&#8217;re non-standard, not all browsers implement them, and they unnecessarily complicate implementations.  So, in concert with other browser engines like <a href="https://bugs.webkit.org/show_bug.cgi?id=28285">WebKit</a>, we&#8217;re making regular expressions non-callable in Firefox 5.  (Regular expressions are callable in Firefox 4, but of course don&#8217;t rely on this.)</p>
<p>You can experiment with a version of Firefox with these changes by downloading a <a href="http://nightly.mozilla.org/js-preview.html">TraceMonkey nightly build</a>.  Trunk&#8217;s still locked down for Firefox 4, so it won&#8217;t pick up the change until Firefox 4 branches and trunk reopens for changes targeted at the next release.  (Don’t forget to <a href="http://support.mozilla.com/en-US/kb/Managing+profiles">use the profile manager</a> if you want to keep the settings you use with your primary Firefox installation pristine.)</p>
]]></content:encoded>
			<wfw:commentRss>http://whereswalden.com/2011/03/06/javascript-change-in-firefox-5-not-4-and-in-other-browsers-regular-expressions-cant-be-called-like-functions/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>More ES5 backwards-incompatible changes: regular expressions now evaluate to a new object, not the same object, each time they&#8217;re encountered</title>
		<link>http://whereswalden.com/2010/01/15/more-es5-incompatible-changes-regular-expressions-now-evaluate-to-a-new-object-not-the-same-object-each-time-theyre-encountered/</link>
		<comments>http://whereswalden.com/2010/01/15/more-es5-incompatible-changes-regular-expressions-now-evaluate-to-a-new-object-not-the-same-object-each-time-theyre-encountered/#comments</comments>
		<pubDate>Fri, 15 Jan 2010 16:37:36 +0000</pubDate>
		<dc:creator>Jeff</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[ecma-262]]></category>
		<category><![CDATA[ecmascript]]></category>
		<category><![CDATA[es5]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[js]]></category>
		<category><![CDATA[mozilla]]></category>
		<category><![CDATA[regular expression]]></category>
		<category><![CDATA[spidermonkey]]></category>

		<guid isPermaLink="false">http://whereswalden.com/?p=1265</guid>
		<description><![CDATA[(preemptive clarification: coming in Firefox 3.7 and not Firefox 3.6, which is to say, a good half year away from now rather than Real Soon Now) Disjunction: is /foo/ the same object, or a new object, each time it&#8217;s evaluated in ES3? According to ECMA-262 3rd edition, what should this code print? function getRegEx() { [...]]]></description>
			<content:encoded><![CDATA[<p>(preemptive clarification: coming in <strong>Firefox 3.7</strong> and <em>not</em> Firefox 3.6, which is to say, a good half year away from now rather than Real Soon Now)</p>
<h2>Disjunction: is <code>/foo/</code> the same object, or a new object, each time it&#8217;s evaluated in <abbr title="ECMA-262 3rd edition">ES3</abbr>?</h2>
<p>According to ECMA-262 3rd edition, what should this code print?</p>
<pre class="code" data-language="javascript">
function getRegEx() { return /regex/; }
print("getRegEx() === getRegEx(): " + (getRegEx() === getRegEx()));
</pre>
<p>The answer depends upon this question: when a JavaScript regular expression literal is evaluated, does it create a new <code>RegExp</code> object each time, or does it evaluate to the exact same <code>RegExp</code> object each time it&#8217;s evaluated?  Let&#8217;s look at a few examples and make a guess.</p>
<h2>I sense a pattern</h2>
<pre class="code" data-language="javascript">
var tests =
  [
   function getNull() { return null; },
   function getNumber() { return 1; },
   function getString() { return "a"; },
   function getBoolean() { return false; },
   function getObject() { return {}; },
   function getArray() { return []; },
   function getFunction() { return function f(){}; },
  ];

for (var i = 0, sz = tests.length; i &lt; sz; i++)
{
  var t = tests[i];
  print(t.name + "() === " + t.name + "(): " + (t() === t()));
}
</pre>
<p>If you test that code, you&#8217;ll see that the first four results are true, and the rest are false, all per ECMA-262 3rd edition.  (Okay, technically, and bizarrely, ES3 permitted <em>either</em> result for the function case, but no browser ever implemented a result of true; <abbr title="ECMA-262 5th edition">ES5</abbr> acknowledges reality and mandates that the result be false.)  The first four functions return primitive values; the last three return objects.  There&#8217;s only a single instance of any primitive value &mdash; or, alternately, you might say, equality doesn&#8217;t distinguish between different instances of the same primitive.  Therefore it doesn&#8217;t really matter whether primitive literals evaluate to new instances or the same instance.  On the other hand, objects compare equal only if they&#8217;re the <em>same</em> object.  Since the object cases didn&#8217;t compare identically, they must be new objects each time.  This makes sense: if this were not the case, what would happen in the following example?</p>
<pre class="code" data-language="javascript">
function makePoint(x, y)
{
  var pt = {};
  pt.x = x;
  pt.y = y;
  return pt;
}

var pt1 = makePoint(1, 2);
var pt2 = makePoint(3, 4);
</pre>
<p>It would be complete nonsense if the object literal above evaluated to the same object every time it were encountered; the next two lines would blow away the previous point, and we would have <code>pt1.x ===3 &#038;&#038; pt1.y === 4</code>.</p>
<h2>Plausible assertion: regular expression literals evaluate to new objects when encountered?</h2>
<p>Returning to the original question, then, what <em>does</em> ES3 say this code should print?</p>
<pre class="code" data-language="javascript">
function getRegEx() { return /regex/; }
print("getRegEx() === getRegEx(): " + (getRegEx() === getRegEx()));
</pre>
<p>A regular expression is an object.  If you don&#8217;t want to get weird property-poisoning of the sort just suggested, regular expression literals must evaluate to different objects each time they&#8217;re encountered, right?</p>
<h2>Alternative: ES3 says <code>/foo/</code> is the same object every time</h2>
<p>Wrong.  According to ES3, there&#8217;s only a single object for each regular expression literal that&#8217;s returned each time the literal is encountered:</p>
<blockquote><p>A regular expression literal is an input element that is converted to a RegExp object (section 15.10) when it is scanned.  The object is created before evaluation of the containing program or function begins.  Evaluation of the literal produces a reference to that object; it does not create a new object.</p>
</blockquote>
<div class="attribution">ECMA-262, 3rd ed. 7.8.5 Regular Expression Literals</div>
<p>This was originally a dubious optimization in the standard to avoid the &#8220;costly&#8221; creation of a regular expression object every time a literal would be encountered.  It&#8217;s perhaps a little surprising that the same object is returned each time, but does it make a difference in real programs not written to demonstrate the quirk?  Often it doesn&#8217;t matter.  As a simple example, <code>if (/^\d+$/.test(str)) { /* ... */ }</code> executes identically either way, assuming <code>RegExp.prototype.test</code> is unmodified.  The <code>RegExp</code> never escapes, and its use doesn&#8217;t depend on mutable state, so creating new objects each time doesn&#8217;t make a difference (other than negligibly, in speed).</p>
<p>Sometimes, however, the shared-object misoptimization does matter meaningfully: when a <code>RegExp</code> with mutable state is used in ways that depend on that state.  Most regular expressions don&#8217;t store any state, so if the same <code>RegExp</code> object is used twice it&#8217;s no big deal.  However, it can matter a lot for regular expressions specified with the <code>global</code> flag:</p>
<pre class="code" data-language="javascript">
var s = "abcddeeefffffgggggggghhhhhhhhhhhhh";
function next(s)
{
  var r = /(.)\1*/g;
  r.exec(s);
  return r.lastIndex;
}

var r = [];
for (var i =0; i &lt; 8; i++)
  r.push(next(s));
print(r.join(", "));
</pre>
<p>Each time a regular expression with the <code>global</code> flag is used, its <code>lastIndex</code> property is updated with the index of the location in the matched string where matching should resume when the regular expression is next used.  Thus, in this example we have mutable state, and if <code>next</code> is called multiple times we have uses which will depend on that mutable state.  Let&#8217;s see what happens in engines which implemented regular expression literals per ES3.  If you <a href="https://developer.mozilla.org/devnews/index.php/2010/01/10/firefox-3-6-release-candidate-is-now-available-for-download/">download the Firefox 3.6 release candidate</a> and test the above code in it (adjusting the implied <code>print</code> to <code>alert</code>), the printed result will be this:</p>
<pre class="code">
1, 2, 3, 5, 8, 13, 21, 34
</pre>
<h2>ES5: an escape to sanity</h2>
<p>Is ES3&#8242;s behavior what you&#8217;d expect?  No, it isn&#8217;t.  In fact, ES3&#8242;s behavior, which Mozilla and SpiderMonkey implement, is <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=98409">the second-most duplicated bug filed against Mozilla&#8217;s JavaScript engine</a>.  SpiderMonkey and (strangely enough) v8 are the only notable JavaScript engines out there that implement ES3&#8242;s behavior.  ES3&#8242;s behavior is rarely what web developers expect, and it doesn&#8217;t provide any real value, so ES5 is changing to the behavior you&#8217;d expect: evaluating a regular expression literal creates a new object every time.</p>
<p>Starting with Firefox 3.7, Firefox will implement what ES5 specifies.  Download a Firefox nightly from <a href="http://nightly.mozilla.org/">nightly.mozilla.org</a> and test it out as above (<a href="http://support.mozilla.com/en-US/kb/Managing+profiles">use the profile manager</a> if you want to keep your current Firefox settings and install untouched).  Instead of the Fibonacci sequence you&#8217;ll get this:</p>
<pre class="code">
1, 1, 1, 1, 1, 1, 1, 1
</pre>
<h2>The bottom line</h2>
<p><strong>Starting with Firefox 3.7, evaluating a regular expression literal like <code>/foo/</code> will create a new <code>RegExp</code> object, just as evaluating <code>{}</code> or <code>[]</code> currently creates a new object or array.</strong>  The optimization ES3 specified has resulted in clear developer confusion and was misguided and inconsistent with respect to other object literal syntax in JavaScript.</p>
<p>Again, as with <a href="http://whereswalden.com/2010/01/12/more-es5-backwards-incompatible-changes-the-global-properties-undefined-nan-and-infinity-are-now-immutable/">my previous post</a>, we doubt this change will affect many scripts (in this case, except for the better).  The fact that few browsers implemented ES3&#8242;s semantics means that most sites have to cope with either choice of semantics, so the semantics in ES5, implemented by Mozilla for Firefox 3.7, are likely already handled.  Still, it&#8217;s possible that this change might break some sites (particularly those which include browser-specific code), so we&#8217;re giving a heads-up as early as possible.</p>
]]></content:encoded>
			<wfw:commentRss>http://whereswalden.com/2010/01/15/more-es5-incompatible-changes-regular-expressions-now-evaluate-to-a-new-object-not-the-same-object-each-time-theyre-encountered/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>

