(ZT)The Client Side of ASP.NET Pages
http://msdn.microsoft.com/msdnmag/issues/06/12/Cutting%20Edge/default.aspx
Analysis of the ASPX Code
Analysis of the HTML Client Code
The View State Field
The PostBack Mechanism
Analysis of Class Code
There's a trend in the software industry towards moving much of the burden of code writing to the infrastructure of the underlying platform. A variety of development platforms ask developers to provide a high-level description of the information they need in a relatively loose syntax, instead of hard-coding every single byte of it according to a strict set of syntax rules. It is now common for developers to use an XML dialect to describe the desired result and have a compiler or runtime engine parse and process the contents into traditional and executable code.
For example, Windows® Presentation Foundation, one of the pillars of the .NET Framework 3.0, uses XAML as the XML-based presentation language to describe the user interface of the form. The Microsoft AJAX Library (part of the system formerly code-named ASP.NET "Atlas") applies the same principle to rich Web pages with its XML-Script metalanguage (although, technically, XML-Script is not part of the core release, but rather it's being shared as an unofficial sample technology). A declarative layout language, XML-Script wires up HTML elements and script together and forms virtual client-side controls. In the end, XML-Script injects logic and functionality in client pages.
There are a few advantages to using a declarative language to author Web pages and forms. In this way, server-side components can more easily generate pages and forms than if they had to emit actual Visual Basic®, C#, or JavaScript code. Furthermore, declarative markup is inherently easier to devise and design for authoring tools such as Visual Studio®. From an architectural standpoint, by using a declarative approach you indicate what page elements will do, but not how they will do it. In this way, you create an additional abstraction layer.
The first concrete programming environment to take advantage of such a model was ASP.NET, starting with version 1.0. As most Web developers should know by now, an ASP.NET page is typically written in one or two files: an .aspx markup file and, optionally, a code-behind file. The code-behind contains a class file written in any supported programming language, though typically Visual Basic or C#. The .aspx markup file contains HTML tags, ASP.NET control tags, and literals that form the structure of the page (it can also contain code). This text is parsed at run time and transformed into a page class. Such a page class, combined with the code-behind class and some system-generated code, comprise the executable code that processes any posted data, generates the response, and sends it back to the client.
While the overall model is known to the vast majority of ASP.NET developers, a number of black holes exist that only a small group of developers understand with much depth. MSDN®, books, and online articles explain single aspects of the page machinery, but an overall and unified coverage of the page internals is still lacking. If you take a look at the HTML source code of an ASP.NET page you see a number of hidden fields and automatically injected blocks of JavaScript code that you may hardly make sense of. However, these fields and blocks contribute to make the Web page work. In this column I'll analyze the client-side source code that ASP.NET pages generate. I'll cover hidden fields such as the well-known view state, but also little known ones such as control state, event validation, event target, and argument and system-provided script code.
Much of the implementation details I cover here are specific to the current version of ASP.NET. These details could change in the future (they have changed in the past), and you shouldn't build any production code that depends on any undocumented details.
Figure 1 shows a minimal but working ASP.NET page. Despite its extreme simplicity, this is a good sample as it includes typical elements of a real-world ASP.NET page-input fields, clickable postback elements, and read-only elements.
The .aspx page contains three server controls: a textbox to capture data, a Submit button to start a post operation, and a label to display read-only data. On top of the .aspx file, the Page directive defines some global attributes for the individual page. Let's take a look at the most commonly used attributes of the Page directive, such as those that you saw in Figure 1.
<%@ Page Language="C#"
AutoEventWireup="true"
CodeFile="Test.aspx.cs"
Inherits="Test"
%>
Most of the Page directive attributes have limited effect on the page markup, the HTML code that the browser receives with the HTTP response. Rather, most Page attributes affect the code of the dynamically generated page that the system builds on top of the .aspx markup and code-behind class. The Language attribute designates the language used to author the code-behind in Visual Studio. The system will use the same language to generate the dynamic page class to serve the browser request for the .aspx resource. The CodeFile attribute indicates the source file where the code-behind class is stored. The Inherits attribute indicates the name of the code-behind class in the code file that should be used as the parent of the dynamically generated page class. Finally, the AutoEventWireup attribute indicates whether a default naming convention should be used to map handling code to Page events. When AutoEventWireup is set to true, you can add a Page_Load method to the code file to handle the page Load event, and it will automatically be registered with the Page's Load event. The implicit naming convention dictates that the event handler will take the form of Page_XXX, where XXX can be the name of any public events defined on the Page class. If AutoEventWireup is set to false, you must explicitly bind the Page class event with its handler. You can do that in a made-to-measure class constructor:
public partial class Test : System.Web.UI.Page
{
public Test()
{
this.Load += new EventHandler(Page_Load);
}
...
}
When the Web server receives an HTTP request for a given .aspx resource, it forwards the request to the ASP.NET worker process. The process hosts the CLR, inside of which a runtime environment is created to process ASP.NET requests. The ultimate goal of the ASP.NET HTTP runtime environment is serving the request-that is, obtaining the markup (HTML, WML, XHTML, and whatever else the app is supposed to return) which will be embedded in the HTTP response. In charge of returning the markup for the request is a special system component known as the HTTP handler.
The HTTP handler is an instance of a class that implements the IHttpHandler interface. The ASP.NET framework comes with a few predefined HTTP handlers to serve particular situations or to act as a base class for other and more specialized requests. The System.Web.UI.Page class is one of the most complex and sophisticated built-in HTTP handlers in ASP.NET.
Each ASP.NET request is mapped to an HTTP handler. Suppose that a client browser places a request for a page named test.aspx. The request is passed to ASP.NET and processed by the HTTP runtime. The runtime determines the HTTP handler class to serve the request through a page handler factory. If is the correct handler is not yet available in the AppDomain, it is created dynamically and stored in the ASP.NET temporary folder on the Web server machine. For a page named test.aspx, the HTTP handler is created as a class named ASP.text_aspx.
The dynamic creation of the HTTP handler class for a given request is a process that takes place only once per page, the first time that page is requested in the application lifetime (although when batch compilation is used, the handler can be generated on the first request for any page in the application). The dynamically created assembly is invalidated and replaced if the application is restarted or if the page source is modified on the Web server. Figure 2 shows the hierarchy of page classes from the base Page class down to the dynamically generated class to serve the user request.
Figure 2 Hierarchy of Page Classes (Click the image for a smaller view)
Figure 2 Hierarchy of Page Classes (Click the image for a larger view))
The ASP.NET runtime creates the Visual Basic or C# source code of the dynamic page class by parsing the source code of the corresponding .aspx file. Each tag with runat="server" is mapped to a server control instance. Any other text is mapped to a literal control and emitted verbatim. The Register directive, if any, helps to resolve tags pointing to non-standard controls. The markup returned to the client browser is composed by accumulating the markup that each server control in the page emits. Note that each page generally emits markup, usually HTML markup. However, this is not a requirement, and an ASP.NET page can output any data it wants.
Analysis of the HTML Client Code
Figure 3 shows the HTML output for the sample page in Figure 1. In the HTML there's no clue that a Page directive existed in the server-side .aspx page. Instead, the !DOCTYPE directive is copied verbatim. The first runat="server" block in Figure 1 is the <form> tag. This means that any text in between Page and <form> is emitted verbatim. In the source code of the dynamically created page class on the server, this text is converted into a single instance of the LiteralControl class. The <form> tag is emitted like this:
<form name="form1" method="post" action="Test.aspx" id="form1">
The <form runat="server" …> tag is rendered through an instance of the HtmlForm class. The control class has no property to let you set the action attribute on the output markup. The action attribute is hardcoded to the URL of the current page. This behavior is at the foundation of the ASP.NET platform. Note that the ID attribute is partnered with an identical name attribute.
The <asp:textbox> tag is rendered in HTML through an <input type="text"> element. In this case, a name attribute is added to match the original ID attribute. Note that if you omit the ID attribute you may receive a warning from Visual Studio 2005, but ASP.NET will still compile the page successfully. If the ID attribute is missing, a random string is generated and bound to the name attribute. The <asp:Button> tag is rendered through an <input type="submit"> button. An <asp:Label> tag will render the HTML <span> tag to the client browser.
In most cases (though not in all), each tag decorated with the runat="server" attribute generates a corresponding block of HTML markup. The ID string guarantees a persistent match between the two blocks-one on the client side and one on the server side. As you can see in Figure 3, a couple of hidden fields complete the HTML markup: __VIEWSTATE and __EVENTVALIDATION.
The contents of the __VIEWSTATE field represent the state of the page when it was last processed on the server. Although sent to the client, the view state doesn't contain any information that should be consumed by the client. The information stored in view state is pertinent only to the server page and some of its child controls and is exclusively read, consumed, and modified by the server.
Implemented in this way, the view state doesn't consume any critical server resources and is fast to retrieve and use. On the other hand, just because the view state is packed with the page, it inevitably increases the size of the HTTP request and response by a few kilobytes. Note that a realistic page padded with a grid of data can easily reach a view state size of 20KB. This extra stuff is uploaded and downloaded each and every time. The view state is one of the most important features of ASP.NET because it enables stateful programming over a stateless protocol such as HTTP. Used without strict criteria, though, the view state can easily become a burden for pages.
By overriding a couple of methods on the code file class, you can leave the contents of the view state field on the server, stored in a database, in the Cache or in the Session object. However, note that leaving the view state information on the server is not the obvious workaround it first appears. It's not by chance, in fact, that the ASP.NET team opted for a page-based view state. A server-based view state is fine as long as user navigates from one page to the next following the links in the application. Remember that ASP.NET applications work by posting repeatedly over the same page. But what if the user clicks the Back button? To be safe, you should maintain view state on a per-request basis rather than on a per-page basis. And the chain of tracked requests should be as long as the requests the user can reach through the Back and Forward buttons. View state stored on the client may not be perfect, but neither is view state stored on the server. The one that's preferable for your application depends on the expectations you have for it.
In ASP.NET 2.0, the __VIEWSTATE hidden field contains two types of information-view state and control state. Developers can disable view state altogether and operate their applications in a pure stateless manner. This is not an issue as long as you use built-in controls and controls that you wrote yourself, or at least controls for which you have access to the source code. What if you use a custom control that assumes an enabled view state? Some controls-typically, rich third-party and custom controls-need to persist private information across postbacks. This information is not public and not designed to be exposed to the application level-for example, the collapsed/expanded status of a dropdown panel. This information can only be persisted to the view state. If the view state is disabled, the control may inadvertently fail.
To alleviate this issue, ASP.NET 2.0 introduces the notion of the control state. Each server control can pack any critical properties to a collection and store it to the page's control state. The control state is saved to the __VIEWSTATE field but, unlike the traditional view state, can't be disabled and is always available. Developers manage the control state through a pair of new overridable methods on the Page class: LoadControlState and SaveControlState. Speaking of the view state in ASP.NET 2.0, though, it is also worth noticing that a new and more effective serialization algorithm is employed to streamline the state of individual controls to a hidden field. As a result, the overall size of the __VIEWSTATE hidden field in most cases is as small as half the size of the corresponding field in ASP.NET 1.x.
As mentioned, the view state is stored in a hidden field to associate it unambiguously with a particular page request. When any of the HTML elements in a given page instance post back, the dynamically generated page class starts working on the server and uses the data stored in the view state to recreate the last known good state for the controls in the page. What if the view state is tampered with on the client? Is that ever possible? By default, the view state is encoded using the Base64 schema and hashed, and the resulting hash value is also stored with the view state. The hash value is calculated from the contents of the view state plus a server key. Whenever the page posts back, the code in the page class separates the contents and hash value of the view state. Next, it recalculates the hash value based on the retrieved view state contents and server key. If the two hash values don't match, a security exception is thrown (see Figure 4).
Figure 4 Page View Can't Be Altered on the Client (Click the image for a smaller view)
Figure 4 Page View Can't Be Altered on the Client (Click the image for a larger view))
What if a malicious user attempts to post a fake request with a modified view state? The malicious user would need to know the server key in order to generate a hash value on the modified view state contents that can be matched on the server. The server key, though, is made of server-only information and is not included in the view state field. The tweakviewstate.aspx page in the companion code contains script code to modify the view state and practice with the exception shown in Figure 4.
Although the view state can hardly be used to plan an attack, it doesn't guarantee data confidentiality unless encryption is used. The contents of the view state, in fact, can be decoded and examined on the client, but not successfully modified to serve an altered page state to the server environment.
The __EVENTVALIDATION hidden field is a security measure new to ASP.NET 2.0. The feature prevents unauthorized requests sent by potentially malicious users from the client. To ensure that each and every postback and callback event originates from the expected user interface elements, the page adds an extra layer of validation on events. The page basically matches the contents of the request with the information in the __EVENTVALIDATION field to verify that no extra input field has been added on the client and that value is selected on a list that was already known on the server. The page generates the event validation field during rendering-that is at the last possible moment when the information is available. Like the view state, the event validation field contains a hash value to prevent client-side tampering.
Controls use the RegisterEventForValidation method on the ClientScriptManager object to store their own information for safe postbacks. At a very minimum, each control registers its own unique ID. List controls also store all the values in the list. Server controls that support event validation typically call the ValidateEvent method in their implementation of the IPostBackDataHandler interface. If the validation fails, a security exception would be thrown.
You can enable and disable event validation on a per-page basis; each control class enables event validation through the SupportsEventValidation attribute. Currently, there's no way to enable or disable event validation on a particular control instance.
Event validation is a defense barrier aimed at limiting input to a known set of values. It simply raises the security bar higher and doesn't stop script injection attacks by itself.
Event validation may pose issues if used in the context of AJAX-enabled applications. In such applications, some client work can create new input elements on the fly, thus making the next postback fail because of unknown elements. The best workaround is to render any user interface on the server whenever possible, and hide it on the client using the cascading style sheets display attribute. In this way, any user interface you're going to use is registered with the event validation field. If you write custom controls, you should decorate it with the SupportsEventValidation attribute to enable this feature.
The ASP.NET page in Figure 1 posts back as the user clicks the button. This is because the <asp:Button> tag renders as an HTML submit <input> element. When a submit input field is clicked, the browser fires the onsubmit HTML client event and then prepares the new request to the server based on the contents of the submitted form. The HTTP request being sent includes an additional piece of information that evaluates to the ID of the button.
The page class scans the body of the HTTP request to see if any of the posted fields matches the ID of a button control in the ASP.NET page. If the match is found, that button control is called to run any code associated with its Click event. More precisely, the page class checks to see if the matching button control implements the IPostBackEventHandler interface. If so, it invokes the RaisePostbackEvent method on the interface. For a button control, the method raises the server-side Click event.
So far, so good. But what if the page contains a LinkButton control instead? Figure 5 shows the markup for an ASP.NET page that is identical to the page in Figure 1 except that a LinkButton is used instead of the Submit button. As you can see, the markup includes two more hidden fields, __EVENTTARGET and __EVENTARGUMENT, and a bit of JavaScript code. The href target of the link button is bound to the __doPostback script function meaning that the function will be invoked whenever a client click on the element is detected. The __doPostback function is emitted in the page by the rendering code of the LinkButton control. It populates the __EVENTTARGET and __EVENTARGUMENT fields with proper information and then triggers the postback via script. In this case, the body of the HTTP postback request simply contains the input fields in the page and no posted data references the Submit button.
How does ASP.NET recognize the control responsible for handling the postback? When no controls referenced in the request body implement the IPostBackEventHandler interface, the page class looks for the __EVENTTARGET hidden field, if any. The contents of the field is assumed to be the ID of the control that caused the postback. If this control implements the IPostBackEventHandler interface, the RaisePostbackEvent method is invoked. For a LinkButton control, this results in the invocation of the Click server event.
The .aspx markup defines the layout of an ASP.NET page and determines size, style and position of constituent controls. It contains no logic, however, except perhaps for some client script code and any Visual Basic or C# inline code you may have. Initialization code, event handlers, and any helper routines typically go in a separate companion file, known as the code-behind file:
public partial class Test : System.Web.UI.Page
{
protected void Page_Load(object sender, EventArgs e)
{
...
}
protected void Button1_Click(object sender, EventArgs e)
{
...
}
}
The class in the code file inherits, directly or indirectly, from System.Web.UI.Page. The code file and markup represent required but distinct pieces of information. To fully represent the ASP.NET page, they must be combined to form a page class that incorporates the logic of the code file and the layout data of the markup file. The code file class is already a page class, but it lacks two key pieces of information: the list of child server controls to populate the user interface and the declaration of class members that identify the various server controls.
In ASP.NET 1.x, each time the page author drops a control onto the Web Form, Visual Studio .NET 2003 automatically adds a new line to the code file to create a class member that handles the just dropped server control. This does a pretty good job of keeping everything in sync, but often developers run into compile errors due to the lack of a class member or existence of useless class members.
In ASP.NET 2.0 the issue is fixed in an elegant way. Enter partial classes, a source-level, assembly-limited, non-object-oriented way to extend the behavior of a class. In the .NET Framework 2.0, a class definition can span over two or more files. Each file contains a fragment of the final class definition and the compiler takes care of merging the various partial definitions to form a single, unified class. All fragments must have the same signature and the final class definition must be syntactically correct.
Next, a second partial class is generated dynamically to list all control members. The two partial classes are merged at compile-time. When the .aspx markup file is parsed to create the temporary ASP.test_aspx class, this class inherits from the combined code file in its final version. If the ASP.NET page is not bound to a code file but contains its code inline, then the dynamic page class inherits from System.Web.UI.Page and includes any inline code in its body.
There's a lot more to learn about the dynamic page compilation machinery, but this provides fodder for a future column.
0 Comments:
Post a Comment
<< Home