COMPSCI - Summer 2009: June 2009

Wednesday, June 24, 2009

Using JQuery UI Tab with ASP.Net UpdatePanels

With advent of AJAX, web development has never remained the same. Developers (and business) are demanding more and more win-form like UI interaction from web applications (Think Web 2.0). To support that need, ASP.Net has come up with the "AJAX Control Toolkit" which provides a rich set of various UI controls and widgets like Tabs, Calendar, Datepicker etc. These are all fantastic AJAX controls but are all server based. It would be simply great if these were completely client side controls. As if on cue, enters j Query UI which provides advanced effects and high-level, themeable widgets, built on top of the jQuery JavaScript Library, that you can use to build highly interactive web applications.

One of the widely used JQuery UI widget is the "Tab" control which allows you to easily separate contents. It is also a great way to effectively use the "real estate" on the web page. Another advantage of using a client side "Tab" control is that it will allow you to have more than one form on the page if you need. With ASP.Net AJAX 'Tab' you cannot have "multiple web-forms".

So when do we need "multiple forms" ? Think about a "Maintenance" option in a web-application e.g. a maintenance page which allows to add users, roles, maintain reference table data etc. Each functionality needs the data to be posted back to the server which immediately brings us to the point where we want that to happen asynchronously or "AJAX style". There are two ways to achieve this goal:

We can use JQuery's AJAX capabillities
Use ASP.Net UpdatePanels

With ASP.Net UpdatePanels we have to code much less (ASP.Net does all the work for us) than if we used JQuery AJAX (though w.r.t performance UpdatePanels are worser than pure JavaScript AJAX). I also want to point out the fact that it is okay and feasible to mix both client and server side AJAX controls to get the best of both the world.

Let's look at an example of a simple JQuery UI Tab markup using ASP.Net UpdatePanels:

  <div id="tabsMaintainTables">
          <ul>
              <li><a href="#tab-Owners">Owners</a></li>
              <li><a href="#tab-Status">Status</a></li>
              <li><a href="#tab-Roles">Roles</a></li>
          </ul>
          <div id="tab-Owners">
            <asp:UpdatePanel runat="server" ID="TableMaintStatusUpdatePanel" UpdateMode="Conditional">
            <ContentTemplate>
            ...
            </ContentTemplate>
            </asp:UpdatePanel>
          </div>
          <div id="tab-Status">
            <asp:UpdatePanel runat="server" ID="TableMaintStatusUpdatePanel" UpdateMode="Conditional">
            <ContentTemplate>
            ...
            </ContentTemplate>
            </asp:UpdatePanel>
          </div>
          <div id="tab-Roles">
            <asp:UpdatePanel runat="server" ID="TableMaintRolesUpdatePanel" UpdateMode="Conditional">
            <ContentTemplate>
            ...
            </ContentTemplate>
            </asp:UpdatePanel>
          </div>
  </div>

The JQuery code to display the 'Tabs' is as follows:

 $(function() {
     $("#tabsMaintainTables").tabs();    
 });

So far so good. Now when we submit the web-form, the UpdatePanel packages the full postback as an AJAX call for us. Once the AJAX calls completes, the page basically loads again (as it would normally do). This causes the 'Tab Index" to get reset i.e. if the form was submitted when the 3rd Tab (in our case Roles tab) was selected, after the post-back the selected tab-index is the 1st Tab.

So the obvious question is how can we maintain the selected tab-index of jQuery UI tabs after the ASP.Net UpdatePanel post back completes?

One advantage of using ASP.Net AJAX Control Toolkit's 'Tab' control over jQuery UI 'Tab' is that if you use UpdatePanel with the former, support for maintiang the 'tab-index' after the post back is provided out of the the box. No extra code needed. Unfortunately when using jQuery UI Tab we need to do perform the following extra steps to achieve the same functionality:

Trap the client side events when the ASP.Net AJAX request begins and ends
In the begin request event handler save the selected tab-index in a global variable
In the end request event handler, initialize the jQuery UI tab control and use the "saved index" to set the focus to proper tab

Trapping the client side events:

   Sys.WebForms.PageRequestManager.getInstance().add_endRequest(EndRequestHandler);
  Sys.WebForms.PageRequestManager.getInstance().add_beginRequest(BeginRequestHandler);

Saving the currently selected tab-index at the beginning of the AJAX request:

 //Global variable
 var selectedMaintenanceTabIndex;
 
 function BeginRequestHandler(sender, args) {
 ...
 var maintenancetabs = $("#tabsMaintainTables").tabs();
 selectedMaintenanceTabIndex = maintenancetabs.tabs('option', 'selected');
 ...
 }

Retreiving the saved selected tab-index at the end of the AJAX request and setting the tab index properly:

 function EndRequestHandler(sender, args) {
 ...
 var maintenancetabs = $("#tabsMaintainTables").tabs();
 maintenancetabs.tabs('select', selectedMaintenanceTabIndex);
 ...
 }

And that's it! This is all there is to maintain tab-index with jQuery UI tabs when using it with ASP.Net UpdatePanels.

Thursday, June 18, 2009

Understanding the concept of ‘unlimited’ or ‘infinite’ email storage space

In 2005, Google made an astonishing announcement that it would keep increasing GMail’s email storage by the second as long as it had enough space on its servers. Currently GMail provides more than 7338 MB of free storage. It is indeed intriguing as to how it can provide this ‘infinite’ storage space for its entire subscriber base. Is it at all possible? After all there is only so much ‘finite’ storage space available. So how can we get to ‘infinite’ storage?

If you search you will see discussion and speculation galore on nature of its physical servers, amount of storage (in petabytes) it owns and type of storage like holographic storage, network and distributed storage it might be using. Discussion also revolves around the fact that most of the email users will use less than 25%-30% of the available storage and so we will technically never run out of physical space ever or that Google will keep buying servers to keep space with the demand.

But for some reason there is not much discussion or information on the very interesting math and science behind this concept. So I have decided to talk about the simple mathematical model that can be used(based on my understanding and I am no math whizkid!) to describe this ‘unlimited’ growth of the storage space. If you have observed carefully, it is interesting to note that storage space on GMail kept increasing at faster space in the beginning and then it started to slow down considerably.

“On October 12, 2007 the rate of increase was 5.37 MB per hour.

Approximately a week later, the rate decreased to 1.12 MB per hour, on January 4, 2008 further down to about 3.35 MB per day, or 0.14 MB per hour, and in October 2008 further down to about 353.9 KB per day.” – Wikipedia

How can we achieve this kind of behavior? i.e. write a software program (aka an algorithm) that will start with an initial value and then begin incrementing the value at a fast rate for some period of time and then start to slow down (or increment at a slower rate).

To answer that question we need to understand the concept of ‘Function Growth’ – which in simple terms can be described as the rate at which the value of any given function grows in relation to the function’s current input value. And different family of function grows at a different rate e.g. you can have constant growth O(1), linear growth O(n), exponential growth O(2^n), logarithmic growth O(log n) etc. Of these, logarithmic growth is the one which we are most interested in for our case. Why? That’s because growth rate of a log function is very similar to the growth rate observed in the ‘unlimited’ growth of the email storage.

What I am going to do next is to create a program that simulates the ‘unlimited’ growth of email storage. Let us make some basic assumption first. We will assume our initial storage starts at 5000 MB (5 GB). We will increase the storage every second by some ‘factor’. The simulation will run until the storage reaches 10000 MB (10 GB). We will then observe how long it takes to reach from 5GB to 10 GB.

Since we are simulating the growth “every second” we would consider total “seconds” there are in a day (which is 1 * 60 * 60 * 24). So we will start from 1 and once we have reached 86400th second, we will consider a day has gone by and again start from 1 second. We will use the following function: fn = c*[log(s)/(s*d )] where c = is some constant, s=each second and d=current day.

The simple code is as follows:

And if you run the program you will see the following output:

The first column is the "day", second column shows the "storage size" at the end of day, third column displays the daily growth while the last depicts the over all growth. If you now plot a graph of Day vs Size you will get something like this:

Can you see now what's going on? Starting from day 1, it will take about 1300 days i.e. approximately 4.5 years to reach 10GB. By changing the value of the constant 'c', you can control the overall rate. We also notice that the storage grew by almost 60% (upto 8000 MB) in the first 60 days. Then it slowed down considerably and grew at a much slower pace.

So effectively what we are seeing is that though the growth happens every second giving the illusion as if we are marching towards infinity in practical terms we could take years before we run out of physical strorage space. And who knows by then we might have found a way to really have infinite storage.

Friday, June 12, 2009

ASP.NET 2.0 Master Pages - accessing server side control from JavaScript

Master pages are the latest and greatest addition to ASP.NET 2.0. It helps us build consistent and maintainable user interfaces. But as with every new thing, they are not without their gotchas. One of the most often faced problem is with accessing the server side controls from client side javascript. This is because both the MasterPage and Content controls are naming containers. Naming container is any control that carries the INamingContainer interface and one thing a naming container does is to mangle its children’s ClientID property.Mangling ensures all ClientID properties are unique on a page.

Let us consider the following simple Master-Content page:

So for instance, the ID for our Label control is “MyLabel”, but the ClientID of the Label is "ctl00_BodyContent_MyLabel". Each level of naming container prepends it’s ID to the control (the MasterPage control ID in this form is ctl00). So now if we try to access this control from Javascript with client side script functions like document.getElementById(), it will fail with a JavaScript error: " 'MyLabel' is undefined ".

In the content page, I have two server side controls. One is a Label and another is a Button. On every click of the button I want to display the current time in the label and I want to do it without post-back (obviously!!!) i.e. from client side javascript. So I write a javascript function called SetTime() and attach it to the 'onlick' event of the button.In the SetTime() function we calculate the current time and then set the value to the label control.

As I mentioned earlier if we do MyLabel.innerHTML = _curtime or document.getElementById(MyLabel).innerHTML = _curtime, in both cases we would get the 'MyLabel' is undefined error.

So how to fix the problem?

Solution 1: One possible solution is to directly use the generated client side ID of the Label control.

ctl00_BodyContent_MyLabel.innerHTML =_curtime ;

But as you can see this is really bad practice as we’d never want to hardcode the client ID into a script. Typically we should build the SetTime() javascript function dynamically using StringBuilder or String.Format and emit the complete client script with the ClientScript.RegisterStartupScript() function.
This approach works but only if your functions are small and simple.As your javascript functions becomes more complex this apprach falls apart.

Solution 2: Another alternative is to extend the first approach with use of markers in the script and use a call to String.Replace. Essentially we'll create a client side variable containing the control's ClientID value as follows:

Protected Sub Page_Load(ByVal sender As Object, ByVal e As EventArgs) Dim MyLabelID As String = "var MyLabelID = ""{}"";" MyLabelID = MyLabelID .Replace("{}", MyLabel.ClientID) ClientScript.RegisterStartupScript( Me.GetType(), "ClientID", MyLabelID, True) End Sub

Now we can use MyLabelID.innerHTML = _curtime or document.getElementById(MyLabelID).innerHTML = _curtime with out any problem.

So then for all the server side controls which you want to access from client side you can use approach 2 and create a corresponding JavaScript variable and use it in your client script
But what if you have 50 server side controls in your page ? Or you add 50 new server side controls to your page and want to access either all or some of them from client side?Maintenance becomes increasingly difficult as you have to remember to repeat the steps in approach 2 for every server side controls added.

Can we do better? Sure we can.

Solution 3: Let us write a function that will find all the controls in the page and automatically create the corresponding client side javascript variable with the same name.

Call the above function on Page load event. Now if you add another server side label control and name it "MyLabel2" you can directly access it from JavaScript (without the quotes).

Wednesday, June 10, 2009

Service based, on-demand web forms with AJAX, JSON and JavaScript

If you are thinking what's with the weired title, then you are right. I am yet to come up with a better name of this design approach that I am going to talk about. But before that let me start with a real world example of a web-application.

Imagine you are part of a distributed project team responsible for building a production enabled, web-based service desk application in ASP.NET. One of the many features of the application is to allow users to create new tickets. In the create new ticket page users need to select from a drop down the type of service they want and based on their selection different set of forms with validation will be displayed on the page for the user to fill up and submit. Finally, after the form is submitted, it would gather all the information and create an HTML formatted email and send it to the service owner.

How would you go about implementing this functionality?

One simple way to do it is to create separate web-forms for each type of service and redirect to the appropriate page depending on the user's selection of the type of service. Since the requirement is to have the forms be displayed within the same page we need to ensure that the look and feel of all the web-forms are consistent. Easiest and recommended way to achieve that will be to make use of Master pages and style sheets.
Another option will be to create seperate DB table in line with the form structure and store the information there. Then build the form on the fly based on that structure and show it to the client
We could also create an associative table (key,val) where the value of the rows are column names depicting the fields in the form.

Simple enough but do you see the problem with either of the above approaches?

What happens when a form changes from one release to another ?
What if we need to change any of the forms mid-release?
What if we want to add a new form to a type of service once the application is in production?
What if the team designing the forms are different from that application development team and only know HTML, JavaScript and CSS?
What if we want to change the client side validation logic?
What if the HTML display of the email needs to be changed?

For each of the above scenario every change to the form requires code change, database table tructure changes, testing cycles, build, deployment and a separate release just to put through an UI change. This could eventually lead to slower go-to-market time and be detrimental to the business. We need a way to be able to handle all the scenarios (1-6) more effectively. Ideally we would want team responsible for maing changes to the forms to be able to log in to the application, go to a maintenance section and make necessary changes and publish the form without having to make any "code change". So how to achieve this goal?

The Design:

The basic design is very simple. Instead of physically creating the forms and storing them on the web-server we store the forms in the database. Then when the client requests for a particular form for a selected type of service, we pull the necessary information from the database (making an AJAX call) and show it to the user.

Now you may ask, ok, displaying the form to the client is one thing (simple enough) but how about (i) handling form elements actions like click of a button, selecting a drop down, initializing some part on load on the client (ii) overall form validation before being submitted, (iii) reading the values of the form elements and (iv) creating the HTML display for the email?

First, let us examine the table structure where we store the forms. Let the table name be 'FormTable' which has the following columns (left out some of the columns for sake of brevity):

[frm_svc_id] [int] NOT NULL,

[frm_input_elements] [varchar](max) NOT NULL,

[frm_display] [varchar](max) NULL,

[frm_name_map] [varchar](max) NULL,

[frm_actions] [varchar](max) NULL,

[frm_validation] [varchar](max) NULL

frm_svc_id = The id of the service for which we need the form

frm_input_elements = The input form's HTML structure and elements

frm_display = The output HTML form that needs to be displayed after the form is submitted

frm_name_map = The display names of the input form fields

frm_actions = Event handling functions

frm_validation = The form validation function before submission

Let's look at an example what goes in [frm_input_elements]:

 <table>
  <tr>
    <td>Date of Service</td>
    <td><input type="text" name="frmdos"/></td>
  </tr>
<tr>
  <td>Environemnt</td>
  <td>
  <select nme="frmselect">
    <option value="DEV">DEV</option>
    <option value="PROD">PROD</option>
  </select>
  </td>
  <td>
  <input type="text" name="frmselectedenv"    
  value=""/>
  </td>
</tr>
<tr>
  <td>
  <input type="button" name="frmsubmitbutton"  
  value="Submit"/>
  </td>
</tr>
</table>

Now we want to add the following functionality to the form. On selecting the value from the 'frmselect' drop-down, display the selected value in the 'frmselectedenv' textbox. To do this all we need to do is to attach a function to the "onchange" event of the 'frmselect' dropdown. The function code will be something like this:

 function dowork()
{
var selectobj = document.getElementById("frmselect");
var txtobj = document.getElementById("frmselectedenv");
txtobj.value = selectobj.options[selectobj.selectedIndex].value;
}

So how can we store this information? Enter JSON. The [frm_actions] column holds data in the following JSON structure:

[{id:'',eventtype:'',action:''},{id:'',eventtype:'',action:''},...,{EOF:true}]

So in our example, frm_actions will be as follows:

 [
 {id:'frmselect',
  eventtype:'onchange',
  action:'var selectobj = document.getElementById("frmselect");
              var txtobj = document.getElementById("frmselectedenv");
              txtobj.value = selectobj.options[selectobj.selectedIndex].value'
  },
  {EOF:true}
]

Similarly, the [frm_name_map] column, which is used to display the data after the form is submitted, holds data in the following JSON structure:

[{SystemFieldName:'',DisplayFieldName:''},...,{EOF:true}]

In our eaxmple it will be as follows:

 [
{ SystemFieldName:'frmselectedenv',
  DisplayFieldName:'SelectedEnvironemnt'
},
{ SystemFieldName:'frmdos',
  DisplayFieldName:'Service Date'
},
 {EOF:true}
]

We are all almost done. Now all is needed is to retrieve the JSON data for the form, display it and attach the proper event handlers, set up the vlaidation (if avialable) and read the form vlaues before submission. Enter JavaScript and JQuery.

The Implementation

The AJAX call to retrive the form's information willl return the following JSON object from the server side:

({frmelements:'',frmvalidation:'',frmdisplay:'',frmnamemap:'',frmactions:''})

where:

"frmelements" <= [frm_input_elements]
"frmvalidation" <= [frm_validation]
"frmdisplay" <= [frm_display]
"frmnamemap" <= [frm_name_map]
"frmactions" <= [frm_actions]

On the client side the JavaScript function "GetForm()" is the main engine of this design approach. It makes use of Javascript's ability to create and attach "anonymous functions" to events. We need to be careful when working with anonymous functions functions is JS as it can lead to memory leak.

 //Global definition
var objArr = new Array();
var frmformdatadisplay = "";
var frmnamemap = "";
var frmFormValidation;

function GetForm(svcid)
{
var selectedid = document.getElementById(svcid).value;           
//Clear up the array to prevent memory leaks
for (i = 0; i < objArr.length; i = i + 3) {
if ((objArr[i] != null) && ($get(objArr[i]) != null)) {
                $get(objArr[i]).detachEvent(objArr[i + 1],   
  objArr[i + 2]);}
  objArr[i] = null;
  objArr[i+1] = null;
  objArr[i+2] = null;
}
PageMethods.GetForm(selectedid, GetFormSuccess, GetFormFailed);
}

function GetFormSuccess(result)
{
var frmobj = eval(result);
if (typeof (frmobj) !== 'undefined')
{
  $get("divfrm").innerHTML = frmobj.frmelements;
  var j = 0;
  for (i = 0; i < frmobj.actions.length; i++) {
  var actionobj = frmobj.actions[i];
  if (actionobj.id != "")
  {
    objArr[j] = actionobj.id;
    objArr[j + 1] = actionobj.eventtype;
    objArr[j + 2] = new Function(actionobj.action);
    $get(actionobj.id).attachEvent(actionobj.eventtype, objArr[j + 2]);
    j += 3;
  }
  else
  {
  //id = "" ==> action is not specific to any element. so execute it right away
  if (actionobj.action != "") {
    new Function(actionobj.action)();
  }
}
}
  frmformdatadisplay = frmobj.frmdisplay;
  frmnamemap = frmobj.frmnamemap;
  frmFormValidation = new Function(frmobj.frmvalidation);   
}
}

And that's it! The only thing I left out is the parser to parse the form and extract the form fields and get the corresponding values and storing them. It's easy enought to iterate through the form element collection, and for each type of input element get the value and store it in a JSON object and save it in the datastore.

As with every thing in life, this design is not a one-solution-fits-all. It has its own drawbacks. With this aproach you loose the ability to store data on the server and use SQL Query to generate reports and perform searches. Also we are putting lot of functionality and processing on the client which could prove fatal. Moreover, you need write more code in contrast to other standard solutions and there will be learning curve involved for maintainng the forms.

But if flexibility is your goal, you do not want to go through a release cycle to push in a change, the form structures will be modified often enough by businesss and you can live without having to do SQL Query then this desing could prove beneficial.

In the next part, I'll explain how to create a Web 2.0 rich interface using JQuery to edit and manage these forms.

Tuesday, June 9, 2009

ASP.NET MVC 1.0 - Are you ready for it?

ASP.NET MVC v1.0 is finally here! Today it is available as an add-on to ASP.NET 3.5 SP1. One thing that has been missing from the ASP.Net suite is the out of box support for Model-View-Controller (MVC) design/architecture pattern. It is well known that with web development, MVC is the best way to go. With classic ASP it was impossible (almost) to achieve MVC pattern while with ASP.Net Web-Form it was somewhat better (separation of View and Controller/Model) but still it was too tightly coupled. Now with ASP.NET MVC one can truly achieve the benefits of this design.

But question is are you ready for it? ASP.Net MVC introduces a new paradigm in web development. It is an intrusive technology that can radically change the way folks develop web-based applications. Right now ASP.Net MVC is not a replacement for Web-Forms but rather another option. But what can you expect when you start developing with ASP.Net? Are there any surprises waiting for you. May be ...

First, the "ViewState" and "PostBack" concept of web-forms no longer holds good. Everything is either a GET or POST. So "State Management" is no longer out of the box but needs to be handled by the developer. This also means "Server Controls" are useless and no "GridView"!!!. You need to implement you own 'gridview' (though not a difficult task).
What about standard functionalities like "Paging", "Sorting", "In-Place Edit", "Styles" ? None of the features comes out of the box and needs to be implemented too. This necessarily means knowledge of "Extension Methods", "Lambda Expressions", "Query Syntax" is a must thereby increasing the learning curve and the go-to-market speed.
What about AJAX? There is no AJAX support out of the box (like UpdatePanel in web-forms). You need to make use of JavaScript/JQuery to implement AJAX functionalities. What about Security? Early adopters of ASP.Net MVC have raised concerns over inherent security flaws in MVC framework e.g. "Delete Link through GET", "POST Data tampering" to name a few. With regard to deployment, it supports IIS 7.0 Integrated mode by defualt. For older IIS like 7.0 classic and 6.0, to needs to be configured.

WOW! that's lot of "surprises"! You might be tempted to think ASP.Net MVC is not something you want to try. But keep in mind ASP.NET MVC 1.0 is just coming out of infancy. By the next release, due to the hard work put in by early adopters, it would have reach an acceptable level of maturity with lot of the standard functionality in place. In the long run, ASP.Net MVC has the potential to shorten the development schedule, gear everyone towards Test-Driven Development(TDD) and ease maintenance of large and complex projects.

On the flip side, there are few things I don't like about the ASP.NET MVC implementation because it reminds me of classic ASP!!! I'll elaborate more in my next post.

Before signing off, I must say, ASP.NET MVC provides you with all the control that you asked for. It is here to stay. But remember what uncle Ben said to Spidey :"with great power comes great responsibility".

Monday, June 8, 2009

Batch Updates in ADO.Net 2.0 - How to find the optimum batch size?

Before I explain how to go about estimating the optimum batch size to be used with ADO.NET 2.0 batch operations, I want to briefly touch upon the basics. if you want to skip the basics, click here

One the new features in ADO.NET 2.0 is the "Batch Updates" operation. It promises to improve performance of the application by reducing the number of round trips to the database. Prior to ADO.NET 2.0, if we made any changes to the DataSet and then saved it using the Update method of the SqlDataAdapter class, it made round trips to the database for each modified row in the DataSet. This was a major performance hindrance. So how does reducing database roundtrip improves perfromance?

Let's look at the following example:

Assume we are building a 3-tier application (client-webserver-database). The web-servers are located in Boston while the database servers are in California (an extreme example, but a practical one never the less). Also, in the application, we have a datagrid (associated with an underlying datatable) that displays records and through some specific operations (and user interaction) the records in the datatable are updated and we need to persist these changes back to the database. We are using ADO.NET, SQLDataAdapter and its Update method. Let us say there were 50 records were modified. Each individual Update operation takes 1 second. And each roudtrip to SQL server takes 1 seconds. So with ADO.NET 1.1, where we need to make 50 roundtrips to the database server, the overall operation will take 50 * (1+1) = 100 seconds

So how does the above scenario change with ADO.NET 2.0 ?

In ADO.NET 2.0, we now have a new "UpdateBatchSize" property which indicates the number of rows that are processed in each round-trip to the server. It can take the following values:

n=0 - There is no limit on the batch size
n=1 - Disables batch updating.
n>1 - Changes are sent using batches of 'n' operations at a time

So in our above example, let us set batch size(n) = 5. That means, In each round-trip to server 5 records will be processed. So now the overall operation will take (50/5 * 1) + (50 *1) = 60 seconds.
Let n=10 ==> (50/10 * 1) + (50 * 1) = 55 seconds
Similarly if n=50 ==> (50/50 * 1) + (50 * 1) = 50 seconds

So we see that in this simple contrived example, we easily get a performance boost of 40% - 50% with Batch Update mode. So the key question is what should be the "optimum batch size" that will provide us the maximum performance gain. Surprisingly, this is not a simple question to answer.There is no automated way to figure this out. Moreover, there are not much information out there to help us make an informed decission. This is what MSDN says - Executing an extremely large batch could decrease performance. Therefore, you should test for the optimum batch size setting before implementing your application..

So how to test and 'estimate' the optimum batch size for your own application? Note the word 'estimate'. That is exactly what we are going to compute. In order to do that we need to start with some raw data.

Let say an update operation (take the example I described above) takes 't' seconds to complete (includng the roundtrip time to database server and the web service call alogn with the actual update operation). Execute the operation multiple times, each time with different batch size 'S' and note down the time taken 't' in each case. Here's the pseudo code -

maxbatchsize = M;
batchincrement = 1;
for(S=0; S < M; j=j+batchincrement)
{
if (S!=1) {
t1 = starttimer();
ExecuteBatchUpdate(S, dt)//where S=batch size, dt=datatable with 'R' rows updated
t2 = endtimer();
t' = t2-t1;
LogInTextFile(S, t')//log the batch size and the correspondign time taken t'
}
}

Q: Why do we skip S=1?

Let say for first run, R=500, M=500 and batchincrement =1. That means in our simulation, we will have batch size from 0 to 500 (except 1) and the log file would look something like this:

Batch Time
----- ----
0 t1
2 t2
3 t3
.. ..
500 t500

For the next run R=500, M=500 and batchincrement = 10
Batch Time
----- ----
0 t1
10 t2
20 t3
.. ..
500 t50

For the next run R=500, M=500 and batchincrement = 25
Batch Time
----- ----
0 t1
25 t2
50 t3
.. ..
500 t21

The 'batchincrement' value decides the batchsize for the update operation and the total number of roundtrip to the database server. Ideally you would want to chnage the value of 'batchincrement' (from low to high i.e from 1 to 500) to control the batch size and execute as many runs as you can. Once you have collected the raw data, the exciting part begins.

Now for each run data do the following:

a. Do a scatter plot for Batch Vs Time (x vs y)
b. Then plot the line-of-best fit i.e. do curve fitting and add a trend line.
e.g. you can se a quadratic function (a+bx+cx^2) or a linear function
c. Find the minimum value from the line-of-best fit.

So how to perfrom the above steps? You can use Excel do it or you can use the free open-source statistical analysis software R. With R, its easy as writing a 20 line of R-code to do the above steps.Here's the program (uses a quadartic function for curve fitting for my sample data):

setwd("C:/")
Data<-read.table(file="BatchResult.txt",sep="",header=TRUE)
#par(mfrow=c(2,1))
Batch<-Data[,1]
Time<-Data[,2]
plot(Batch,Time,type="p",col="red",cex=0.5)
## Fit Quadratic Time=a+b*Batch+c*Batch^2
Batch.2<-Batch^2
fit.quadratic<-lm(Time~Batch+Batch.2)
print(summary(fit.quadratic))
coef<-fit.quadratic$coefficients
a<-coef[1]
b<-coef[2]
c<-coef[3]
print(coef)
Time.est<-a+b*Batch+c*Batch.2
lines(Batch,Time.est)
cat("B.hat at min time = ", -(b)/(2*c),"\n")

The output of the above program is shown below:

So say for Run1 we get optimum batch size value B1. Repeat the steps for all the other runs and we have something like this(Table A):

Run1 B1 (B1/R)*100
Run2 B2 (B2/R)*100
Run3 B3 (B3/R)*100
.......
RunN BN (BN/R)*100

where BN/R*100 proivdes the the % of the the total records being modified as the estimated batch size. Sort Table A by the value BN/R*100 and you will get an esitmate of what would be the "optimum range of the batch size" to use in your application.

So now for any new functionality you intrduce in your appliation that needs to use the ADO.NET 2.0 batch operations, you can find out the estiamted range of the optimum batch size which will provide maximum perfromance boost (no more blind trial and error!)

Thanks to Sourish for helping me with the above program

Select-Case vs If-Then-Else

What is the difference between ‘Select-Case’ and ‘If-Then-Else’ constructs? Ask anyone and most likely you will get the answer they are same as far as functionality goes. So why use one over the other? Again the readability of the code is given as the primary reason for selecting Select-Case over If-Else. So is there any other reason beside ‘readability’? What about performance? Let us consider the following:

Both codes generate the same output. So to see the difference we need to dive little deeper and look at the IL code that gets generated:

The left hand side is the IL code from If-Then-Else while on the right hand side is the IL code from Select-Case. Can you see the difference? Let me explain.
After loading the value of ‘i=15’ on to the evaluation stack (both cases), the If-Then-Else construct executes the instruction bne.un.s (i.e. branch if not equal to OR CMP and JMP in assembly language) 3 times. On the other hand the ‘Select-Case’ creates a ‘jump table’ i.e. a ‘lookup table’ using the switch instruction. The lookup table is 0-indexed. So instead of examining every single case statement separately, we can jump to the right case by simply calculating the offset into the address table.

Here’s how it works:
After the value i=15 is loaded on to the evaluation stack, the integer value 1 is pushed on the stack .Then the instruction sub is executed and the result is pushed on the stack. So in this case the value 14 is pushed to the top of the stack. Now the offset = 14 which gives the address IL_0055 and the execution pointer directly jumps to the specified location. From there it sets ‘j=6’ and prints the value of ‘j’.Similarly if we set i=2, then the offset = 1, which gives the address IL_0050 and execution pointer directly jumps to the specified location. From there it sets ‘j=6’ and then unconditionally jumps (br.s) to IL_0058. It then prints the value of ‘j’ and exits.

Now imagine if you have 50 comparisons to make. If you use If-Then-Else construct you would use 50 bne.us.s (i.e. CMP and JMP) instructions to get to the last match (worst case scenario). While with Select-Case you can jump to the last matching case with a single table lookup. So in terms of worst-case time-complexity we have replaced a O(n) algorithm with O(1) i.e. constant-time algorithm when using Select-Case.

So does this mean the Select-Case construct always performs better than If-Then-Else? There’s your ‘gotcha’! It turns out that the ‘address lookup table’ solution implemented by ‘Select-Case’ can be applied to almost all real-world scenarios with only the offset calculation becoming more complex. There’s only one exception to this rule:

If the cases are completely unrelated to each other i.e. the compiler fails to find a pattern in order to create the lookup table e.g.

As we see the resulting code does the comparison by examining every case separately i.e. executing the instruction bne.un.s 3 times essentially making it same as the If-Then-Else construct and we get an O(n) algorithm. So in this situation it’s no better than If-Then-Else.

What if we are comparing non-Integers e.g. double? Let’s look at the source code and the underlying IL code:

Again we see that with the same set of cases, when we did Integer comparison, Select-Case generated the lookup table. But with Double it does it the long way i.e. just like If-Else construct and thereby giving us O(n) algorithm.Now what if we are comparing strings? How does Select-Case match up to If-Then-Else? Let’s take a look at the following code:

Again both produce the same output; but what about their IL code?

Surprise! In both cases the compiler uses the instruction bne.us.s to do the individual comparison for each case. Select-case does not generate an ‘address lookup table’ in this scenario. So we get O(n) algorithm when doing string comparison with both Select-Case and If-Then-Else construct i.e.both are same perfromance wise.
Can we do any better when doing string comparison with Select-Case construct?. As it turns out we can do better by using Enumerations when doing string comparison with Select-Case. Let’s re-write the code as follows:

Here's the generated IL code:

As you can see when used with Enumeration, the compiler goes back to generating the ‘address lookup table’ for the Select-Case thereby making it an efficient O(1) algorithm. So it begs the question, if we use Enumeration to do string comparison will If-Else construct will it be as efficient as Select-Case? The answer is no (obviously !!!)
So next time when you are about to make a decision as to which construct to use Select-Case or If-Else, here's the comparison chart to you help you with that decision:

COMPSCI - Summer 2009