Why web programming is hard - and how to make it easy again

Update! Gabor Vitez has written Impostor (Freshmeat) which implements the ideas below!

Web programming is hard to do right

Creating a toy web service is easy. Creating a large robust and secure application is pure hell. Comprehensive software environments are for sale to help coders (WebSphere, Broadvision etc etc) - but why is it so hard in the first place? This documents attempts to set out the reason and also mentions the solution.

Insuring your car

Imagine that you need to get insurance for your car, and you go to an office to arrange for it. You enter the building and are directed to Desk 1, where an anonymous employee asks you for your name, but tells you to answer the question over at desk 2.

Befuddled, you go there, and you state your name. Desk 2 has a very identical anonymous employee that thanks you, writes down your name on a piece of paper and gives it to you. Furthermore, he asks you what kind of car you have, and if you have ever claimed insurance on other cars, but please mention the answer to Desk 3.

At Desk 3, you show your piece of paper with your name on it, and you tell them you have a 1975 car, and that only last year, you had an accident with your other car. They note the make of your car, and 'Had accidents' on your piece of paper, and send you of to Desk 92.

On arriving there, you see that this is the Vintage Car Insurance desk. They look at the piece of paper you brought, and tell you what insuring your car is going to cost, and that you are not going to get a discount. This is all written down on your piece of paper, and you are told to go to Desk 5 to settle the payment.

You head there, with your piece of paper, and pay the amount specified on it. Your car is now insured.

Asynchronous stateless programming

Does this sound the least bit convoluted to you? It is. But it is the way most web services operate these days. Because there is no permanent connection between you and the website you are visiting, each time you have performed part of an operation, your reappearance comes as a complete surprise to the webserver.

The next step in the process is determined by which desk ('url') you walk to, and what is written down on your piece of paper. Each separate part of the operation must make sure that you went to the intended desk, and re-read your piece of paper, to see who you are and what you want.

Besides being complicated, this is also error prone. What if you decide to change your piece of paper? You could easily modify your accident history, and get a huge discount. Or go one better and while heading to the payment desk, change the amount of money you need to pay!

The real world

The 'piece of paper modification problem' is real. Many merchant sites suffer from the 'choose your own price' problem mentioned in the previous paragraph. Clued programmers work around it. They don't store your data on a piece of paper they trust you with. Instead, they give you a token, and store all data on the webserver. You only carry the token around.

When you arrive at a desk, the piece of paper corresponding to your token is retrieved. This only allows you to fiddle with your token, which given proper mathematics, is not going to work - the webserver detects a bogus token, and refuses service.

This token technique is hard, however, and the problem remains that you are free to try what happens with your token over at other desks. Perhaps you can skip the appraisal desk, and neglect to mention your history of accidents, who knows. Each desk operates on its own.

How did we arrive at this mess?

Well, this is how webservers work! Each page is a desk, and each time you visit a script, it is started anew, with no information on what happened before. Static webpages contain no state - the famous 'index.html' may contain a clock, telling you the date, or perhaps some smartness in figuring out your preferred language - but is mostly static.

When the web became more dynamic, coders did not leave the 'dynamic page' paradigm. Instead, they improved on the 'piece of paper passing' technique. A lot of tricks were evolved, for example, a Desk can be programmed to refer to itself. This is important for error handling - a user may labour under the impression that his car dates from 1875 instead of 1975. In our story above, Desk 3 would have to check your answer and send you back to Desk 2.

This creates a large distance between user input and error checking, which complicates coding. A smarter desk would contain both input and error checking - it would give you your paper, and instruct you to come back to the same desk again.

But given the connection-less nature of the web, even when redirecting to the original desk does not save you from passing around pieces of paper. Each pageview is a whole new event!

That's just the way it works, isn't it?

The vast majority of web coders have never programmed anything else - this is due to the exponential growth of the web where newcomers will alway outnumber old hands by a large margin. To many of these newcomers, this 'event driven paper passing' technique may seem perfectly natural.

Old farts however who may have programmed console applications, or even used 'BASIC' on their microcomputers have a very hard time. To them, most of work coding is spent on perfecting the transfer of data from one step of the website to another, and checking that the steps are executed in the right (intended, non-tampered) order.

This is not what they want to do - they want as many lines of code as possible to be involved with actually doing things, entering users in the database, processing payments and selling stuff.

How should it be then?

In the old days, a program may have looked like this (in no specific language):

10	print "What is your name?";
20	name=getLine();
30	print "Your name is: " + name;
50	print "What kind of car do you have?";
60	car=getLine();
70	print "Insuring a care of make " + car + " costs: ";
100	print carCost(car) + " per year";

Even in this very simple language, this program makes sense. Actions are laid out linearly. You can read what happens, and in what order, by starting at the top, and reading on from there.

Note how this is decidedly unlike the many-Desk horror story above! In fact, the code looks just like insuring a car should be, although heavily simplified. The employee asks a question, gets an answer, asks another question, does a calculation, and tells you what you want to know.

In many-Desk parlance, this code may look like this, again in no specific existing language:

index.html:
	What is your name?
	<form action=name.html>
	Your name: <input name=yourName>
	</form>

name.html:
	Your name is $yourName
	What kind of car do you have?
	<form action=car.html>
	<input type=hidden name=yourName value=$yourName>
	Your car: <input name=yourCar>
	</form>

car.html:
	$yourName, a $yourCar car costs to insure:
	$calcPrice($yourCar);

Even without error checking, this is far more verbose and split out. Note the clever use of 'hidden form inputs' to pass your name to the next page. This is the infamous piece of paper! Now we improve the original program somewhat:

10	print "What is your name?";
20	name=getLine();
25*	if(name=="") then goto 10;
30	print "Your name is: " + name;
50	print "What kind of car do you have?";
60	car=getLine();
65*	if(noSuchCar(car)) then goto 50;
70	print "Insuring a care of make " + car + " costs: ";
100	print carCost(car) + " per year";

The lines marked with an asterisk are new and perform very simple error checking, making sure that you enter a non-empty name, and that your car make exists. If you make a mistake in either case, it will just ask you again, until you get it right.

Now translate this to the many-Desk scenario. This is where the hurting seriously takes off. There are a myriad ways to handle a detected error. In our example, name.html may decide to also know about the original form and reprint it in case you forgot to enter your name. Or it may print a warning, and output a link directing you back to index.html, asking you to try again. Or it may do so for you, and forcibly send your browser back, passing an error message on your piece of paper.

Index.html would then read that piece of paper and tell you that you forgot to enter your name, and to please try again. Each of these methods has a problem and each of them is in wide use.

But it gets worse from here. Car.html also has to check if your name is still set, and if not, balk at the error. This gets even more imporant if there were a fourth file, payment.html! Each successive step has to perform steps to check if the state of things is as it should be.

Remedies

These problems are well known and there are lots of ways to ameliorate them. It all revolves around the stateless event-driven nature of the web. The current solutions try to regain some statefulness. For example, many environments allow you to tag certain variables as 'persistent'.

Through the right kind of magic, the value of $yourName might then survive from name.html to car.html without the use of the hidden form input. It is also possible to unify the three files into one 'persistent class'. Every variable within that class is then 'persistent', and travels automatically from page to page.

However, all these tricks fail to do more than gloss over the problem. Each pageview is a whole new event. Each stage has to check if previous states behaved. Each stage must make sure that it cannot be fooled by accessing it at the wrong moment (which as mentioned before would allow us to skip the 'accident history' page).

Why not go back to the old days?

Compared to the many-Desks drama, the simple numbered program listed above sounds like heaven! Why did you ever move to this event driven nonsense? As mentioned, this is partly due to the 'dynamic page' paradigm, where some dynamic code is added to basically static webpages, ie, a clock telling you the current time.

Such pages are inherently 'desk oriented'. Also, as the computer science people can tell you, nothing is as efficient as stateless interrupt driven operations. While the user is filling out his form, say, in between index.html and name.html, no resources are consumed, save perhaps for the piece of paper we have to store - a few bytes.

The dynamic page paradigm is ridiculously efficient. Even a meager webserver could run millions of sessions this way - because there is no session to speak of. In the absence of other stimuli, no computer professional can resist the pull of perfect efficiency.

The listed program above has to wait for the user to fill out his form. This 'waiting' consumes resources - but not a lot. By the time you are driving enough traffic for this to matter, you will have other problems.

A new (or old!) paradigm: Synchronous Web Programming

The technical term for the simple program listing above is 'Synchronous'. It outputs a form and then waits, in place, for the user to respond. This paradigm is new for web programming, but is old hat for all other uses.

Over the past 30 years it has served us well. It is time for us to integrate the web into our current practices, and treat it no different from other programs.

A sample implementation exists which has already proven that the synchronous paradigm lends itself very well to webdesign. It is expected that existing script languages can be adapted to synchronous operations.

Some example code

  main()
  {
    string username;
    if(doUserPasswordCheck(&username)) {
      memberMenu();
    }
    print("Bye!");
    die(); // session will die after page is viewed   
  }

  bool doUserPasswordCheck(string *user)
  {
    while(true) {            // loop forever
      startTableForm();      // makes a pretty form in a table
      formInput("Your username","username");
      formPassword("Your password","password");
      formSubmit("Login!");
      formSubmit("Cancel"); 
      formEnd();
      
      readVariables(); // wait for user input

      if(buttonPressed("Cancel")) // user canceled
	return false; // not logged in

      if(getVar("username").empty())
	formError("username","<- must not be empty!"); // will be displayed on retry

      if(!userPassExists(getVar("username"),getVar("password")))
	print("Password/username did not match database! Try again: ;<p>\n");
      else {
        *user=getVar("username");
	return true;
      }
    }
    return false; // we never get here
  }

Hang on, this can never work!

You are a dinosaur.