Choosing The Right Server-Side Scripting Language
by: Craig McElweeFull Version
Article published Tuesday, 6th January 2004
How Five Languages Do The Same Basic Tasks
This article compares five prominent scripting tools (Perl, PHP, Python, Tcl, and Java servlets) by applying each to the same six common server-side tasks. You can look at the syntax side-by-side and evaluate how each language handles specific jobs. If you are new to server-side scripting, or if you have used only a few of these languages, you can see what they look like. Even if you already have a favorite, you can see how the other languages stack up in terms of usability, functionality, and code readability.
Assuming your Web site currently serves up dynamic content, how did you pick your scripting language from the numerous open source server-side scripting methodologies available? Did you inherit it or receive it from on high? Or did you methodically look at the pros and cons of each option, weigh them against your site's needs, deadlines, and your own skill set, and choose the one that fit best? If not, read on. You may find a scripting solution that is easier, more powerful, simpler to maintain, or just more fun.
Personally, I used the highly scientific notion of personal bias to select five languages for comparison: Java servlets, Perl, PHP, Python, and Tcl. These fall into two basic categories: Common Gateway Interface (CGI, or calling an external program that returns HTML) and what I call Super Markup (an HTML page embedded with other-language markup code -- a superset of HTML). My apologies if I left out your favorite sever-side scripting tool; mine is not an exhaustive compilation of possibilities (since practically any language can be used for CGI).
To show how they work side-by-side, we'll implement the same six tasks in each language:
Task 1: Get and format the time/date
Task 2: Put form field data into variables: receive 2 HTML form field variables
Task 3: Search and replace
Task 4: File writing: write the two values, along with the transaction date and time as a comma-separated string to a CSV text file (CSV = comma-separated values, such as: one, two, three)
Task 5: File reading: read and present all the records of said CSV file
Task 6: Split comma-delimited line into variables: split the last, just submitted, record back into individual fields for processing, in this case simply echoing back to the browser
Barring the use of a "real" database, these six tasks cover a lot of basic functionality, and you can derive most chores, with modest modifications, from them.
This article provides five scripts (see Resources), one per language, for the basic set of tasks above. Take a moment to them look over to get a feel for each before we dissect them. Gurus in any of these languages will note that I've dropped most language idioms, leaving the constructs they replace, thus favoring readability (for newcomers' benefit) over performance.
Let's go through the programs task by task.
Task 1: Get and Format The Time/Date
This is more of a convenience than anything else. If your script writes to a log, you'll probably want a human-readable date/time. Interestingly, there seem to be two methodological camps on this basic function, in particular the Raw Date vs. the Formatted Date.
The Raw Date camp, which includes Perl, Python, and Tcl, returns the number of seconds from a set point in time, which can then be fed into time and date formatting functions. If you are going to be doing extensive date manipulation, this may be preferable to getting fully formatted dates as strings, as in Java servlets and PHP. Indeed, there is an exception to every rule, and we won't cover all the bases here.
Task 2: Put Form Field Data Into Variables
Each language has its own methodology and idioms for getting the "name=value" pairs from the environment. For the most part, these basic tricks of the trade will simply be cut and pasted into each new script, and modified as necessary, which is to say, not much. When a Web server calls an external program, the data sent by the client's browser are assigned to specific variables in that program's environment. For an HTML form that used the post method, the program has to read the number of bytes in the environment's "Content Length" variable from standard in (STDIN). The program variable that holds the data from STDIN now holds a string of "name=value" pairs, and will need to separate these into chunks, URL-decode the values, and then put them into variables for processing.
Here are some examples (note there is no error checking):
Python uses a module that simplifies the task considerably, allowing almost direct access to the values for assignment into variables, here, data and data2:
|
PHP and Java servlets let you skip the first step and jump right to assignment, as long as you get the variable names you expect. For more generic processing, you would iterate through the list of received variables and assign them dynamically.
|
Java:
String data = request.getParameter("data"); |
Perl's standard block of code isn't hidden away in a module. Here, all the "name=value" pairs are taken out of the string and each assigned to a slot in the @pairs array.
Perl:
|
Then we iterate through the array, putting the name and value into variables:
|
Since the values submitted via the browser have been "URL encoded" for transmission to the server, we now must "URL-decode" the values, first by changing all the +s to spaces, then converting all the escape codes back to their original values.
|
Finally we assign whatever is on the right side of the = (equal) sign as the value of a variable named whatever is on the left side of the = sign.
|
The Tcl code follows this same methodology. Having to figure out this logic for every program would be tedious and error prone. Fortunately you can just cut and paste those blocks verbatim in any CGI program.
Task 3: Search and Replace
Data input on HTML forms frequently need to be modified, and if you intend to run a shell command using data input from the user, then it is an absolute necessity! You must cleanse all input data of anything that might compromise the integrity of your system.
Perl has the most powerful regular expression engine and excels at text manipulation. If your next program requires much text manipulation, you'll be hard pressed to find a reason not to use Perl. That said, Python, PHP, and Tcl support regular expression searching and replacing, though the interface in each appears awkward and convoluted when compared to Perl's elegant syntax. Here are examples from the scripts, modified so each uses the same simple variable name for clarity. In each example, a case-insensitive search for the sequence of letters "cat" is made in the value of the variable 'data', and for each successful find, those letters are replaced with the sequence "dog":
Perl:
|
PHP:
|
Python:
|
Tcl:
|
The Java language lacks integrated regular expressions, but this is a rather simple substitution that we can manage with substrings:
Java:
|
This is extremely inelegant. We could have worked similar solutions in the other languages, but a clear, concise, single statement seems far superior. For whatever reason, Sun decided to leave regular expressions out of the standard Java distribution.
Task 4: File Writing
This is rather straightforward and similar in all of these languages, except Java servlets. Compare how each one opens a file called "file.txt" in append mode and assigns the name "out" to the filehandle:
Java:
PrintWriter |
Perl:
|
PHP:
|
Python:
|
Tcl:
|
Notice how the PHP, Python, and Tcl code snippets are simple, intuitive, and almost identical. The Perl code is extremely similar and just as easy and intuitive. Contrast this with the Java technology's "There's a class for that" approach. In fact, there are 60 classes for that. You must decide exactly which type of output stream you want, and then which particular print writer to use for every I/O situation instead of letting the language do the work. Furthermore, is it intuitive that the "true" at the end of the statement means to append? PHP, Python, and Tcl use 'a' for append, and the use of the filehandle is easily understood -- in fewer than half the keystrokes.
The task of writing to the file is rather similar across the board as demonstrated here (writing to the file handle "out" the contents of the variable "joined"):
Java:
|
Perl:
|
PHP:
|
Python:
|
Tcl:
|
Task 5: File Reading
Again, opening a file to read is trivial in the scripting languages, and requires object instantiation on the part of Java. Actually reading from the file is somewhat different in each and highlights some philosophical differences between the languages. Perl and Python make it easy to read entire files, assigning each line to an element of an array or list, as appropriate, and then processing by iterating over each element. The others are better suited to reading a line, processing it, and then looking for another line. Here are examples from the least amount of code to the most (within reason and without idioms). In each case, the filehandle is 'in' and each line is printed to STDOUT:
Perl:
|
Note: Before Python programmers write to me saying Perl uses an extra character, note that I could have written "for" instead of "foreach", or even written the whole thing as: for (<IN>){ print }.
|
PHP:
|
Tcl:
|
Java:
|
Task 6: Split Comma-Delimited Line Into Variables
Your script reads a line from a CSV file. How easily does it separate each field into individual variables, in this case, a, b, c, and d? Perl, PHP, and Python all have a handy 'split' function, taking the string to be split and the delimiter used for splitting as arguments. Java servlets and Tcl require you to set each field individually. While fine for this example, this latter approach would be prohibitive if each line had a large number of fields.
Perl:
|
PHP:
|
Python:
|
Java:
|
Tcl:
|
So, Which One Should You Use?
Beware of data tainting You might ask the user for the name of the directory to list expecting input like "~" or "..". This is then sent to the shell with the ls command as "ls ~". Innocuous enough, but what if a hacker put in "~; rm *"? The shell would happily carry out the commands, first doing "ls ~" and then "rm *". Prepare for the worst by assuming all input may be tainted: it may have been entered by someone trying to hack your system by embedding system commands in the data. For example, you could allow users to start programs on your machine remotely via server scripting. I'm not for a moment suggesting you do this, but even in an innocent request such as getting a directory listing, there is potential danger. This is not the sort of behavior you intended, but is completely possible unless you take care of such data tainting. In Perl, for example, you may want to strip out anything not alphanumeric or any underscores/asterisks/tildes. In this case, the command "ls ~;rm *" would become "ls ~rm *", which would probably result in a simple error instead of a major system corruption. |
If you are new to the CGI game, hopefully some of these possibilities have whetted your appetite. Which language should you choose to start with? Look over all the programs and see which one makes the most sense. How easily can you figure out what is going on intuitively or from context? Which would you feel comfortable trying to compose from scratch? Which would seem least obtrusive in your dreams and speech? They are all free, so cost isn't an issue. Toss a Web server on your system and have a go!
Finally, if it seems that I'm bashing Java servlets as a server-side solution, I don't mean to. Most server-side applications are relatively small (in the other languages), and the overhead of Java's object-oriented syntax and packaging may not always be worth the development time and effort. Quite frankly, there are only two reasons I can see for writing Java servlets instead of using the others. One, your company is a Java shop and Java programmers are required to do server-side programming; or two, your server-side programming needs require large, complex programs, and it has been determined that you need the "power of Java." If this requirement was determined by your pointy-haired boss, use one of the other languages, surf for a few weeks, then tell him you did it in Java.
댓글 없음:
댓글 쓰기