At times the different technologies the Internet employs seems to increase at an astounding rate, enough to drive the average person mad. HTML, DHTML, XML, CSS, JS, ASP, CGI…What does this all mean for you? How do I know what browser supports what? There are a lot of questions that need to be answered. Here you will find a summary of the most popular, and what these technologies can do for you.
HTML, DHTML, XML
HTML stands for Hypertext Markup Language. It is the language of the Worldwide Web, one of the more prominent parts of today’s Internet. HTML is the language all web pages are written in, in one form or another. Even pages that contain animation’s such as Flash or other multimedia, have to contain some HTML language so your browser can interpret the commands correctly.
HTML is a series of tags and commands written in a text format, that a web browser processes, and then displays the resulted output onto your monitor. HTML can be written via a text editor such as Windows notepad, or a UNIX editor such as Pico or vi. Anyway you write it, it boils down to simple text. Embedded in the text is calls to images, tables, forms, just about anything. HTML has been around for quite some time now, and new HTML ‘versions’ are coming out all the time that include newer commands, and revised older commands. Newer versions of web browsers such as Internet Explorer and Netscape Navigator are released all the time to support newer HTML standards.
Java
Java is a programming language developed by Sun Microsystems with the idea being to create a universal platform with applications that will run on any machine, with a simple plugin being all that is required. Java isn’t as hot today on the web as it was when it was first introduced, as it really wasn’t designed for use on webpages, but more for a means of allowing users will all types of personal computers to run all the same applications, as said before. Java that works on web pages, are generally known as ‘applets’. Here is a clock that is a sample Java applet.
CSS, Cascading Stylesheets
Cascading Stylesheets or CSS for short, is a method which allows a way to format and control layout of web pages like HTML never dreamed of. You can control layout, make smaller and faster pages, you can maintain and update pages easier and faster then with HTML. One of the best things about stylesheets is that you can create a CSS template, which is a basic text file, and call this template with a simple line of text into the rest of your HTML documents, instead of actually placing the whole piece of text there, like HTML. In return it cuts down on file size, and makes implementing widespread font, layout, or color changes a lot easier, since all you have to do is change one text file, instead of potentially hundreds of HTML documents. Click here to see our style sheet tutorials.
CGI Scripts – Perl
Common Gateway Interface or CGI, is a method of running executable scripts on web servers which will perform certain tasks, whether it be running message boards, chat rooms, managing mailing lists, or handling forms. It isn’t a programming language, but a technology in itself. CGI defines how programs pass information back and forth between web servers and clients. The most common languages CGI is written in, are Perl, C, or UNIX shell script…however you can write CGI programs in just about any programming language that can execute on your server’s hardware. Just like any programs that execute on the server, security is always a concern, as you want to be careful when writing the script that you are not executing any of the users data. Not all web servers will support CGI scripting, as many ISP’s are concerned about letting users place scripts onto the server, mainly for security reasons.
ASP, PHP and CFM
Active Server Pages, or ASP, is a technology introduced by Microsoft. ASP’s are server-generated pages which can call other programs to do things database functions, form processing, basically similar things that CGI could do. One advantage ASP has over CGI is that it runs as a service on a particular machine, allowing it to take advantage of server-side features such as multithreaded architectures. As Microsoft would say it, “Active Server Pages is an open, compile-free application environment in which you can combine HTML, scripts, and reusable ActiveX server components to create dynamic and powerful Web-based business solutions. Active Server Pages enables server-side scripting for IIS with native support for both VBScript and Jscript.” One major issue is that you cannot run ASP scripts on a UNIX based server, although there are a few companies that have released ports to UNIX.
Databases
If you interested in selling products, providing large amounts of information to users and then letting them search for it and return results dynamically, then a database is for you. The problem is, there are many different kids of web-based databases that it can be confusing. The good news is you do not have to worry about what browser is compatible with a database since it runs server-side, but you do have to know which webservers can handle which databases. Most servers run SQL based databases, and they are easy to install, learn from, and manage. Many databases can be managed via telnet, or via the web, such as PHP scripting. Whatever you do, choose your database wisely, be sure it will easily support the information you plan on placing into it, and do your homework prior to making any choices.
Web Server Platforms
UNIX or NT? Sounds simple doesn’t it? It can be to some, but others it can be a nightmare of a decision.
UNIX has been around longer, is more stable, less prone to malicious software, and is generally more secure. A UNIX-based server will also manage its resources more efficiently then an NT machine, and a good portion of the software you will be running is open source, or free, such as the web-server software itself, Apache.
On the other hand, many more webmasters feel that NT provides a much more easy-to-use interface, and a much more familiar one at that. The software may not be free and sometimes expensive, but is much more easier to use and configure, and most of the tools available to UNIX servers are in one way or another available to those running on a NT machine.
Whatever you choose, as with all web technologies, do your homework before you buy.
Security
One major thing you want to ensure when creating a site of your own, is security. You want to protect you and your visitors from any potential acts against you or them. Some things seem simple, others can slip by even the most experienced of webmasters.
First of all, every directory that contains sensitive information or will allow access to management functions of you site you want to password protect. Also directories that contain e-mail addresses of users, from mailing lists, chat rooms, message boards, etc., you will probably want to encrypt and hide them from search engines both internally and externally using a robots.txt file. On a UNIX based system, be sure you have file permissions set correctly, so others will not be able to write data to files that they shouldn’t be.
Last but not least, exercise common sense. Don’t share you passwords with anyone, period.
Network Layout and requirements
I’m not going to get too in depth on this one here. If you are not going to have an ISP host your site, and you wish to do your own, you will need to take several things into consideration. You may have to provide your own DNS, mail, FTP, and other servers depending on which services you want to have set up and running. You will also have to have a very fast connection both up and downstream, and an SSL certificate if you plan on doing any type of e-commerce. You may also require a database server if needed. There are a lot of things to take into consideration as usual, like where will you get your bandwidth from ( more then likely the local phone company ) pricing, etc. Here is a typical internal network layout.
Sometimes the hub is not needed, if the router has enough 10/100 ports to accommodate internal users. If not a hub will be needed. Some routers also have built in firewall services, such as Cisco and Netopia routers, so a separate firewall may not be needed. Sometimes, a router isn’t needed at all. A router is used to route one block of IP addresses ( yours ) to another block ( more than likely your ISP ). Even if you do not have static IP’s, yet you have more than 1 machine using the bandwidth, you will still need some sort of router to assign internal dynamic IP’s to those machines so they can access the Internet.
User-Agents
Note: See the Server log file section for a complete list of environmental variables and more about your server log files.
User-Agents are one of many environmental variables that the server gets from the visitor. ( HTTP_USER_AGENT ). For example, when a visitor visits this site, with Windows 98, running Microsoft Internet Explorer 5.5, the HTTP_USER_AGENT would look something like this:
Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)
Every visitor that visits your site leaves user-agent info that is able to be logged, however there are utilities that will allow you to return a custom user-agent to the server.
There are certain user-agents that you DO NOT want to see in your log files however. Some can waste your bandwidth, causing your site to slow down, others can harvest your e-mail address, some of them will copy your website, others are just plain rude.
Here is a brief list of search engine spider user-agents..these are a good thing to see in general.
- Googlebot/2.1 – Google.com
- Scooter-3.0.FS – Altavista.com
- FAST-WebCrawler/2.2.5 – Lycos/Alltheweb/Fast
Here is a brief list of User-agents you do NOT want to see.
- WebZIP – Copies websites to hard drives
- EmailSiphon – E-mail harvester
- Wget – Very malicious web crawler
Apache Log files
This section will deal with how to analyze your log files for your web server, dealing with Apache web server in general, as it is the most popular.
When a visitor visits your website, the TCP/IP stack sends the request to the server, along with environmental variables, and the server sends back HTTP headers back to the client. Here are a few standard Apache environmental variables and what they mean:
REMOTE_ADDR: Remote I.P. Address or hostname of the client
REQUEST_METHOD: POST, GET, or HEAD. GET is the most common form, meaning the client is requesting a document from the server.
REQUEST_URI: The requested document relative to the document root. ( e.g.. /test/123.html )
Note: The above variables are requests sent from the CLIENT that the server logs.
Here are some common HTTP headers:
HTTP_HOST: The base URL of the host ( e.g. www.icehousedesigns.com )
HTTP_REFERER: The URL of the page that made the request. If linked from e-mail this value will be null.
HTTP_USER_AGENT: The browser ID or user-agent string identifying the browser (nominally defined by RFC 1945 and RFC 2068).
Note: The above are HTTP headers. This is requests the the server sends to the client, and the client responds to…this info is logged also.
Here is a sample line of a log file from my server. We will break it down to see what it all means.
255.255.255.255 – – [18/Jul/2001:10:18:08 -0400] “GET /portfolio/logos.php3 HTTP/1.1” 200 23745 “http://www.google.com/search?hl=hr&safe=off&q=free+flash+logo+design” “Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 4.0)”
OK lets break down the above log entry.
255.255.255.255 : The I.P. address of the visitor ( IP changed to protect identity )
[18/Jul/2001:10:18:08 -0400] – The date and time of the visit
GET – The request method
/portfolio/logos.php3 – The document relative to root that was requested
HTTP/1.1 – The content type of the attached info.
200 – The server code returned ( 200 OK, 404 not found, 500 server error, etc. )
23745 – The document size in bytes
“http://www.google.com/search?hl=hr&safe=off&q=free+flash+logo+design” – The referring URL ( the URL the visitor came from )
“Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 4.0)” – The User-Agent of the visitor.
Apache log files are generally found in your /log/httpd directory. What they are named and the exact path is up to your Apache configuration. The error log may be stored in a separate file. Find out from your web host if you don’t know. Also, not all information as I have above maybe logged. You may have to request from your host ( or do it yourself ) that certain features of your log files be enabled in your Apache config.
There are many different ways to analyze your log files. There are some programs that do it for you, and there are other programs that will create their own log files for you.