Friday, June 30, 2006

Character Encoding

Translations

Sometimes it happens that a web page is displayed correctly on the author's computer, but when visitors see it on the Internet, the text is illegible. This usually happens with languages other than English. Instead of accented letters there appear small squares, question marks, or completely different letters. This is because the author did not specify encoding of the page (or s/he specified it incorrectly). This mistake is rather common, because author usually will not notice it him/herself – someone else must report it.

What is the encoding? Simply said, from technical point of view all data in computer are stored as numbers (and all numbers are stored as ones and zeroes – but this is not important now). So also letters and other characters written in text editor are remembered by computer as numbers; for example „A“ is 65, „B“ is 66,... and the text file is saved on disk as a sequence of numbers, then it is loaded from the disk as a sequence of numbers; and it is also sent across the Internet as a sequence of numbers.

The problem is: which character is which number? For historical causes, there are a few standards. Each of them takes some set of characters and assigns them numbers. The 8-bit standards try to use only numbers from 0 to 255 – of course this cannot include all possible letters, so each standard includes only for a few languages. English MS Windows by default saves text files with encoding „windows-1252“. (If you try to save a TXT file in different language, you may lose some letters after saving.) Linux typically uses ISO standard „ISO-8859-1“.

There is also a Unicode standard, which tries to include all characters from all alphabets; one of its encodings is „UTF-8“. If you save a text file in UTF-8 encoding, it can be written in any language. So I strongly recommend using this encoding, if you use languages other than English.

The important part is that the web browser of the visitor of your pages should know, which encoding uses the page. Today's web browser usually understand a lot of encodings, and visitor can select the correct encoding in menu. But if you specify the encoding in the page, visitor does not have to select anything, because it will be selected automatically. So if you use encoding „windows-1252“, write in the header of page:

<meta equiv="Content-Type" content="text/html; charset=windows-1252"/>

If you want to save your page in encoding UTF-8, in program Notepad select in menu „File | Save as...“ and in the botom row select „Encoding: UTF-8“. Do this when first saving the file, the program will remember it later. And in the header of page write:

<meta equiv="Content-Type" content="text/html; charset=UTF-8"/>

Thursday, June 29, 2006

Welcome to the „WWW Examples“!

This blog will be about creating web pages and other related topics. Examples and explanations for the beginners, but also useful tips for experienced.

It should enable a total beginner to create a web page compliant with internet standards; which is something than many commercially made webs fail to do. However, I think that also a webmaster with a few years of experience may sometimes find here a clever hack or a useful tool – though obviously not in each article.

I will often use the „mysterious“ words like HTML, XHTML, CSS, JavaScript, PHP, and a few others. If anything is unclear, just ask in the discussion below the article. I will try to publish new articles regularly, if possible one article per day; but I cannot promise that.

Enough talking. Let's quickly make out first web page. Create a text file containing the following text, and then save it as for example „test.html“. And open it in an internet browser:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" lang="en">
<head>
<meta equiv="Content-Type" content="text/html; charset=UTF-8"/>
<title>My first web page</title>
<style type="text/css">
h1 {
 color: blue;
}
</style>
</head>
<body>

<h1>My first web page</h1>
<p>I have created this web page in a few seconds.</p>

</body>
</html>

If everything succeeded, this is how your page could look like:

My first web page

I have created this web page in a few seconds.

Next time we will talk about the meaning of all those symbols.