[Tb] HTML export encoding/Umlauts etc

From: Charles Starrett <mac__AT__cstarrett.com>
Date: Tue Feb 01 2005 - 16:11:31 EST

Marc-Antoine Parent wrote:

>> * what character encoding does Tinderbox use for output? (latin-1,
>> utf-8?)
>
> Mac Roman. Sigh.

I put a [page][1] up on the TB Wiki arguing that the standard should be
Latin-1. I think I was wrong. Before I work through changing the page,
I wanted to move this discussion into what you all think would make the
most sense.

1: http://www.eastgate.com/wiki2/?EntityExport

First of all, I would think that it would be best for Eastgate to move
away from Mac Roman anyhow as it rolls out a Windows version of
Tinderbox. If we listers can come to some sort of consensus here,
perhaps we can petition Eastgate. Eh?

Next, I want to admit I'm a neophyte. All I did on the Wiki page was to
report my findings. I've done some more searching and have found that,
in fact, I think the W3C is pushing UTF-8 at the moment.

So, as I see it now, the top two character encodings for Roman languages
are Latin-1 and UTF-8. There seem to be [trade-offs][2] for both of
them. (Quelle surprise!) The greatest advantage of Latin-1 is more
consistent backwards compatibility. The greatest advantage of UTF-8 is
greater language compatibility.

2: http://czyborra.com/utf/

Personally, since I do a small amount of my research in Korean, I think
I'd rather have UTF-8 encoding built in to Tinderbox. I'm sure there
are people on this list with far more expertise in this area than I.
Any thoughts?

~~C
Received on Tue Feb 1 21:11:31 2005

This archive was generated by hypermail 2.1.8 : Wed Dec 14 2005 - 10:45:37 EST