Posts Tagged ‘characters’

A Question of Characters

Sunday, 31 May 2015

At various times, I'm confronted with confusion by persons and by systems of characters with glyphs. Most of the time, that confusion is a very minor annoyance; sometimes, as when wrestling with the preparation of a technical document, it can cause many hours of difficulty.

It's probably rather easier for people first to see that a character may have multiple glyphs. For example, here are two distinct yet common glyphs for the lower-case letter a: and here are two for g:

People have a bit more trouble with the idea that a single glyph can correspond to more than one character. Perhaps most educated folk generally understand that a Greek Ρ is not our P, even though one could easily imagine an identical glyph being used in some fonts. But many people think that they're looking at a o with an umlaut in each of these two words: whereäs the two dots over the o in the first word are a diæresis, an ancient diacritical mark used in various languages to clarify whether and how a vowel is pronounced.[1] The two dots over the o in the German shön are indeed an umlaut, which evolved far more recently from a superscript e.[2] (One may alternately write the same word schoen, whereäs schon is a different word.)

Out of context, what one sees is a glyph. Generally, we need context to tell use whether we're looking at Ϲ (upper-case lunate sigma), our familiar C, or С (upper-case Cyrillic ess); likewise for many other characters and their similar or identical glyphs. Until comparatively recently, we usually had sufficient context, mistakes were relatively infrequent and usually unimportant. (Okay, so a bunch of people thought that the Soviet Union called itself the CCCP, rather than the СССР. Meh.) But, with the development of electronic information technology, and with globalization, the distinction becomes more pressing. Most of us have seen the problems of OCR; these are essentially problems of inferring characters from glyphs. It's not so messy when converting instead from plain-text or from something such as ODF, but when character substitutions were made based upon similarity or identity of glyph, the very same problems can then arise. For example, as I said, one sees glyphs, but what is heard when the text is rendered audible will be phonetic values associated with the characters used. And sometimes the system will process a less-than sign as a left angle bracket, because everyone else is using it as such. In an abstract sense, these are of course problems of transliteration, and of its effects upon translation.

Some of you will recognize the contrast between character and glyph as a special case of the contrast between content and presentation — between what one seeks to deliver and the manner of delivery. Some will also note that the boundary between the two shifts. For example, the difference between upper-case and lower-case letters originated as nothing more than a difference in glyphs. Indeed, our R was once no more than a different way of writing the Greek Ρ; our A simply was the Greek Α, and it can remain hard to distinguish them! I don't know that ſ (long ess) should be regarded as a different character from s, rather than just as an archaïc glyph thereof.

Still, the fact that what is sometimes mere presentation may at other times be content doesn't mean that we should forgo the gains to be had in being mindful of the distinction and in creating structures that often help us to avoid being shackled to the accidental.


[1] In English and most other languages, a diæresis over the second of two vowels indicates that the vowel is pronounced separately, rather than forming a diphthong. (So here /koˈapəˌret/ rather than /ˈkupəˌret/ or /ˈkʊpəˌret/.) Over a vowel standing alone, as in Brontë, the diæresis signals that the vowel is not silent. (In English and some other languages, a grave accent may be used to the very same effect.) Portuguese cleverly uses a diæresis over the first of two vowels to signal that diphthong is formed where it might not be expected.

[2] Germans used to use a dreadful script — Kurrentschrift — in which such an evolution is less surprising.

Approaching a Finish

Tuesday, 22 May 2012

The conditions for the acceptance of my paper on indecision were revealed to me in early April. Apparently the intention had been to provide them in mid-March, when I was informed of the conditional acceptance, but there'd been a bit of confusion.

Some of the conditions imposed were pretty strong. With the exception of one change,[1] I actively disliked every one of them. I thought that some of them sought reasonable objectives but would bring more cost than benefit; I thought that others were simply wrong-headed.

However, I made or attempted to make all of the changes except for three sorts. I figured that the editor would support me when it came to two of those remaining three sorts, as one would have formatted the references very differently from the journal's own standard (with which the reviewer was apparently unfamiliar) and the other would have dropped-in a proposition that would in fact have been perfectly superfluous in my paper (though an important axiom in most theories of probability).

I was, however, very concerned about the effect of my refusing to make one of the changes against which I dug-in. That change was suggested or demanded (it was not clear which) by the reviewer in order to simplify the presentation by simplifying the structure. Unfortunately, it would also have torn the work from part of its empirical foundations. I genuinely felt that it would be better not to have the paper published than to make the change, yet I was not sure that my intransigence would be properly understood. But I was afforded an opportunity to explain myself on this point (and on every other), and apparently my explanation was accepted.

Yester-day, I was told that the changes that I made had sufficiently addressed the reviewer's original concerns, and that the paper would be accepted conditional upon my modifying the acknowledgments (to be less specific as to what the acknowledged parties had done) and upon my removing the dedication (which the editor or reviewer suggested replacing with an acknowledgment of support). I have made those changes.

I also fixed a broken cross-reference that I had spotted. And I replaced one symbol with another. In order to effect one sort of change that the reviewer had wanted, I had introduced an explicit symbol for binary paralysis. [Erratum (2013:04/25): (Well, actually, for the union of binary paralysis with identity.)] Specifically, I used U+224e () [expression using U+224e to represent binary paralysis] I had adopted this particular character because nothing better occurred to me quickly, and I didn't want to grind to a halt over a d_mn'd symbol. (How dreadful to be paralyzed in the choice of a symbol for paralysis!) But I wasn't comfortable with it. I felt that the reader would have trouble remembering what it meant as it occurred here-and-there, that it was too suggestive of an equality, and that it would be awkward to write by hand. I eventually decided that what I wanted was a π (for παράλυσις)[2] centrally superscripted over a dash. [expression using pi over a dash to represent binary paralysis]

Anyway, there is some small chance that my effecting this change of symbols will cause me difficulty with the editor, but I believe that the paper is effectively accepted now. I don't know how long it might be before the paper is actually published.


[1] I had inserted a foot-note specifically to preëmpt a repeat of an inappropriate criticism delivered by the reviewer at the previous journal. I was planning to request, upon acceptance of the paper, that the foot-note be removed. In the event, the latest reviewer insisted that the foot-note be removed.

[2] The Latin p is too readily associated with preference, and indeed P was once very common for the binary relation of strict preference or that of weak preference.

Questions of Character

Monday, 12 April 2010

I don't much like being limited to using the characters found on my key-board or in the ASCII set.

In some contexts, the resolution is to enter an escape sequence, such as those for HTML or those for Java. In other contexts, the best that I can do is to copy-and-paste the character from somewhere.

With that in mind, I cobbled-together a utility qua webpage for my own purposes, character.php. It's a PHP page because it uses server-side code to generate a table on-the-fly of Unicode characters from U+0020 through U+07FF. (Most of these characters will not be well rendered on most systems, though.) Before that table, it has an assemblage of characters that I frequently want (or that I see as otherwise belonging amongst such characters), such as Greek characters and Latin characters with diacritical marks. And, before that assemblage, it has a couple of JavaScript applets; the first of which converts amongst hexadecimal, characters, and decimal; the second of which converts ordinary strings into strings of HTML escape sequences.

Anyway, I began creäting the page with my own use in mind, but I've extended it a bit to make it more useful to others. I might not implement suggested changes, but I'd certainly consider them.