Archive for the ‘information technology’ Category

Technical Difficulties

Saturday, 14 April 2018

Readers may have noticed some technical problems with this 'blog over the previous few days. I believe that the problems are resolved.

Recently, browsers have become concerned to warn users when they are dealing with sites that do not support encryption. Simply so as not to worry my visitors, I have tried to support the HTTPS protocol.

But I found that WordPress was still delivering some things with the less secure HTTP protocol, which in turn was provoking the Opera browser to issue warnings. At the WordPress site, I learned that I needed to modify two fields.

Unfortunately, changing these two fields broke my theme — my presentation software — so that fall-back text, rather than the title graphic, was sometimes displayed; but I didn't discover the breakage for a while, because the symptom wasn't always present. Ultimately, I realized that something were amiss. I tracked the problem to inconsistencies in how WordPress determines the protocol of the URI of the 'blog versus that of the directory holding the themes.

I recoded my theme to handle this inconsistency. (In the process of this recoding, my 'blog was made still more dysfunctional over several brief intervals.) My code is now sufficiently robust that it should not break if WordPress is made consistent in these determinations.

A Minor Note on the Myth of admin

Sunday, 12 February 2017

This evening, I was looking at a record of recent failed attempts to log into this 'blog. I found that relatively few attempts tried to do so with the popular username of admin, whereäs by far the majority were with the username oeconomist (that it to say with the second-level domain name). There is not and never has been an account here with username oeconomist; the would-be intruder was guessing mistakenly — but not unreasonably. If my logs are representative, then having an account name match a second-level domain name is less secure than having it be admin. With people avoiding admin, it is natural for crackers to try other likely candidates, including candidates whose probabilities are conditional upon the domain names.

Mind you that the reasoning of my earlier explanation of why the avoidance of admin doesn't add a discernible amount of security if passcodes are properly selected can be applied to avoiding a username that matches a domain name. An account with a known username and a well-chosen password of m+n characters is more secure than an account with a secret m-character username and an n-character password.

Choose a username that pleases you. Choose a password that is long and that looks like chaos, and make occasional changes to it.

Accuracy, Exactitude, and Precision

Monday, 5 September 2016

Dictionaries and thesauri often treat accuracy and precision as synonymous, or as nearly so. But the words accuracy and precision and their coördinates[1] are each most strongly associated with a distinct and important notion. The word exactitude (often treated as synonymous with the previous two) and coördinates are most strongly associated with something rather like the combined sense of those other two, but with a notable difference.

When we say that a specification is precise, we do not necessarily mean that it were correct when judged against the underlying objectives. We may merely mean that it were given with considerable explicit or implicit detail. If I tell you that a musical show will begin at 8:15:03 PM, then I am being precise (indeed, surprisingly so). But the show may begin at some other time; in fact, it may never have been planned to begin at that stated time; I can be both precise and wrong.

If your friend tells you that the show will begin shortly after 9 PM, then she may be accurate, though she was far less precise than I.[2] The word accuracy and coördinates are associated with closeness to the truth; and, in everyday discourse, she might be said to be more accurate were she to be more precise while remaining within the range implied by shortly after 9 PM. But the word is also associated with encompassing the truth; if the precision seemed to narrow the range of possibilities in a way that excluded what proved to be the truth, then she might be regarded a having become less accurate. (If one is told that the show is to begin at 9:15 PM, but it begins at 9:05 PM, then one might feel more misled than had one been less precisely told shortly after 9 PM.)

(Note that it would be seen as self-contradiction to say that someone were accurately wrong, though we sometimes encounter the phrase precisely wrong. The latter carries with it the sense — usually hyperbolic — that the someone had managed to be so wrong that even the slightest deviation from what he or she had said or done would be an improvement.)

Although some people might jocularly, eristically, or sophistically pretend that one truth were somehow truer than another, any meaningful proposition is either simply true or simply false (though which may be unknown and there are degrees of plausibility). If Tom and Dick each go to the store, then it is true that one of them has gone to the store. It is not closer to the truth that two of them have gone to the store. It might be said that it were more accurate that two of them have gone to the store, but this seems to imply that it is truer that two went than that one went, and this implication is false. Fortunately, we have a word and coördinates that can carry with them a particular sense of accuracy and precision, with exclusion. These words are exact, exactly, and exactitude.[3] It is true that one person has gone to the store, but it is not true that exactly one person has gone to the store.[4]

(The expression exactly wrong is usually in hyperbolic contrast with exactly right, but is sometimes applied elliptically, when there is believed to be exactly one way in which to have been wrong.)

Even if one is not greatly concerned with rigor, these distinctions can be important. Asking members of an audience to be more accurate when one wants them to be more precise may inadvertently suggest to the audience that one thinks them to have been untruthful! Typically, risking that inference brings no benefit. It would then be better to ask them to be more precise or more exact.[5] The latter may work best with the passive-aggressive or with the autistic, who might otherwise be more precise while less accurate.


[1] The coördinates of a word are simply the other parts of speech built of the same root and carrying the same general sense adapted to a different grammatical rôle. For example, the adjective accurate and the adverb accurately are coördinate with the abstract noun accuracy.

[2] In discussions of computer science, the everyday distinction between accuracy and precision is made more emphatic, because the mathematics of computing is discrete, and limitations in detail have important implications. For example, ordinary floating-point encoding imperfectly represents numbers such as 1/10. That's why calculators and computers so often seem to add or to subtract tiny fractions to or from the ends of numbers. Number-crunching scientist who do not themselves recognize this issue have generated spurious results by proceeding as if computers have unlimited precision, and thus by mistaking artefacts of limited precision for something meaningful within the data. I strongly suspect that a major reason that so many reported econometric results were not subsequently found by other researchers poring over the very same data was that the original researchers (or, sometimes, the later researchers!) were not taking into account the implications of limited precision.

[3] The words just and only can carry the same meaning, but often bring normative implications.

[4] In mathematics, x translates to for some x, while ∃!x translates to for exactly one x.

[5] Asking a person to be more just or more only would almost surely provoke bafflement.

Styling Programs

Saturday, 3 September 2016

Just as in a natural language there are issues of style on top of those of grammar, of orthography, and of syntax, there are issues of style in computer languages.

For example, in some languages, var = 3 sets var to 3, while var == 3 tests whether var is (already) equal to 3. Omit an = in a test, and the test accidentally becomes an assignment; many programs silently fail as a result of such an omission. But adopt the style of always putting any constant on the left side of the test (eg, 3 == var) and the error (eg, 3 = var, which attempts to set 3 to something) is noticed as soon as the compiler or interpetter reaches it. (There are compilers, interpretters, and separate utilities that will spot possible instances of errors of this sort. It's good to use tools with these features, but best not to be dependent upon them; and one doesn't want the notice of a genuine error to be lost in a sea of largely spurious warnings.)

The specifications of some computer languages, especially of those that are older, significantly limit the lengths of names and of labels; but it's otherwise stylistically best to chose names and labels that clearly identify the nature of whatever is named or labelled. Transparent names and labels then function as integrated documentation. One identifies a lazy or thoughtless programmer by the needless use of opaque names and labels. In Java, the stylistic convention is to name things in ways that clearly identify them; and the convention is to camel-case the names of variables, methods, and classes (eg, countOfBadBits); other languages also allow names to be clearly identifying, but the convention is to separate naming words with underscores (eg, count_of_bad_bits). One uses the naming convention that prevails amongst programmers of that language, so as not to throw-off other programmers who have to deal with the code; it is literally uncivil[1] to use the convention prevailing amongst programmers of one language when writing code in a language where a different convention prevails. (Had it been up to me, then we'd use a different naming style in Java; but it wasn't up to me and I abide by the prevailing convention.)

Many languages end statements with ;. When I helped other students debug SAS programs, I found that the error that they most often made was to omit that semicolon. Sometimes the program wouldn't compile, but sometimes it would compile and silently do something unintended. So I told them to put a space just before the semicolon. The program would still compile just fine if otherwise properly done; but, with all the semicolons visually floating instead of being up against something else, an omission would more easily be spotted. I don't myself use this style for every language in which it would work, but I adopt it for languages in which I notice myself or others omitting the semicolon.

(I was reminded of the general issue of coding style when working on some code written in Python, and wondering whether to put a space before each semicolon.)


[1] Civility is not conterminous with pleasantry; but, rather, a matter of behaving to avoid and to resolve conflict in interaction with other persons.

Passcodes Redux

Friday, 1 July 2016

To-day, I found myself unable to log-in to this 'blog. I got a diagnostic that I were entering the wrong password. I don't want to burden my readers with a detailed retelling, but what had actually happened was that an up-date of WordPress rejected my password — it wasn't that I were entering the wrong password; it was that the password that I was entering was now prohibitted.

On top of the login code misreporting the problem, the code for resetting the password wouldn't tell me why my password was being rejected. But it was rejected for containing a particular sub-string; and when I removed that sub-string, the password was then accepted.

If you understand passcodes (perhaps in part from reading my previous entry in which they were discussed), then you should see that there is something literally stupid in the WordPress software. Let's say that the forbidden sub-string were 8675309 and that my password were X.52341-hunao-8675309.Y. If I drop the 8675309, the password becomes X.52341-hunao-.Y. That is now accepted, though it is less secure!

If a would-be intruder knew where in the original password 8675309 appeared, and knew the length of the password, then the password would effectively be p1p2p148675309p22p23 where each pi were an unknown character, and the new password would be p1p2p14p22p23 so that the two passwords would be equally secure! (Either way, an intruder must find a sequence of sixteen unknown characters.) But, as it is, would-be intruders wouldn't be sure that the sub-string appeared, let alone where in the code it would appear, nor how long the password were. One could, in fact, conceptualize the sub-string 8675309 as if it were a single character of extraordinary length (a macro-character) and of great popularity which character might appear within a string of equal or greater length, in which case prohibiting the sub-string would be rather like prohibiting the use of E.

That's not to say that common sub-strings should simply be accepted as passwords or within passwords. A great many systems have been hacked because someone foolishly used passwords such as password, root, or batman. But, instead of rejecting a password because it contained a popular sub-string, the software could, for example, test to see whether the password would be secure if the sub-string were excised, in which case it should be at least slightly more secure if the sub-string were retained.

(Note that this approach works with popular sub-strings of any length, including those of just one character! In fact, when there is no upper-limit on the length of passcodes, they may be securely constructed of nothing but popular sub-strings each of which has multiple characters; a secure password could be made by concatenating ten or more of the one hundred most popular passcodes. Mathematically, the problem of using just one popular passcode is fundamentally the same as that of using a short passcode!)

Sometimes, it's smart programming to write stupid programs, because the costs of designing, implementing, and maintaining more sophisticated software out-weigh the benefits. But, here, the WordPress programmers have opted for cheapness in a way that needlessly thwarts and insults some users, and can actually make systems less secure in those cases. (And the poor diagnostics are simply inexcusable.)

Username Administration[0]

Thursday, 24 March 2016

Those managing 'blogs are frequently told that the administrative account should not have a username of admin nor of administrator.[1] Indeed, 'bots attacking this 'blog try the username admin multiple times every day. None-the-less, I think that concern about easily guessed usernames is quite misplaced.

Ordinary access to an account requires two pieces of identification, the username and a passcode. We can conceptualize these jointly as a single string, the first part of which is practically fixed, the second part of which is changeable. For example, if one had the username admin and the passcode h3Ll0p0p3y3, then the string would be adminh3Ll0p0p3y3 Some might imagine that two strings represent two hoops and therefore more security; but, actually, each character is a hoop. If usernames and passcodes were equally secure, then the username-passcode pairs kelsey5 dO0DL3bug and kelsey 5dO0DL3bug would be perfectly equivalent as far as security were concerned. So we can imagine the two strings concatenated, so long as we remember that one set of its characters are unchangeable, while the others may be changed. In general, the form of the string can be conceptualized as u1u2ump1p2pn where each ui represents an unchangeable username character and each pj represents a changeable passcode character. Now, if we simply know that the administrative account username is admin adminp1p2pn unauthorized access is a matter of guessing the characters of the passcode, without knowing how many they might be. (How passcodes are stored may limit or effectively limit the length of passcodes, but this will typically not have much effect unless those limits are very tight.) On the other hand, if the administrative username is completely unknown, then the string is the apparently more mysterious u1u2ump1p2pn That might seem significantly more secure. However, the number of characters in the passcode is unknown to the opponent, and u1u2um-kp1p2pn+k is more secure for all 0 < km,[2] because usernames are unchangeable. (Were usernames as changeable as are passcode, then the two would be equally secure.) And adminp1p2pn+m is more secure than u1u2ump1p2pn

So real security here is to be found in long and strong passcodes, for which secret usernames are poor substitutes, and one can easily compensate for a readily guessed username by having a stronger passcode.


[0 (2016:03/30, 04/09)] I've fleshed-out this entry a bit, in an attempt to make in more easily understood.

[1] See, for example, the entry for 23 March at the Wordfence 'blog.

[2] The case k = m represents a zero-length username, which really is to say no username at all. It would be quite possible to create a system with just passcodes and no distinct usernames — or, equivalently, a system with very changeable usernames and no passcodes — though this would present some practical difficulties.

Don't Bank on It

Saturday, 25 July 2015

This morning, I discovered that a number of attempts in 2012, in '13, and in '14 to breach the security of this 'blog came from an IP number assigned to the Federal Reserve Board (132.200.32.34).

No, I don't think that Ben Bernanke and Janet Yellen wanted to crack my site. Rather, I'm pretty sure that a Fed computer was itself cracked, and was operating as a 'bot, for years. 'Cause that's how our government rolls.

Preserve the Proxies!

Monday, 22 June 2015

Under the original ethos of the 'Net, those who registered domain names were required to make publicly available their contact information.

A technical loop-hole was found. One party could register a domain name, and that party could provide its own contact information; yet the party could allow (and perhaps even be contractually required to allow) some other party to use the domain name for its own ends. So the technical registrant was a proxy agent for the practical holder. This loop-hole was challenged, but ultimately allowed to remain.

Now pressure is being brought upon ICANN to prohibit proxies for what are deemed commercial sites. The primary motivation appears to be to help firms identify and pursue those who infringe upon trademarks and other intellectual property. (At present, they would have to get a court order requiring the proxy service to release the identity of the practical holder.)

I think that this effort should be strongly resisted. At the time that the use of proxies began, I had mixed feelings about it. But use of the Internet and of the World-Wide Web has evolved, and evolved within the context of this proxied registration being an accepted practice. A rule-change now would impose new costs — sometimes quite significant — on many people, the vast majority of whom are quite innocent of any trespass on intellectual property. Further, I note that most of those who are deliberate in their infringements are unlikely to have qualms about using using proxies that simply claim to be practical holders.

You may want to read ICANN's discussion of the matter

Comments may be sent to comments-ppsai-initial-05may15@icann.org before 7 July.

A Question of Characters

Sunday, 31 May 2015

At various times, I'm confronted with confusion by persons and by systems of characters with glyphs. Most of the time, that confusion is a very minor annoyance; sometimes, as when wrestling with the preparation of a technical document, it can cause many hours of difficulty.

It's probably rather easier for people first to see that a character may have multiple glyphs. For example, here are two distinct yet common glyphs for the lower-case letter a: and here are two for g:

People have a bit more trouble with the idea that a single glyph can correspond to more than one character. Perhaps most educated folk generally understand that a Greek Ρ is not our P, even though one could easily imagine an identical glyph being used in some fonts. But many people think that they're looking at a o with an umlaut in each of these two words: whereäs the two dots over the o in the first word are a diæresis, an ancient diacritical mark used in various languages to clarify whether and how a vowel is pronounced.[1] The two dots over the o in the German shön are indeed an umlaut, which evolved far more recently from a superscript e.[2] (One may alternately write the same word schoen, whereäs schon is a different word.)

Out of context, what one sees is a glyph. Generally, we need context to tell use whether we're looking at Ϲ (upper-case lunate sigma), our familiar C, or С (upper-case Cyrillic ess); likewise for many other characters and their similar or identical glyphs. Until comparatively recently, we usually had sufficient context, mistakes were relatively infrequent and usually unimportant. (Okay, so a bunch of people thought that the Soviet Union called itself the CCCP, rather than the СССР. Meh.) But, with the development of electronic information technology, and with globalization, the distinction becomes more pressing. Most of us have seen the problems of OCR; these are essentially problems of inferring characters from glyphs. It's not so messy when converting instead from plain-text or from something such as ODF, but when character substitutions were made based upon similarity or identity of glyph, the very same problems can then arise. For example, as I said, one sees glyphs, but what is heard when the text is rendered audible will be phonetic values associated with the characters used. And sometimes the system will process a less-than sign as a left angle bracket, because everyone else is using it as such. In an abstract sense, these are of course problems of transliteration, and of its effects upon translation.

Some of you will recognize the contrast between character and glyph as a special case of the contrast between content and presentation — between what one seeks to deliver and the manner of delivery. Some will also note that the boundary between the two shifts. For example, the difference between upper-case and lower-case letters originated as nothing more than a difference in glyphs. Indeed, our R was once no more than a different way of writing the Greek Ρ; our A simply was the Greek Α, and it can remain hard to distinguish them! I don't know that ſ (long ess) should be regarded as a different character from s, rather than just as an archaïc glyph thereof.

Still, the fact that what is sometimes mere presentation may at other times be content doesn't mean that we should forgo the gains to be had in being mindful of the distinction and in creating structures that often help us to avoid being shackled to the accidental.


[1] In English and most other languages, a diæresis over the second of two vowels indicates that the vowel is pronounced separately, rather than forming a diphthong. (So here /koˈapəˌret/ rather than /ˈkupəˌret/ or /ˈkʊpəˌret/.) Over a vowel standing alone, as in Brontë, the diæresis signals that the vowel is not silent. (In English and some other languages, a grave accent may be used to the very same effect.) Portuguese cleverly uses a diæresis over the first of two vowels to signal that diphthong is formed where it might not be expected.

[2] Germans used to use a dreadful script — Kurrentschrift — in which such an evolution is less surprising.

Not Following the Script

Monday, 13 April 2015

I frequently run across the problem of websites whose coders silently presume that all their visitors of interest have Javascript enabled on their browsers. Yester-day, I found this presumption affecting a page of someone whom I know (at least in passing), which prompts me to write this entry. (The person in question did not generate the code, but could suffer economic damage from its flaw.)

The reason that one should not presume that Javascript is enabled on the browsers of all visitors is that Javascript is itself a recurring source of security problems. Careful users therefore enable Javascript only for sites that they trust; very careful users enable Javascript only for sites that they trust and even then only on an as-needed basis; and paranoid users just won't enable Javascript ever. Now, in theory, the only visitors who might interest some site designers would be careless users, but we should look askance at those designers and at their sites.

(And trusting a site shouldn't be merely a matter of trusting the competence and good will of the owner of the domain. Unless that owner is also owner of the server that hosts the domain, one is also trusting the party from whom the site owner leases hosting. In the past, some of my sites have been cracked by way of vulnerabilities of my host.)

A designer cannot infer that, if-and-when his or her site doesn't work because Javascript is not enabled, the visitor will reälize that Javascript needs to be enabled; many problems can produce the same symptoms. Most of the time that sites don't work with Javascript disabled, they still don't work with it enabled. Further, the party disabling Javascript on a browser might be different from the present user; the present user might have only vague ideas about how web pages work. (An IT technician might disable Javascript for most browsers of users at some corporate site. Some of those users, perhaps very proficient in some areas but not with IT, may be tasked with finding products for the corporation.)

The working assumption should typically be that Javascript is not enabled, as this assumption will not be actively hurtful when Javascript is enabled, whereäs the opposite assumption will be actively hurtful when Javascript is not enabled.

The noscript element of HTML contains elements or content to be used exactly and only if scripting has been disabled. That makes it well suited to for announcements that a page will work better if Javascript is enabled

<noscript><p class="alert">This page will provide greater functionality if Javascript is enabled!</p></noscript>

or not at all if it is not enabled.

<noscript><p class="alert">This page requires Javascript!</p></noscript>

(It is possible to put the noscript element to other uses.) So a presumption that Javascript is enabled certainly need not be silent.

However, in many cases, the effect got with Javascript isn't worth badgering the visitor to enable Javascript, and the page could be coded (with or without use of the noscript element) so that it still worked well without Javascript. In other cases, the same effects or very nearly the same effects could be got without any use of Javascript; a great deal that is done with Javascript could instead be done with CSS.