Posts Tagged ‘programming’

Virtual Shelving

Sunday, 19 June 2022

[This entry was revised and expanded on 2022:07/07.]

I am always uncomfortable with the process of organizing books and articles on shelves or in boxes. I desire to have them grouped by each author and by each subject of interest; these desires cannot be reconciled without having multiple copies of each book and of each article, which multiplicity I cannot afford.

Electronic copies are a different matter. Even without multiple copies, symbolic links, which I discussed in a previous entry, make it possible effectively to list the same file in multiple directories. Hereïn, I'll explain the principle structure that I use for organizing documents, and I'll present some small utilities that facilitate creating and maintaining that structure on POSIX-compliant file systems. This structure is not as fine-grained as might be imagined, but it strikes a balance appropriate to my purposes. (For a more sophisticated system one should employ an application storing and retrieving documents mediated by a cataloguing relational database.)

As with many systems, mine have each a directory named Documents. Its two subdirectories relevant to this discussion are Authors and Subjects.

The entries in Subjects are subdirectories with names such as Economics, Logic and Probability, Mathematics, and Philosophy.

In turn, the entries in each of these are subdirectories with the names of authors.

Finally, in each of these subdirectories are entries for files containing their work corresponding to the superdirectory. For example, Documents/Subjects/Logic and Probability/Johnson William Ernest/ would have entries for works by him on logic or on probability, but his article on indifference curves would be listed instead in Documents/Subjects/Economics/Johnson William Ernest/.

Most of the subdirectories of Authors have names corresponding to the subdirectories in the third level of the Subjects substructure, but all of these subdirectories in Authors are different directories from those in the Subjects substructure.

Each of most of these subdirectories of Authors lists not subdirectories nor files, but symbolic links. These links take their names from the subdirectories of Subjects, but they do not link to those subdirectories. Instead, each links to an author-specific sub-subdirectory. Thus, for example, Documents/Authors/Johnson William Ernest/Logic and Probability is a symbolic link to Documents/Subjects/Logic and Probability/Johnson William Ernest. It is as if the subject-specific collection of an author's works is the author-specific collection of works on that subject, just as it should be.

One could, instead, use the complementary organization, in which the Subjects substructure were ultimately dependent upon the Authors substructure, or use a hybrid organization in which some of the dependency flows one way and some the other. The determinant should be what is most important to preserve if the collection is copied to a file system that does not support symbolic links, as in the case of a SD card with a FAT file system.

I've sketched the principal structure, but want to note useful complications of two sorts.

The first is that symbolic links may be used to place some subjects effectively under others. For example, logic an probability fall within the scope of philosophy. As well as having a directory named Logic and Probability listed in Subjects, I have a symbolic link to it listed in Philosophy. Indeed, when a subject falls within the intersection of other subjects, each may have such a symbolic link, and I have links to Documents/Subjects/Logic and Probability not only in Philosophy but in Mathematics and in Economics.

The second is that symbolic links may be used effectively to list a document with multiple authors in the directory for each author. And essentially the same device may be used to classify a single document under different subjects.

Although this organization is not especially fine-grained, it requires the creation of many directories and symbolic links. I've written seven utilities in Python to reduce the burden. Two of those utilities were presented in a previous 'blog entry because they can be put to more general purpose. Here, I will present five more.

(Again, these utilities are written for POSIX-compliant file systems. Windows is not POSIX-compliant. A full discussion of the relevant issues would be tedious, as would be an effort to rewrite these programs to support Windows.)

[Read more.]

Cataloguing and Restoring Symlinks

Wednesday, 15 June 2022

While one might imagine computer files as stored in something analogous to folders, in reality the directories of file systems are, well, directories. A directory is a file of entries, most of which correspond to names, locations, and other information about other files (some of which may themselves be directories).

But some file systems allow for entries which do not directly provide the location of a file. Instead, these entries — called symbolic links or symlinks — point to other entries. One symbolic link may point to another symbolic link, but it is to be hoped that ultimately an entry is reached that points to a file. A file system will then treat most references to a symbolic link as if they are references to whatever file is indicated by the entry to which the symbolic link ultimately leads. The option of symbolic links allows for different directory entries — possibly with different names and possibly in different directories — effectively to point always to the same file.

I use symbolic links to organize electronic copies of books and articles, so that my directory system categorizes them both by topic and by author, and sometimes by multiple topics or by multiple authors (in the cases of collaborations and of anthologies). But I face the problem that often I want to save these documents using a file system that doesn't support symbolic links.

Not just in this case, but in any case in which I copy to a file system that does not support symbolic links a collection of files in which the directories contain symbolic links, I'd like to be able to restore the entire structure from such a copy.

My solution has been to create a file that catalogues the symbolic links, so that they can be recreated. Of course, I want both the cataloguing and the recreation to be automated. Towards that end, I've written two small programs in Python. These programs will work with any POSIX-compliant operatings system (Linux, MacOS, &c), but Windows is not generally POSIX-compliant.

This program creates a catalogue of symbolic links in the current working directory and in any of its subdirectories, as a set of records with tab-separated variables, and sends it to standard output.

#!/usr/bin/env python
import os

separator = "\t"

def chase_link(link):
    source = os.readlink(link)
    dir_save = os.getcwd()
    os.chdir(link[:link.rindex("/")])
    os.chdir(source[:source.rindex("/")])
    if os.path.islink(source):
        print(source + separator,end="")
        chase_link(source)
    else:
        print(source)
    os.chdir(dir_save)

def search_dir(directory):
    list_dir = [entry for entry in os.scandir(directory)
            if entry.is_dir() or os.path.islink(entry)]
    for entry in list_dir:
        if os.path.islink(entry):
            print(entry.path + separator,end="")
            chase_link(entry.path)
        elif entry.is_dir():
            search_dir(entry)

dir_top = "."

search_dir(dir_top)

And this program reads a catalogue from standard input and recreates symbolic links in the current working directory and subdirectories (recreating subdirectories as necessary).

#!/usr/bin/env python
import os
import os.path
import fileinput

separator = "\t"

def relink(chain):
    dir_start = os.getcwd()
    os.makedirs(chain[0][:chain[0].rindex("/")],0o777,True)
    os.chdir(chain[0][:chain[0].rindex("/")])
    if len(chain) > 2:
        relink(chain[1:])
    link = chain[0][chain[0].rindex("/")+1:]
    if not os.path.exists(link):
        os.symlink(chain[1],link)
    os.chdir(dir_start)

for line in fileinput.input():
    relink(line.rstrip().split(separator))

The reason that you see so much changing of directories in these programs is that they support symbolic links with relative specification. Absolute specification is also supported, but if absolute specification is used for symbolic links then relocating a directory structure is more difficult.

A catalogue created with the first program may have many redundant links. The program could be written to omit these, but that enhancement would come at an expense in programming time and in computing resources that simply doesn't make sense at the scale at which I operate. (Likewise for recoding these programs to work or to fail gracefully with various versions of Windows.) I try not to go crazy with my refinements!

In a later 'blog entry, I'll present some other utilities that I've written more specifically for managing the symbolic links of my files of books and of articles.

Technical Difficulties

Saturday, 14 April 2018

Readers may have noticed some technical problems with this 'blog over the previous few days. I believe that the problems are resolved.

Recently, browsers have become concerned to warn users when they are dealing with sites that do not support encryption. Simply so as not to worry my visitors, I have tried to support the HTTPS protocol.

But I found that WordPress was still delivering some things with the less secure HTTP protocol, which in turn was provoking the Opera browser to issue warnings. At the WordPress site, I learned that I needed to modify two fields.

Unfortunately, changing these two fields broke my theme — my presentation software — so that fall-back text, rather than the title graphic, was sometimes displayed; but I didn't discover the breakage for a while, because the symptom wasn't always present. Ultimately, I realized that something were amiss. I tracked the problem to inconsistencies in how WordPress determines the protocol of the URI of the 'blog versus that of the directory holding the themes.

I recoded my theme to handle this inconsistency. (In the process of this recoding, my 'blog was made still more dysfunctional over several brief intervals.) My code is now sufficiently robust that it should not break if WordPress is made consistent in these determinations.

Styling Programs

Saturday, 3 September 2016

Just as in a natural language there are issues of style on top of those of grammar, of orthography, and of syntax, there are issues of style in computer languages.

For example, in some languages, var = 3 sets var to 3, while var == 3 tests whether var is (already) equal to 3. Omit an = in a test, and the test accidentally becomes an assignment; many programs silently fail as a result of such an omission. But adopt the style of always putting any constant on the left side of the test (eg, 3 == var) and the error (eg, 3 = var, which attempts to set 3 to something) is noticed as soon as the compiler or interpetter reaches it. (There are compilers, interpretters, and separate utilities that will spot possible instances of errors of this sort. It's good to use tools with these features, but best not to be dependent upon them; and one doesn't want the notice of a genuine error to be lost in a sea of largely spurious warnings.)

The specifications of some computer languages, especially of those that are older, significantly limit the lengths of names and of labels; but it's otherwise stylistically best to chose names and labels that clearly identify the nature of whatever is named or labelled. Transparent names and labels then function as integrated documentation. One identifies a lazy or thoughtless programmer by the needless use of opaque names and labels. In Java, the stylistic convention is to name things in ways that clearly identify them; and the convention is to camel-case the names of variables, methods, and classes (eg, countOfBadBits); other languages also allow names to be clearly identifying, but the convention is to separate naming words with underscores (eg, count_of_bad_bits). One uses the naming convention that prevails amongst programmers of that language, so as not to throw-off other programmers who have to deal with the code; it is literally uncivil[1] to use the convention prevailing amongst programmers of one language when writing code in a language where a different convention prevails. (Had it been up to me, then we'd use a different naming style in Java; but it wasn't up to me and I abide by the prevailing convention.)

Many languages end statements with ;. When I helped other students debug SAS programs, I found that the error that they most often made was to omit that semicolon. Sometimes the program wouldn't compile, but sometimes it would compile and silently do something unintended. So I told them to put a space just before the semicolon. The program would still compile just fine if otherwise properly done; but, with all the semicolons visually floating instead of being up against something else, an omission would more easily be spotted. I don't myself use this style for every language in which it would work, but I adopt it for languages in which I notice myself or others omitting the semicolon.

(I was reminded of the general issue of coding style when working on some code written in Python, and wondering whether to put a space before each semicolon.)


[1] Civility is not conterminous with pleasantry; but, rather, a matter of behaving to avoid and to resolve conflict in interaction with other persons.

The Red Death

Sunday, 27 February 2011

Uhm, Firefox programmers? I have a question for you: What does this thing [enlarged image of red button with central 'x' from Navigation Toolbar] actually mean? You know, that red button with the central white x on the Navigation Toolbar. [image of red button on Navigation Toolbar] What's s'posed to happen when I click on it?

Let me explain my question. Traditionally, browsers gave me something like this [image of hexagon] It looks a lot like a stop sign, and clicking on it was a lot like stepping on a brake. The browser stopped what it was doing. That's not exactly what happens when I click on your little red-circle-with-the-eks. Now, it's as if my brakes have been redesigned by a passive-aggressive sociopath. Metaphorically speaking, the car will no longer stop before it goes into the intersection; instead, it will stop either on the other side or just in the intersection.

Really, I mean, when I'd discover that a site was trying to send me some big-ass graphic, I would use the friendly stop-sign button, and it would stop the loading of that thing. The new red button says Just a minute; let me finish loading this big-ass graphic. Or I'd click on a link, and things would churn and churn, so I'd decide to bail. With the stop-sign button, the browser just stopped, leaving me at the prior page on which the link was; with the new red button, it goes to a blank screen (and then, to back-up, Firefox demands that the server of the previous page be re-sent everything to reload the page from scratch, which might not even be directly possible).

Anyway, I'd like either to get the functionality associated with the old button restored, or at least some honest revelation of the functionality associated with this new button. It seems, well, evil.

TNX.

Clean Thoughts

Sunday, 3 August 2008

My best thinking seems to be done in the shower. Yester-day, in the shower, I came up with the idea for what may in fact be a killer app.

The thing that distinguishes a killer app is not that it provides an excellent solution to a problem so much as that it provides an acceptable solution to an excellent problem. That is to say that a killer app may not have ideally efficient code, but manages to do something very desirable that other programs pretty much aren't doing at all.

Some time ago, I wrote a simple pair of programs for the use of the Woman of Interest and myself. Their functionality is very limited, and they were written under an assumption that now seems more dubious. So I was thinking about how to rewrite them into something more powerful, and quickly developed the general idea for the hypothetical app.

Later, I returned a phone call from my friend Phillip (a programmer), and during the course of our conversation sketched the idea for him, telling him that I would want to discuss it at some future date. But Phillip quickly got very actively interested, and discovered that I had coherent answers for related programming questions. (What I don't have are answers for some of the marketing problems.) Basically, he wouldn't let go of the subject, and we ended-up talking for hours. Phillip had one excellent technical suggestion about how to improve the app. He's planning to research potential sources of competition, and then get back to me.

The nature of the app is such that, if some party produces a decent implementation and gets a significant number of users before anyone else produces a decent implementation, then that party can probably profit for years, by virtue of path dependency. But, if a well-funded rival recognized the potential market before there were already a substantial number of users for the app, then that rival might be able to get utterly displace the first party. Hence, I'll remain annoyingly vague about the idea, until I either abandon it or have product ready to move.

A Useful Bit o' PHP Code, Set Right

Monday, 16 June 2008

I came upon someone's ancient 'blog entry in which he or she attempted to present what would be a useful PHP function. Unfortunately, the code has a few bugs.

A dynamic webpage may seek data from various sources, including data passed by GET and POST methods (which is how web forms normally), persistent data in cookies (stored on the client but provided to the server with each visit), and persistent data on the server.

Towards that end, PHP maintains five associative arrays: $_GET, $_POST, $_COOKIE, $_REQUEST, and $_SESSION. ($_REQUEST combines the contents of $_GET, $_POST, and $_COOKIE.) To access a variable named user sent by POST method, one would refer to $_POST["user"], and so forth.

The 'blog entry in question may have been written before $_REQUEST was introduced; in any event, the author had two good ideas:
  1. Avoid errors resulting from trying to access variables that don't actually exist. If no variable user was passed by POST, then $_POST["user"] throws an error. To avoid that sort of thing, the author checks for the presence of the variable before attempting to access it.
  2. Combine the variables in $_SESSION, as well as those in $_GET, $_POST, and $_COOKIE. Indeed, session data is more analogous to cookie data than is cookie data to data transmitted by $_GET or by $_POST.
The problems with the actual code are these:
  • If the server is not maintaining session data, then the attempt to use $_SESSION will itself cause an error.
  • There is an attempt to get cookie data from the array $_SESSION.
  • In the aforementioned attempt, the array is treated as a function.
Here's a version of the code that fixes those problems:
function getvar($var_name)
{
  if (array_key_exists($var_name, $_GET) == TRUE) $ret_value = $_GET[$var_name];
  else if (array_key_exists($var_name, $_POST) == TRUE) $ret_value = $_POST[$var_name];
  else if (session_id() != "")
  {
    if (array_key_exists($var_name, $_SESSION) == TRUE) $ret_value = $_SESSION[$var_name];
  }
  else if (array_key_exists($var_name, $_COOKIE) == TRUE) $ret_value = $_COOKIE[$var_name];
  else $ret_value = ""; 
  return $ret_value;
}
PHP also provides analogous associative arrays for other global variables, but what unites the variable types of the five here is that they are commonly used in session-tracking — keeping data associated with a specific visitor as she moves through one's site. Possibly, getvar would be better named something else, if not distinguished by being made a member of some class of objects.