Posts Tagged ‘Python’

Virtual Shelving

Sunday, 19 June 2022

[This entry was revised and expanded on 2022:07/07.]

I am always uncomfortable with the process of organizing books and articles on shelves or in boxes. I desire to have them grouped by each author and by each subject of interest; these desires cannot be reconciled without having multiple copies of each book and of each article, which multiplicity I cannot afford.

Electronic copies are a different matter. Even without multiple copies, symbolic links, which I discussed in a previous entry, make it possible effectively to list the same file in multiple directories. Hereïn, I'll explain the principle structure that I use for organizing documents, and I'll present some small utilities that facilitate creating and maintaining that structure on POSIX-compliant file systems. This structure is not as fine-grained as might be imagined, but it strikes a balance appropriate to my purposes. (For a more sophisticated system one should employ an application storing and retrieving documents mediated by a cataloguing relational database.)

As with many systems, mine have each a directory named Documents. Its two subdirectories relevant to this discussion are Authors and Subjects.

The entries in Subjects are subdirectories with names such as Economics, Logic and Probability, Mathematics, and Philosophy.

In turn, the entries in each of these are subdirectories with the names of authors.

Finally, in each of these subdirectories are entries for files containing their work corresponding to the superdirectory. For example, Documents/Subjects/Logic and Probability/Johnson William Ernest/ would have entries for works by him on logic or on probability, but his article on indifference curves would be listed instead in Documents/Subjects/Economics/Johnson William Ernest/.

Most of the subdirectories of Authors have names corresponding to the subdirectories in the third level of the Subjects substructure, but all of these subdirectories in Authors are different directories from those in the Subjects substructure.

Each of most of these subdirectories of Authors lists not subdirectories nor files, but symbolic links. These links take their names from the subdirectories of Subjects, but they do not link to those subdirectories. Instead, each links to an author-specific sub-subdirectory. Thus, for example, Documents/Authors/Johnson William Ernest/Logic and Probability is a symbolic link to Documents/Subjects/Logic and Probability/Johnson William Ernest. It is as if the subject-specific collection of an author's works is the author-specific collection of works on that subject, just as it should be.

One could, instead, use the complementary organization, in which the Subjects substructure were ultimately dependent upon the Authors substructure, or use a hybrid organization in which some of the dependency flows one way and some the other. The determinant should be what is most important to preserve if the collection is copied to a file system that does not support symbolic links, as in the case of a SD card with a FAT file system.

I've sketched the principal structure, but want to note useful complications of two sorts.

The first is that symbolic links may be used to place some subjects effectively under others. For example, logic an probability fall within the scope of philosophy. As well as having a directory named Logic and Probability listed in Subjects, I have a symbolic link to it listed in Philosophy. Indeed, when a subject falls within the intersection of other subjects, each may have such a symbolic link, and I have links to Documents/Subjects/Logic and Probability not only in Philosophy but in Mathematics and in Economics.

The second is that symbolic links may be used effectively to list a document with multiple authors in the directory for each author. And essentially the same device may be used to classify a single document under different subjects.

Although this organization is not especially fine-grained, it requires the creation of many directories and symbolic links. I've written seven utilities in Python to reduce the burden. Two of those utilities were presented in a previous 'blog entry because they can be put to more general purpose. Here, I will present five more.

(Again, these utilities are written for POSIX-compliant file systems. Windows is not POSIX-compliant. A full discussion of the relevant issues would be tedious, as would be an effort to rewrite these programs to support Windows.)

[Read more.]

Cataloguing and Restoring Symlinks

Wednesday, 15 June 2022

While one might imagine computer files as stored in something analogous to folders, in reality the directories of file systems are, well, directories. A directory is a file of entries, most of which correspond to names, locations, and other information about other files (some of which may themselves be directories).

But some file systems allow for entries which do not directly provide the location of a file. Instead, these entries — called symbolic links or symlinks — point to other entries. One symbolic link may point to another symbolic link, but it is to be hoped that ultimately an entry is reached that points to a file. A file system will then treat most references to a symbolic link as if they are references to whatever file is indicated by the entry to which the symbolic link ultimately leads. The option of symbolic links allows for different directory entries — possibly with different names and possibly in different directories — effectively to point always to the same file.

I use symbolic links to organize electronic copies of books and articles, so that my directory system categorizes them both by topic and by author, and sometimes by multiple topics or by multiple authors (in the cases of collaborations and of anthologies). But I face the problem that often I want to save these documents using a file system that doesn't support symbolic links.

Not just in this case, but in any case in which I copy to a file system that does not support symbolic links a collection of files in which the directories contain symbolic links, I'd like to be able to restore the entire structure from such a copy.

My solution has been to create a file that catalogues the symbolic links, so that they can be recreated. Of course, I want both the cataloguing and the recreation to be automated. Towards that end, I've written two small programs in Python. These programs will work with any POSIX-compliant operatings system (Linux, MacOS, &c), but Windows is not generally POSIX-compliant.

This program creates a catalogue of symbolic links in the current working directory and in any of its subdirectories, as a set of records with tab-separated variables, and sends it to standard output.

#!/usr/bin/env python
import os

separator = "\t"

def chase_link(link):
    source = os.readlink(link)
    dir_save = os.getcwd()
    os.chdir(link[:link.rindex("/")])
    os.chdir(source[:source.rindex("/")])
    if os.path.islink(source):
        print(source + separator,end="")
        chase_link(source)
    else:
        print(source)
    os.chdir(dir_save)

def search_dir(directory):
    list_dir = [entry for entry in os.scandir(directory)
            if entry.is_dir() or os.path.islink(entry)]
    for entry in list_dir:
        if os.path.islink(entry):
            print(entry.path + separator,end="")
            chase_link(entry.path)
        elif entry.is_dir():
            search_dir(entry)

dir_top = "."

search_dir(dir_top)

And this program reads a catalogue from standard input and recreates symbolic links in the current working directory and subdirectories (recreating subdirectories as necessary).

#!/usr/bin/env python
import os
import os.path
import fileinput

separator = "\t"

def relink(chain):
    dir_start = os.getcwd()
    os.makedirs(chain[0][:chain[0].rindex("/")],0o777,True)
    os.chdir(chain[0][:chain[0].rindex("/")])
    if len(chain) > 2:
        relink(chain[1:])
    link = chain[0][chain[0].rindex("/")+1:]
    if not os.path.exists(link):
        os.symlink(chain[1],link)
    os.chdir(dir_start)

for line in fileinput.input():
    relink(line.rstrip().split(separator))

The reason that you see so much changing of directories in these programs is that they support symbolic links with relative specification. Absolute specification is also supported, but if absolute specification is used for symbolic links then relocating a directory structure is more difficult.

A catalogue created with the first program may have many redundant links. The program could be written to omit these, but that enhancement would come at an expense in programming time and in computing resources that simply doesn't make sense at the scale at which I operate. (Likewise for recoding these programs to work or to fail gracefully with various versions of Windows.) I try not to go crazy with my refinements!

In a later 'blog entry, I'll present some other utilities that I've written more specifically for managing the symbolic links of my files of books and of articles.