Cataloguing and Restoring Symlinks

15 June 2022

While one might imagine computer files as stored in something analogous to folders, in reality the directories of file systems are, well, directories. A directory is a file of entries, most of which correspond to names, locations, and other information about other files (some of which may themselves be directories).

But some file systems allow for entries which do not directly provide the location of a file. Instead, these entries — called symbolic links or symlinks — point to other entries. One symbolic link may point to another symbolic link, but it is to be hoped that ultimately an entry is reached that points to a file. A file system will then treat most references to a symbolic link as if they are references to whatever file is indicated by the entry to which the symbolic link ultimately leads. The option of symbolic links allows for different directory entries — possibly with different names and possibly in different directories — effectively to point always to the same file.

I use symbolic links to organize electronic copies of books and articles, so that my directory system categorizes them both by topic and by author, and sometimes by multiple topics or by multiple authors (in the cases of collaborations and of anthologies). But I face the problem that often I want to save these documents using a file system that doesn't support symbolic links.

Not just in this case, but in any case in which I copy to a file system that does not support symbolic links a collection of files in which the directories contain symbolic links, I'd like to be able to restore the entire structure from such a copy.

My solution has been to create a file that catalogues the symbolic links, so that they can be recreated. Of course, I want both the cataloguing and the recreation to be automated. Towards that end, I've written two small programs in Python. These programs will work with any POSIX-compliant operatings system (Linux, MacOS, &c), but Windows is not generally POSIX-compliant.

This program creates a catalogue of symbolic links in the current working directory and in any of its subdirectories, as a set of records with tab-separated variables, and sends it to standard output.

#!/usr/bin/env python
import os

separator = "\t"

def chase_link(link):
    source = os.readlink(link)
    dir_save = os.getcwd()
    os.chdir(link[:link.rindex("/")])
    os.chdir(source[:source.rindex("/")])
    if os.path.islink(source):
        print(source + separator,end="")
        chase_link(source)
    else:
        print(source)
    os.chdir(dir_save)

def search_dir(directory):
    list_dir = [entry for entry in os.scandir(directory)
            if entry.is_dir() or os.path.islink(entry)]
    for entry in list_dir:
        if os.path.islink(entry):
            print(entry.path + separator,end="")
            chase_link(entry.path)
        elif entry.is_dir():
            search_dir(entry)

dir_top = "."

search_dir(dir_top)

And this program reads a catalogue from standard input and recreates symbolic links in the current working directory and subdirectories (recreating subdirectories as necessary).

#!/usr/bin/env python
import os
import os.path
import fileinput

separator = "\t"

def relink(chain):
    dir_start = os.getcwd()
    os.makedirs(chain[0][:chain[0].rindex("/")],0o777,True)
    os.chdir(chain[0][:chain[0].rindex("/")])
    if len(chain) > 2:
        relink(chain[1:])
    link = chain[0][chain[0].rindex("/")+1:]
    if not os.path.exists(link):
        os.symlink(chain[1],link)
    os.chdir(dir_start)

for line in fileinput.input():
    relink(line.rstrip().split(separator))

The reason that you see so much changing of directories in these programs is that they support symbolic links with relative specification. Absolute specification is also supported, but if absolute specification is used for symbolic links then relocating a directory structure is more difficult.

A catalogue created with the first program may have many redundant links. The program could be written to omit these, but that enhancement would come at an expense in programming time and in computing resources that simply doesn't make sense at the scale at which I operate. (Likewise for recoding these programs to work or to fail gracefully with various versions of Windows.) I try not to go crazy with my refinements!

In a later 'blog entry, I'll present some other utilities that I've written more specifically for managing the symbolic links of my files of books and of articles.

Tags: , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.