Python and XML Validation

While recently working on a project where we had included, nested, embedded, or inherited XSDs, I had errors while getting the XML to validate using LXML.  The solution ended up being that when you give LXML the schema location, the included XSD files must be in the working directory for it to find them.  There you go.  I didn\’t find it anywhere on the internet.  Enjoy!

Multithreading, Python and passed arguments

Recently I\’ve had a project that required precompiling the firmware for a device so that the end user could program the device, but not have the source code. We\’re not talking about a few versions of the code, but almost 1000. This is something that no person would want to do, especially since it would have to be redone every time the source code changes. Python to the rescue. It was simple enough to write a program that would copy the source code, change a bit of information in a header file, compile it and save the binary to the appropriate location. Controlling other programs is pretty easy with the subprocess module. That\’s great and all, but doing it single-threaded, that\’s so 90s. Python makes multithreading pretty simple using its multiprocessing library. The trick is not stepping on any toes when you do it.

Continue reading

Cleaning up filenames for transfer to windows

For those of you run multiple operating systems, you may have run across the problem where the filenames on one are not valid on the other. Specifically I\’ve had that problem when using NTFS filesystems between Linux and Windows. The NTFS3G drivers on Linux will allow characters in the file names that windows doesn\’t like. To solve this, I wrote a quick python script that will make the filenames windows acceptable. Enjoy.

windows_rename.py

#! /usr/bin/env python
#Copyright 2012 Chad Kidder
#Released under GPL v3.0


import os, sys, re, shutil


def SafeNames(location):
    for root, dirs, files in os.walk(location):
        for tfile in files:
            NewFile = InvalidCharacters.sub("_",tfile)
            if NewFile <> tfile:
                shutil.move(os.path.join(root, tfile), os.path.join(root, NewFile))
                print "%s -> %s" % (os.path.join(root, tfile), os.path.join(root, NewFile))
        for tdir in dirs:
            NewDir = InvalidCharacters.sub("_",tdir)
            if NewDir <> tdir:
                print "%s -> %s" % (os.path.join(root, tdir), os.path.join(root, NewDir))
                shutil.move(os.path.join(root, tdir), os.path.join(root, NewDir))

if __name__ == "__main__":
    if len(sys.argv) < 2:
        print "Please enter a directory to fix the file names on"
    else:
        InvalidCharacters = re.compile(r"[\\/?:*|\'\\"]")
        for location in sys.argv[1:]:
            SafeNames(location)

Automating Dreamhost backups

We here at Curious System Solutions use dreamhost as our hosting provider.  One of the nice things they give us is a nice, tidy, backup every month, if we ask for it.  It may take a few days if you ask at the beginning of the month, and it is easy to forget to download.  So, we have a handy python script that will check an imap4 email server to see if the backup is ready, and if so, download it.  The script is designed to be a cron job that can be ran every night so you don\’t have to worry about remembering to download things.

Continue reading

Downsampling PDFs to save space

One of the best things since sliced bread, IMHO, is automatic scan to email/pdf functionality on the multi-function copier/printer/scanner/fax.  This makes copying print articles easy so that you can send them to friends, or keep an article out of something you borrowed.  My personal philosophy is \”scan once, process as needed.\”  That means I scan at a high resolution, and go from there.

Now, say you want to share that article, whatever with your friend… and that high-res PDF is too big to email… or they have a slow net connection… you get the idea.  How do you shrink the PDF easily?  Ghostscript is part of the answer.  The other part is, write a script.  What we have below is a script that takes multiple input files and runs each of them through ghostscript with its screen settings and outputs it with _small at the end of the base filename.  Most of the logic in the script is just for parsing the file name and path to get the extension of the base filename in the right place.Continue reading

Concatenating IEEE E-Book PDFs

For those of you who are IEEE members, they now offer some of their \”classic\” ebooks as a free download off IEEE Xplore. The only problem is that they come in multi-part PDF files and are not named in a rational fashion. So… if you have linux, pdftk installed, and a mass download extension on your web browser, (such as the great Download-them-All for firefox) we can fix that.
Continue reading