Archive for the ‘Python’ Category

Anki 2.0, Esperanto, and GitHub

Tuesday, April 24th, 2012

Recently, I had become aware that Anki 2.0 was in beta through my friend Tom, who is working on making some changes to the MCD Support plugin so that it supports languages like Spanish as well as it does Japanese. Despite this awareness, it never crossed my mind that I might need to update the Esperanto Support addon that I wrote a couple years ago until I received an email from Damien Elmes, the author of Anki, that it was time to look at doing so.

I decided that this was the perfect opportunity to finally get on GitHub, since this project is very small and would not require a lot of work to move it to its new home. The process of getting setup on GitHub was pretty easy thanks to their excellent documentation.

Also easy was upgrading my code to work with Anki 2.0. Although the code needed significant changes, the excellent documentation for writing addons for Anki 2.0 had everything I needed to know. If this documentation existed for the older version of Anki, I never saw it, so I’m glad that an obvious effort was put into creating it.

The Esperanto Support addon for Anki 2.0 is available for download from https://beta.ankiweb.net/shared/info/2096916868, or just browse the add-on list within Anki and you will find it. If you want to see the source, it now lives at https://github.com/peterjcarroll/Esperanto-Anki-Plugin.

Verifying that an MP3 File is valid in Python

Friday, September 10th, 2010

This post is a result of many attempts at trying to find an existing solution, deciding that nothing did what I needed, and writing the code myself. Specifically, I wanted to be able to verify whether or not that a file is a valid MP3 file from Python. I did not want any dependency on non-Python code (for cross-platform reasons), nor did I need to encode, decode, play, record, or any other such operations to the file. I just needed to know if it was an MP3 or not, and that is all. Oh yeah, and the file will probably have a random file name without the .mp3 extension.

At first, I downloaded several python libraries. The documentation was poor on most of them so I had to experiment to figure out if they did what I needed. All were failures or required something external like ffmpeg. I found library that seemed to check if an    mp3 file was valid, but discovered it only worked if the file was named with the mp3 extension. A closer look at its code revealed that it was just checking the file’s mime-type based on the file extension. That was useless for me.

So I decided that this was something I needed to do myself. With this mp3 file format specification as a reference, I sat down and wrote the code that follows, which seems to work very well. Basically the code searches for the first valid audio frame, makes sure that the frame’s header values are sane, and then checks that the second frame seems to start where it should. This code does not decode any audio in those frames.

Here is the code:

def isMp3Valid(file_path):
    is_valid = False

    f = open(file_path, 'r')
    block = f.read(1024)
    frame_start = block.find(chr(255))
    block_count = 0 #abort after 64k
    while len(block)>0 and frame_start == -1 and block_count<64:
        block = f.read(1024)
        frame_start = block.find(chr(255))
        block_count+=1
       
    if frame_start > -1:
        frame_hdr = block[frame_start:frame_start+4]
        is_valid = frame_hdr[0] == chr(255)
       
        mpeg_version = ''
        layer_desc = ''
        uses_crc = False
        bitrate = 0
        sample_rate = 0
        padding = False
        frame_length = 0
       
        if is_valid:
            is_valid = ord(frame_hdr[1]) & 0xe0 == 0xe0 #validate the rest of the frame_sync bits exist
           
        if is_valid:
            if ord(frame_hdr[1]) & 0x18 == 0:
                mpeg_version = '2.5'
            elif ord(frame_hdr[1]) & 0x18 == 0x10:
                mpeg_version = '2'
            elif ord(frame_hdr[1]) & 0x18 == 0x18:
                mpeg_version = '1'
            else:
                is_valid = False
           
        if is_valid:
            if ord(frame_hdr[1]) & 6 == 2:
                layer_desc = 'Layer III'
            elif ord(frame_hdr[1]) & 6 == 4:
                layer_desc = 'Layer II'
            elif ord(frame_hdr[1]) & 6 == 6:
                layer_desc = 'Layer I'
            else:
                is_valid = False
       
        if is_valid:
            uses_crc = ord(frame_hdr[1]) & 1 == 0
           
            bitrate_chart = [
                [0,0,0,0,0],
                [32,32,32,32,8],
                [64,48,40,48,16],
                [96,56,48,56,24],
                [128,64,56,64,32],
                [160,80,64,80,40],
                [192,96,80,96,40],
                [224,112,96,112,56],
                [256,128,112,128,64],
                [288,160,128,144,80],
                [320,192,160,160,96],
                [352,224,192,176,112],
                [384,256,224,192,128],
                [416,320,256,224,144],
                [448,384,320,256,160]]
            bitrate_index = ord(frame_hdr[2]) >> 4
            if bitrate_index==15:
                is_valid=False
            else:
                bitrate_col = 0
                if mpeg_version == '1':
                    if layer_desc == 'Layer I':
                        bitrate_col = 0
                    elif layer_desc == 'Layer II':
                        bitrate_col = 1
                    else:
                        bitrate_col = 2
                else:
                    if layer_desc == 'Layer I':
                        bitrate_col = 3
                    else:
                        bitrate_col = 4
                bitrate = bitrate_chart[bitrate_index][bitrate_col]
                is_valid = bitrate > 0
       
        if is_valid:
            sample_rate_chart = [
                [44100, 22050, 11025],
                [48000, 24000, 12000],
                [32000, 16000, 8000]]
            sample_rate_index = (ord(frame_hdr[2]) & 0xc) >> 2
            if sample_rate_index != 3:
                sample_rate_col = 0
                if mpeg_version == '1':
                    sample_rate_col = 0
                elif mpeg_version == '2':
                    sample_rate_col = 1
                else:
                    sample_rate_col = 2
                sample_rate = sample_rate_chart[sample_rate_index][sample_rate_col]
            else:
                is_valid = False
       
        if is_valid:
            padding = ord(frame_hdr[2]) & 1 == 1
           
            padding_length = 0
            if layer_desc == 'Layer I':
                if padding:
                    padding_length = 4
                frame_length = (12 * bitrate * 1000 / sample_rate + padding_length) * 4
            else:
                if padding:
                    padding_length = 1
                frame_length = 144 * bitrate * 1000 / sample_rate + padding_length
            is_valid = frame_length > 0
           
            # Verify the next frame
            if(frame_start + frame_length < len(block)):
                is_valid = block[frame_start + frame_length] == chr(255)
            else:
                offset = (frame_start + frame_length) - len(block)
                block = f.read(1024)
                if len(block) > offset:
                    is_valid = block[offset] == chr(255)
                else:
                    is_valid = False
       
    f.close()
    return is_valid

Esperanto Support Plugin for Anki

Thursday, August 5th, 2010

So I decided to learn Esperanto, which as an avid user of the SRS application Anki, meant I needed to either enter Esperanto’s special characters (ĉ, ĝ, ĥ, ĵ, ŝ, ŭ) into my flash cards, which can’t easily be typed with the US International keyboard layout, or I could deal with the ugly “x method” workaround (cx, gx, hx, jx, sx, ux). At first, I was only creating Esperanto cards from my Linux computers at home, which let me use an Esperanto keyboard layout to type in the special characters. Pretty soon though, I found myself creating cards from my Windows machine at work during breaks. There is no Esperanto keyboard layout in Windows by default, so I tried to install some third party keyboard layouts without success. I eventually came across a program called Ek, which seemed to do the job of letting me type special characters, except in Anki where it would only type “ĉ”. So I just dealt with the “x method” and was typing words like vojagxas instead of vojaĝas. I don’t know why, but after a while all the x’s began to really bother me. I didn’t want to see mangxi in my flash cards, it just doesn’t seem as natural as manĝi does. So I did what any other software developer would do….

I wrote some code.

Specifically, I wrote a plugin for Anki which converts all those terrible cx, gx, hx, jx, sx, and ux combinations into the aesthetically pleasing ĉ, ĝ, ĥ, ĵ. ŝ. and ŭ characters. Prior to this I’ve never written a plugin for Anki, and even now I claim no expertise. Anki is written in Python, and so are its plugins. I found a plugin that adds some support for the German language to Anki and used that as a model to build my plugin.

To use the Esperanto plugin, open Anki, go to File -> Download -> Shared Plugin. Type “esperanto” into the search box. My plugin is the only one that matches that search, so it should be highlighted already. The plugin is called “Esperanto Support for Anki”. Click Ok and it should download and install for you. In your deck, when you want to add a card for Esperanto, make sure the card is using the “Esperanto” model rather than “Basic” model.

I’m open to suggestions and feedback, and if you are curious about the code at all, open up your Anki plugins folder and take it a look. The code is right there and it’s very simple.

The difficulty of consuming a .NET Web Service using Python

Friday, March 6th, 2009

This post is not part of my Biblefeed series of posts, but it is very much related. For the Biblefeed project, I was hoping to consume this web service in order to get the data I need to make the project work. The web service appears to be a SOAP web service written in .Net.

In my day job, I develop using C# and VB.Net and use .Net web services all the time. Of course, consuming a .Net web service with a .Net client is very easy. I had hoped that with the relative popularity of the .Net programming languages that python would have a good SOAP library that could make the task easier.

Based I what I’ve been able to discover so far, I can only state that python does indeed have libraries for dealing with SOAP. I have not been able to make any of them work with the web service mentioned above though.

When googling, the first thing I found was SOAPpy and ZSI. I was a bit alarmed that the last release date for these was in 2001. I tried to install SOAPpy, which seemed to install ok, but apparently had a dependency on PyXML, which is no longer maintained. I abandoned trying to use the libraries at this point.

After some digging, I discovered there are two more modern libraries, soaplib and suds. Both of these seemed to be capable libraries. Soaplib seems like it’s a little stronger on the server side and suds looks to be easier to use on the client side.

To use soaplib as client like I want to do here, I need to create stub classes which resemble the structures used by the web service. I played with this for a little while, but gave up on it because I realized that the web service uses Dataset objects, which I couldn’t figure out how to represent in a python stub class.

Suds is a little nicer because it reads the WSDL for the web service to keep from requiring you to build stub classes, however it does not like Dataset’s either. I was running into the issue described here. As of this writing that issue is still open. One of the comments on that issue suggested removing the <s:element ref=”s:schema”/> tags from the WSDL, so I saved the WSDL file locally and tried it. I was able to progress with suds a little further because of that, but when I actually tried to call the webservice it errored out.

So I guess no luck today for me with any SOAP libraries. The examples out there seem to show that consuming web services created in Java or Python works just fine, and even .Net web services can work when they use simple types. Unfortunately I have no control of the service that I want to consume and so I must try something else.

Possible solutions? While I’m sure I could use mono to access the web service and have it return something I can use in python, I don’t want to make my solution too complex. I have an idea that I’m going to try next that will involve Django’s template system. If it works, it will be in the next post concerning the Biblefeed project.