Posts Tagged ‘Python’

Verifying that an MP3 File is valid in Python

Friday, September 10th, 2010

This post is a result of many attempts at trying to find an existing solution, deciding that nothing did what I needed, and writing the code myself. Specifically, I wanted to be able to verify whether or not that a file is a valid MP3 file from Python. I did not want any dependency on non-Python code (for cross-platform reasons), nor did I need to encode, decode, play, record, or any other such operations to the file. I just needed to know if it was an MP3 or not, and that is all. Oh yeah, and the file will probably have a random file name without the .mp3 extension.

At first, I downloaded several python libraries. The documentation was poor on most of them so I had to experiment to figure out if they did what I needed. All were failures or required something external like ffmpeg. I found library that seemed to check if an    mp3 file was valid, but discovered it only worked if the file was named with the mp3 extension. A closer look at its code revealed that it was just checking the file’s mime-type based on the file extension. That was useless for me.

So I decided that this was something I needed to do myself. With this mp3 file format specification as a reference, I sat down and wrote the code that follows, which seems to work very well. Basically the code searches for the first valid audio frame, makes sure that the frame’s header values are sane, and then checks that the second frame seems to start where it should. This code does not decode any audio in those frames.

Here is the code:

def isMp3Valid(file_path):
    is_valid = False

    f = open(file_path, 'r')
    block = f.read(1024)
    frame_start = block.find(chr(255))
    block_count = 0 #abort after 64k
    while len(block)>0 and frame_start == -1 and block_count<64:
        block = f.read(1024)
        frame_start = block.find(chr(255))
        block_count+=1
       
    if frame_start > -1:
        frame_hdr = block[frame_start:frame_start+4]
        is_valid = frame_hdr[0] == chr(255)
       
        mpeg_version = ''
        layer_desc = ''
        uses_crc = False
        bitrate = 0
        sample_rate = 0
        padding = False
        frame_length = 0
       
        if is_valid:
            is_valid = ord(frame_hdr[1]) & 0xe0 == 0xe0 #validate the rest of the frame_sync bits exist
           
        if is_valid:
            if ord(frame_hdr[1]) & 0x18 == 0:
                mpeg_version = '2.5'
            elif ord(frame_hdr[1]) & 0x18 == 0x10:
                mpeg_version = '2'
            elif ord(frame_hdr[1]) & 0x18 == 0x18:
                mpeg_version = '1'
            else:
                is_valid = False
           
        if is_valid:
            if ord(frame_hdr[1]) & 6 == 2:
                layer_desc = 'Layer III'
            elif ord(frame_hdr[1]) & 6 == 4:
                layer_desc = 'Layer II'
            elif ord(frame_hdr[1]) & 6 == 6:
                layer_desc = 'Layer I'
            else:
                is_valid = False
       
        if is_valid:
            uses_crc = ord(frame_hdr[1]) & 1 == 0
           
            bitrate_chart = [
                [0,0,0,0,0],
                [32,32,32,32,8],
                [64,48,40,48,16],
                [96,56,48,56,24],
                [128,64,56,64,32],
                [160,80,64,80,40],
                [192,96,80,96,40],
                [224,112,96,112,56],
                [256,128,112,128,64],
                [288,160,128,144,80],
                [320,192,160,160,96],
                [352,224,192,176,112],
                [384,256,224,192,128],
                [416,320,256,224,144],
                [448,384,320,256,160]]
            bitrate_index = ord(frame_hdr[2]) >> 4
            if bitrate_index==15:
                is_valid=False
            else:
                bitrate_col = 0
                if mpeg_version == '1':
                    if layer_desc == 'Layer I':
                        bitrate_col = 0
                    elif layer_desc == 'Layer II':
                        bitrate_col = 1
                    else:
                        bitrate_col = 2
                else:
                    if layer_desc == 'Layer I':
                        bitrate_col = 3
                    else:
                        bitrate_col = 4
                bitrate = bitrate_chart[bitrate_index][bitrate_col]
                is_valid = bitrate > 0
       
        if is_valid:
            sample_rate_chart = [
                [44100, 22050, 11025],
                [48000, 24000, 12000],
                [32000, 16000, 8000]]
            sample_rate_index = (ord(frame_hdr[2]) & 0xc) >> 2
            if sample_rate_index != 3:
                sample_rate_col = 0
                if mpeg_version == '1':
                    sample_rate_col = 0
                elif mpeg_version == '2':
                    sample_rate_col = 1
                else:
                    sample_rate_col = 2
                sample_rate = sample_rate_chart[sample_rate_index][sample_rate_col]
            else:
                is_valid = False
       
        if is_valid:
            padding = ord(frame_hdr[2]) & 1 == 1
           
            padding_length = 0
            if layer_desc == 'Layer I':
                if padding:
                    padding_length = 4
                frame_length = (12 * bitrate * 1000 / sample_rate + padding_length) * 4
            else:
                if padding:
                    padding_length = 1
                frame_length = 144 * bitrate * 1000 / sample_rate + padding_length
            is_valid = frame_length > 0
           
            # Verify the next frame
            if(frame_start + frame_length < len(block)):
                is_valid = block[frame_start + frame_length] == chr(255)
            else:
                offset = (frame_start + frame_length) - len(block)
                block = f.read(1024)
                if len(block) > offset:
                    is_valid = block[offset] == chr(255)
                else:
                    is_valid = False
       
    f.close()
    return is_valid

Esperanto Support Plugin for Anki

Thursday, August 5th, 2010

So I decided to learn Esperanto, which as an avid user of the SRS application Anki, meant I needed to either enter Esperanto’s special characters (ĉ, ĝ, ĥ, ĵ, ŝ, ŭ) into my flash cards, which can’t easily be typed with the US International keyboard layout, or I could deal with the ugly “x method” workaround (cx, gx, hx, jx, sx, ux). At first, I was only creating Esperanto cards from my Linux computers at home, which let me use an Esperanto keyboard layout to type in the special characters. Pretty soon though, I found myself creating cards from my Windows machine at work during breaks. There is no Esperanto keyboard layout in Windows by default, so I tried to install some third party keyboard layouts without success. I eventually came across a program called Ek, which seemed to do the job of letting me type special characters, except in Anki where it would only type “ĉ”. So I just dealt with the “x method” and was typing words like vojagxas instead of vojaĝas. I don’t know why, but after a while all the x’s began to really bother me. I didn’t want to see mangxi in my flash cards, it just doesn’t seem as natural as manĝi does. So I did what any other software developer would do….

I wrote some code.

Specifically, I wrote a plugin for Anki which converts all those terrible cx, gx, hx, jx, sx, and ux combinations into the aesthetically pleasing ĉ, ĝ, ĥ, ĵ. ŝ. and ŭ characters. Prior to this I’ve never written a plugin for Anki, and even now I claim no expertise. Anki is written in Python, and so are its plugins. I found a plugin that adds some support for the German language to Anki and used that as a model to build my plugin.

To use the Esperanto plugin, open Anki, go to File -> Download -> Shared Plugin. Type “esperanto” into the search box. My plugin is the only one that matches that search, so it should be highlighted already. The plugin is called “Esperanto Support for Anki”. Click Ok and it should download and install for you. In your deck, when you want to add a card for Esperanto, make sure the card is using the “Esperanto” model rather than “Basic” model.

I’m open to suggestions and feedback, and if you are curious about the code at all, open up your Anki plugins folder and take it a look. The code is right there and it’s very simple.

BibleFeed Project: Consuming a SOAP web service

Monday, March 30th, 2009

This is the third post in the BibleFeed Project. If you haven’t already, read the first and second posts.

In my last post I stated the difficulty I was having finding a python library to handle the SOAP web service which I’ll be using to get the data for this project. I gave up on using a library for SOAP and decided to use urllib2 to send the SOAP request and retrieve the response, and ElementTree to parse the response. Both of these are standard libraries in Python 2.5 and higher, so you should not need to install anything extra to use these libraries.

Creating the SOAP request

I decided to take advantage of Django’s template system to create the SOAP requests. The advantages of this approach are that I can easily insert variable data into each SOAP request, I don’t have to manually build the XML in code using a potentially clumsy API, I’m not hard-coding the XML in a string, and tweaking the request is as simple as editing any other XML file.

To accomplish this, I created a templates directory under the bible directory (this the directory where models.py lives). I edited settings.py so that Django knows where the template directory is.

TEMPLATE_DIRS = (
    # Put strings here, like "/home/html/django_templates" or "C:/www/django/templates".
    # Always use forward slashes, even on Windows.
    # Don't forget to use absolute paths, not relative paths.
    'bible/templates',
)

In the templates directory I created a file called soaprequest_listbooks.xml which contains the SOAP request to get the list of books from the web service.

<SOAP-ENV:Envelope xmlns:ns0="http://www.francisshanahan.com/" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
   <SOAP-ENV:Header/>
   <SOAP-ENV:Body>
      <ns0:ListBooks/>
   </SOAP-ENV:Body>
</SOAP-ENV:Envelope>

You may have noticed that this template doesn’t use any variables at all. That’s okay, in the future I will be making this code usuable for all our SOAP requests, so I will need to be able to use template variables in the future.

Sending the SOAP request and reading the response

At this point, I haven’t written any code that does anything yet. I’m going to change that now. In the bible directory there is a mostly empty file called views.py. This file is intended to contain views, which are methods that produce an HttpResponse based on a given HttpRequest. I created a view called listbooks_view, which will retrieve the list of books of the Bible from the web service, and save these books in my database.

import urllib2
from django.template import Context, loader
from django.http import HttpResponse
from bible.models import *
from string import atoi
import xml.etree.ElementTree as ET

def listbooks_view(request):
    # Create the SOAP request and send it
    url = 'http://francisshanahan.com/TheHolyBible.asmx'
    headers = {'Soapaction' : '"http://www.francisshanahan.com/ListBooks"',
        'Content-Type' : 'text/xml'}
    request_template = loader.get_template('soaprequest_listbooks.xml')
    request_context = Context({}) #nothing is needed for this request
    request_data = request_template.render(request_context)
    http_req = urllib2.Request(url, request_data, headers)
    http_resp = urllib2.urlopen(http_req)

    # Assuming we got a successful response, parse it and store the results in the database
    soap_resp = ET.fromstring(http_resp.read())
    # Lovely path, huh?
    books_xml = soap_resp.findall('{http://schemas.xmlsoap.org/soap/envelope/}Body/{http://www.francisshanahan.com/}ListBooksResponse/{http://www.francisshanahan.com/}ListBooksResult/{urn:schemas-microsoft-com:xml-diffgram-v1}diffgram/NewDataSet/bible_content')
    for book_xml in books_xml:
        id = atoi(book_xml.find('Book').text)
        if id<100: # This webservice returns other stuff numbered 100 and higher that isn't actual bible content
            book = Book()
            book.id = id
            book.name = book_xml.find('BookTitle').text
            if id<40: #The first 39 books are in the Old Testament, the rest are New Testament
                book.testament = 'O'
            else:
                book.testament = 'N'
            book.save()

    return HttpResponse('Success!')

The path used to parse the XML in the SOAP response is kind of nasty due to the heavy use of XML namespaces in the SOAP response. In my experience this is pretty common, and is just the nature of dealing with SOAP. To give you an idea of what the XML that I’m parsing looks like, here’s a snippet of the SOAP response:

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <soap:Body>
        <ListBooksResponse xmlns="http://www.francisshanahan.com/">
            <ListBooksResult>
                <xs:schema id="NewDataSet" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
                    <xs:element name="NewDataSet" msdata:IsDataSet="true" msdata:UseCurrentLocale="true">
                        <xs:complexType>
                            <xs:choice minOccurs="0" maxOccurs="unbounded">
                                <xs:element name="bible_content">
                                    <xs:complexType>
                                        <xs:sequence>
                                            <xs:element name="Book" type="xs:int" minOccurs="0" />
                                            <xs:element name="BookTitle" type="xs:string" minOccurs="0" />
                                        </xs:sequence>
                                    </xs:complexType>
                                </xs:element>
                            </xs:choice>
                        </xs:complexType>
                    </xs:element>
                </xs:schema>
                <diffgr:diffgram xmlns:msdata="urn:schemas-microsoft-com:xml-msdata" xmlns:diffgr="urn:schemas-microsoft-com:xml-diffgram-v1">
                    <NewDataSet xmlns="">
                        <bible_content diffgr:id="bible_content1" msdata:rowOrder="0">
                            <Book>1</Book>
                            <BookTitle>The First Book of Moses, called Genesis</BookTitle>
                        </bible_content>
                        <bible_content diffgr:id="bible_content2" msdata:rowOrder="1">
                            <Book>2</Book>
                            <BookTitle>The Second Book of Moses, Called Exodus</BookTitle>
                        </bible_content>
                        <bible_content diffgr:id="bible_content3" msdata:rowOrder="2">
                            <Book>3</Book>

If you’re following along at home, you may be tempting to run the django test web server and see if the code works. You will be disappointed when you see “ProgrammingError at /listbooks/ ERROR:  value too long for type character varying(50).” What this means is that the name field in the Book model is too short. As you can see from the SOAP response above, this webservice uses long names for each of the books of the Bible. Where I initially expected names like “Matthew” and “Corinthians I”, instead I got names like “The Gospel According to St. Matthew” and “The First Epistle of Paul the Apostle to the Corinthians.” Thankfully this is easy to fix. I edited models.py so that the name field in the Book model is 120 characters long instead of 50.

class Book(models.Model):
    TESTAMENTS = (
        ('O','Old Testament'),
        ('N','New Testament'),
    )
    name = models.CharField(max_length=120)
    testament = models.CharField(max_length=1, choices=TESTAMENTS)

Next, I need to adjust the database so that the name column for the “bible_book” table is 120 characters long. Note that in the snippet below I use PostgreSQL for my database. If you are using MySQL or some other database, the SQL will be slightly different.

$ python manage.py dbshell
Password for user postgres:
Welcome to psql 8.3.3, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help with psql commands
       \g or terminate with semicolon to execute query
       \q to quit

biblefeed=# alter table bible_book alter name type varchar(120);
ALTER TABLE
biblefeed=# \d bible_book
                                  Table "public.bible_book"
  Column   |          Type          |                        Modifiers
-----------+------------------------+---------------------------------------------------------
 id        | integer                | not null default nextval('bible_book_id_seq'::regclass)
 name      | character varying(120) | not null
 testament | character varying(1)   | not null
Indexes:
    "bible_book_pkey" PRIMARY KEY, btree (id)

Now, you can run the django test webserver. Open http://localhost:8000/listbooks/ and you should see “Success!”

So how do I know this worked? I go back to the database to see what’s in the “bible_book” table.

$ python manage.py dbshell
Password for user postgres:
Welcome to psql 8.3.3, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help with psql commands
       \g or terminate with semicolon to execute query
       \q to quit

biblefeed=# select * from bible_book order by id;
 id |                            name                             | testament
----+-------------------------------------------------------------+-----------
  1 | The First Book of Moses, called Genesis                     | O
  2 | The Second Book of Moses, Called Exodus                     | O
  3 | The Second Book of Moses, called Leviticus                  | O
  4 | The Fourth Book of Moses, called Numbers                    | O
  5 | The Fifth Book of Moses, called Deuteronomy                 | O
  6 | The Book of Joshua                                          | O
  7 | The Book of Judges                                          | O
  8 | The Book of Ruth                                            | O
  9 | The First Book of Samuel                                    | O
 10 | The Second Book of Samuel                                   | O
 11 | The First Book of the Kings                                 | O
 12 | The Second Book of the Kings                                | O
 13 | The First Book of the Chronicles                            | O
 14 | The Second Book of the Chronicles                           | O
 15 | The Book of Ezra                                            | O
 16 | The Book of Nehemiah                                        | O
 17 | The Book of Esther                                          | O
 18 | The Book of Job                                             | O
 19 | The Book of Psalms                                          | O
 20 | The Proverbs                                                | O
 21 | Ecclesiastes or, The Preacher                               | O
 22 | The Song of Songs, Which is Solomon's                       | O
 23 | The Book of the Prophet Isaiah                              | O
 24 | The Book of the Prophet Jeremiah                            | O
 25 | The Lamentations of Jeremiah                                | O
 26 | The Book of the Prophet Ezekiel                             | O
 27 | The Book of Daniel                                          | O
 28 | The Book of Hosea                                           | O
 29 | The Book of Joel                                            | O
 30 | The Book of Amos                                            | O
 31 | The Book of Obadiah                                         | O
 32 | The Book of Jonah                                           | O
 33 | The Book of Micah                                           | O
 34 | The Book of Nahum                                           | O
 35 | The Book of Habakkuk                                        | O
 36 | The Book of Zephaniah                                       | O
 37 | The Book of Haggai                                          | O
 38 | The Book of Zechariah                                       | O
 39 | The Book of Malachi                                         | O
 40 | The Gospel According to St. Matthew                         | N
 41 | The Gospel According to Saint Mark                          | N
 42 | The Gospel According to St. Luke                            | N
 43 | The Gospel According to Saint John                          | N
 44 | The Acts of the Apostles                                    | N
 45 | The Epistle of Paul the Apostle to the Romans               | N
 46 | The First Epistle of Paul the Apostle to the Corinthians    | N
 47 | The Second Epistle of Paul the Apostle to the Corinthians   | N
 48 | The Epistle of Paul the Apostle to the Galatians            | N
 49 | The Epistle of Paul the Apostle to the Ephesians            | N
 50 | The Epistle of Paul the Apostle to the Philippians          | N
 51 | The Epistle of Paul the Apostle to the Colossians           | N
 52 | The First Epistle of Paul to the Thessalonians              | N
 53 | The Second Epistle of Paul the Apostle to the Thessalonians | N
 54 | The First Epistle of Paul the Apostle to Timothy            | N
 55 | The Second Epistle of Paul the Apostle to Timothy           | N
 56 | The Epistle of Paul to Titus                                | N
 57 | The Epistle of Paul to Philemon                             | N
 58 | The Epistle to the Hebrews                                  | N
 59 | The General Epistle of James                                | N
 60 | The First Epistle General of Peter                          | N
 61 | The Second Epistle General of Peter                         | N
 62 | The First Epistle General of John                           | N
 63 | The Second Epistle of John                                  | N
 64 | The Third Epistle of John                                   | N
 65 | The General Epistle of Jude                                 | N
 66 | The Revelation to Saint John                                | N
(66 rows)

If you have questions, or if I missed something or just didn’t cover it enough, then leave a comment.

The difficulty of consuming a .NET Web Service using Python

Friday, March 6th, 2009

This post is not part of my Biblefeed series of posts, but it is very much related. For the Biblefeed project, I was hoping to consume this web service in order to get the data I need to make the project work. The web service appears to be a SOAP web service written in .Net.

In my day job, I develop using C# and VB.Net and use .Net web services all the time. Of course, consuming a .Net web service with a .Net client is very easy. I had hoped that with the relative popularity of the .Net programming languages that python would have a good SOAP library that could make the task easier.

Based I what I’ve been able to discover so far, I can only state that python does indeed have libraries for dealing with SOAP. I have not been able to make any of them work with the web service mentioned above though.

When googling, the first thing I found was SOAPpy and ZSI. I was a bit alarmed that the last release date for these was in 2001. I tried to install SOAPpy, which seemed to install ok, but apparently had a dependency on PyXML, which is no longer maintained. I abandoned trying to use the libraries at this point.

After some digging, I discovered there are two more modern libraries, soaplib and suds. Both of these seemed to be capable libraries. Soaplib seems like it’s a little stronger on the server side and suds looks to be easier to use on the client side.

To use soaplib as client like I want to do here, I need to create stub classes which resemble the structures used by the web service. I played with this for a little while, but gave up on it because I realized that the web service uses Dataset objects, which I couldn’t figure out how to represent in a python stub class.

Suds is a little nicer because it reads the WSDL for the web service to keep from requiring you to build stub classes, however it does not like Dataset’s either. I was running into the issue described here. As of this writing that issue is still open. One of the comments on that issue suggested removing the <s:element ref=”s:schema”/> tags from the WSDL, so I saved the WSDL file locally and tried it. I was able to progress with suds a little further because of that, but when I actually tried to call the webservice it errored out.

So I guess no luck today for me with any SOAP libraries. The examples out there seem to show that consuming web services created in Java or Python works just fine, and even .Net web services can work when they use simple types. Unfortunately I have no control of the service that I want to consume and so I must try something else.

Possible solutions? While I’m sure I could use mono to access the web service and have it return something I can use in python, I don’t want to make my solution too complex. I have an idea that I’m going to try next that will involve Django’s template system. If it works, it will be in the next post concerning the Biblefeed project.

BibleFeed Project: Creating the models

Thursday, January 29th, 2009

This is the second post relating to the BibleFeed Project. If you haven’t yet, you may want to read the first post.

As with most applications, this one needs to store data. To store data in a django project, I need to first create models representing the data. While I’m sure that the following will not be everything, this is good enough to start with:

class Book(models.Model):
    TESTAMENTS = (
        ('O','Old Testament'),
        ('N','New Testament'),
    )
    name = models.CharField(max_length=50)
    testament = models.CharField(max_length=1, choices=TESTAMENTS)
   
class Chapter(models.Model):
    book = models.ForeignKey(Book)
    chapter_num = models.IntegerField()
   
class Verse(models.Model):
    chapter = models.ForeignKey(Chapter)
    text = models.TextField()
    verse_num = models.IntegerField()

These models are straightforward. There is a Book class which will store the name of book and which testament it is part of. The Chapter class has a field storing which book it is part of, and a field for the chapter number. The Verse class points to the chapter that contains it and has fields for the text of the verse and the verse number.
Now I need to have somewhere to store the data now that I have the models to represent it. Before I do that though, I need to let django know it should include the BibleFeed application. I edit the INSTALLED_APPS setting in the settings.py field:

INSTALLED_APPS = (
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.sites',
    'biblefeed.bible',
)

Now I can run the syncdb command which will create the database tables.

$ python manage.py syncdb
Creating table bible_book
Creating table bible_chapter
Creating table bible_verse
Installing index for bible.Chapter model
Installing index for bible.Verse model

I realize at this point I haven’t shown anything too exciting, and there’s not much that is interactive here, unless you really enjoy viewing the tables in the database. The next post will create something a little more interactive.