Using Apache Tika from Python, with jnius

I needed a library to extract metadata and plaintext transcript from various file formats, for indexing purposes.

After looking around for a while, I found out that Apache Tika might be the right tool for the job (or, at least, it does quite a good job in extracting information from files).

Sadly though, that thing is written in Java.

At first, I tried it by running the jar via subprocess and then parsing the json output. I quickly discarded that approach, as:

  • It required to launch the process twice (once for extracting metadata, and once to extract the plain-text version ...

Python testing with py.test and 2to3 (plus Tox and Travis CI)

I recently decided to try py.test as my test-running facility, in place of the old, boring unittest module.

It seems a nice piece of software so far, the only problem I encountered is making it work automatically with Python 3, on code built via 2to3 script.

A little intro to 2to3

For those not familiar with it, 2to3 is a script that converts code written for Python 2.x into code for Python 3.x, for example by replacing things like "string" and u"unicode" with b"bytes" and "string", .iteritems() with .items(), etc.

The excellent distribute packaging tool ...

99 Bottles of beer

Do you know the popular programming challenge of generating the lyrics of the 99 Bottles of beer song using the shortest code possible? There's even a websites that collects these scripts in many different languages:

I took the challenge..

..and here it is my solution, in just 237 bytes of Python code! :)

w=lambda c,d:'%s bottle%s of beer on the wall%s\n'%(c or'No','s'[:c!=1],d)
for i in range(99,-1,-1):print"\n".join([(w(i,',')*2)[:-14]+'.','Take one down, pass it around,'if(i ...

Link sharer on Drupal

I just wanted something like the "share on facebook" button, but to post links on my (this) blog. I thought I would have to write a module to do that but.. no, Drupal itself allows us to do that directly :)

First, the CCK "Bookmark" Content

First of all, I created a CCK content named "bookmark". Here is the exported CCK code.

Download CCK bookmark here

Second: the "bookmark" button

Yep, I copied this from the facebook "SHARE IT" button.. :)

First, we need to enable the Prepopulate Drupal module, that allows us to prefill the forms by URL. Then, create some ...

