This is an update to my previous post, “Inspiration strikes a chord!”, detailing my pet project to decipher the songs in some decorative piano rolls I found in a restaurant. For the genesis post, click this link!
A lot has happened since my last writing! I now have an album overflowing with photos of a bar’s entire piano roll light shade collection, as well as an actual piece of the piano roll that had been hiding in a storage closet (lucky, lucky me!).
With data-a-plenty, I’ve given some thought to the steps I’ll need to take to go from pixels to decibels:
- Cropping and de-warping
- Grid-fitting and sampling
- MIDI composition
Cropping and de-warping
My ultimate goal is to be able to snap a picture with a phone and output a song. Cell phone images are almost guaranteed to be distorted and from an arbitrary view point. I’ll need to rectify the image using a perspective transformation. For my initial exploration, I’ve manually selected four corners for my perspective transformation. In the future, I’d like this process to be automatic (inferred from lines detected in the image, etc).


Now that I have a rectified image, I want to determine which pixels are notes (the punch-outs) and which pixels are paper. A threshold function should help to that effect.

I now have an image mask where a pixel value of 255 corresponds to the notes in the song. But before I do any sampling, I need to determine which row corresponds to which note! I’ll need to analyze the vertical spacing of connected components to understand the correct scale of the piano roll, and how a grid/staff might fit the underlying notes. The Hough transform (or a similar edge detector) may help to that effect.

…Tune in next time for more pixelated musical goodness!
Pingback: [UPDATE 2] Inspiration strikes a chord! – Future Reality Lab