[ml] PyMir mp3 playback and STFT processing

gershon bialer gershon.bialer at gmail.com
Wed Mar 21 23:26:24 PDT 2012


I forked Jeremy's PyMir code on my github at http://github.com/gersh/pymir.

I followed John's suggestions and I got the ffmpeg encodings to play
with audiolab. I think there may some constraints on this API that
make it less than ideal, if someone wants to get it working with a
better API.

I added Steven's stft code from stackoverflow to
pymir/audio/transforms.py. I think it would be good to do stft, istft,
then play it as a test case even if we lose some quality. If you look
at the readmp3.py file I have wrote some commented out test code for
this, but I don't think it currently works. I wasn't sure what
parameters to use for encoding, and I haven't had time to really mess
with it.

It would be good to improve the organization of the test cases in the
codebase. I haven't used Python all that much, so I don't really know
what is best practice for Python on this.

There is some cool stuff we could do with machine learning with a
decent input interface. If the audio is properly pre-processed, I'd
like to trying using the deep learning stuff, which Mike suggested.
How much pre-processing is required for deep learning? Is it possible
to just work from the raw audio? Can the pre-processing be
incorporated into the neural network to allow fuller back-progation?

On Tue, Mar 20, 2012 at 12:27 AM, John Hurliman <jhurliman at cull.tv> wrote:
> I wrote a simple ffmpeg wrapper for extracting audio recently
> (https://github.com/jhurliman/node-pcm). It's a node.js library but here is
> the relevant ffmpeg command:
>
> var ffmpeg = spawn('ffmpeg', ['-i',filename,'-f','s16le','-ac',channels,
> '-acodec','pcm_s16le','-ar',sampleRate,'-y','pipe:1']);
>
> Then read in stdout as a stream of 16-bit signed little endian integers and
> divide each by 32767.0 to convert to floating point. Hope that helps, and
> with any luck I'll make it this Thursday.
>
> Best,
> John Hurliman
>
>
> On Mon, Mar 19, 2012 at 10:57 PM, gershon bialer <gershon.bialer at gmail.com>
> wrote:
>>
>> Hi Steve,
>>
>> Thats cool that you wrote that Stack Overflow answer.
>>
>> PyMIR looks like a good start. I see that Jeremy has a nice hack for
>> importing from ffmpeg. I suppose we could try using ffmpeg's API
>> directly, although that can be a tricky API to work with. I'd like to
>> be able to play this at least as a sanity check. I suppose you might
>> be able to play it with audiolab, but I think that requires converting
>> from int16 to float. I suppose float might be better for fft and such,
>> anyway. I tried feeding it back to ffplay with:
>>   ffmpeg = Popen([
>>            "ffplay",
>>            "-i -"],
>>            stdin=PIPE, stderr=open(os.devnull,"w"))
>>   ffmpeg.communicate(mp3Array.tostring())
>> but that doesn't seem to work. What do you think is the best way to do
>> this?
>>
>> MFCC would be cool to work with. Is it invertible? How does it sound
>> inverted?
>>
>> Does NMF give a sparse representation? What is a good reference on NMF?
>>
>> Thanks,
>> Gershon Bialer
>>
>> On Mon, Mar 19, 2012 at 4:33 PM, Steve Tjoa <stjoa at izotope.com> wrote:
>> > Hello Gershon, others,
>> >
>> > Lurker here. That happens to be my code and Stack Overflow answer that
>> > you
>> > linked to!
>> >
>> > Regarding concerns in this email thread:
>> >
>> > 1. Despite that "Python in Music" page, the lack of basic, simple
>> > audio/music processing libraries in Python has motivated my friend
>> > Jeremy to
>> > begin a Github repo for that very purpose named PyMIR:
>> > (http://jsawruk.com/?p=141). Feel free to use or contribute.
>> >
>> > 2. In there, you will find an MP3 importer that Jeremy wrote.
>> >
>> > 3. I have custom-brewed stuff for audio feature extraction operations,
>> > including MFCCs. I also have sparse coding and NMF stuff.  If there are
>> > specific requests that I can fulfill, I will add them to the repo.
>> >
>> > Please feel free to ask if you have any questions.
>> >
>> > Steve
>> > http://stevetjoa.com
>> >
>> >
>> > On Sun, Mar 18, 2012 at 10:59 PM, gershon bialer
>> > <gershon.bialer at gmail.com>
>> > wrote:
>> >>
>> >> Yeah, thursday would be cool.
>> >>
>> >> Friture looks interesting, I'll have to see I found some code at
>> >> http://stackoverflow.com/questions/2459295/stft-and-istft-in-python
>> >> for doing the spectogram. I couldn't find a good library for importing
>> >> mp3's into python. Although, I suppose we can work with wav files for
>> >> now.
>> >>
>> >> On Sun, Mar 18, 2012 at 10:50 PM, Mike Schachter
>> >> <mschachter at eigenminds.com> wrote:
>> >> > Hey Gershon,
>> >> >
>> >> > Do you want to meet up this Thursday and talk about
>> >> > time-frequency representations for sound? I'm looking
>> >> > at various packages in python. One that struck my eye
>> >> > was a real-time spectrogram package:
>> >> >
>> >> > http://tlecomte.github.com/friture/
>> >> >
>> >> > Anyone else interested in this kind of stuff too? I could
>> >> > put something on the calendar and make an official-like
>> >> > announcement.
>> >> >
>> >> >  mike
>> >> >
>> >> > On Thu, Mar 15, 2012 at 12:00 PM, Mike Schachter
>> >> > <mschachter at eigenminds.com> wrote:
>> >> >> That's awesome Gershon!
>> >> >>
>> >> >> I can't come out tonight, but how about we meet
>> >> >> up next Thursday and have a discussion about using
>> >> >> deep nets for sound feature extraction? Spectrograms
>> >> >> are also be invertible feature representation, as long
>> >> >> as you use the overlapping windows for the FFT.
>> >> >>
>> >> >>  mike
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Thu, Mar 15, 2012 at 11:42 AM, gershon bialer
>> >> >> <gershon.bialer at gmail.com> wrote:
>> >> >>> Hi,
>> >> >>>
>> >> >>> Do you want to meet again tonight?
>> >> >>>
>> >> >>> I played a bit with trying to build a generative model for creating
>> >> >>> music like we were talking about. I also read the papers and looked
>> >> >>> at
>> >> >>> the tutorial on deep learning.
>> >> >>>
>> >> >>> I think the first step is to find an invertible, sparse, feature,
>> >> >>> representation. I think this would be MFCC or some sort of linear
>> >> >>> predictive coding. I suppose you could then apply some of the deep
>> >> >>> learning stuff to it for a generative model. Any thoughts?
>> >> >>> --
>> >> >>> ---------------------
>> >> >>> Gershon Bialer
>> >> >>> _______________________________________________
>> >> >>> ml mailing list
>> >> >>> ml at lists.noisebridge.net
>> >> >>> https://www.noisebridge.net/mailman/listinfo/ml
>> >>
>> >>
>> >>
>> >> --
>> >> ---------------------
>> >> Gershon Bialer
>> >> _______________________________________________
>> >> ml mailing list
>> >> ml at lists.noisebridge.net
>> >> https://www.noisebridge.net/mailman/listinfo/ml
>> >
>> >
>>
>>
>>
>> --
>> ---------------------
>> Gershon Bialer
>> _______________________________________________
>> ml mailing list
>> ml at lists.noisebridge.net
>> https://www.noisebridge.net/mailman/listinfo/ml
>
>
>
> _______________________________________________
> ml mailing list
> ml at lists.noisebridge.net
> https://www.noisebridge.net/mailman/listinfo/ml
>



-- 
---------------------
Gershon Bialer


More information about the ml mailing list