I did the rotoscoping on video footage in a custom Qt/GL program of mine, a primitive thing with matched-tangent bezier curve shapes which you can keyframe. From there the cuts between shots / layout of roto shapes / text is all done through python to generate svg. I wrote some rudimentary tracking to get some of the head and guitar movements, sans skew and scale, leading to the bobbleheadedness.
Probably better would be to use blender, it has great tracking and roto and a full compositing suite as well as everything else… I know now that I should have spent a few more hours over the holidays in a proper digital content creation app, rather than writing code. I think it would have been hours and hours of brute force roto either way. Still less frustrating than getting a good audio mix for me at this point tho!