Page 3 of 5

Re: Archive data fun

Posted: Tue Aug 23, 2011 10:12 pm
by Lunkhead
It's already available as a CSV:

http://sfjukebox.org/songs.csv?fightTit ... ply+Filter

A shared Google spreadsheet would save people the trouble of downloading the data and importing it into a spreadsheet program and then exporting charts and graphs and uploading them somewhere.

Re: Archive data fun

Posted: Tue Aug 23, 2011 10:32 pm
by Lunkhead
Well, here's something. Seems like a 9000+ row spreadsheet maybe is a bit more than Google Docs is meant to handle as it feels a bit sluggish.

https://docs.google.com/spreadsheet/ccc ... n_US#gid=0

Re: Archive data fun

Posted: Tue Aug 23, 2011 11:10 pm
by Lunkhead
Oh, King Arthur, I remembered what the limitation is with the Google Chart API. I can only provide a value along the vertical axis, and a label to associate with that value on the horizontal axis. It then plots those values and labels in the order it gets them. So if I wanted it to space the fights out correctly over time, I'd have to add 0 values for every day in between the days where I'm plotting the fights. I'm probably not explaining it well. Basically it's not like you give it two coordinates and it plants a dot, at least not for the bar/column/line charts. It looks like scatter plots do work more that way.

Re: Archive data fun

Posted: Wed Aug 24, 2011 7:55 am
by Manhattan Glutton
Lunkhead wrote:9000+ row spreadsheet maybe is a bit more than Google Docs is meant to handle

Re: Archive data fun

Posted: Wed Aug 24, 2011 11:35 am
by Manhattan Glutton
So this is kind of a weird request...
but I think it'd be really cool to have a graph of the rolling average of %vote for an artist.

I can't explain the details since I'd probably do it wrong, but something that shows an artist's improvement over time rather than specific fights as plots. Such that the influence of previous fight results decay over time along the plot. I'm not a statistician, but I think there must be something like that.

Re: Archive data fun

Posted: Wed Aug 24, 2011 11:49 am
by Lunkhead
Unless I am completely misunderstanding you (which is possible) that sounds kind of like an audio upsampling algorithm, which I guess is sort of akin to curve fitting. You make up data points between the actual data points based on weighted averages of the points on either side, or something like that. (I wrote some upsampling code only once so I'm not really an expert on that and I've never written curve fitting code.) Anyway, I'm not sure what you envision being the end result of that but I suspect it would just be a curve that connected the % votes data points. I could pretty easily switch the straight lines on the graphs to curves. I could also try to figure out a way to get the time scale along the horizontal axis to work properly (by that I mean, have each tick correspond to a fixed time unit, like a day, or a week).

Of course, if you really want "something that shows an artist's improvement over time" then maybe the Song Fight! data isn't the place to look? :P

Re: Archive data fun

Posted: Wed Aug 24, 2011 12:05 pm
by Manhattan Glutton
It doesn't have to be interpolated, but the points would be connected with bezier curves? A graph of the average % would be fine as well.

This is kind of what's going on in my head. I'm sure it makes no sense.
sfgraph.png
sfgraph.png (23.12 KiB) Viewed 2295 times

Re: Archive data fun

Posted: Wed Aug 24, 2011 12:27 pm
by Lunkhead
Ah, I think I am beginning to get what you mean, thanks for the visual aid. That looks pretty straightforward. I guess the variable there is the rate of decay of influence of past values?

Re: Archive data fun

Posted: Wed Aug 24, 2011 12:32 pm
by Manhattan Glutton
Yes, and whether it decays per time or per fights. I'm not sure which is better. Or whether the sum or average is better.

I'm not a statistician, but it seems like something like that might be cool to show... something.

Re: Archive data fun

Posted: Wed Aug 24, 2011 12:34 pm
by Lunkhead
I'm just going to keep doing things by fight for now, rather than by time, so as to avoid having to deal with multiple fights on the same date, and to avoid making the horizontal scale work correctly. ;)

Re: Archive data fun

Posted: Wed Aug 24, 2011 12:37 pm
by Manhattan Glutton
So I think... the sum would measure the influence in the community (and should decay per every song fight title), and the average would measure consistency in a way that is more forgivable than just the average at each point (and should decay for each fight entered by that artist).

I probably drew the pink line incorrectly. It should be jagged.

Re: Archive data fun

Posted: Wed Aug 24, 2011 1:28 pm
by Lunkhead
So would you expect to see the "song fight power" line start high then curve down steeply and have a "long tail" of low values? I think I have it coded up and that's what I'm seeing. I'm not sure if that's what I'm supposed to see though. Should I be incorporating the percentile rank for every previous fight when calculating the "power" for a fight, or just a fixed number of previous fights?

Also I guess I'm not really sure what that value is meant to indicate either. The artist's "influence in the community" over time? What does that mean and how is that supposed to correlate to this graph?

Re: Archive data fun

Posted: Wed Aug 24, 2011 1:36 pm
by Lunkhead

Re: Archive data fun

Posted: Wed Aug 24, 2011 1:39 pm
by fluffy
I have no idea what that's trying to show.

Re: Archive data fun

Posted: Wed Aug 24, 2011 1:48 pm
by Lunkhead
Yeah, me neither. :) But it was fun to make! I don't know why I am enjoying this stuff so much, especially now that I've gotten things to the point where I can copy and paste some code and kludge out new charts pretty easily.

I think so far the percentile rank distribution/histogram seems to maybe be the most informative chart. You can see at a glance how often someone ranks highly vs in the lower percentiles. That seems like it could possibly correlate to the quality of an artist's entries.

Re: Archive data fun

Posted: Wed Aug 24, 2011 1:49 pm
by Manhattan Glutton
Why are the charts going down? They should be going up!

Re: Archive data fun

Posted: Wed Aug 24, 2011 1:51 pm
by Lunkhead
They are going down because the "sum of song fight mojo" is being divided by a bigger number (the # of fights up to that fight) every time. Is that not correct? Here's my code:
//iterate through songs/fights, oldest to newest
// get artist's percentile rank for current song/fight
Double percentileRank = Math.max(fightIdPercentileRankMap.get(song.getFightId()) * 100, 0d);
// add percentile rank to list of percentile ranks
rankValues.add(percentileRank);
// start creating sum of percentile ranks
rankSumValues.add(percentileRank);
for (int j = (mojoIndex - 1); j >= 0; j--) {
Double rankI = rankSumValues.get(mojoIndex);
// add previous song/fight percentile rank, scaled down more the farther back you guy
rankI += rankValues.get(j) / (RATE_OF_MOJO_DECAY * (mojoIndex - j));
rankSumValues.set(mojoIndex, rankI);
}
// calculate average
mojoValues.add(rankSumValues.get(mojoIndex) / rankSumValues.size());

Re: Archive data fun

Posted: Wed Aug 24, 2011 1:53 pm
by Manhattan Glutton
Maybe make it sum instead of average?

Re: Archive data fun

Posted: Wed Aug 24, 2011 1:59 pm
by Lunkhead
Then the lines go up. But again, I'm still not clear on exactly what it's supposed to mean?

EDIT: Site updated to use sum rather than average, and I changed the rate of decay of the influence of previous fights a bit.

Re: Archive data fun

Posted: Wed Aug 24, 2011 2:08 pm
by Manhattan Glutton
I like that better! Thanks for indulging me. I feel like it's probably a better indicator of whether the next song from an artist will be "good" than a simple average. Maybe. Like I said, I'm not a statistician. :)

Re: Archive data fun

Posted: Wed Aug 24, 2011 11:00 pm
by Billy's Little Trip
Who here likes kitties?Image
....me too. :P



random pirate. Image

Re: Archive data fun

Posted: Fri Aug 26, 2011 11:58 pm
by Lunkhead
Thanks for interjecting BLT. No, that does not have anything to do with masturbation. Or ... does it... ?

Anyway, I decided to nerd out in a slightly different way and dig into Yahoo Pipes a bit. Here's a pipe that, given an artist's key in the archive and full name spits out an RSS feed for that artist's songs in the archive:

http://pipes.yahoo.com/pipes/pipe.run?_ ... cial+Scene

Pipes are cool! No, that is not a drug reference, BLT. Or ... is it...?!!