Archive data fun

Links and other hanky panky that doesn't have to do with anything in particular.
User avatar
Lunkhead
Rosselli
Posts: 8567
Joined: Sat Sep 25, 2004 12:14 pm
Instruments: many
Recording Method: cubase/mac/tascam4x4
Submitting as: Berkeley Social Scene
Pronouns: he/him
Location: Central Oregon
Contact:

Re: Archive data fun

Post by Lunkhead »

It's already available as a CSV:

http://sfjukebox.org/songs.csv?fightTit ... ply+Filter

A shared Google spreadsheet would save people the trouble of downloading the data and importing it into a spreadsheet program and then exporting charts and graphs and uploading them somewhere.
User avatar
Lunkhead
Rosselli
Posts: 8567
Joined: Sat Sep 25, 2004 12:14 pm
Instruments: many
Recording Method: cubase/mac/tascam4x4
Submitting as: Berkeley Social Scene
Pronouns: he/him
Location: Central Oregon
Contact:

Re: Archive data fun

Post by Lunkhead »

Well, here's something. Seems like a 9000+ row spreadsheet maybe is a bit more than Google Docs is meant to handle as it feels a bit sluggish.

https://docs.google.com/spreadsheet/ccc ... n_US#gid=0
User avatar
Lunkhead
Rosselli
Posts: 8567
Joined: Sat Sep 25, 2004 12:14 pm
Instruments: many
Recording Method: cubase/mac/tascam4x4
Submitting as: Berkeley Social Scene
Pronouns: he/him
Location: Central Oregon
Contact:

Re: Archive data fun

Post by Lunkhead »

Oh, King Arthur, I remembered what the limitation is with the Google Chart API. I can only provide a value along the vertical axis, and a label to associate with that value on the horizontal axis. It then plots those values and labels in the order it gets them. So if I wanted it to space the fights out correctly over time, I'd have to add 0 values for every day in between the days where I'm plotting the fights. I'm probably not explaining it well. Basically it's not like you give it two coordinates and it plants a dot, at least not for the bar/column/line charts. It looks like scatter plots do work more that way.
User avatar
Manhattan Glutton
Niemöller
Posts: 1530
Joined: Tue Feb 15, 2005 12:10 pm
Instruments: Angst
Recording Method: REAPER
Location: Madison, WI
Contact:

Re: Archive data fun

Post by Manhattan Glutton »

Lunkhead wrote:9000+ row spreadsheet maybe is a bit more than Google Docs is meant to handle
If I had a dollar for every one of my songs j$ has called a 90s pastiche, I'd have $1 for every song I've written.

Nur Ein Archives | The New Ugly Podcast
User avatar
Manhattan Glutton
Niemöller
Posts: 1530
Joined: Tue Feb 15, 2005 12:10 pm
Instruments: Angst
Recording Method: REAPER
Location: Madison, WI
Contact:

Re: Archive data fun

Post by Manhattan Glutton »

So this is kind of a weird request...
but I think it'd be really cool to have a graph of the rolling average of %vote for an artist.

I can't explain the details since I'd probably do it wrong, but something that shows an artist's improvement over time rather than specific fights as plots. Such that the influence of previous fight results decay over time along the plot. I'm not a statistician, but I think there must be something like that.
If I had a dollar for every one of my songs j$ has called a 90s pastiche, I'd have $1 for every song I've written.

Nur Ein Archives | The New Ugly Podcast
User avatar
Lunkhead
Rosselli
Posts: 8567
Joined: Sat Sep 25, 2004 12:14 pm
Instruments: many
Recording Method: cubase/mac/tascam4x4
Submitting as: Berkeley Social Scene
Pronouns: he/him
Location: Central Oregon
Contact:

Re: Archive data fun

Post by Lunkhead »

Unless I am completely misunderstanding you (which is possible) that sounds kind of like an audio upsampling algorithm, which I guess is sort of akin to curve fitting. You make up data points between the actual data points based on weighted averages of the points on either side, or something like that. (I wrote some upsampling code only once so I'm not really an expert on that and I've never written curve fitting code.) Anyway, I'm not sure what you envision being the end result of that but I suspect it would just be a curve that connected the % votes data points. I could pretty easily switch the straight lines on the graphs to curves. I could also try to figure out a way to get the time scale along the horizontal axis to work properly (by that I mean, have each tick correspond to a fixed time unit, like a day, or a week).

Of course, if you really want "something that shows an artist's improvement over time" then maybe the Song Fight! data isn't the place to look? :P
User avatar
Manhattan Glutton
Niemöller
Posts: 1530
Joined: Tue Feb 15, 2005 12:10 pm
Instruments: Angst
Recording Method: REAPER
Location: Madison, WI
Contact:

Re: Archive data fun

Post by Manhattan Glutton »

It doesn't have to be interpolated, but the points would be connected with bezier curves? A graph of the average % would be fine as well.

This is kind of what's going on in my head. I'm sure it makes no sense.
sfgraph.png
sfgraph.png (23.12 KiB) Viewed 2312 times
If I had a dollar for every one of my songs j$ has called a 90s pastiche, I'd have $1 for every song I've written.

Nur Ein Archives | The New Ugly Podcast
User avatar
Lunkhead
Rosselli
Posts: 8567
Joined: Sat Sep 25, 2004 12:14 pm
Instruments: many
Recording Method: cubase/mac/tascam4x4
Submitting as: Berkeley Social Scene
Pronouns: he/him
Location: Central Oregon
Contact:

Re: Archive data fun

Post by Lunkhead »

Ah, I think I am beginning to get what you mean, thanks for the visual aid. That looks pretty straightforward. I guess the variable there is the rate of decay of influence of past values?
User avatar
Manhattan Glutton
Niemöller
Posts: 1530
Joined: Tue Feb 15, 2005 12:10 pm
Instruments: Angst
Recording Method: REAPER
Location: Madison, WI
Contact:

Re: Archive data fun

Post by Manhattan Glutton »

Yes, and whether it decays per time or per fights. I'm not sure which is better. Or whether the sum or average is better.

I'm not a statistician, but it seems like something like that might be cool to show... something.
If I had a dollar for every one of my songs j$ has called a 90s pastiche, I'd have $1 for every song I've written.

Nur Ein Archives | The New Ugly Podcast
User avatar
Lunkhead
Rosselli
Posts: 8567
Joined: Sat Sep 25, 2004 12:14 pm
Instruments: many
Recording Method: cubase/mac/tascam4x4
Submitting as: Berkeley Social Scene
Pronouns: he/him
Location: Central Oregon
Contact:

Re: Archive data fun

Post by Lunkhead »

I'm just going to keep doing things by fight for now, rather than by time, so as to avoid having to deal with multiple fights on the same date, and to avoid making the horizontal scale work correctly. ;)
User avatar
Manhattan Glutton
Niemöller
Posts: 1530
Joined: Tue Feb 15, 2005 12:10 pm
Instruments: Angst
Recording Method: REAPER
Location: Madison, WI
Contact:

Re: Archive data fun

Post by Manhattan Glutton »

So I think... the sum would measure the influence in the community (and should decay per every song fight title), and the average would measure consistency in a way that is more forgivable than just the average at each point (and should decay for each fight entered by that artist).

I probably drew the pink line incorrectly. It should be jagged.
If I had a dollar for every one of my songs j$ has called a 90s pastiche, I'd have $1 for every song I've written.

Nur Ein Archives | The New Ugly Podcast
User avatar
Lunkhead
Rosselli
Posts: 8567
Joined: Sat Sep 25, 2004 12:14 pm
Instruments: many
Recording Method: cubase/mac/tascam4x4
Submitting as: Berkeley Social Scene
Pronouns: he/him
Location: Central Oregon
Contact:

Re: Archive data fun

Post by Lunkhead »

So would you expect to see the "song fight power" line start high then curve down steeply and have a "long tail" of low values? I think I have it coded up and that's what I'm seeing. I'm not sure if that's what I'm supposed to see though. Should I be incorporating the percentile rank for every previous fight when calculating the "power" for a fight, or just a fixed number of previous fights?

Also I guess I'm not really sure what that value is meant to indicate either. The artist's "influence in the community" over time? What does that mean and how is that supposed to correlate to this graph?
User avatar
Lunkhead
Rosselli
Posts: 8567
Joined: Sat Sep 25, 2004 12:14 pm
Instruments: many
Recording Method: cubase/mac/tascam4x4
Submitting as: Berkeley Social Scene
Pronouns: he/him
Location: Central Oregon
Contact:

Re: Archive data fun

Post by Lunkhead »

User avatar
fluffy
Eisenhower
Posts: 11267
Joined: Sat Sep 25, 2004 10:56 am
Instruments: sometimes
Recording Method: Logic Pro X
Submitting as: Sockpuppet
Pronouns: she/they
Location: Seattle-ish
Contact:

Re: Archive data fun

Post by fluffy »

I have no idea what that's trying to show.
User avatar
Lunkhead
Rosselli
Posts: 8567
Joined: Sat Sep 25, 2004 12:14 pm
Instruments: many
Recording Method: cubase/mac/tascam4x4
Submitting as: Berkeley Social Scene
Pronouns: he/him
Location: Central Oregon
Contact:

Re: Archive data fun

Post by Lunkhead »

Yeah, me neither. :) But it was fun to make! I don't know why I am enjoying this stuff so much, especially now that I've gotten things to the point where I can copy and paste some code and kludge out new charts pretty easily.

I think so far the percentile rank distribution/histogram seems to maybe be the most informative chart. You can see at a glance how often someone ranks highly vs in the lower percentiles. That seems like it could possibly correlate to the quality of an artist's entries.
User avatar
Manhattan Glutton
Niemöller
Posts: 1530
Joined: Tue Feb 15, 2005 12:10 pm
Instruments: Angst
Recording Method: REAPER
Location: Madison, WI
Contact:

Re: Archive data fun

Post by Manhattan Glutton »

Why are the charts going down? They should be going up!
If I had a dollar for every one of my songs j$ has called a 90s pastiche, I'd have $1 for every song I've written.

Nur Ein Archives | The New Ugly Podcast
User avatar
Lunkhead
Rosselli
Posts: 8567
Joined: Sat Sep 25, 2004 12:14 pm
Instruments: many
Recording Method: cubase/mac/tascam4x4
Submitting as: Berkeley Social Scene
Pronouns: he/him
Location: Central Oregon
Contact:

Re: Archive data fun

Post by Lunkhead »

They are going down because the "sum of song fight mojo" is being divided by a bigger number (the # of fights up to that fight) every time. Is that not correct? Here's my code:
//iterate through songs/fights, oldest to newest
// get artist's percentile rank for current song/fight
Double percentileRank = Math.max(fightIdPercentileRankMap.get(song.getFightId()) * 100, 0d);
// add percentile rank to list of percentile ranks
rankValues.add(percentileRank);
// start creating sum of percentile ranks
rankSumValues.add(percentileRank);
for (int j = (mojoIndex - 1); j >= 0; j--) {
Double rankI = rankSumValues.get(mojoIndex);
// add previous song/fight percentile rank, scaled down more the farther back you guy
rankI += rankValues.get(j) / (RATE_OF_MOJO_DECAY * (mojoIndex - j));
rankSumValues.set(mojoIndex, rankI);
}
// calculate average
mojoValues.add(rankSumValues.get(mojoIndex) / rankSumValues.size());
User avatar
Manhattan Glutton
Niemöller
Posts: 1530
Joined: Tue Feb 15, 2005 12:10 pm
Instruments: Angst
Recording Method: REAPER
Location: Madison, WI
Contact:

Re: Archive data fun

Post by Manhattan Glutton »

Maybe make it sum instead of average?
If I had a dollar for every one of my songs j$ has called a 90s pastiche, I'd have $1 for every song I've written.

Nur Ein Archives | The New Ugly Podcast
User avatar
Lunkhead
Rosselli
Posts: 8567
Joined: Sat Sep 25, 2004 12:14 pm
Instruments: many
Recording Method: cubase/mac/tascam4x4
Submitting as: Berkeley Social Scene
Pronouns: he/him
Location: Central Oregon
Contact:

Re: Archive data fun

Post by Lunkhead »

Then the lines go up. But again, I'm still not clear on exactly what it's supposed to mean?

EDIT: Site updated to use sum rather than average, and I changed the rate of decay of the influence of previous fights a bit.
User avatar
Manhattan Glutton
Niemöller
Posts: 1530
Joined: Tue Feb 15, 2005 12:10 pm
Instruments: Angst
Recording Method: REAPER
Location: Madison, WI
Contact:

Re: Archive data fun

Post by Manhattan Glutton »

I like that better! Thanks for indulging me. I feel like it's probably a better indicator of whether the next song from an artist will be "good" than a simple average. Maybe. Like I said, I'm not a statistician. :)
If I had a dollar for every one of my songs j$ has called a 90s pastiche, I'd have $1 for every song I've written.

Nur Ein Archives | The New Ugly Podcast
User avatar
Billy's Little Trip
Odie
Posts: 12090
Joined: Mon Nov 13, 2006 2:56 pm
Instruments: Guitar, Bass, Vocals, Drums, Skin Flute
Recording Method: analog to digital via Presonus FireBox, Cubase and a porn machine
Submitting as: Billy's Little Trip, Billy and the Psychotics
Location: Cali fucking ornia

Re: Archive data fun

Post by Billy's Little Trip »

Who here likes kitties?Image
....me too. :P



random pirate. Image
User avatar
Lunkhead
Rosselli
Posts: 8567
Joined: Sat Sep 25, 2004 12:14 pm
Instruments: many
Recording Method: cubase/mac/tascam4x4
Submitting as: Berkeley Social Scene
Pronouns: he/him
Location: Central Oregon
Contact:

Re: Archive data fun

Post by Lunkhead »

Thanks for interjecting BLT. No, that does not have anything to do with masturbation. Or ... does it... ?

Anyway, I decided to nerd out in a slightly different way and dig into Yahoo Pipes a bit. Here's a pipe that, given an artist's key in the archive and full name spits out an RSS feed for that artist's songs in the archive:

http://pipes.yahoo.com/pipes/pipe.run?_ ... cial+Scene

Pipes are cool! No, that is not a drug reference, BLT. Or ... is it...?!!
Post Reply