Song Fight! Data (for programmers)

Post by **Lunkhead** » Tue Jul 05, 2011 4:48 pm

This post is mostly for programmers out there. BLT, and other jokers, please don't post in here just to say that you don't know what I'm talking about, etc. Let's all just pretend that you've posted some lolz here for us all to enjoy then get on with our lives, thanks.

Anyway, I would like to suggest that any other crazy people like me and Manhattan Glutton who want to build Song Fight! related apps consider using the data that is already available from my "Jukebox" site before they go off and write more scraping code. The jukebox is really two apps: one a database of the fight related data with a simple RESTful Web service on top that makes the data available in easy to consume formats, and the other a jukebox built on top of the database. I'd love to grow the database aspect, and maybe even fully split that out from the jukebox.

Getting the data out of my site is very easy. Some examples:

All the fight data for fights started after July 1st last year, sorted with most recent fights first, in JSON:

http://sfjukebox.org/fights.json?minSta ... ding=false

All the artist data (minus the extended profile info, which I don't have yet and may never import since it's mostly stale) for artists who first entered after July 1st last year, sorted by artist name, in JSON:

http://sfjukebox.org/artists.json?minFi ... nding=true

I've limited these examples to data within the last year, but it's possible to return the whole dataset by removing the restriction. It's similarly easy to get the individual artist and fight info out in easy to consume formats.

I would love to work with anybody who wants to build a Song Fight! app to make the data available in whatever format works for them, and to work with people on trying to add data that is missing.

Maybe some day we could have something that was so good that we could then flip things around and make it the real system of record and drive songfight.org off of it and stop all the scraping and importing.

fluffy · Post by **fluffy** » Tue Jul 05, 2011 5:03 pm

That's beautiful. I hope I didn't break things too badly from the improved band key mapping. :) (I've also just fixed the way ampersands are handled, namely by removing some additional archive weirdness where some were stored as & and some were as &.)

Post by **Lunkhead** » Tue Jul 05, 2011 5:22 pm

I think the only way your changes would affect the jukebox is on the jukebox artist pages, where there is a link to the "official archive page". I had copied the old artist name to artist key mapping code that Spud sent me and pasted it into one of my Java classes and Java-fied it, to make those links. I think that's the only place where I used the artist key. I will update that to use the new code you posted in the other thread so those links work again. I would also happily link to the artist's wiki page, if MG can send me the code I need to convert an artist name into an artist wiki key (or give me a URL about how that works if it's some standard MediaWiki thing).

Post by **Lunkhead** » Tue Jul 05, 2011 5:42 pm

Oh wait...

fluffy wrote:Well, another thing I did was made it so that you can use the plain artist name in the URL, like:

http://songfight.org/artistpage.php?key=Jon%20Eric

will map internally to the same key.

So probably what I should do is remove my copy of the old code and instead use the actual artist names in my links to the "official" archive pages? And this will work as long as I properly encode the funky characters in the artist names?

fluffy · Post by **fluffy** » Tue Jul 05, 2011 5:48 pm

I only put that in as a friendlier way of sanitizing the input, but yeah, I guess it's not a bad idea. I'd be concerned about some of the weirder interplays between non-ASCII characters and entities and whatever though.

Post by **Lunkhead** » Tue Jul 05, 2011 6:12 pm

Oy, for example, ¡Juiceharp! is problematic. Doesn't work 100% right either on songfight.org or the jukebox. I'm probably just going to use the full artist names since that simplifies things on my said and removes the need for me to have to maintain a copy of the artist name to artist key mapping code. I'll just live with the few edge cases that are broken for now.

Post by **Spud** » Tue Jul 05, 2011 10:09 pm

fluffy, are you fucking with my code? just want to know, that's all.

fluffy · Post by **fluffy** » Tue Jul 05, 2011 10:45 pm

Yes, I was. Because it was broken and I'd like it to not be.

fluffy · Post by **fluffy** » Tue Jul 05, 2011 10:45 pm

Oh also I found the bug with ¡Juiceharp! et al but I got distracted by friends showing up before I had a chance to fix it. It should be fixed now. (Although note that non-ASCII characters expect ISO-8859-1 and most browsers use UTF-8 so you still can't do a direct link like http://songfight.org/artistpage.php?key=¡Juiceharp! unless that link comes from an ISO-8859-1 page. And even then it's non-guaranteed. So basically what I'm saying is that you should use the programmatically-generated keys and not try to do anything fancy like using the plaintext name.)

If you want to do the mapping yourself, the current code is this:

Code: Select all

function makeKey($aName)
{
  $aKey = strtolower($aName);

  # strip a leading "a" or "the" as a word
  $aKey = preg_replace('/^(a|the) /','',$aKey);
  # convert spaces to underscores
  $aKey = str_replace(' ','_',$aKey);
  # convert entities to plaintext
  $aKey = html_entity_decode($aKey);
  # convert non-URL-safe characters to %xx
  $aKey = urlencode($aKey);
  # convert all runs of non-allowed characters into a single _
  $aKey = preg_replace('/[^a-zA-Z0-9\-]+/','_',$aKey);

  return $aKey;
}

but it would be better to just store the URL you scraped the data from. If you're still scraping, I mean. If you're processing the actual archive data file then I guess you need the code.

jast · Post by **jast** » Wed Jul 06, 2011 4:51 am

That will make it a lot easier to automatically update the wiki. Thanks for posting.

Post by jb » Wed Jul 06, 2011 7:08 am

Crossposting from the Wiki thread.

A while ago I took a stab at creating a schema for a DB that would serve not only Song Fight, but Cover Fight, Nur Ein, and any other entity that wanted to do the traditional "song fight" thing. It's never been implemented, but here it is:

https://spreadsheets.google.com/spreads ... utput=html

I am not a professional database administrator or designer, so this design is not in any standard database design format, just a spreadsheet. A pair of spreadsheet rows defines a table, with the name of the table on top. Foreign keys are colored according to the table they come from. I did that to make sure I was linking everything correctly.

If I remember correctly, this is compliant with at least the second normal form (I didn't evaluate it against the third).

I am releasing this into the public domain! Use as you please.

JB

Song Fight! Data (for programmers)

Song Fight! Data (for programmers)

Re: Song Fight! Data (for programmers)

Re: Song Fight! Data (for programmers)

Re: Song Fight! Data (for programmers)

Re: Song Fight! Data (for programmers)

Re: Song Fight! Data (for programmers)

Re: Song Fight! Data (for programmers)

Re: Song Fight! Data (for programmers)

Re: Song Fight! Data (for programmers)

Re: Song Fight! Data (for programmers)

Re: Song Fight! Data (for programmers)