Music Blog Visualization with Pandas

Katie Wojciechowski
Aug 14
3 min read

My friend came up with some awesome visualizations for the music website we both write for.

However, he realized after the fact that he could have worked with the data FAR more efficiently if he had used Pandas instead of just Numpy and a bunch of Python loops and functions. So, I set about reworking the code to utilize Pandas to get the same output for one of the most popular charts! For the consumer, this wouldn't make a difference, but it makes it cleaner on the back end and easier to incorporate new data in the future.

Originally, my friend had loaded the data (housed in a CSV) by looping through each line of the CSV and assigning a bunch of items to a bunch of lists (a few of which are pictured above).

Here’s what I did instead.

Pointed to the filepath on my desktop with a variable called SWIMSTATS.
Saved readable rows (a list called clean_lines) to a Pandas DataFrame.

I quickly realized there was a problem with reading the CSV. Based on the error messages, it seemed at least one row of the document had 11 fields instead of 10. I counted all the rows with unexpected field counts and found out there were 68 - that’s a few too many to ignore, since the whole document only has a few hundred rows.

Compiled a list of unreadable rows and diagnosed the problem.

So, I created a list of “bad” rows and printed out something like this for each one, naming the row number, the amount of fields, and the content of whatever’s in an unexpected field:

Line 412 has 11 fields.

Extra fields beyond expected columns: [' ']

The pattern here was obvious: the unexpected content for all 68 rows - the contents of the 11th field, because they all had 11 - was just an extra space. Now this, we can work with!

Trimmed the problematic blank space off the end of the 68 unreadable rows.

With all of the “bad” rows already in a list, I looped through and cut off the extra space, making sure nothing would be tallied as an extra field in those rows. I should point out - this CSV was downloaded in such a way that each row’s fields were delimited (separated) by an asterisk. I configured the code to handle it as such.

Concatenated the now-readable 68 rows to the rest of the content so that we can work with the complete data.
Check the column names to make sure we know what we’re working with.

They are: ['Post Number', 'Date', 'Title', 'Website Link', 'Google Doc', 'Music/Life', 'Category', 'Word Count', 'Author', 'Favorite']

For readability, I filtered the histogram to just 2024.

This is how my friend had it visualized, anyway. Otherwise, the X axis gets pretty hard to read, because as you add more years to the graph, Taylor (who runs the site and therefore writes WAY more articles than anyone else) will always be a higher and higher outlier for number of articles.

I modified the existing function used to create the plot.

The former one was working, again, with a bunch of different lists. I simplified it to begin with a loop that bins max article counts for each author - so the X axis, like I mentioned above, is ticked by number of articles per author. The Y axis is number of authors per bin. My friend had already laid out all the logic for this, and had a lot of good formatting in place - I just had to adapt it for the DataFrame.

The output histogram was the same as my friend’s, but that was the point!

Here's the finished product:

This was a great learning experience for me to:

Practice reading and understanding code and documentation written by others
Gain hands-on experience using GitHub for version control and collaboration
Troubleshoot issues with a messy CSV file by implementing custom data cleaning and loading strategies
Adapt a visualization to produce consistent output from differently structured datasets

Check out the aforementioned articles on Swim Into The Sound while you're at it!

10 Comments

tuanpencet

5 days ago

Great website. A lot of helpful information here. Tuan Pencet sending it to several friends ans additionally sharing in delicious. And of course, thank you to your sweat!

pausempire

6 days ago

Paus Empire have been examinating out some of your posts and Paus Empire can state nice stuff.

nagaempire

Nov 04

Naga Empire wish to point out my love for your kind-heartedness in support of persons that need help on this important subject matter.

rajabotak

Nov 02

Raja Botak loved as much as you’ll receive carried out right here.

empire88

Nov 01

Great goods from you, man. Empire88 understand your stuff previous to and you’re just too fantastic.