As a journalist, it is important to get data from reliable sources. One such source is Danmarks Statistik: http://www.dst.dk/da
In this post, we are going to collect data from dst.dk and import it to Google Fusion Tables.
Also, we are going to encounter lots of problems and we have to learn, how to be flexible and how to deal with them.
First, click on “Find Statistik” and then “Statistikbanken”, here you can find a wealth of information. I chose the following table: http://www.statistikbanken.dk/BOP6
Here, you can choose what data specifically you want to see.
I was interested in the transport sector, especially in relation to Germany over the course of the last decade. If you press the control-key on your keyboard, you can choose multiple entries in each list, my selection looked like this:
Then you can click “Vis Tabel” and you will see the data set, and be able to download it as a file.
We now want to import the file to Fusion Tables. Go to fusion tables http://www.google.com/drive/start/apps.html#fusiontables and create a new table, choose to upload the file and select “tab” as your separator character.
Apparently, we have to clean our data set first, so Fusion Tables can understand it. Use a text-editor, f.eks Notepad++ for windows or gedit for Linux to open the file.
We can see that the file has multiple parts, one for “Tyskland” and one for “Lande i alt” and some more information, like the name of the table. To fix this, i am going to create two new, empty files and copy and paste the data for Germany and the other countries separately. Then i add the column names ( years ) with tabs in between, make sure there are tabs between the numbers and delete unnecessary quotes.
Which i now can upload to fusion tables, which now understands the file properly
On the next page we attribute the data to “Danmark Statistik” and provide a link, to where the data can be found.
We end up with a nice table
I apply the exact same process to the table with the Germany data
You can now press the + button and select “new chart”, here you can play around with different visualizations of the data. When i did this, i found that none of the visualizations really showed what i wanted. Then it hit me: the organization of the data in the table made not a lot of sense.
It is much more useful to have years organized in rows instead of having one column for each year.
I went back to the Danmark Statistik webpage and looked for options to change the format of the data. Looks like we are in luck. The page offers us an option to pivot the dataset. After some fiddling around i came up with the following organization:
Now i downloaded the xls file, which can be openend in Microsoft Excel or Libre / Open Office.
Overall, the use of the xls file format made it a lot easier to create this file.
Upload the new file to fusion tables and check out the charts again. Now the graphs make a lot more sense.
My point with this post is, that we are going to run into lots of trouble when we are working with tables and data. Formatting and organization of data is important and we have to learn how to get it right, and what tools can support us with these tasks.