As part of a multimedia collection launched today, we had to create a new way of displaying the hefty list of 482 (the govt claims it’s 480) items for Hong Kong’s Intangible Cultural Heritage list. The government, in creating the list through its committee, has only made available a PDF file.
We had to use OpenRefine to clean it up, which wasn’t trivial. Then, there was a lot of anguish in getting to the count of 480 that they got (in graph theory parlance, it was about counting the leaves). Finally, the easiest part, was perhaps to decide how to render the data as a 4-level HTML nested list ((ul + li) * 4).
The result was, in my opinion, quite elegant:
Although really, it’s just an HTML list. Nothing fancy, because cross-platform was a requirement.
When we published the piece today, this Friday morning, my friends Darcy and Mart jumped on the fact that someone at SCMP had published a CSV file for the masses! Can you imagine, a CSV!
(Strangely, the Heritage Museum, which hosts the pages and files, changed all of its URLs today, without any 30[1-3] redirect. Probably enough to make Darcy, an information architect, cringe his teeth out.)
Mart pointed out immediately in this thread on the ODHK group on Facebook that this can be a case study for public open data, where “data is simple enough for all government officials to understand it”, but where the public servants in charge were deposed by the tyranny of data entry using Microsoft Word.
We shall see what comes out of that exercise!
As a person obsessed with (scraping) data in his day to day job, that sort of job keeps one in good shape (and employed). But I hope that one day, we are given data that makes sense and is simple to reuse, that does not literally require hours just to clean up and re-model.
That said, this particular topic of Intangible Culture is controversial in itself. My colleague Vivienne wrote earlier this month:
But there is a dilemma. To sustain cultural heritage that’s at risk of disappearing, cash and resources are needed. Promotions help bring visitors and money. But overexposure can commercialise a tradition, turning it into tourism spectacle and severing its ties to the local community.
If we wanted to look closely at the sort of choices that were made to get to a 480-482-item list, we need to go deeper. Look at these excerpts of a long list of items (around 100) that were considered, then rejected from the list:
(That’s from a Legco Hansard document from July 2, 2014 from page 156 / 16346 — see database on the particular policy issue of ICH)
I was sure bean curd sheet (腐竹), not just regular bean curd itself could’ve made the list if the cultural bearers returned the phone calls! It would be nice to have those tables in that PDF in a good machine-readable format, but it’s probably not possible right now (I didn’t verify beyond Google), even though Legco is known in the community as leaders in open government data.
Well, to be continued…