OldAms.nl — the process of creating the largest collection of historical photos of Amsterdam on map
So how exactly this project had been created?
Last year, I participated in N8 Hackathon Amsterdam, I still proud for the project we created in an awesome team with David Swinstead, Elena Borisova, Filip Lysyszyn, Polina Raevskaya, Yaroslav Kravchenko, Natalia Rozvezeva, Helena Gaevskaya and Vladimir Kalmykov. We were using open APIs to get information of how many people we have around museums in Amsterdam to recommend where the users of the app should go to avoid huge queues and have overall better experience visiting city and putting more attractions in a smaller timeframe.
We still didn’t publish that app, although we are going to do it this year. But the story is not about app.
Since that Hackathon, we knew that Amsterdam has a lot of public city APIs and ~ 2 month ago I’ve found http://oldsf.org/ project. Inspired but what authors of OldSF did and knowing that we have an API for the old photos of the city and nobody tried to geocode them yet, we decided to create the same project for Amsterdam.
Collecting & shaping the data
Beeldbank has an OpenSearch API, which simplifies initial data import quiet a lot. We used http://beeldbank.amsterdam.nl/api/opensearch/ and created script which aggregated all the content into MySQL database. As a result, we’ve got a huge ~ 360k entries table.
There were multiple content types. We were interested only in photos (and building plans later). So we copied the table over and trimmed it down to photos only. What was nice with the database is that we’ve got some addresses information right away:
So, we just removed entries without addresses right away (we were looking in both description & subject fields to find this information).
As the next step, we moved addresses to a separate table and fetched latitude-longitude pairs using Geocoding API.
First version we’ve launched was a prototype. We used NodeJS and a DigitalOcean hosting to quickly get it running. It didn’t have any clustering or so, but provided ability to at least check what photos landmarks placed on map had.
We showed that version to friends to get initial feedback.
There were some nasty problems or simply unpleasant experience with:
- Lack of clustering
- We were showing photos from the CDN Beeldbank used — it was ultra-slow, sometimes people were getting stuck waiting for photos to load for couple of seconds
- We were showing wide photos right away (width = 500px), turned out people wanted to see an overview of photos for landmark first and only after that they want to enlarge the photo to take a closer look. Also, bandwidth matters — we have landmarks with 100+ photos.
- At a certain period of time we’ve moved to SQL database as we needed to constantly add new photos to collection (we are still doing this every day)
After aggregating all the feedback about prototype, we’ve rewritten application.
It is now running on Ruby on Rails (we chose RoR for simple out-of-the-box database migrations and rake tasks we started to use to speak with various APIs and refining data) and uses PostgreSQL database. We also switched hosting from DigitalOcean to Heroku as it is way cheaper for our use-case and allows us to scale as we need (we didn’t need it so far), not talking about a very simple deployment process.
We fixed a problem with downloading photos from Beeldbank CDN by putting them in Amazon S3.
And clustring has been solved by simply using Google MarkerClusterer library https://github.com/googlemaps/js-marker-clusterer. It provides optimal performance even in mobile browser.