A recent Sunday morning I was separating my trash into it’s respective recycling containers, we have a new scheme around here so I was unfamiliar with the particulars of what waste goes where. On our local Town Council website I found a directory of different trash items, and what recycling bin they must be placed in. (see here).
There is one problem with this reference: It’s an alphabetical directory of one page per letter (A-Z). I was looking for “takeaway coffee cup”, I tried “T” (for “Takeaway”) — nothing there, “C” for “Coffee Cup” — nothing there etc etc. This directory has no searching functionality. So the developer in me decided to try do something about it before settling down for breakfast.
Now – If all I wanted to do was search a list of the pages, I could have set up something pretty quick to do it, but who doesn’t want to try out new technologies and ideas when they get the chance!
The idea was to scrape the data from all the pages, then provide a search function to find waste items. Anyone who is addicted to reading HackerNews is obviously going to turn to NodeJS — so of course I did!
NodeJS app up and running, need to get pages; http can do that for me. Test. Finally got the HTML response!
I need to parse HTML pages to traverse the DOM to scrape the directory links A-Z etc. It seems jsdom will let me use jQuery, something I know how to use! jsdom fails to install on me, I find cheerio — apparently it’s fast and provides a jQuery like interface for DOM. Great!
Manage to scrape all the directory links (hint: they don’t use a nice standard “/a/”, “/b/” permastruct. Next: scrape all the individual “letter” pages for their waste items.
Parallel http requests makes NodeJS super fast for a job like this — however, syncing up after multiple requests is not something I am used to. Bug #nodejs on IRC, async is apparently good for this, bingo!
I am now scraping the full directory into an array of waste items to bin type mapping. Time to make a front end and web server! I have used express before, that seems good! NodeJS server listening on Port 3000.
Start building a really basic HTML page to display search bar and results – serve via Nginx (because that’s easiest for me!). Basic styling, jQuery to request data from NodeJS endpoint on keyup.
Run into cross domain scripting issue: port 80 (nginx) > 3000 (node), find out expressjs supports JSONP!
Web page up and running, showing results from nodejs on keyup. Tweak search on NodeJS app to case insensitive etc. Tweak styles, cancel running ajax requests on keyup.
Add iOS meta tags for app icon, viewport width, zooming etc. Test, sweet!
Push to github, install nodejs on live server, deploy. Visit on iPhone, add to homescreen — and I can now search for waste items from the kitchen on my phone! You can see it live here http://labs.hmn.md/whatwastewhere/! Certainly “minimum viable product”, I don’t think it will be making me millions. If you happen to live in the Derbyshire Dales, UK – you are welcome to use it, and you can see it on github.
Time for breakfast.
Was it worth it?
After a couple of hours coding, I had created something from beginning to end that I could actually use. It’s a pretty good feeling to be able to create something and learn a huge amount in the process (no, I don’t have anything better to do!) on a Sunday morning. Sure, NodeJS probably wasn’t the easiest route here, but that wasn’t really my intention. I wanted to solve a problem, and I wanted to do it a way I would enjoy — as it happens I have been wanting to have a play around with NodeJS.
You’de be suprised at how much you can learn in a short period of time when you are trying to build something for fun.