I like building tools that are broadly useful and informative. One of the most frustrating aspects of working as a structural design engineer for me was the narrow focus of my job. While I spent a lot of time performing highly detailed oriented tasks in building complex analysis models or drawing details, I rarely had the opportunity to look at the bigger picture of what I was contributing to the built environment. Coming from engineering at the “far right” of the long tail, the question was usually can we engineer this, rather than should we. To that end, when I decided to do a second batch at the Recurse Center and transition my career to focus on software, I wanted to work on a project that would deepen my knowledge of web applications, while also shedding some light on the current state of the built environment.
Just before I started my batch, I discovered the National Bridge Inventory, which seemed like a perfect fit for the type of project I wanted to do. The data was already normalized in a format that’s been consistent for several decades, it was geographically diverse, and it had enough entries to make working with a database worth it, rather than loading a Pandas dataframe. While the InfoBridge Web portal allows some plot customization and analytics capabilities, I wanted to work with much more complex data visualizations utilizing D3.
Bridge.watch is the final product of this endeavor, allowing users to interactively explore this dataset through a series of maps and hierarchical plots with custom filtering and data aggregation.
When I figuratively hung up my hard hat, I swore off using Excel for any significant purpose, which I was able to do mainly because I’m a giant Visidata fan now. Using Visidata, I could quickly get histograms of different fields to determine what would be interesting to show as a plot. I also used Visidata to do some data cleaning with regex, and eventually I wrote a Python script to do the same type of data cleaning, and fix some latitude and longitude outliers that were consistent over several years of data (I’m looking at you, whoever in Maryland keeps putting 300 bridges in the Atlantic Ocean!). The 2021 NBI data set was released this month, and with Visidata and Python I was able to quickly adjust and load the new data into my database.
Based on the NBI records structure, I thought that a relational database was the best option for looking at the 2021 data. While I probably could have gotten away with just using SQLite, I chose to use PostgreSQL mainly to be able to do more complex geographic queries with PostGIS (though I’ve only scratched the surface of it at this point). I wrote some Python functions to automatically generate a SQL file to load the database and normalize each field as well.
Django and PostgreSQL are a popular pairing, so I used the Django Rest Framework to create my API endpoints. I’ll admit that this is probably the piece of the project that I’m least happy with (and most likely to refactor in the future; my friend keeps trying to convince me to do it in Haskell for extra bonus points). I discovered a serious performance hit passing data through Django, and what I’d originally thought was lousy database performance from a lack of indices in Postgres, was actually a several second delay in Django which resulted in me ripping out the serializers I’d written and mostly passing on data as a csv through a StreamingHttpResponse.
Sending 500k data points to the browser is a bad idea. Rather than rewriting some of D3’s aggregation functions, I created an Express server to postprocess the streaming csv data into the right data shape for D3 (and seriously cut down on the size; no one wants a 15 mb JSON). Originally alongside with Express, I added a Redis cache at this level, but I ended up removing it in the final product and using NGINX’s caching capabilities instead.
I didn’t know what NGINX was before starting this project, and I found it very difficult to play around with on the DigitalOcean droplet I was using. Serendipitously, when I reached this step of the project, Julia Evans had just released a new tool for a NGINX sandbox website. I played around with this and read the docs to create my NGINX config. Configuring NGINX was probably the most “flaily” part of the project, but little by little I was able to build up a file to reverse proxy the backend, cache api calls, and properly redirect domains to avoid CORS errors.
On the advice of a friend, I chose Preact as the front end framework for the web application. I’ve built React apps before, and the transition to Preact was pretty seamless. I’m definitely not a front-end developer, so I relied on Material UI to give me a set of components I could easily use for the site.
Before I started developing this project into a proper web application, I prototyped several of the visualizations on Observable. Observable is an excellent platform for learning how to use D3. I particularly like its dependency graphs, and of course, the reactive nature of the platform.
I used docker-compose to create a multi-container application based on the pieces above. There was a lot of trial and error coming up with the proper development and production files, but once it finally ‘clicked’, it was super handy for getting everything up and running on a server.
I also read these books/used these resources these during my batch to gain a bigger picture understanding of data visualization and web applications.
[^1]: You may have digital access to these books through your local library and O’Reilly Complete Public Library