The development process is broken down into four main sections. Each section represents a different portion of the main website.
Interpreting the structure of the data, learning Pandas to clean data, writing a script to clean the data
Figuring out which portion of the data to visualize, designing the visualization
Designing elements for the website, creating a script to round off pixel colors, change image colors, and make images transparent
Designing the website layout/functionality, learning HTML/CSS/JS, Implementing HTML/CSS/JS
Special thanks to Wei Jun Tan for cleaning up my code and adding mobile support
The original data set provided by USGS.gov was far from ideal. There were over 250 columns, some related, some unrelated. It would have been time consuming and error prone to attempt to build a visualization off of this set of columns. To make matters worse, much of the data was incomplete or inconsistently entered with varying data types. Therefore, the first reasonable step of the project was to clean the data. This was done in order to develop reasonable insights to guide towards the data and simplify the visualization creation process. Cleaning the data was split into two main parts: interpreting and organizing structure of data and writing scripts to merge relevant data and remove inconsistencies.
Most of the time here was spent figuring out which columns to merge and why. The data already had pre-aggregated data, however it was inconsistent. This was because sub-columns of aggregate columns were not always summed correctly. From this information, columns were organized into parent-child relationships with aggregate columns being the parent. The decision was made to sum off sub-columns to generate new aggregate columns due to the inconsistencies in the pre-existing aggregate columns.
The next step was to shorten column names, convert the given tsv to csv, and remove unnecessary text in the file. The data was then converted into a pandas data frame for cleaning. New aggregate columns were generated off of root-children from the determined structure, while zero-data columns were dropped. The result was a column reduction of over 200. In the end, there were about 70 aggregate columns remaining, each categorized by water consumptive use, withdrawals, and demographic data and sub categorized by industry.
For the visualization, we opted to use an embedded Tableau visualization. This was because most of the team had experience with Tableau. We were willing to lose some expressiveness in favor of ease-of-use because some of the team was unfamiliar with web development. It would have likely been too overwhelming to learn a new visualization language on top of learning web development and the tools for data cleaning. From there, we designed a visualization that we thought struck balance between amount interaction and amount of visualization elements
The hardest part of the visualization was encoding each element to be reactive based on filters and selections from other elements. There were a lot of dependencies to keep track of in order to ensure a fluid visualization.
The most time consuming bug was placing a reactive stationary percentage in the donut charts. This was because it was difficult to know which years were selected due to the years being repeated multiple times for each county. The solution was to treat years as a separated sum of a scalar multiples of each year.
Ex. Selected(1985, 1995, 2015) -> Count(Year) * 1985 + Count(Year) * 1995 Count(Year) * 2015
In order to match the design and look of the website, some elements need to be created and processed to fit in the webpage.
The images created for the website needed to be either meaningful or serve a functional purpose. For example, arrows were used to guide the user through the website while the silhouette lake reflection represents the different sectors (residential, irrigation, energy, city) and their consumptive use.
Python offers a library for basic image processing that was convenient for this part of the project. Image pixels were converted to rgba tuples and rounded off to a single color, made transparent, or changed to another color.
One of the primary concerns in developing the frontend of the website was balancing a visually appealing layout, a concise and accurate display of information, and engaging elements. Secondary concerns included mobile support and a responsive design across all resolutions. Animations were added to capture audience attention while arrow prompts were used to actively engage the audience.
A large portion of the time spent developing the website was through experimentation on how to use the frontend languages in general. As a result, much of the code was uncommented, unorganized, and suboptimal. From here, a decent amount of time was spent on restructuring code and style changes. This was to meet the web language style guidelines to a reasonable degree.
The most difficult part of the frontend development was creating a responsive layout based on resolution of the device. The website needed to be reasonably readable and interactable regardless of the display it is on. This was achieved through scaling and positioning elements based on ratios rather than fixed values.
The visualization was primarily designed to guide the user through how different areas of California used water compared to each other.
This was made possible through a visualization that utiizes:
Visualization elements were sized based on importance and focus priority. The largest element is the map because it is the main interaction piece. While the second largest element, the dot chart, is used as a complementary interaction. The donut charts are not interactable and only display adjusted information, hence they are the smallest chart element.
To take it even further, elements were also positioned based on importance. With the level of importance moving from left to right first and from up to down second. Based on this, the map and title are prioritized. While the information disclaimer and donut charts are secondary.
The map offers a visual representation of the area and relative position to other counties. This allows a very visual selection that facilitates easy comparisons between areas.
Since consumptive use is in Mgal/day in five year windows. The stacked dot plot lets the user compare discrete values of consumptive use between years; and allows the user to select counties close to each other in consumptive use for comparison.
The donut chart allows for the comparison of select counties to the entirety of California. It acts as a fixed comparison point for reference that conveys both demographic and measure data.
The year checkbox filter allows the user to compare subsets of years
This is the main select function of the visualization. It provides the filters for the entire sheet. The select functions include box select, free-form select, circle select, and single select on the map. Each select allows the user to highlight specific areas of the map.
The main reason for the chart select is to zoom-in on closely related data or highlight specific sections of previous selects
Color was chosen primarily based on what would fit the theme of the project, visual aesthetics, and contrast. A light blue was chosen for the majority of elements to symbolize water while the grayish black was chosen as a soothing complement to the visualization elements. Bright red was chosen as the highlighted warnings because it universally symbolizes danger or alert. Dim white text was used to contrast with the background while minimizing the color clash between white and black.
The next color consideration is color blindness. Most of Tableau’s color palettes are reasonably color blind friendly. For this reason, color encoding was not prioritized for the visualization.
Labels were placed on elements that needed additional context. For example, dots are labeled by counties, donut chart fill is labeled by percent, total is labeled by the main body(California), red label for explaining why some chunks of 5 year data were missing, definition labels to explain uncommon terms (“Consumptive Use”), and individual hover labels to see specific data values.
For the axes of the stack dot chart, counties are plotted on the y-axis while Mgal/d of consumptive use was plotted on the x-axis. This was mainly for space efficiency as the opposite orientation would likely not fit on the page. X grid lines are dotted to allow for better visibility of dots while Y grid lines are disable to reduce cluttering. Grid lines are spaced on a log scale to separate close data values.
Map shapes are the same as county shapes. Chart dots are hollow to see closely related values