In my previous post Correlating Rainfall and Sump Pump Activity in Power BI, I use data from the National Oceanic and Atmospheric Administration (NOAA) to analyze rainfall near my house.
To do so, I made GET requests to multiple NOAA endpoints to fetch:
- NOAA weather stations with data in my zip code
- Historical daily, monthly, and annual rainfall for those stations
In this post, I will explain how I use Power Query within Power BI to fetch and transform this data prior to presentation in a visualizations.
NOAA Data and Power BI Basics
NOAA’s Climate Data Online (CDO) group offers an impressive range of data, which can be downloaded, requested via email, and called via API depending on the dataset.
If you are pulling data once or infrequently, their search and email service is quick and easy.
But if you want to pull data more frequently and automate any transformations, the API is the way to go. For my needs, I hit both v1 and v2 of their API. Specifically, I used:
- API v2 to fetch stations for a given zip code
- Note: The v2 endpoints require an access token in your request header, which can be requested here.
- API v1 to fetch daily, monthly, and annual rainfall totals for specific stations
- v1 endpoints do not require an access token
At the end of this post I’ll link to my Power BI file, if you want to inspect or modify it for your own use. Of course, the API v2 call will not work “out of the box” as you’ll need to provide your own access token.
If you haven’t used it before, Power BI Desktop (which includes Power Query) is made by Microsoft and free to download here.
Fetching Weather Stations for a Zip Code
From Power BI, you’ll want to select Transform Data and then New Source > From Web.
Then choose Advanced and enter the URL as directed per the API docs.
You’ll also need to type the key “token” as a request header parameter and provide your NOAA provided access token as the value.
Hit OK and Power Query will return a data table like the below.
Boom! We are in business. It’s actually pretty impressive how much logic its giving us for free here. At this point, I’d suggest right clicking on the query name in the left pane and choose Advanced Editor.
If you are unfamiliar, you are looking at Power Query M and it’s pretty impressive what Microsoft is giving us for free. Specifically, Power Query has:
- Determined the response is formatted as JSON
- There are two objects in the response; both “metadata” and “results”
- Parsed both responses to populate column headers and associated values
If you close the Advanced Editor and look to the right of the data in Power Query, you’ll see the Query Settings generated, which are basically a summary of each line in the Advanced Editor.
Take note of the “settings” gears. Power Query gives us lot of options to modify or extend this logic through a controlled interface or directly in the Advanced Editor.
If you do inspect my working file, in the query titled “getStationsForZip” you’ll see I did go onto to do some extensions and modifications, such as:
- Parameterize query values like Zip Code and Access Token
- Renaming the query and some columns
- Adding additional custom columns to the data set
- Change the data type of given column(s)
To keep this post somewhat manageable, I won’t do a step-by-step here on these changes, but possibly in a future post.
In my opinion, Power Query makes such transformations trivial. Arguably, the “hardest” part would be parameterizing query values, but if you have almost any familiarity with concatenation and logical syntaxes, it’s a snap.
Fetching Daily Historical Rain Data for Stations
As mentioned above, this GET request will go to NOAA’s API v1, which does not require an access token. Tax dollars at work! You can anonymously hammer this endpoint all day long!
Joking aside, we use the same steps as before selecting Transform Data and then New Source > From Web. It’s even easier now as there’s no need to choose Advanced as there’s no token.
We just provide our URL according to the documentation.
After clicking OK, you’ll see an interim interface that just asks you to confirm the data structure.
This is happening because this endpoint provides a response in CSV format if not otherwise specified. But no worries, Power Query handles all this with ease. Click OK again and you’ve got daily rain data.
If you are so inclined, the endpoint documentation explains how to request other formats.
Again if you go into my working file, you’ll see some changes like paramaterization of the query criteria in “getPrecipDailySummariesForStations”.
Additionally, the astute reader will see I’ve applied a Group By (grouping aggregation) to keep only the max precipitation recorded on a given day. I pulled data for two stations to get more historical data. So on days where both stations provide rainfall data, I conservatively keep the higher of the two.
Again, I won’t go step-by-step on how to do these transformations here but I promise Power Query makes it stupid easy.
Fetching Monthly and Annual Historical Rain Data
With the daily data, you can of course calculate monthly and annual values on your own. But if you want to leave that processing up to the taxpayers, it’s just like fetching the daily’s with the below.
Thanks for reading. In future posts I will:
- Explain how to put this data to work and build the visualizations shown in Correlating Rainfall and Sump Pump Activity in Power BI
- Explain how to fetch data from the same endpoints using Python and Tableau Prep
Leave a Reply