Text Mining the NZ Road Network with R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
What are the most common words in New Zealand road names? Are there any common themes?
Thankfully, New Zealand’s 73,906 current road names have been made available through the LINZ Data Service. To answer the questions above, we can use R’s tm package to conduct basic text mining.
The process is simple*. Text is cleansed of any punctuation, extra white-space, redundant or uninteresting words before being fed into wordcloud(). The 60 most common words are then displayed with size proportional to frequency of occurrence.
Can we see any common themes? Yes, namely:
1. Royalty and famous Britons: George, King, Victoria, Queen, Elizabeth, Albert, Nelson.
2. Early New Zealanders: Campbell, Russel, Grey, Scott.
3. Native trees: Kowhai, Totara, Rata, Rimu, Matai, Kauri, Miro.
4. Not-so native trees: Pine, Oak.
5. Native birds: Tui, Huia, Kiwi.
*This blog post from deltaDNA served as a guide.
References:
https://deltadna.com/blog/text-mining-in-r-for-term-frequency/
https://cran.r-project.org/web/packages/tm/index.html
https://cran.r-project.org/web/packages/wordcloud/index.html
Landonline: Road Name. Source: LINZ/Full Landonline Dataset: https://data.linz.govt.nz/table/2024-landonline-road-name/
ASP: Street Type. Source: LINZ/Electoral: https://data.linz.govt.nz/table/1210-asp-street-type/
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.