Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Data
Data used in this article is available here and licensed under the CC BY 4.0 license.
Strategy
The strategy is to read the JSON file using the fromJSON function of the jsonlite package. The output will be presented as a list of lists. Read individual lists, and with the help of rapply and unique functions, extract the value of the labels. Repeat this for all the data that is required to form a data frame.
The value section of the JSON file returns the elements in the form of a numeric vector. Read the vectors by adding three into their indexes and assign them to a new variable. Remember to start from the first, second, and third place to read the right element. Repeat this logic three times to create three variables. Use the same logic and create two more variables, one for the year and another for statistics.
Code
Here is the working copy of the code for your scrutiny. Please comment if you have a better and more optimized way of handling this data. If you are interested, then a copy of this code is available at github repository as well.
################################################################################ ## www.dataenq.com ## Reading a JSON file and preparing data for analysis ################################################################################ #Using jsonlite to read .json file library(jsonlite) #Using function fromJSON from jsonlite package to read the file djson <- fromJSON("https://statbank.cso.ie/StatbankServices/StatbankServices.svc/jsonservice/responseinstance/CIS78") #Preparing the data frame from the list of lists djson created above #Reading individual lists and preparing columns df <- data.frame( #Reading dimension Type of Cooperation Partner unique(rapply(djson$dataset$dimension$`Type of Cooperation Partner`$category$label, function(lst) head(lst, 1))), #Reading first and every other third value from there for each observation V2 = djson$dataset$value[seq(1, length(djson$dataset$value), 3)], #Reading second and every other third value from there for each observation V3 = djson$dataset$value[seq(2, length(djson$dataset$value), 3)], #Reading third and every other third value from there for each observation V4 = djson$dataset$value[seq(3, length(djson$dataset$value), 3)], #Reading first and every other third value from there for each observation but for dimension called year V5 = djson$dataset$value[seq(1, length(djson$dataset$value), 3)], #Reading first and every other third value from there for each observation but for dimension called Statistic V6 = djson$dataset$value[seq(2, length(djson$dataset$value), 3)]) #Assigning column names from vectors to match the data presented on the site given below # https://data.gov.ie/dataset/7b6c5d4c-955c-4eeb-a9d0-e35fb58bf200/resource/5a856b72-f470-4c71-ab1f-fbb0ef3b1e22#&r=Type%20of%20Cooperation%20Partner&c=NACE%20Rev%202%20Sector colnames(df) = c(djson$dataset$dimension$`Type of Cooperation Partner`$label, unique(rapply(djson$dataset$dimension$`NACE Rev 2 Sector`$category$label, function(lst) head(lst, 1))), unique(rapply(djson$dataset$dimension$Year$category$label, function(lst) head(lst, 1))), unique(rapply(djson$dataset$dimension$Statistic$category$label, function(lst) head(lst, 1)))) #Structure of the data frame str(df) ## 'data.frame': 12 obs. of 6 variables: ## $ Type of Cooperation Partner : chr "Any type of cooperation" "Cooperation from clients and or customers" "Cooperation from competitors" "Cooperation other enterprises within own enterprise group" ... ## $ Industries (05 to 39) : num 54.7 34.9 21.5 29 30 45.8 43.8 27 15.9 44.6 ... ## $ Industries and selected services (05 to 39,46,49 to 53,58 to 63,64 to 66,71 to 73): num 50.8 32.9 20.2 27.4 25.8 40.1 38.7 23.3 17.3 41.9 ... ## $ Selected Services (46, 49-53, 58-63, 64-66, 71-73) : num 47.8 31.5 19.3 26.3 22.7 35.9 34.8 20.5 18.5 39.9 ... ## $ 2018 : num 54.7 34.9 21.5 29 30 45.8 43.8 27 15.9 44.6 ... ## $ Co-operation by Technological Innovative Enterprises (%) : num 50.8 32.9 20.2 27.4 25.8 40.1 38.7 23.3 17.3 41.9 ... #Printing data frame df ## Type of Cooperation Partner ## 1 Any type of cooperation ## 2 Cooperation from clients and or customers ## 3 Cooperation from competitors ## 4 Cooperation other enterprises within own enterprise group ## 5 Cooperation from Universities and or third level institutions ## 6 Cooperation from suppliers of equipment, materials, components or software ## 7 Cooperation from consultants and or commercial laboratories or private research and development institutes ## 8 Cooperation from Government or public research institutes ## 9 Cooperation from public sector clients or customers ## 10 Cooperation from private business enterprises outside your enterprise group ## 11 Cooperation from other enterprises ## 12 Cooperation from non-profit organisations ## Industries (05 to 39) ## 1 54.7 ## 2 34.9 ## 3 21.5 ## 4 29.0 ## 5 30.0 ## 6 45.8 ## 7 43.8 ## 8 27.0 ## 9 15.9 ## 10 44.6 ## 11 22.7 ## 12 11.3 ## Industries and selected services (05 to 39,46,49 to 53,58 to 63,64 to 66,71 to 73) ## 1 50.8 ## 2 32.9 ## 3 20.2 ## 4 27.4 ## 5 25.8 ## 6 40.1 ## 7 38.7 ## 8 23.3 ## 9 17.3 ## 10 41.9 ## 11 21.5 ## 12 12.5 ## Selected Services (46, 49-53, 58-63, 64-66, 71-73) 2018 ## 1 47.8 54.7 ## 2 31.5 34.9 ## 3 19.3 21.5 ## 4 26.3 29.0 ## 5 22.7 30.0 ## 6 35.9 45.8 ## 7 34.8 43.8 ## 8 20.5 27.0 ## 9 18.5 15.9 ## 10 39.9 44.6 ## 11 20.6 22.7 ## 12 13.4 11.3 ## Co-operation by Technological Innovative Enterprises (%) ## 1 50.8 ## 2 32.9 ## 3 20.2 ## 4 27.4 ## 5 25.8 ## 6 40.1 ## 7 38.7 ## 8 23.3 ## 9 17.3 ## 10 41.9 ## 11 21.5 ## 12 12.5
I hope you would like this short article. Please help dataenq.com by commenting on what you think about this article and by sharing it with your network. Thank you.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.