Getting to know Julia
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I thought I’d try Julia out and see how far I could get with nothing but Google on my side.
I’ve had it installed for a while, but never really done anything with it. My aims for this exercise were :
- download some open data
- wrangle it, or at least do some sort of manipulation on it
- plot it
I’ve used the daily Covid-19 statistics provided by Public Health Scotland (see code for url).
These get updated daily, and published after 2pm, so there may be a short period during the day when they are not available whilst the update is in progress.
Getting started
After installation, the first thing that needs to be done is to open up a terminal (PowerShell if you’re a Windows user) and type Julia
to launch the REPL (equivalent to the RStudio console for the R users out there)
With that being done, I realised I needed to install some packages.
Julia has it’s own package manager, called Pkg
You can launch that in the REPL by hitting the ]
key
The prompt then changes from Julia
to pkg>
Install a package with add
add DataFrames
Once you’ve installed packages, you need to do the equivalent of library
in R, which in Julia, is using
using Pkg using Dates using CSV using Downloads using DataFrames using Chain using VegaLite url = "https://www.opendata.nhs.scot/dataset/b318bddf-a4dc-4262-971f-0ba329e09b87/resource/427f9a25-db22-4014-a3bc-893b68243055/download/trend_ca_20220301.csv" file = CSV.File(Downloads.download(url), missingstring = "NA", dateformat = "yyyymmdd") df = DataFrame(file) df[!, :Date] = string.(df[!, :Date]) df[!, :Date] = Date.(df[!, :Date], "yyyymmdd") df |> @vlplot(:line, columns = 4, wrap = "CAName:o", x = "Date", y = "DailyPositive")
Here, I load the packages, and have defined the url for the Covid-19 data.
CSV
is a package for working with CSV / flat files, while Downloads
also does what it says on the tin.
You can use describe
as the equivalent of glimpse
or str
– it gives you an overview of the object.
From this, I could see that the Date column was an integer, so I needed to convert it to a string, and from there, to a date.
Finally, I piped the df to the VegaLite package (which has a fairly effortless ability to make small multiples). Now, this may not be particularly polished (and that’s on me, I have kept this as minimal as possible), but it’s certainly more than good enough for a first look at a dataset.
At the top right of the plot window, 3 dots appear, clicking on them brings up a menu to save the plot in various formats, including svg:
Among the things I searched up – how to filter dates (still haven’t quite sussed that out yet, but I suspect I need to spend some time here.
I also looked at Gadfly, but couldn’t suss out how to get the small multiple to work legibly – again, that’s on me and my lack of time. [I will need to look at it again[(http://gadflyjl.org/stable/man/compositing/).
One other thing I discovered in passing was the Julia equivalent of the here
functionality, namely joinpath
, which allows you to build filepaths from parts so they are independent of OS.
root = dirname(@__FILE__) joinpath(root, "positives.csv")
There is a lot to learn, and I am looking for something more structured, but for a quick dabble, this has been a useful exercise.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.