Getting to know Julia

John MacKintosh

1 week ago

[This article was first published on HighlandR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I thought I’d try Julia out and see how far I could get with nothing but Google on my side.

I’ve had it installed for a while, but never really done anything with it. My aims for this exercise were :

download some open data
wrangle it, or at least do some sort of manipulation on it
plot it

I’ve used the daily Covid-19 statistics provided by Public Health Scotland (see code for url).

These get updated daily, and published after 2pm, so there may be a short period during the day when they are not available whilst the update is in progress.

Getting started

After installation, the first thing that needs to be done is to open up a terminal (PowerShell if you’re a Windows user) and type Julia to launch the REPL (equivalent to the RStudio console for the R users out there)

With that being done, I realised I needed to install some packages.

Julia has it’s own package manager, called Pkg

You can launch that in the REPL by hitting the ] key The prompt then changes from Julia to pkg>

Install a package with add

add DataFrames

Once you’ve installed packages, you need to do the equivalent of library in R, which in Julia, is using

using Pkg
using Dates
using CSV
using Downloads
using DataFrames
using Chain
using VegaLite

url = "https://www.opendata.nhs.scot/dataset/b318bddf-a4dc-4262-971f-0ba329e09b87/resource/427f9a25-db22-4014-a3bc-893b68243055/download/trend_ca_20220301.csv"
file = CSV.File(Downloads.download(url), missingstring = "NA", dateformat = "yyyymmdd")
df = DataFrame(file)
df[!, :Date] = string.(df[!, :Date])
df[!, :Date] = Date.(df[!, :Date], "yyyymmdd")
df |> @vlplot(:line, columns = 4, wrap = "CAName:o", x = "Date", y = "DailyPositive")

Here, I load the packages, and have defined the url for the Covid-19 data.

CSV is a package for working with CSV / flat files, while Downloads also does what it says on the tin.

You can use describe as the equivalent of glimpse or str – it gives you an overview of the object.

From this, I could see that the Date column was an integer, so I needed to convert it to a string, and from there, to a date.

Finally, I piped the df to the VegaLite package (which has a fairly effortless ability to make small multiples). Now, this may not be particularly polished (and that’s on me, I have kept this as minimal as possible), but it’s certainly more than good enough for a first look at a dataset.

At the top right of the plot window, 3 dots appear, clicking on them brings up a menu to save the plot in various formats, including svg:

Among the things I searched up – how to filter dates (still haven’t quite sussed that out yet, but I suspect I need to spend some time here.

I also looked at Gadfly, but couldn’t suss out how to get the small multiple to work legibly – again, that’s on me and my lack of time. [I will need to look at it again[(http://gadflyjl.org/stable/man/compositing/).

One other thing I discovered in passing was the Julia equivalent of the here functionality, namely joinpath , which allows you to build filepaths from parts so they are independent of OS.

root = dirname(@__FILE__)
joinpath(root, "positives.csv")

There is a lot to learn, and I am looking for something more structured, but for a quick dabble, this has been a useful exercise.

To leave a comment for the author, please follow the link and comment on their blog: HighlandR.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.