Hetu-package for handling of Finnish personal identity codes

[This article was first published on rOpenGov R packages for open government data analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

General information

Hetu-package for R is meant for algorithmic handling of Finnish personal identity numbers (PINs). The package is especially useful for those who wish to extract information from or validate a large number of PINs at a time.

The toolset for analyzing Finnish PINs was initially developed as a part of sorvi-package, but was later made into a separate package. The development of the hetu-package reached an important milestone in Fall 2020 when it was published in CRAN.

The development of hetu-package is closely related to sweidnumbr, a similar package meant for analyzing Swedish personal identity numbers (PINs) and organizational identity numbers (OINs). Hetu-package shares similar function names with sweidnumbr, when applicable.1

Finnish personal identification code, hetu

Personal identification code (or: national identification number, national identity number, personal identification number or PIN) is meant to be a unique identifier for individuals. Finnish personal identification number (henkilötunnus, hetu for short) consists of date (DDMMYY), century marker (-, + or A), personal number (NNN) and checkmark (C). Males have an odd personal number and females an even personal number.2

Personal identity codes are widely used in public and private sectors alike. They are not confidential or secret information, but like every personal information, handling hetu-codes requires consent from the individual or a valid reason.

Algorithmic handling of hetu-pins

Analyzing and extracting information from Finnish personal identity numbers is rather straightforward even with a naked eye. Hetu-package naturally excels in handling large number of PINs, which would be cumbersome otherwise.

Hetu-package has functions to extract the following information:

  • hetu_date / pin_date: date of birth
  • hetu_sex / pin_sex: sex, Male or Female
  • hetu_age / pin_age: age in years, months or days (at the time of the query or at a desired date)
  • hetu_ctrl / pin_ctrl: validity check for the PIN, TRUE or FALSE

Use of hetu-package

Installing the package in R from CRAN:

install.packages("hetu")

Loading the package and setting a few imaginary PINs for testing:

library(hetu)
example_pins <- c("010101-0101", "111111-111C")

Hetu-function is the backbone of the package and majority of the information that can be extracted is available as a simple data frame:

knitr::kable(hetu(example_pins))
hetu sex p.num checksum date day month year century valid.pin
010101-0101 Female 010 1 1901-01-01 1 1 1901 - TRUE
111111-111C Male 111 C 1911-11-11 11 11 1911 - TRUE

There are several alternatives in extracting specific information about a group of PINs, for example date of birth. If the output of the hetu-function is saved as an object, all columns can be normally subsetted. For the convenience of the end user, the information in the data frame can also be extracted by using extract-parameter in the hetu-function or by using one of the specialized functions:

# Extracting sex
hetu(example_pins, extract = "sex")
## [1] "Female" "Male"
hetu_sex(example_pins)
## [1] "Female" "Male"
# Extracting date of birth
hetu(example_pins, extract = "date")
## [1] "1901-01-01" "1911-11-11"
hetu_date(example_pins)
## [1] "1901-01-01" "1911-11-11"
# Extracting information on validity
hetu(example_pins, extract = "valid.pin")
## [1] TRUE TRUE
hetu_ctrl(example_pins)
## [1] TRUE TRUE
# Information that can be extracted only with extract-parameter
hetu(example_pins, extract = "p.num")
## [1] "010" "111"

In contrast to other information, extracting age works only with a specialized function. In this example we will also introduce the ability to generate random PINs with rhetu-function:

example_pins2 <- rhetu(5, start = "1950-01-01", end = "1995-05-07")
# Age in years
hetu_age(example_pins2)
## The age in years has been calculated at 2021-01-31.
## [1] 33 69 62 31 43
# Age in months
hetu_age(example_pins2, timespan = "months")
## The age in months has been calculated at 2021-01-31.
## [1] 403 839 752 383 521
# Age in 2011
hetu_age(example_pins2, date = "2011-01-01")
## The age in years has been calculated at 2011-01-01.
## [1] 23 59 52 21 33
# Visualization: boxplot grouped by sex
example_pins3 <- rhetu(20, start = "1950-01-01", end = "1995-05-07", p.male = 0.5)
boxplot(hetu_age(example_pins3)~hetu_sex(example_pins3), xlab = "", ylab = "Age in years", col=c("cyan", "magenta"))
## The age in years has been calculated at 2021-01-31.

In some cases diagnostics information for invalid PINs might be useful:

hetu_diagnostic("321399-000G")
##           hetu is.temp valid.p.num valid.checksum correct.checksum valid.date
## 21 321399-000G   FALSE       FALSE          FALSE            FALSE      FALSE
##    valid.day valid.month valid.year valid.length valid.century
## 21     FALSE       FALSE       TRUE         TRUE          TRUE
# Print only certain columns
hetu_diagnostic("321399-000G", extract = c("valid.p.num", "valid.length"))
##           hetu valid.p.num valid.length
## 21 321399-000G       FALSE         TRUE

Business Identity Numbers (Y-tunnus, BID)

As in sweidnumbr, hetu-package has two functions that can be used with Finnish Business Identity Numbers (y-tunnus). Finnish business identity numbers have the form 1234567-8, where the last number is a checknumber.3 The following functions are available:

  • bid_ctrl(bid): checks the valiity of the BID, TRUE or FALSE
  • rbid(n): generates n BIDs
example_bids <- rbid(2)
example_bids
## [1] "7128741-6" "1963928-5"
bid_ctrl(example_bids)
## [1] TRUE TRUE

No additional information can be extracted from BIDs.

References


  1. More information about sweidnumbr can be found e.g. from this blogpost: Magnusson, Mans & Bulow, Erik. 2015. R made personal (at least for swedes)!. URL: https://ropengov.org/2015/08/r-made-personal-at-least-for-swedes/↩︎

  2. Digital and Population Data Services Agency (Digi- ja väestötietovirasto). The personal identity code. URL: https://dvv.fi/en/personal-identity-code↩︎

  3. Finnish Patent and Registration Office. The Business Information System (BIS). URL: https://www.prh.fi/en/kaupparekisteri/rekisterointipalvelut/ytj.html↩︎

To leave a comment for the author, please follow the link and comment on their blog: rOpenGov R packages for open government data analytics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)