Preprocessing the Norwegian Web as Corpus (NoWaC) in R

Posted on September 27, 2023 by R on Pablo Bernabeu in R bloggers | 0 Comments

[This article was first published on R on Pablo Bernabeu, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The present script can be used to pre-process data from a frequency list of the Norwegian as Web Corpus (NoWaC).

Before using the script, the frequency list should be downloaded from https://www.hf.uio.no/iln/english/about/organization/text-laboratory/services/nowac-frequency.html. The list is described as ‘frequency list sorted primary alphabetic and secondary by frequency within each character’, and the direct URL is: https://www.tekstlab.uio.no/nowac/download/nowac-1.1.lemma.frek.sort_alf_frek.txt.gz. The download requires signing in to an institutional network. Last, the downloaded file should be unzipped.

Reference of the corpus

Guevara, E. R. (2010). NoWaC: A large web-based corpus for Norwegian. In Proceedings of the NAACL HLT 2010 Sixth Web as Corpus Workshop (pp. 1-7). https://aclanthology.org/W10-1501

R bloggers Facebook page

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Preprocessing the Norwegian Web as Corpus (NoWaC) in R

The present script can be used to pre-process data from a frequency list of the Norwegian as Web Corpus (NoWaC).

Reference of the corpus

Most viewed posts (weekly)

Sponsors

Recent Posts

Jobs for R-users

python-bloggers.com (python/data-science news)

R Posts by Year

The present script can be used to pre-process data from a frequency list of the Norwegian as Web Corpus (NoWaC).

Reference of the corpus

Most viewed posts (weekly)

Sponsors

Recent Posts

Jobs for R-users

python-bloggers.com (python/data-science news)

R Posts by Year

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)