[This article was first published on bnosac :: open analytical helpers - bnosac :: open analytical helpers, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’m happy to announce that the R package udpipe was updated recently on CRAN. CRAN now hosts version 0.8.3 of udpipe. The main features incorporated in the update include
- parallel NLP annotation across your CPU cores
- default models now use models trained on Universal Dependencies 2.4, allowing to do annotation in 64 languages, based on 94 treebanks from Universal Dependencies. We now have models built on afrikaans-afribooms, ancient_greek-perseus, ancient_greek-proiel, arabic-padt, armenian-armtdp, basque-bdt, belarusian-hse, bulgarian-btb, buryat-bdt, catalan-ancora, chinese-gsd, classical_chinese-kyoto, coptic-scriptorium, croatian-set, czech-cac, czech-cltt, czech-fictree, czech-pdt, danish-ddt, dutch-alpino, dutch-lassysmall, english-ewt, english-gum, english-lines, english-partut, estonian-edt, estonian-ewt, finnish-ftb, finnish-tdt, french-gsd, french-partut, french-sequoia, french-spoken, galician-ctg, galician-treegal, german-gsd, gothic-proiel, greek-gdt, hebrew-htb, hindi-hdtb, hungarian-szeged, indonesian-gsd, irish-idt, italian-isdt, italian-partut, italian-postwita, italian-vit, japanese-gsd, kazakh-ktb, korean-gsd, korean-kaist, kurmanji-mg, latin-ittb, latin-perseus, latin-proiel, latvian-lvtb, lithuanian-alksnis, lithuanian-hse, maltese-mudt, marathi-ufal, north_sami-giella, norwegian-bokmaal, norwegian-nynorsk, norwegian-nynorsklia, old_church_slavonic-proiel, old_french-srcmf, old_russian-torot, persian-seraji, polish-lfg, polish-pdb, polish-sz, portuguese-bosque, portuguese-br, portuguese-gsd, romanian-nonstandard, romanian-rrt, russian-gsd, russian-syntagrus, russian-taiga, sanskrit-ufal, serbian-set, slovak-snk, slovenian-ssj, slovenian-sst, spanish-ancora, spanish-gsd, swedish-lines, swedish-talbanken, tamil-ttb, telugu-mtg, turkish-imst, ukrainian-iu, upper_sorbian-ufal, urdu-udtb, uyghur-udt, vietnamese-vtb, wolof-wtb
- some fixes as indicated in the NEWS file
How does parallel NLP annotation looks like right now? Let’s do some annotation in French.
library(udpipe) data("brussels_reviews", package = "udpipe") x <- subset(brussels_reviews, language %in% "fr") x <- data.frame(doc_id = x$id, text = x$feedback, stringsAsFactors = FALSE) anno <- udpipe(x, "french-gsd", parallel.cores = 1, trace = 100) anno <- udpipe(x, "french-gsd", parallel.cores = 4) ## this will be 4 times as fast if you have 4 CPU cores View(anno)
Note that udpipe particularly works great in combination with the following R packages
- crfsuite for entity recognition (more docs here)
- textrank for text summarisation (more docs here)
- BTM for topic modelling on short texts (more docs here)
- ruimtehol for doing text classification, text recommendation and finding similaries between articles, sentences, words, bigrams, labels, tags, persons, websites, entities and entity relations (more docs here and here)
And nothing stops you from using R packages tm / tidytext / quanteda or text2vec alongside it!
Upcoming training schedule
If you want to know more, come attend the course on text mining with R or text mining with Python. Here is a list of scheduled upcoming public courses which BNOSAC is providing each year at the KULeuven in Belgium.
- 2019-10-17&18: Statistical Machine Learning with R: Subscribe here
- 2019-11-14&15: Text Mining with R: Subscribe here
- 2019-12-17&18: Applied Spatial Modelling with R: Subscribe here
- 2020-02-19&20: Advanced R programming: Subscribe here
- 2020-03-12&13: Computer Vision with R and Python: Subscribe here
- 2020-03-16&17: Deep Learning/Image recognition: Subscribe here
- 2020-04-22&23: Text Mining with R: Subscribe here
- 2020-05-05&06: Text Mining with Python: Subscribe here
To leave a comment for the author, please follow the link and comment on their blog: bnosac :: open analytical helpers - bnosac :: open analytical helpers.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.