Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In the previous post, we examined some configuration issues with Cloudera Manager and Hadoop services in the latest release of Oracle Big Data Lite VM (4.1.0). In this post we report issues with Oracle R Enterprise, and the remedies we applied.
It turns out that if we load the ORE package in R, we subsequently cannot use the help system at all:
> library(ORE) Attaching package: ‘OREbase’ The following objects are masked from ‘package:base’: cbind, data.frame, eval, interaction, order, paste, pmax, pmin, rbind, table Loading required package: OREembed Loading required package: OREstats Loading required package: MASS Loading required package: OREgraphics Loading required package: OREeda Loading required package: OREmodels Loading required package: OREdm Loading required package: lattice Loading required package: OREpredict Loading required package: ORExml > help(ore.connect) # ORE function Error in readRDS(f) : unknown input format > help(median) # base R function Error in readRDS(f) : unknown input format
The Error in readRDS(f) : unknown input format message turns out to be a rather cryptic one, popping up occasionally in the R universe, with the proposed solution usually being to delete the directory with the downloaded packages, a solution we would certainly like to avoid. The error is probably due to some corrupted file(s) in the loaded ORE packages, e.g. OREbase does not trigger the problem, but OREstats does (if you are following this in your machine, be sure to restart R between the code snippets presented here in order to reproduce the results):
> library(OREbase) > help(ore.connect) # works OK > library(OREstats) Loading required package: MASS > help(ore.connect) Error in readRDS(f) : unknown input format
And to make things somewhat more complicated, if we load OREstats from another location (most ORE and ORCH packages exist in two locations – more on this in a second), the problem does not appear at all:
> library("OREstats", lib.loc="/usr/lib64/R/library") Loading required package: MASS Loading required package: OREbase > help(ore.connect) # works OK!
What’s happening?
Locating the error cause
First, the careful user might have already noticed from the Packages tab of RStudio that all ORCH and most ORE packages show up two times in the package list:
Why the duplicates, and where are these packages located? We get the answer to the second question from the R function .libPaths:
> .libPaths() [1] "/u01/app/oracle/product/12.1.0.2/dbhome_1/R/library" [2] "/usr/lib64/R/library" [3] "/usr/share/R/library"
The first of these directories belongs to user oracle, while the second one to root (the third one is empty).
Now, it takes little effort to verify that the same ORE & ORCH packages exist in both directories numbered ‘1’ and ‘2’ above, hence the duplicate entries in the RStudio package list.
By default, the R command library, if not provided with a location, loads the requested package from the first directory, as listed in the .libPaths function; only if R cannot find the package in that directory, it proceeds to look for it in the rest of the directories listed. So, in our case, it turns out that the OREstats package in the “default” location /u01/app/oracle/product/12.1.0.2/dbhome_1/R/library is corrupted, while the same package in the second location /usr/lib64/R/library is OK.
It takes just a little experimentation to verify that the copy of OREstats in the first directory listed above is the single point of failure producing the error; indeed, if we bypass the default setting and load OREstats from the second directory before loading ORE, the help system works properly:
> library("OREstats", lib.loc="/usr/lib64/R/library") Loading required package: MASS Loading required package: OREbase Attaching package: ‘OREbase’ The following objects are masked from ‘package:base’: cbind, data.frame, eval, interaction, order, paste, pmax, pmin, rbind, table > library(ORE) Loading required package: OREembed Loading required package: OREgraphics Loading required package: OREeda Loading required package: OREmodels Loading required package: OREdm Loading required package: lattice Loading required package: OREpredict Loading required package: ORExml > help(ore.connect) # works OK
The reason why this is so should be obvious by now: library(ORE) loads its dependencies from the “default” directory, where the copy of OREstats is corrupted; by forcing OREstats to be loaded from the second available directory (where the copy is OK), we end up with no corrupted packages loaded and no errors.
A (very) simple workaround
In order to restore the help system functionality, the only thing we have to do is to delete the OREstats package from the first directory listed in .libPaths (we can use the ORACLE_HOME environmental variable for brevity):
[oracle@bigdatalite ~]$ cd $ORACLE_HOME/R/library [oracle@bigdatalite library]$ pwd /u01/app/oracle/product/12.1.0.2/dbhome_1/R/library [oracle@bigdatalite library]$ ls arules ORCHcore OREcommon OREmodels ROracle Cairo ORCHstats OREdm OREpredict rstudio DBI ORCHtestkit OREeda OREserver statmod manipulate ORE OREembed OREstats png ORCH OREbase OREgraphics ORExml [oracle@bigdatalite library]$ rm -rf OREstats
Now OREstats will be automatically loaded from the second directory listed in .listPaths, where it is already available and not corrupted:
> library(ORE) Attaching package: ‘OREbase’ The following objects are masked from ‘package:base’: cbind, data.frame, eval, interaction, order, paste, pmax, pmin, rbind, table Loading required package: OREembed Loading required package: OREstats Loading required package: MASS Loading required package: OREgraphics Loading required package: OREeda Loading required package: OREmodels Loading required package: OREdm Loading required package: lattice Loading required package: OREpredict Loading required package: ORExml > help(ore.connect) # ORE function - works OK > help(median) # base R function - works OK
Tidying up the package directories
Although the problem has been solved, there is still an issue: personally, I don’t like this situation with duplicated packages as shown in Fig. 1 above. So, we will proceed to move all ORE & ORCH packages to the directory $R_HOME/library (of root ownership), where the default R packages (i.e. the packages shipped along with any R distribution) are also kept. We need superuser privileges (since the target directory belongs to root); also, since the mv command refuses to overwrite existing directories (setting the –force flag has no effect), we answer ‘no’ to the overwrite questions, and afterwards we simply remove the remaining OR* files since they already exist in $R_HOME/library (CAUTION: be sure that you have first removed OREstats as described above, otherwise you will end up having deleted the healthy copy and kept the corrupted one!):
[oracle@bigdatalite library]$ pwd /u01/app/oracle/product/12.1.0.2/dbhome_1/R/library [oracle@bigdatalite library]$ su [root@bigdatalite library]# mv OR* $R_HOME/library mv: overwrite `/usr/lib64/R/library/ORCH'? n mv: overwrite `/usr/lib64/R/library/ORCHcore'? n mv: overwrite `/usr/lib64/R/library/ORCHstats'? n mv: overwrite `/usr/lib64/R/library/ORCHtestkit'? n mv: overwrite `/usr/lib64/R/library/OREbase'? n mv: overwrite `/usr/lib64/R/library/OREcommon'? n mv: overwrite `/usr/lib64/R/library/OREserver'? n [root@bigdatalite library]# exit exit [oracle@bigdatalite library]$ ls arules manipulate ORCHstats OREcommon ROracle Cairo ORCH ORCHtestkit OREserver rstudio DBI ORCHcore OREbase png statmod [oracle@bigdatalite library]$ rm -rf OR* [oracle@bigdatalite library]$ ls arules Cairo DBI manipulate png ROracle rstudio statmod
Now only the six ORE “supporting” packages remain in $ORACLE_HOME/R/library, along with the two RStudio-related packages rstudio and manipulate. And no more duplicate entries in RStudio package list:
Directory $ORACLE_HOME/R/library is the one where we (i.e. user oracle) will normally download any additional packages we may need, and it makes sense to keep it separated from $R_HOME/library directory, which will contain only the R default packages, along with ORE and ORCH. We wouldn’t like to move any more packages in $R_HOME/library, since the packages in our home directory $ORACLE_HOME/R/library are much more straightforward to update from RStudio (we’ll cover the updating of R default packages in $R_HOME/library directory in a subsequent post).
Since we touched the subject, let us close this post with updating the existing packages: from RStudio select Tools -> Check for package updates…:
Do not bother with manipulate (it needs to be in the same version with your RStudio installation, which currently is 0.98.1062), and just select the other three as shown in Fig. 3.
That was it! Despite the unexpected problem with ORE, you are now ready to use Oracle R Enterprise in the VM. For better overall performance, be sure to check also our previous post, where we address some issues with Cloudera Manager…
The post Oracle R Enterprise issues in Oracle Big Data Lite VM 4.1.0 appeared first on Nodalpoint.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.