Release Notes for current version (v6) of DataSHIELD
Release Notes for v6.1.1 of DataSHIELD
Focus of Release
This release of DataSHIELD is a maintenance release for release 6.1, containing changes for known issues in release 6.1. Larger changes will be saved for the release 6.2.
Changes from DataSHIELD v6.1 to v6.1.1
Add tests to client-side methods to ensure that ‘datasources’ is a list of DSConnection objects.
Many error messages previous returned have been converted to errors generated by 'stop(...)'.
General improvements to documentation
Additional tests implemented
ds.boxPlot: implementation has been refined.
ds.completeCases: ensured correct behaviour if applied to a vector.
ds.cor: remove the naAction argument, and fixed the behavior to 'casewise.complete' for “NA” values.
ds.dataFrameFill: if filling a column with factors, levels are set correctly.
ds.dataFrameSort: ensure correct behaviour if sorting columns are not integer of numeric columns.
ds.tapply.assign: ensure correct behaviour of disclosure check when conversion to factors.
ds.lexis: ensure required headers are retained.
ds.glmSLMA: added support for assigning model to a variable on the server-side.
ds.lmerSLMA: added support for assigning model to a variable on the server-side.
ds.glmerSLMA: corrected documentation.
ds.recodeValues: can be applied to columns which are factors.
ds.rm: permit removal of multiple variables on the server-side and removed the restriction on variable name length.
ds.merge: correction to documentation.
ds.table: ensured correct behaviour when assignment of table to variable on the server-side is specified. Ensure study names are included in tables.
ds.cbind: ensure checking of the assignment of column names and unused arguments.
Additional support packages and platforms
Support Opal 4.0.3
Support DSI 1.2, DSOpal 1.2 and opalr 2.0.0
Release Notes for v6.1.0 of DataSHIELD
Focus of Release
The major focuses of the v6.1 release of DataSHIELD, is adding new analytical and presentation functions.
Changes from DataSHIELD v6.0.1 to v6.1
New Analytical Functions
The following functions have been added to the suite of analytical functions provided by DataSHIELD.
- ds.glmPredict: causes the application of predict.glm to a server-side glm object
- ds.glmSummary: causes the summarization of a server-side glm object
- ds.kurtosis: calculates the kurtosis of a numeric variable with three different mathematical formulas
- ds.skewness: calculates the skewness of a numeric variable with three different mathematical formulas
- ds.getWGSR: computes the WHO Growth Reference z-scores of anthropometric data
- ds.abs: computes the absolute values of an input numeric or integer variable
- ds.sqrt: computes the square root values of an input numeric or integer variable
- New Presentation Functions
- ds.boxPlot: draws boxplot based on data on the study servers (*)
(*) Provided by Xavier Escribà Montagut, Barcelona Institute of Global Health (ISGlobal), Spain
The following functions have been modified in this release of DataSHIELD.
ds.dataFrameSubset: contains additional argument checking
ds.glmSLMA: improved documents
ds.cbind: simplified, some arguments have been removed
ds.scatterPlot: the logical argument “return.coords” (default to FALSE) added, to allow users select if the coordinates of the anonymised data points should be returned back to the Console. If the user selects to get the coordinates on the console, he/she can then use any of the native R graphical libraries (e.g. ggplot, etc.) to generate other visualisations on the client-side
Client-side Testing Infrastructure
Added support for running tests which require ds.dangerClient functions as part of the Continuous Integration tests.
Additional tests, and general test improvements are included in this release.
Server-side Testing Infrastructure
Added test suite to directly tests of the server functions. These tests have also been integrated with Continuous Integration.
Generate of test coverage reports from server test suite.
DataSHIELD v6.1 is supported on R3.5, R3.6 and R 4.0, and would be expected to work with intermediate versions. At present the DataSHIELD client-side package is known to work on Ubuntu 16.04, Ubuntu 18.04 and Windows 10. DataSHIELD server-side package is known to work when deployed to Opal 3.0.3 running on Ubuntu 16.04.
As ever, you can see the code at a variety of places: https://cran.obiba.org/, https://github.com/datashield/dsBase/tree/6.1 and https://github.com/datashield/dsBaseClient/tree/6.1
Release Notes for v6.0.1 of DataSHIELD
Focus of Release
The major focuses of the v6.0.1 maintenance release of DataSHIELD, addressing known issues.
Changes from DataSHIELD v6.0 to v6.0.1
Improvements to "ds.dataFrameSubset", “dataFrameSubsetDS1" and "dataFrameSubsetDS2" functions: improved argument checking and support for the "ONES" option, which will cause a vector containing "1" to be created and then used directly to select all rows when you are primarily subsetting by column.
Improvements to “ds.dataFrame” and “ds.cbind” functions: ensures that the combining of variables in data frames that are created by those two functions maintains the actual class of each input variable.
Improvements to "ds.names" and "namesDS" functions: improved checking that function is operation on a list object which may include objects whose primary class is something else (e.g. glm, for output from a generalized linear model) but is also a list.
Improvements to "ds.dataFrameFill" and "dataFrameFillDS" functions: ensures that the classes of any created columns match the classes of existing columns in the other studies.
Enhancements of build and testing system: remove reliance on "opal" and "opaladmin" packages.
DataSHIELD v6.0.1 is supported on R3.5, R3.6 and R 4.0, and would be expected to work with intermediate versions. At present the DataSHIELD client-side package is known to work on Ubuntu 16.04, Ubuntu 18.04 and Windows 10. DataSHIELD server-side package is known to work when deployed to Opal 3.0.1 running on Ubuntu 16.04.
As ever, you can see the code at a variety of places: https://cran.obiba.org/,
Release Notes for v6.0 of DataSHIELD
Focus of Release
The major focuses of the v6.0 release of DataSHIELD is the addition of new analytical functions and the integration with DataSHIELD Interface (DSI).
Changes from DataSHIELD v5.1 to v6.0
DataSHIELD Interface (DSI)
DataSHIELD’s dsBaseClient package now uses DataSHIELD Interface (DSI) to communicate with the Opal Server, this replaces the legacy opal R package. This will be a breaking change for code written to use DataSHIELD v4 and v5. The main impact on end users of DataSHIELD it the new technique for logging in to and out of the server.
The motivation for this change is to give DataSHIELD the ability to connect to other types of server in the future. More information about DSI can be found on it’s GitHub page at https://github.com/datashield/DSI
New Analytical Functions
The functions ds.completeCases, ds.glmerSLMA, ds.lmerSLMA, ds.rep, ds.sample and ds.table have been added to the suite of analytical functions provided by DataSHIELD.
ds.completeCases: constructs a modified data frame, matrix or vector, contains no missing values
ds.glmerSLMA: fits a Generalized Linear Mixed-Effects Model (GLME) on data from one or multiple sources with pooling via SLMA
ds.lmerSLMA: fits a Linear Mixed-Effects Model (lme) - can include both fixed and random-effects - on data from one or multiple sources with pooling via SLMA (Study-Level Meta-Analysis)
ds.rep: creates a repetitive sequence by repeating the specified scalar number, vector or list in each data source
ds.sample: draws a pseudorandom sample from a vector, dataframe or matrix on the serverside
ds.table: creates 1-dimensional, 2-dimensional and 3-dimensional tables using the table function in native R
The functions ds.dim, ds.length, ds.colnames, ds.ls and ds.levels have been reimplemented not to use the server-side aliases dim, length, colnames, ls and levels (which have now been removed), but now dedicated DataSHIELD server-side functions dimDS, lengthDS, colnamesDS, lsDS and levelsDS. These changes should not affect the behaviour of the functions, they merely reduce the reliance on non-DataSHIELD functions internally on the server and therefore make it more secure and reliable.
The functions ds.cbind and ds.dataFrame have been modified to remove any “DATAFRAME.NAME$“ strings from the column names of the assigned data frames. In addition, the new version of the ds.cbind function generates data frames instead of matrices. We have also fixed a bug related to this issue, on how the two functions were defining the column names in the assigned dataframes when the order of the input components is different in different studies.
An additional disclosure control was added to the ds.cov and ds.cor functions. The disclosure control checks that the number of the input variables is lower than a pre-specified proportion of the individual-level records. To specify the maximum allowed proportion we have used the same filter as the one used in the ds.glm function which checks if the regression model is not oversaturated (you can find more details here). The used filter is set by default to 0.33 which means for example that for a dataframe of 100 rows (i.e. individual-level records) only the variance-covariance or the correlation matrix of up to 33 variables can be returned.
There are a number of functions in DataSHIELD v6.0 which should be regarded as deprecated - i.e. they are still there, but we strongly recommend you stop using them as they will be removed in v6.1. The functions which are deprecated are shown below, along with their replacements which should be used as soon as is practicable.
ds.setDefaultOpal, and should be replaced by datashield.connections_defaults
ds.listOpals, and should be replaced by datashield.connections
ds.table1DS, and should be replaced by ds.table
ds.table2DS, and should be replaced by ds.table
ds.meanByClass, and should be replaced by ds.meanSdGp
ds.recodeLevels, and should be replaced by ds.recodeValues
ds.subset, and should be replaced by ds.dataFrameSubset
ds.subsetByClass, and should be replaced by ds.dataFrameSubset
ds.vectorCalc, and should be replaced by ds.make
It should be noted that use of [ and ] should be avoided when performing analysis, specially in conjunction with ds.dataFrameSubset.
There are a number of server-side aliases in DataSHIELD v6.0 which should be regarded as deprecated, so should not be used as they will be removed in v6.1. The aliases which are deprecated are:
is.character (aggregate alias)
is.factor (aggregate alias)
is.list (aggregate alias)
is.null (aggregate alias)
is.numeric (aggregate alias)
NROW (aggregate alias)
t.test (aggregate alias)
as.character (assign alias)
as.null (assign alias)
as.numeric (assign alias)
attach (assign alias)
complete.cases (assign alias)
rep (assign alias)
unlist (assign alias)
In addition to the depreciated function it should be noted that it is planned to rename ds.meanSdGp to ds.meanSDByClass in DataSHIELD v6.1.
The documentation of all DataSHIELD functions has been updated.
This new documentation has the same format in all the functions and examples with the logging in according to version 6.0, the usage of the function, and the logging out from the server.
We have continued to develop our continuous integration (CI), and how have 6310 tests which are run every day and on every proposed code change.
How to upgrade
Update DataSHIELD server-side package
If you have a suitable version of Opal server, and you would like to upgrade the DataSHIELD server package (dsBase). This can be done via the Opal Web Portal. If you go to the “DataSHIELD” page within the “Administration” section of the Opal Web Portal, the old “dsBase” can be removed, then using the “+Add Package” button the new version of “dsBase” can be installed. Select “Install all DataSHIELD packages” then press the “Install” button.
Update DataSHIELD client-side package
If you have installed the DataSHIELD client package (dsBaseClient) using the function install.packages and specifying the Obiba repository, then you can update the client package as follows:
If you do not have the “DSI” and “DSOpal” packages installed these packages can be installed as follows:
as installing ‘DSOpal’ will cause the installation of 'DSI'.
DataSHIELD v6.0 is supported on R3.5, R3.6 and R 4.0, and would be expected to work with intermediate versions. At present the DataSHIELD client-side package is known to work on Ubuntu 16.04, Ubuntu 18.04 and Windows 10. DataSHIELD server-side package is known to work when deployed to Opal 2.16.0 running on Ubuntu 16.04.
As ever, you can see the code at a variety of places: https://cran.obiba.org/, or https://github.com/datashield/dsBase/tree/6.0.0 and https://github.com/datashield/dsBaseClient/tree/6.0.0