Latest release

DataSHIELD

The latest version of dsBaseClient and dsBase is v6.1.1.


More information can be found on GitHub.

DSI

The latest version of DSI is v1.3.

More information can be found on the CRAN.

Opal

The latest version of opalr is v3.0.

More information can be found on the CRAN, or on the Obiba website.

Release Notes for current version (v6) of DataSHIELD

Release Notes for v6.1.1 of DataSHIELD

Focus of Release

This release of DataSHIELD is a maintenance release for release 6.1, containing changes for known issues in release 6.1. Larger changes will be saved for the release 6.2.

Changes from DataSHIELD v6.1 to v6.1.1

General Changes

  1. Add tests to client-side methods to ensure that ‘datasources’ is a list of DSConnection objects.

  2. Many error messages previous returned have been converted to errors generated by 'stop(...)'.

  3. General improvements to documentation

  4. Additional tests implemented

Changed Functions 


  1. ds.boxPlot: implementation has been refined.

  2. ds.completeCases: ensured correct behaviour if applied to a vector.

  3. ds.cor: remove the naAction argument, and fixed the behavior to 'casewise.complete' for “NA” values.

  4. ds.dataFrameFill: if filling a column with factors, levels are set correctly.

  5. ds.dataFrameSort: ensure correct behaviour if sorting columns are not integer of numeric columns.

  6. ds.tapply.assign: ensure correct behaviour of disclosure check when conversion to factors.

  7. ds.lexis: ensure required headers are retained.

  8. ds.glmSLMA: added support for assigning model to a variable on the server-side.

  9. ds.lmerSLMA: added support for assigning model to a variable on the server-side.

  10. ds.glmerSLMA: corrected documentation.

  11. ds.recodeValues: can be applied to columns which are factors.

  12. ds.rm: permit removal of multiple variables on the server-side and removed the restriction on variable name length.

  13. ds.merge: correction to documentation.

  14. ds.table: ensured correct behaviour when assignment of table to variable on the server-side is specified. Ensure study names are included in tables.

  15. ds.cbind: ensure checking of the assignment of column names and unused arguments.


Additional support packages and platforms

  1. Support Opal 4.0.3 to 4.2.2

  2. Support DSI 1.2 to 1.3, DSOpal 1.2 to 1.3 and opalr 2.0.0 to 3.0.0

 

Release Notes for v6.1.0 of DataSHIELD

Focus of Release

The major focuses of the v6.1 release of DataSHIELD, is adding new analytical and presentation functions.

Changes from DataSHIELD v6.0.1 to v6.1

New Analytical Functions

The following functions have been added to the suite of analytical functions provided by DataSHIELD.

  • ds.glmPredict: causes the application of predict.glm to a server-side glm object
  • ds.glmSummary: causes the summarization of a server-side glm object
  • ds.kurtosis: calculates the kurtosis of a numeric variable with three different mathematical formulas
  • ds.skewness: calculates the skewness of a numeric variable with three different mathematical formulas
  • ds.getWGSR: computes the WHO Growth Reference z-scores of anthropometric data
  • ds.abs: computes the absolute values of an input numeric or integer variable
  • ds.sqrt: computes the square root values of an input numeric or integer variable
  • New Presentation Functions
  • ds.boxPlot: draws boxplot based on data on the study servers (*)

(*) Provided by Xavier Escribà Montagut, Barcelona Institute of Global Health (ISGlobal), Spain

Changed Functions

The following functions have been modified in this release of DataSHIELD.

ds.dataFrameSubset: contains additional argument checking

ds.glmSLMA: improved documents

ds.cbind: simplified, some arguments have been removed

ds.scatterPlot: the logical argument “return.coords” (default to FALSE) added, to allow users select if the coordinates of the anonymised data points should be returned back to the Console. If the user selects to get the coordinates on the console, he/she can then use any of the native R graphical libraries (e.g. ggplot, etc.) to generate other visualisations on the client-side

Client-side Testing Infrastructure

Added support for running tests which require ds.dangerClient functions as part of the Continuous Integration tests.

Additional tests, and general test improvements are included in this release.

Server-side Testing Infrastructure

Added test suite to directly tests of the server functions. These tests have also been integrated with Continuous Integration.

Generate of test coverage reports from server test suite.

Supported Versions

DataSHIELD v6.1 is supported on R3.5, R3.6 and R 4.0, and would be expected to work with intermediate versions. At present the DataSHIELD client-side package is known to work on Ubuntu 16.04, Ubuntu 18.04 and Windows 10. DataSHIELD server-side package is known to work when deployed to Opal 3.0.3 running on Ubuntu 16.04.

Code Availability

As ever, you can see the code at a variety of places: https://cran.obiba.org/, https://github.com/datashield/dsBase/tree/6.1 and https://github.com/datashield/dsBaseClient/tree/6.1

Release Notes for v6.0.1 of DataSHIELD

Focus of Release

The major focuses of the v6.0.1 maintenance release of DataSHIELD, addressing known issues.

Changes from DataSHIELD v6.0 to v6.0.1

Improvements to "ds.dataFrameSubset", “dataFrameSubsetDS1" and "dataFrameSubsetDS2" functions: improved argument checking and support for the "ONES" option, which will cause a vector containing "1" to be created and then used directly to select all rows when you are primarily subsetting by column.

Improvements to “ds.dataFrame” and “ds.cbind” functions: ensures that the combining of variables in data frames that are created by those two functions maintains the actual class of each input variable.

Improvements to "ds.names" and "namesDS" functions: improved checking that function is operation on a list object which may include objects whose primary class is something else (e.g. glm, for output from a generalized linear model) but is also a list.

Improvements to "ds.dataFrameFill" and "dataFrameFillDS" functions: ensures that the classes of any created columns match the classes of existing columns in the other studies.

Enhancements of build and testing system: remove reliance on "opal" and "opaladmin" packages.

Supported Versions​

DataSHIELD v6.0.1 is supported on R3.5, R3.6 and R 4.0, and would be expected to work with intermediate versions. At present the DataSHIELD client-side package is known to work on Ubuntu 16.04, Ubuntu 18.04 and Windows 10. DataSHIELD server-side package is known to work when deployed to Opal 3.0.1 running on Ubuntu 16.04.

Code Availability

As ever, you can see the code at a variety of places: https://cran.obiba.org/,

or https://github.com/datashield/dsBase/tree/6.0.1

and https://github.com/datashield/dsBaseClient/tree/6.0.1 

Release Notes for v6.0 of DataSHIELD

Focus of Release

The major focuses of the v6.0 release of DataSHIELD is the addition of new analytical functions and the integration with DataSHIELD Interface (DSI).

Changes from DataSHIELD v5.1 to v6.0

DataSHIELD Interface (DSI)

DataSHIELD’s dsBaseClient package now uses DataSHIELD Interface (DSI) to communicate with the Opal Server, this replaces the legacy opal R package. This will be a breaking change for code written to use DataSHIELD v4 and v5. The main impact on end users of DataSHIELD it the new technique for logging in to and out of the server.

The motivation for this change is to give DataSHIELD the ability to connect to other types of server in the future. More information about DSI can be found on it’s GitHub page at https://github.com/datashield/DSI

New Analytical Functions

The functions ds.completeCases, ds.glmerSLMA, ds.lmerSLMA, ds.rep, ds.sample and ds.table have been added to the suite of analytical functions provided by DataSHIELD.

ds.completeCases: constructs a modified data frame, matrix or vector, contains no missing values

ds.glmerSLMA: fits a Generalized Linear Mixed-Effects Model (GLME) on data from one or multiple sources with pooling via SLMA

ds.lmerSLMA: fits a Linear Mixed-Effects Model (lme) - can include both fixed and random-effects - on data from one or multiple sources with pooling via SLMA (Study-Level Meta-Analysis)

ds.rep: creates a repetitive sequence by repeating the specified scalar number, vector or list in each data source

ds.sample: draws a pseudorandom sample from a vector, dataframe or matrix on the serverside

ds.table: creates 1-dimensional, 2-dimensional and 3-dimensional tables using the table function in native R

Changed Functions

The functions ds.dim, ds.length, ds.colnames, ds.ls and ds.levels have been reimplemented not to use the server-side aliases dim, length, colnames, ls and levels (which have now been removed), but now dedicated DataSHIELD server-side functions dimDS, lengthDS, colnamesDS, lsDS and levelsDS. These changes should not affect the behaviour of the functions, they merely reduce the reliance on non-DataSHIELD functions internally on the server and therefore make it more secure and reliable.

The functions ds.cbind and ds.dataFrame have been modified to remove any “DATAFRAME.NAME$“ strings from the column names of the assigned data frames. In addition, the new version of the ds.cbind function generates data frames instead of matrices. We have also fixed a bug related to this issue, on how the two functions were defining the column names in the assigned dataframes when the order of the input components is different in different studies.

An additional disclosure control was added to the ds.cov and ds.cor functions. The disclosure control checks that the number of the input variables is lower than a pre-specified proportion of the individual-level records. To specify the maximum allowed proportion we have used the same filter as the one used in the ds.glm function which checks if the regression model is not oversaturated (you can find more details here). The used filter is set by default to 0.33 which means for example that for a dataframe of 100 rows (i.e. individual-level records) only the variance-covariance or the correlation matrix of up to 33 variables can be returned.

Deprecated Functions

There are a number of functions in DataSHIELD v6.0 which should be regarded as deprecated - i.e. they are still there, but we strongly recommend you stop using them as they will be removed in v6.1. The functions which are deprecated are shown below, along with their replacements which should be used as soon as is practicable.

ds.setDefaultOpal, and should be replaced by datashield.connections_defaults

ds.listOpals, and should be replaced by datashield.connections

ds.table1DS, and should be replaced by ds.table

ds.table2DS, and should be replaced by ds.table

ds.look

ds.meanByClass, and should be replaced by ds.meanSdGp

ds.message

ds.recodeLevels, and should be replaced by ds.recodeValues

ds.subset, and should be replaced by ds.dataFrameSubset

ds.subsetByClass, and should be replaced by ds.dataFrameSubset

ds.vectorCalc, and should be replaced by ds.make

It should be noted that use of [ and ] should be avoided when performing analysis, specially in conjunction with ds.dataFrameSubset.

Deprecated Aliases

There are a number of server-side aliases in DataSHIELD v6.0 which should be regarded as deprecated, so should not be used as they will be removed in v6.1. The aliases which are deprecated are:

is.character (aggregate alias)

is.factor (aggregate alias)

is.list (aggregate alias)

is.null (aggregate alias)

is.numeric (aggregate alias)

NROW (aggregate alias)

t.test (aggregate alias)

as.character (assign alias)

as.null (assign alias)

as.numeric (assign alias)

attach (assign alias)

complete.cases (assign alias)

rep (assign alias)

unlist (assign alias)

In addition to the depreciated function it should be noted that it is planned to rename ds.meanSdGp to ds.meanSDByClass in DataSHIELD v6.1.

Function documentation

The documentation of all DataSHIELD functions has been updated.

This new documentation has the same format in all the functions and examples with the logging in according to version 6.0, the usage of the function, and the logging out from the server.

Continuous integration

We have continued to develop our continuous integration (CI), and how have 6310 tests which are run every day and on every proposed code change.

How to upgrade

Update DataSHIELD server-side package

If you have a suitable version of Opal server, and you would like to upgrade the DataSHIELD server package (dsBase). This can be done via the Opal Web Portal. If you go to the “DataSHIELD” page within the “Administration” section of the Opal Web Portal, the old “dsBase” can be removed, then using the “+Add Package” button the new version of “dsBase” can be installed. Select “Install all DataSHIELD packages” then press the “Install” button.

Update DataSHIELD client-side package

If you have installed the DataSHIELD client package (dsBaseClient) using the function install.packages and specifying the Obiba repository, then you can update the client package as follows:

# R

> update.packages(repos='http://cran.obiba.org')

If you do not have the “DSI” and “DSOpal” packages installed these packages can be installed as follows:

# R

> install.packages('DSOpal')

as installing ‘DSOpal’ will cause the installation of 'DSI'.

Supported versions

DataSHIELD v6.0 is supported on R3.5, R3.6 and R 4.0, and would be expected to work with intermediate versions. At present the DataSHIELD client-side package is known to work on Ubuntu 16.04, Ubuntu 18.04 and Windows 10. DataSHIELD server-side package is known to work when deployed to Opal 2.16.0 running on Ubuntu 16.04.

Code availability

As ever, you can see the code at a variety of places: https://cran.obiba.org/, or https://github.com/datashield/dsBase/tree/6.0.0 and https://github.com/datashield/dsBaseClient/tree/6.0.0 


For Release Notes relevant to older versions of DataSHIELD which are no longer supported, see the Release History page