Latest release

DataSHIELD

The latest version of dsBaseClient and dsBase is v6.2.0


More information can be found on GitHub.

DSI

The latest version of DSI is v1.4.0

More information can be found on the CRAN.

Opal

The latest version of opalr is v4.4

More information can be found on the CRAN, or on the Obiba website.

Release Notes for current version (v6) of DataSHIELD

DataSHIELD Release Notes v6.2
Focus of Release

The changes in the v6.2 release of DataSHIELD are mainly focuses on the enhancing of disclosure controls available to data owners, also additional analytical and presentation methods for data analysis.

Changes from DataSHIELD v6.1.1 to v6.2

Checking Permissive PrivacyControlLevel

To support data owners who have particularly sensitive data, additional disclosure protection has been added to v6.2 release. These changes permit a data owner to place a service into "Permissive" (default) or "non-Permissive" disclosure mode. This is done by setting the “datashield.privacyControlLevel” option. The service will be in "permissive" mode if the “datashield.privacyControlLevel” option has the value "permissive", any other value will cause the service to be in "non-permissive" mode.

If a service is in "non-permissive" mode will cause certain methods to be blocked from being invoked by the client. The list of blocked methods are:

dataFrameSubsetDS1

rbindDS

levelsDS

recodeLevelsDS

cDS

recodeValuesDS

cbindDS

repDS

dataFrameDS

reShapeDS

dataFrameSortDS

seqDS

dataFrameSubsetDS2

subsetByClassDS

dmtC2SDS

subsetDS

In addition, the method aliases for ‘base::c’, ‘base::cbind’ and ‘base::rep’ have been removed.

Not having access to these methods will mean that the Data Owner will be required to perform more data shaping for the Data User(s).

Changing disclosure settings

In this release, there are new disclosure settings data owners can specify. The new “default.nfilter.levels.density” and “default.nfilter.levels.max” has been added, with default level equal to 0.33 and 40 respectively. These options are described on the page wiki page - https://data2knowledge.atlassian.net/wiki/x/DoCaKg

New Functions

The following functions have been added to the version 6.2 of DataSHIELD dsBaseClient package.

ds.hetcor: computes a heterogenous correlation matrix, consisting of Pearson product-moment correlations between numeric variables, polyserial correlations between numeric and ordinal variables, and polychoric correlations between ordinal variables.

ds.lspline: computes the basis of piecewise-linear spline such that, depending on the argument “marginal”, the coefficients can be interpreted as (1) slopes of consecutive spline segments, or (2) slope change at consecutive knots. This is an assign function which saves the created object on the serverside.

ds.qlspline: this is similar to ds.lspline but it calculates the knot positions to be at quantiles of the input variable.

ds.elspline: this is similar to ds.lspline but it calculates the knot positions such that they cut the range of the input variable into n equal-width intervals.

ds.ns: generates a basis matrix for representing the family of piecewise-cubic splines with a specified sequence of interior knots, and natural boundary conditions. This is an assign function which saves the created object on the serverside.

ds.dmtC2S: supports the need to be able to transfer complex variables for the client-site to the server-side(s). This is an assign type method. The types of variables which can be transferred are data.frame, matrix or tibble.

ds.asFactorSimple: converts an input variable into a factor. Unlike ds.asFactor and its serverside functions, ds.asFactorSimple does no more than coerce the class of a variable to factor in each study. It does not check for or enforce consistency of factor levels across sources or allow you to force an arbitrary set of levels unless those levels actually exist in the sources. In addition, it does not allow you to create an array of binary dummy variables that is equivalent to a factor. If you need to do any of these things, you will have to use the ds.asFactor function.

ds.metadata: obtains the non-disclosive metadata associated with a variable held on the server.

ds.ranksSecure: securely generate the ranks of a numeric vector and estimate true global quantiles across all data sources simultaneously (see https://data2knowledge.atlassian.net/wiki/x/AYDPog for retails)

ds.unique: generate a variable on the server-side which represents a version of an existing variable but without any duplicate values.

ds.forestplot: draws a forestplot of the coefficients for Study-Level Meta-Analysis (*)

(*) Provided by Xavier Escribà Montagut, Barcelona Institute of Global Health (ISGlobal), Spain

Changed Functions

ds.replaceNA: This new version of ds.replaceNA can replace NAs in factor variables. The replaced values are then considered as additional levels of the factor.

ds.tapply.assign: Major refactoring which ensures that variables are present in all servers. fixed an issue to deal correctly with variables that include missing values and not only complete cases.

ds.tapply: Major refactoring which ensures that variables are present in all servers, fixed a issue to deal correctly with variables that include missing values and not only complete cases.

ds.mean: the behavior if all values are NAs has been changed; if ds.mean is call on a vector, on a server, which only contains NAs, the result from the server will be NA, instead of causing a disclosure block.

ds.var: the behavior if all values are NAs has been changed; if ds.var is call on a vector, on a server, which only contains NAs, the result from the server will be NA, instead of causing a disclosure block.

ds.table: The new version allows the user to specify only two options for the argument useNA either “no” or “always”. The option “ifany” which was available in v6.1.1, is not allowed any more.

ds.corTest: The new version allows the user to get Kendall’s tau or Spearman’s rho correlation coefficient for a pair of variables, in addition to the existing Pearson’s correlation. The new arguments added are: the method which can be one of "pearson" (default), "kendall", or "spearman", the exact which is a logical indicating whether an exact p-value should be computed for Kendall’s tau or Spearman’s rho, the conf.level which defines the level of the returned confidence interval, and the type which defines if a study-specific correlation coefficient is returned or a combined correlation across all studies (the combined correlation is an approximation of the exact pooled correlation and is estimated based on Fisher's z transformation).

ds.glmSLMA, ds.lmerSLMA and ds.glmerSLMA: the changes to these functions are as follows:

  • we made sure that the grouping factor (i.e. the variable after the "|") in the mixed model is not included in a set of checks that are normally used for standard GLMs. This is not appropriate as it blocked users from running models when there were small number of individuals in the groups (e.g. siblings in family groups). Having a small number of individuals in a group is not a disclosure issue for mixed models and hence it should be permitted.

  • we improved the handling of errors when something went wrong in the underlying lme4 functions that are used. Previously this meant that the error message returned to the user was not the one from the underlying function, making it hard to debug what has gone wrong.

  • we have added, to ds.glmSLMA, a notify.of.progress argument which can enable or disable logging to progress.

ds.histogram: change to how function automatically checks for disclosure, now compares the number of breaks with the disclosure parameter “nfilter.levels.density”, instead of comparing with “nfilter.levels” as previously.

ds.Boole: an issue was fixed which means that under certain circumstances incorrect results can be produced. This incorrect behaviour can occur if the right-hand operand is negative.

ds.asNumeric: has been changed to deal with different types of variables (including characters)

Client-side Testing Infrastructure

Additional tests, and general test improvements are included in this release.

Addition of testing within client methods of existence of variable and class being used.

Server-side Testing Infrastructure

Additional tests, general test improvements, added privacy control level tests and improved error messages are included in this release.

Backward compatibility with v6.1.1 dsBaseClient

There are no known significant issues with using v6.1.1 dsBaseClient with v6.2 dsBase. The changed in behaviour which have been observed are limited to changes to the text of error messages, changes to the circumstances under which a disclosure block could occur and bug fixes.

Supported Versions

DataSHIELD v6.2 is supported on R3.5, R3.6, R 4.0 and R4.1, and would be expected to work with intermediate versions. At present the DataSHIELD client-side package is known to work on Ubuntu 18.04, Ubuntu 20.04, Windows 10 and macOS Big Sur (11.6). DataSHIELD server-side package is known to work when deployed to Opal 4.3.3 running on Ubuntu 18.04 and 20.04.

Code Availability

As ever, you can obtain the code at a variety of places:

Release Notes for v6.1.1 of DataSHIELD

Focus of Release

This release of DataSHIELD is a maintenance release for release 6.1, containing changes for known issues in release 6.1. Larger changes will be saved for the release 6.2.

Changes from DataSHIELD v6.1 to v6.1.1

General Changes

  1. Add tests to client-side methods to ensure that ‘datasources’ is a list of DSConnection objects.

  2. Many error messages previous returned have been converted to errors generated by 'stop(...)'.

  3. General improvements to documentation

  4. Additional tests implemented

Changed Functions 


  1. ds.boxPlot: implementation has been refined.

  2. ds.completeCases: ensured correct behaviour if applied to a vector.

  3. ds.cor: remove the naAction argument, and fixed the behavior to 'casewise.complete' for “NA” values.

  4. ds.dataFrameFill: if filling a column with factors, levels are set correctly.

  5. ds.dataFrameSort: ensure correct behaviour if sorting columns are not integer of numeric columns.

  6. ds.tapply.assign: ensure correct behaviour of disclosure check when conversion to factors.

  7. ds.lexis: ensure required headers are retained.

  8. ds.glmSLMA: added support for assigning model to a variable on the server-side.

  9. ds.lmerSLMA: added support for assigning model to a variable on the server-side.

  10. ds.glmerSLMA: corrected documentation.

  11. ds.recodeValues: can be applied to columns which are factors.

  12. ds.rm: permit removal of multiple variables on the server-side and removed the restriction on variable name length.

  13. ds.merge: correction to documentation.

  14. ds.table: ensured correct behaviour when assignment of table to variable on the server-side is specified. Ensure study names are included in tables.

  15. ds.cbind: ensure checking of the assignment of column names and unused arguments.


Additional support packages and platforms

  1. Support Opal 4.0.3 to 4.2.2

  2. Support DSI 1.2 to 1.3, DSOpal 1.2 to 1.3 and opalr 2.0.0 to 3.0.0

 

Release Notes for v6.1.0 of DataSHIELD

Focus of Release

The major focuses of the v6.1 release of DataSHIELD, is adding new analytical and presentation functions.

Changes from DataSHIELD v6.0.1 to v6.1

New Analytical Functions

The following functions have been added to the suite of analytical functions provided by DataSHIELD.

  • ds.glmPredict: causes the application of predict.glm to a server-side glm object
  • ds.glmSummary: causes the summarization of a server-side glm object
  • ds.kurtosis: calculates the kurtosis of a numeric variable with three different mathematical formulas
  • ds.skewness: calculates the skewness of a numeric variable with three different mathematical formulas
  • ds.getWGSR: computes the WHO Growth Reference z-scores of anthropometric data
  • ds.abs: computes the absolute values of an input numeric or integer variable
  • ds.sqrt: computes the square root values of an input numeric or integer variable
  • New Presentation Functions
  • ds.boxPlot: draws boxplot based on data on the study servers (*)

(*) Provided by Xavier Escribà Montagut, Barcelona Institute of Global Health (ISGlobal), Spain

Changed Functions

The following functions have been modified in this release of DataSHIELD.

ds.dataFrameSubset: contains additional argument checking

ds.glmSLMA: improved documents

ds.cbind: simplified, some arguments have been removed

ds.scatterPlot: the logical argument “return.coords” (default to FALSE) added, to allow users select if the coordinates of the anonymised data points should be returned back to the Console. If the user selects to get the coordinates on the console, he/she can then use any of the native R graphical libraries (e.g. ggplot, etc.) to generate other visualisations on the client-side

Client-side Testing Infrastructure

Added support for running tests which require ds.dangerClient functions as part of the Continuous Integration tests.

Additional tests, and general test improvements are included in this release.

Server-side Testing Infrastructure

Added test suite to directly tests of the server functions. These tests have also been integrated with Continuous Integration.

Generate of test coverage reports from server test suite.

Supported Versions

DataSHIELD v6.1 is supported on R3.5, R3.6 and R 4.0, and would be expected to work with intermediate versions. At present the DataSHIELD client-side package is known to work on Ubuntu 16.04, Ubuntu 18.04 and Windows 10. DataSHIELD server-side package is known to work when deployed to Opal 3.0.3 running on Ubuntu 16.04.

Code Availability

As ever, you can see the code at a variety of places: https://cran.obiba.org/, https://github.com/datashield/dsBase/tree/6.1 and https://github.com/datashield/dsBaseClient/tree/6.1

Release Notes for v6.0.1 of DataSHIELD

Focus of Release

The major focuses of the v6.0.1 maintenance release of DataSHIELD, addressing known issues.

Changes from DataSHIELD v6.0 to v6.0.1

Improvements to "ds.dataFrameSubset", “dataFrameSubsetDS1" and "dataFrameSubsetDS2" functions: improved argument checking and support for the "ONES" option, which will cause a vector containing "1" to be created and then used directly to select all rows when you are primarily subsetting by column.

Improvements to “ds.dataFrame” and “ds.cbind” functions: ensures that the combining of variables in data frames that are created by those two functions maintains the actual class of each input variable.

Improvements to "ds.names" and "namesDS" functions: improved checking that function is operation on a list object which may include objects whose primary class is something else (e.g. glm, for output from a generalized linear model) but is also a list.

Improvements to "ds.dataFrameFill" and "dataFrameFillDS" functions: ensures that the classes of any created columns match the classes of existing columns in the other studies.

Enhancements of build and testing system: remove reliance on "opal" and "opaladmin" packages.

Supported Versions​

DataSHIELD v6.0.1 is supported on R3.5, R3.6 and R 4.0, and would be expected to work with intermediate versions. At present the DataSHIELD client-side package is known to work on Ubuntu 16.04, Ubuntu 18.04 and Windows 10. DataSHIELD server-side package is known to work when deployed to Opal 3.0.1 running on Ubuntu 16.04.

Code Availability

As ever, you can see the code at a variety of places: https://cran.obiba.org/,

or https://github.com/datashield/dsBase/tree/6.0.1

and https://github.com/datashield/dsBaseClient/tree/6.0.1 

Release Notes for v6.0 of DataSHIELD

Focus of Release

The major focuses of the v6.0 release of DataSHIELD is the addition of new analytical functions and the integration with DataSHIELD Interface (DSI).

Changes from DataSHIELD v5.1 to v6.0

DataSHIELD Interface (DSI)

DataSHIELD’s dsBaseClient package now uses DataSHIELD Interface (DSI) to communicate with the Opal Server, this replaces the legacy opal R package. This will be a breaking change for code written to use DataSHIELD v4 and v5. The main impact on end users of DataSHIELD it the new technique for logging in to and out of the server.

The motivation for this change is to give DataSHIELD the ability to connect to other types of server in the future. More information about DSI can be found on it’s GitHub page at https://github.com/datashield/DSI

New Analytical Functions

The functions ds.completeCases, ds.glmerSLMA, ds.lmerSLMA, ds.rep, ds.sample and ds.table have been added to the suite of analytical functions provided by DataSHIELD.

ds.completeCases: constructs a modified data frame, matrix or vector, contains no missing values

ds.glmerSLMA: fits a Generalized Linear Mixed-Effects Model (GLME) on data from one or multiple sources with pooling via SLMA

ds.lmerSLMA: fits a Linear Mixed-Effects Model (lme) - can include both fixed and random-effects - on data from one or multiple sources with pooling via SLMA (Study-Level Meta-Analysis)

ds.rep: creates a repetitive sequence by repeating the specified scalar number, vector or list in each data source

ds.sample: draws a pseudorandom sample from a vector, dataframe or matrix on the serverside

ds.table: creates 1-dimensional, 2-dimensional and 3-dimensional tables using the table function in native R

Changed Functions

The functions ds.dim, ds.length, ds.colnames, ds.ls and ds.levels have been reimplemented not to use the server-side aliases dim, length, colnames, ls and levels (which have now been removed), but now dedicated DataSHIELD server-side functions dimDS, lengthDS, colnamesDS, lsDS and levelsDS. These changes should not affect the behaviour of the functions, they merely reduce the reliance on non-DataSHIELD functions internally on the server and therefore make it more secure and reliable.

The functions ds.cbind and ds.dataFrame have been modified to remove any “DATAFRAME.NAME$“ strings from the column names of the assigned data frames. In addition, the new version of the ds.cbind function generates data frames instead of matrices. We have also fixed a bug related to this issue, on how the two functions were defining the column names in the assigned dataframes when the order of the input components is different in different studies.

An additional disclosure control was added to the ds.cov and ds.cor functions. The disclosure control checks that the number of the input variables is lower than a pre-specified proportion of the individual-level records. To specify the maximum allowed proportion we have used the same filter as the one used in the ds.glm function which checks if the regression model is not oversaturated (you can find more details here). The used filter is set by default to 0.33 which means for example that for a dataframe of 100 rows (i.e. individual-level records) only the variance-covariance or the correlation matrix of up to 33 variables can be returned.

Deprecated Functions

There are a number of functions in DataSHIELD v6.0 which should be regarded as deprecated - i.e. they are still there, but we strongly recommend you stop using them as they will be removed in v6.1. The functions which are deprecated are shown below, along with their replacements which should be used as soon as is practicable.

ds.setDefaultOpal, and should be replaced by datashield.connections_defaults

ds.listOpals, and should be replaced by datashield.connections

ds.table1DS, and should be replaced by ds.table

ds.table2DS, and should be replaced by ds.table

ds.look

ds.meanByClass, and should be replaced by ds.meanSdGp

ds.message

ds.recodeLevels, and should be replaced by ds.recodeValues

ds.subset, and should be replaced by ds.dataFrameSubset

ds.subsetByClass, and should be replaced by ds.dataFrameSubset

ds.vectorCalc, and should be replaced by ds.make

It should be noted that use of [ and ] should be avoided when performing analysis, specially in conjunction with ds.dataFrameSubset.

Deprecated Aliases

There are a number of server-side aliases in DataSHIELD v6.0 which should be regarded as deprecated, so should not be used as they will be removed in v6.1. The aliases which are deprecated are:

is.character (aggregate alias)

is.factor (aggregate alias)

is.list (aggregate alias)

is.null (aggregate alias)

is.numeric (aggregate alias)

NROW (aggregate alias)

t.test (aggregate alias)

as.character (assign alias)

as.null (assign alias)

as.numeric (assign alias)

attach (assign alias)

complete.cases (assign alias)

rep (assign alias)

unlist (assign alias)

In addition to the depreciated function it should be noted that it is planned to rename ds.meanSdGp to ds.meanSDByClass in DataSHIELD v6.1.

Function documentation

The documentation of all DataSHIELD functions has been updated.

This new documentation has the same format in all the functions and examples with the logging in according to version 6.0, the usage of the function, and the logging out from the server.

Continuous integration

We have continued to develop our continuous integration (CI), and how have 6310 tests which are run every day and on every proposed code change.

How to upgrade

Update DataSHIELD server-side package

If you have a suitable version of Opal server, and you would like to upgrade the DataSHIELD server package (dsBase). This can be done via the Opal Web Portal. If you go to the “DataSHIELD” page within the “Administration” section of the Opal Web Portal, the old “dsBase” can be removed, then using the “+Add Package” button the new version of “dsBase” can be installed. Select “Install all DataSHIELD packages” then press the “Install” button.

Update DataSHIELD client-side package

If you have installed the DataSHIELD client package (dsBaseClient) using the function install.packages and specifying the Obiba repository, then you can update the client package as follows:

# R

> update.packages(repos='http://cran.obiba.org')

If you do not have the “DSI” and “DSOpal” packages installed these packages can be installed as follows:

# R

> install.packages('DSOpal')

as installing ‘DSOpal’ will cause the installation of 'DSI'.

Supported versions

DataSHIELD v6.0 is supported on R3.5, R3.6 and R 4.0, and would be expected to work with intermediate versions. At present the DataSHIELD client-side package is known to work on Ubuntu 16.04, Ubuntu 18.04 and Windows 10. DataSHIELD server-side package is known to work when deployed to Opal 2.16.0 running on Ubuntu 16.04.

Code availability

As ever, you can see the code at a variety of places: https://cran.obiba.org/, or https://github.com/datashield/dsBase/tree/6.0.0 and https://github.com/datashield/dsBaseClient/tree/6.0.0 


For Release Notes relevant to older versions of DataSHIELD which are no longer supported, see the Release History page