Rmonize 2.0.0 (release : 2025-06-26)

Attention: Some changes to functions in the current version of madshapR may require updates of existing code.

Superseded object.

previous version (1.1.0 and older) version 2.0.0
Rmonize_DEMO Rmonize_examples

Superseded parameters.

In functions show_harmo_error(), harmonized_dossier_evaluate(), harmonized_dossier_summarize() and harmonized_dossier_visualize(), the parameters have been simplified into one and only “dossier” https://github.com/maelstrom-research/Rmonize/issues/110 https://github.com/maelstrom-research/Rmonize/issues/109 https://github.com/maelstrom-research/Rmonize/issues/108 https://github.com/maelstrom-research/Rmonize/issues/98 https://github.com/maelstrom-research/Rmonize/issues/93 https://github.com/maelstrom-research/Rmonize/issues/92

previous version (1.1.0 and older)

harmonized_dossier_evaluate(
  harmonized_dossier,dataschema,taxonomy,as_dataschema_mlstr)

harmonized_dossier_summarize(
  harmonized_dossier,group_by,dataschema,data_proc_elem,
  taxonomy,valueType_guess)

harmonized_dossier_visualize(
  harmonized_dossier,bookdown_path,group_by,harmonized_dossier_summary,
  dataschema,data_proc_elem,valueType_guess,taxonomy)

version 2.0.0

harmonized_dossier_evaluate(harmonized_dossier)
harmonized_dossier_summarize(harmonized_dossier)
harmonized_dossier_visualize(harmonized_dossier,bookdown_path)

Superseded function behaviors and/or output structures.

In harmonized_dossier_evaluate(), the columns generated in the outputs have been renamed as follows :

previous version (1.1.0 and older) current version (2.0.0)
index Index
name Variable name
label Variable label
valueType Data dictionary valueType
Categories::label Categories in data dictionary
Categories::missing Non-valid categories

In harmonized_dossier_summarize(), the columns generated in the outputs have been renamed as follows :

previous version (1.1.0 and older) current version (2.0.0)
index in data dict.name Index
name Variable name
label Variable label
Estimated dataset valueType Suggested valueType
Actual dataset valueType Dataset valueType
Total number of observations Number of rows
Nb. distinct values Number of distinct values
Nb. valid values Number of valid values
Nb. non-valid values Number of non-valid values
Nb. NA Number of empty values
% total Valid values % Valid values
% Non-valid values % Non-valid values
% NA % Empty values
———————————— ———————————

Bug fixes and improvements

Enhancements in the assessment, the summary and the visual reports!

https://github.com/maelstrom-research/Rmonize/issues/57 https://github.com/maelstrom-research/Rmonize/issues/53 https://github.com/maelstrom-research/Rmonize/issues/49 https://github.com/maelstrom-research/Rmonize/issues/48 https://github.com/maelstrom-research/Rmonize/issues/39 https://github.com/maelstrom-research/Rmonize/issues/37 https://github.com/maelstrom-research/Rmonize/issues/33 https://github.com/maelstrom-research/Rmonize/issues/32 https://github.com/maelstrom-research/Rmonize/issues/29

Rmonize 1.1.0

Bug fixes and improvements

deprecated functions

To avoid confusion with help(function), the function Rmonize_help() has been renamed Rmonize_website().

Dependency changes

Rmonize 1.0.1

Bug corrections and enhancements after testing with real data.

Bug fixes and improvements

Improvement in handling pooled data

The functions harmo_process(), pool_harmonized_dataset_create(), harmonized_dossier_create(), harmonized_dossier_evaluate(), harmonized_dossier_summarize(), harmonized_dossier_visualize() share the same parameter “harmonized_col_dataset” which is (if exists) the name of the column referring the input dataset names. If this column exists and is declared by the user, this will be used across the pipeline as a grouping/separating variable. By default, the name of each dataset will be used instead.

rename DEMO_file_harmo into Rmonize_DEMO and update examples

suppress the parameter overwrite = TRUE in the functions xxx_visualize()

in visual reports, void confusing changes in color scheme in visual reports.

Histograms for date variables display valid ranges.

in reports, change % NA as proportion in reports.

harmonized_dossier_visualize() report shows variable labels in the same language.

put id_creation in script and in rule in dpe (as in direct_mapping)

Allow special characters in names of datasets and data_dicts

In visual reports, the bar plot only appears when there are multiple missing value types, otherwise only the pie chart is shown.

enhance harmonized_dossier_visualize() output

enhance show_harmo_error() output

in reports, all of the percentages are now included under “Other values (non categorical)”, which gives a single value.

Function recode with special character is possible now

Rmonize 1.0.0

Functions to support rigorous retrospective data harmonization processing, evaluation, and documentation across datasets in a dossier based on Maelstrom Research guidelines. The package includes the core functions to evaluate and format the main inputs that define the harmonization process, apply specified processing rules to generate harmonized data, diagnose processing errors, and summarize and evaluate harmonized outputs.

This is still a work in progress, so please let us know if you used a function before and is not working any longer.

Helper functions and objects

Assess and manipulate input files

Data processing

Evaluation of the harmonization process

import from madshapR package:

as_data_dict(),is_data_dict(), as_data_dict_mlstr(),is_data_dict_mlstr(), as_dataset(),is_dataset(), as_dossier(),is_dossier(), as_taxonomy()

data_extract(),data_dict_extract(), data_dict_apply(),dataset_zap_data_dict(),dossier_create() valueType_adjust()

dataset_evaluate(), data_dict_evaluate(),dossier_evaluate(), dataset_summarize(),dossier_summarize()

bookdown_template(),bookdown_render(),bookdown_open(), dataset_visualize()