Published at October 27th, 2025 Last updated 7 days ago

Deduplication of persons

This article details the deduplication of external organizations according to the automatic process.

Identification and validation

The following chart shows how the system identifies and validates merge candidates:

Identification and validation process of merge candidates from persons records

Notes on matching strategies

 

Scopus ID matching

If there are multiple Scopus ID present on individual profiles,  then there has to be at least one common ID overlap.  Not ALL of the (multiple) Scopus ID present have to match

Name matching

The system extracts all first and last names for a given person record, including person's main name, name variants, and any additional names.

It then normalises these names by converting them to lowercase, trimming whitespace, and removing special characters like hyphens, parentheses, and periods.

 

Merge target identification

If one of the records has active employment, then the record with active employment is the target of the merge. If both records have active employment, then the record with the most recent employment date is the target of the merge.

Merge

The system merges values in the following fields according to the default merge logic:

  • IDs
  • NAME
  • SEX
  • DATE_OF_BIRTH
  • ORGANISATION_ASSOCIATIONS
  • PROFILE_INFORMATION
  • NAME_VARIANTS
  • AFFILIATION_NOTE
  • EMPLOYEE_START_DATE
  • IS_EXPERT
  • RETIRAL_DATE
  • ACADEMIC_PROFFESION_ENTRY
  • PRIVATE_ADDRESS
  • WILLINGNESS_TO_PHD
  • PHD_RESEARCH
  • NATIONALITY
  • NAMES
  • TITLES
  • LEAVE_OF_ABSENCE
  • ORCID
  • ORCID_TOKEN
  • NOTES
  • LINKS
  • PERSON_EDUCATIONS
  • PROFESSIONAL_QUALIFICATIONS
  • EXTERNAL_POSITIONS
  • IMAGES
  • KEYWORD_GROUPS
  • DOCUMENTS

 

  •