Published at October 27th, 2025
      •
      Last updated 8 days ago
    
    Deduplication of Activities
Activity
This job is intended for use with the Community Module
Duplicate Resolver:
The menu "duplicate titles" is available.
"Manage duplicates" is available in the editor.
If two records are marked as duplicates, see below for the merge strategy.
Automated Deduplication job:
Criteria decorators:
This finds candidates through DB query (If either of them is fulfilled the pair is considered as potential duplicates).
- 
Titles are at the least 90 pct similar and year is the same
- Search through all existing Activity
 - If there is an Activity, where both of these are fulfilled, this is considered a potential duplicate
- 
Titles:
- Titles are 80 pct similar
 
 - 
Date :
- If the specific date is present, then this year will be used
 - If not, but start date in period is present, then this year will be used.
 
 
 - 
Titles:
 
 - OR
 - 
Classified id 
- Search through all existing Activities
 - If there is an Activity with a classified id (Source) of same type, and identical value, this is considered a potential duplicate
 
 - OR
 - All persons and dates are identical, and titles are similar
- Search through all existing Activities
 - If there is an activity, where both are fulfilled, this is considered a potential duplicate
- Date:
- The date has to be identical
- Only year
 
 
 - The date has to be identical
 - Persons:
- The same persons are assigned to both activities
 
 - Titles:
- Titles are 60 pct similar
 
 
 - Date:
 
 
Duplicate match strategy:
This is a programmatic match. All criteria must be met, before proceeding to merge.
- The template is the same.
 - AND
 - If both contents have a classified id of the same type with the same value
- if titles have an equality score (Levenshtein Distance) above 80 pct 
- Contents will be merged
 
 
 - if titles have an equality score (Levenshtein Distance) above 80 pct 
 - If not, all of the remaining have to be fulfilled:
- Visibility is the same
 - AND
 - 
Titles are at the least 90 pct similar
- Cleaned for tags
 - Made lower case
 - Note:
- If title is generically generated (from Event, Journal, etc.) the description is used as the title
- This is ex. to prevent merging two talks from the same person at the same event
 
 
 - If title is generically generated (from Event, Journal, etc.) the description is used as the title
 
 - AND
 - All organisations are present for both duplicate and content 
- First checked with name
 - 
If not fulfilled checked with classifiedID
- All organisations need to have a classified ID
 
 
 - AND
 - All persons are present for both duplicate and content (matched with name) OBS: Persons on source and targets are required to have same roles.
- 
Persons need to have the same name (Also if stated as internal at one activity, and external or the other) before we merge.
- Different name variants are also checked
 
 
 - 
Persons need to have the same name (Also if stated as internal at one activity, and external or the other) before we merge.
 - AND
 - Type (Type Classification) is the same
 - AND
 - 
Start date are the same (Specific date or start date in period is the same) 
- If month is present for both they are also compared, else ignored
 
 - AND
 - Activity category is the same (If present, else ignored)
 - AND
 - If the activities are:
- Of a type with one of these: Event, External Organisation, Organisation, Publisher or Journal.
- That has to be the same on both activities
- EX: if there is an Event assigned, both activities need to have the same Event assigned
 
 
 - That has to be the same on both activities
 - Of a type with a host or a visitor
- That has to be the same on both activities
 
 
 - Of a type with one of these: Event, External Organisation, Organisation, Publisher or Journal.
 
 - OBS: It is possible to toggle that identical organisations and persons is not a requirement for merge.
 
Merge converter:
For the merge purpose a target is chosen from the duplicate candidates. If nothing else is stated the target value will appear on the merged version.
The target is the activity that have existed longest.
-  Sources:
- A predefined function in “Abstract merge converter”
- Merge source and source ID
- If target source id or target source is empty
- The source of the sourceContent is set as the target source
 - The sourceId of the sourceContent is set as the target sourceId
 - The external status of the sourceContent is set as the target external status
 
 - If the target source and sourceContent source and the target sourceId and sourceContent sourceId is not equal
- If the target has no secondary source that match sourceContent source
- The sourceContent source is set as a secondary source for target
 
 
 - If the target has no secondary source that match sourceContent source
 
 - If target source id or target source is empty
 - Merge secondary sources
- If sourceContent has secondary source
- For each of the secondary sources in sourceContent
- If each of the following is fulfilled
- Secondary source does not match a source in target
 - Secondary source Id does not match a sourceId in target
 - If there is no secondary source in target that matches the secondary source
- The secondary source is set as target secondary source
 
 
 
 - If each of the following is fulfilled
 
 - For each of the secondary sources in sourceContent
 
 - If sourceContent has secondary source
 - Merge source data
- If the target source data does not contains the sourceContent sourcedata key
- The sourceContent DataEntry is set as the target source data
 
 
 - If the target source data does not contains the sourceContent sourcedata key
 
 - Merge source and source ID
 
 - A predefined function in “Abstract merge converter”
 - Ids:
- A predefined function in “Abstract merge converter”
- For sources in sourceContent
- If there is no matching classified source in target
- The classified source is cloned and added to target sources
 
 
 - If there is no matching classified source in target
 
 - For sources in sourceContent
 
 - A predefined function in “Abstract merge converter”
 - PreviousUuids:
- A predefined function in “Abstract merge converter”
- The sourceContent uuid is set as previousUuid for target
 - If sourceContent has previousUuid
- These are also added as previousUuid
 
 
 
 - A predefined function in “Abstract merge converter”
 - Keywords:
- A predefined function in “Abstract merge converter”
- If source has keywordsGroups
- For each of the keywordGroups in source
- If not target already has the keyword 
- The keyword is added to target
 
 
 - If not target already has the keyword 
 
 - For each of the keywordGroups in source
 
 - If source has keywordsGroups
 
 - A predefined function in “Abstract merge converter”
 - Links:
- A predefined function in “Abstract merge converter”
- For each link in source
- If target does not have the link
- The link is added to the target link
 
 - If there is a link with identical sourceId
- The Link is updated
 
 
 - If target does not have the link
 
 - For each link in source
 
 - A predefined function in “Abstract merge converter”
 - Documents:
- A predefined function in “Abstract merge converter”
- For each document in source, if either of the following if fulfilled
- Target has a document with same source source Id
 - Target has a document with same file name
 - Target has a document with same tile 
- The document is cloned and added to target document list
 
 
 
 - For each document in source, if either of the following if fulfilled
 
 - A predefined function in “Abstract merge converter”
 - Clipping relations
- Relations is added to target, if not already present
 
 - Publication relations:
- Relations is added to target, if not already present
 
 - Impact relations:
- Relations is added to target, if not already present
 
 - Equipment relations:
- Relations is added to target, if not already present
 
 - Thesis relations:
- Relations is added to target, if not already present
 
 - Title:
- Copy only locals that does not exist already
 
 - Descriptions:
- For each description in source
- If target does not contain a description of the given type
- The description is cloned and added to target list of descriptions
 
 - If there is a description with same type
- Locals that does not already exists are copied
 
 
 - If target does not contain a description of the given type
 
 - For each description in source
 - Organisations:
- For each organisation in source
- If not:
- Target already have the organisation
 - OR
 - Target organisations and the source organisation have identical sources
- A new organisation association is constructed
- The source organisation is added to new association
 - The sources for the source organisation and the new association are merged
 - The new association is added to target list of organisations associations
 
 
 - A new organisation association is constructed
 
 
 - If not:
 
 - For each organisation in source
 - External organisations:
- For each of the source external organisations
- If not:
- Target already have the external organisation
 - OR
 - Target organisations and the source organisation have identical sources
- The external organisation is added to target list of external organisations
- The existing external organisation is just added to the target list of external organisations
 
 
 - The external organisation is added to target list of external organisations
 
 
 - If not:
 
 - For each of the source external organisations
 - Persons:
- For each person association in source
- 
If target has an identical person, with same role (Checked with name and name variants) 
- Secondary sources from person association in source is assigned to target person association
 - If the source person is internal, and the target person is external
- All internal organisation associations are added to the internal person
 - The external person from target, is replaced by the internal person from source
 
 - If target person is internal, and the source person is external
- The internal organisation associations are added to the internal person
 
 
 - If target does not have an identical person, or have the same person with another role
 - OR
 - If target does not a person with same source id
- A new Person Association is created
 - From the person Association in source the following is added to new association:
- Person
 - Source Id
 - Source
 - External Organisation
 - External or not
 - Name
 - Secondary sources
 - Source Data
 
 - The person association is added to target
 
 
 - 
If target has an identical person, with same role (Checked with name and name variants) 
 
 - For each person association in source
 - Indicators
- For each of the source indicators:
- If not:
- Target already have that indicator
 - OR
 - Target indicators and the source indicator have identical sources
- The source indicator is added to target list of indicators
- No clone is generated, the existing source indicator is just added to the target list of indicators.
 
 
 - The source indicator is added to target list of indicators
 
 
 - If not:
 
 - For each of the source indicators:
 - Owner:
- If target is null. It is replaced by source
 
 - Enddate:
- If target is null. It is replaced by source
 
 - Startdate:
- If target is null. It is replaced by source
 
 - Degree of recognition:
- If target is null. It is replaced by source
 
 - Visibility:
- Lowest visibility wins
 
 - Activity Activity Relations:
- Activity relation is moved from source to target