Published at October 27th, 2025
•
Last updated 7 days ago
Deduplication of Activities
Activity
This job is intended for use with the Community Module
Duplicate Resolver:
The menu "duplicate titles" is available.
"Manage duplicates" is available in the editor.
If two records are marked as duplicates, see below for the merge strategy.
Automated Deduplication job:
Criteria decorators:
This finds candidates through DB query (If either of them is fulfilled the pair is considered as potential duplicates).
-
Titles are at the least 90 pct similar and year is the same
- Search through all existing Activity
- If there is an Activity, where both of these are fulfilled, this is considered a potential duplicate
-
Titles:
- Titles are 80 pct similar
-
Date :
- If the specific date is present, then this year will be used
- If not, but start date in period is present, then this year will be used.
-
Titles:
- OR
-
Classified id
- Search through all existing Activities
- If there is an Activity with a classified id (Source) of same type, and identical value, this is considered a potential duplicate
- OR
- All persons and dates are identical, and titles are similar
- Search through all existing Activities
- If there is an activity, where both are fulfilled, this is considered a potential duplicate
- Date:
- The date has to be identical
- Only year
- The date has to be identical
- Persons:
- The same persons are assigned to both activities
- Titles:
- Titles are 60 pct similar
- Date:
Duplicate match strategy:
This is a programmatic match. All criteria must be met, before proceeding to merge.
- The template is the same.
- AND
- If both contents have a classified id of the same type with the same value
- if titles have an equality score (Levenshtein Distance) above 80 pct
- Contents will be merged
- if titles have an equality score (Levenshtein Distance) above 80 pct
- If not, all of the remaining have to be fulfilled:
- Visibility is the same
- AND
-
Titles are at the least 90 pct similar
- Cleaned for tags
- Made lower case
- Note:
- If title is generically generated (from Event, Journal, etc.) the description is used as the title
- This is ex. to prevent merging two talks from the same person at the same event
- If title is generically generated (from Event, Journal, etc.) the description is used as the title
- AND
- All organisations are present for both duplicate and content
- First checked with name
-
If not fulfilled checked with classifiedID
- All organisations need to have a classified ID
- AND
- All persons are present for both duplicate and content (matched with name) OBS: Persons on source and targets are required to have same roles.
-
Persons need to have the same name (Also if stated as internal at one activity, and external or the other) before we merge.
- Different name variants are also checked
-
Persons need to have the same name (Also if stated as internal at one activity, and external or the other) before we merge.
- AND
- Type (Type Classification) is the same
- AND
-
Start date are the same (Specific date or start date in period is the same)
- If month is present for both they are also compared, else ignored
- AND
- Activity category is the same (If present, else ignored)
- AND
- If the activities are:
- Of a type with one of these: Event, External Organisation, Organisation, Publisher or Journal.
- That has to be the same on both activities
- EX: if there is an Event assigned, both activities need to have the same Event assigned
- That has to be the same on both activities
- Of a type with a host or a visitor
- That has to be the same on both activities
- Of a type with one of these: Event, External Organisation, Organisation, Publisher or Journal.
- OBS: It is possible to toggle that identical organisations and persons is not a requirement for merge.
Merge converter:
For the merge purpose a target is chosen from the duplicate candidates. If nothing else is stated the target value will appear on the merged version.
The target is the activity that have existed longest.
- Sources:
- A predefined function in “Abstract merge converter”
- Merge source and source ID
- If target source id or target source is empty
- The source of the sourceContent is set as the target source
- The sourceId of the sourceContent is set as the target sourceId
- The external status of the sourceContent is set as the target external status
- If the target source and sourceContent source and the target sourceId and sourceContent sourceId is not equal
- If the target has no secondary source that match sourceContent source
- The sourceContent source is set as a secondary source for target
- If the target has no secondary source that match sourceContent source
- If target source id or target source is empty
- Merge secondary sources
- If sourceContent has secondary source
- For each of the secondary sources in sourceContent
- If each of the following is fulfilled
- Secondary source does not match a source in target
- Secondary source Id does not match a sourceId in target
- If there is no secondary source in target that matches the secondary source
- The secondary source is set as target secondary source
- If each of the following is fulfilled
- For each of the secondary sources in sourceContent
- If sourceContent has secondary source
- Merge source data
- If the target source data does not contains the sourceContent sourcedata key
- The sourceContent DataEntry is set as the target source data
- If the target source data does not contains the sourceContent sourcedata key
- Merge source and source ID
- A predefined function in “Abstract merge converter”
- Ids:
- A predefined function in “Abstract merge converter”
- For sources in sourceContent
- If there is no matching classified source in target
- The classified source is cloned and added to target sources
- If there is no matching classified source in target
- For sources in sourceContent
- A predefined function in “Abstract merge converter”
- PreviousUuids:
- A predefined function in “Abstract merge converter”
- The sourceContent uuid is set as previousUuid for target
- If sourceContent has previousUuid
- These are also added as previousUuid
- A predefined function in “Abstract merge converter”
- Keywords:
- A predefined function in “Abstract merge converter”
- If source has keywordsGroups
- For each of the keywordGroups in source
- If not target already has the keyword
- The keyword is added to target
- If not target already has the keyword
- For each of the keywordGroups in source
- If source has keywordsGroups
- A predefined function in “Abstract merge converter”
- Links:
- A predefined function in “Abstract merge converter”
- For each link in source
- If target does not have the link
- The link is added to the target link
- If there is a link with identical sourceId
- The Link is updated
- If target does not have the link
- For each link in source
- A predefined function in “Abstract merge converter”
- Documents:
- A predefined function in “Abstract merge converter”
- For each document in source, if either of the following if fulfilled
- Target has a document with same source source Id
- Target has a document with same file name
- Target has a document with same tile
- The document is cloned and added to target document list
- For each document in source, if either of the following if fulfilled
- A predefined function in “Abstract merge converter”
- Clipping relations
- Relations is added to target, if not already present
- Publication relations:
- Relations is added to target, if not already present
- Impact relations:
- Relations is added to target, if not already present
- Equipment relations:
- Relations is added to target, if not already present
- Thesis relations:
- Relations is added to target, if not already present
- Title:
- Copy only locals that does not exist already
- Descriptions:
- For each description in source
- If target does not contain a description of the given type
- The description is cloned and added to target list of descriptions
- If there is a description with same type
- Locals that does not already exists are copied
- If target does not contain a description of the given type
- For each description in source
- Organisations:
- For each organisation in source
- If not:
- Target already have the organisation
- OR
- Target organisations and the source organisation have identical sources
- A new organisation association is constructed
- The source organisation is added to new association
- The sources for the source organisation and the new association are merged
- The new association is added to target list of organisations associations
- A new organisation association is constructed
- If not:
- For each organisation in source
- External organisations:
- For each of the source external organisations
- If not:
- Target already have the external organisation
- OR
- Target organisations and the source organisation have identical sources
- The external organisation is added to target list of external organisations
- The existing external organisation is just added to the target list of external organisations
- The external organisation is added to target list of external organisations
- If not:
- For each of the source external organisations
- Persons:
- For each person association in source
-
If target has an identical person, with same role (Checked with name and name variants)
- Secondary sources from person association in source is assigned to target person association
- If the source person is internal, and the target person is external
- All internal organisation associations are added to the internal person
- The external person from target, is replaced by the internal person from source
- If target person is internal, and the source person is external
- The internal organisation associations are added to the internal person
- If target does not have an identical person, or have the same person with another role
- OR
- If target does not a person with same source id
- A new Person Association is created
- From the person Association in source the following is added to new association:
- Person
- Source Id
- Source
- External Organisation
- External or not
- Name
- Secondary sources
- Source Data
- The person association is added to target
-
If target has an identical person, with same role (Checked with name and name variants)
- For each person association in source
- Indicators
- For each of the source indicators:
- If not:
- Target already have that indicator
- OR
- Target indicators and the source indicator have identical sources
- The source indicator is added to target list of indicators
- No clone is generated, the existing source indicator is just added to the target list of indicators.
- The source indicator is added to target list of indicators
- If not:
- For each of the source indicators:
- Owner:
- If target is null. It is replaced by source
- Enddate:
- If target is null. It is replaced by source
- Startdate:
- If target is null. It is replaced by source
- Degree of recognition:
- If target is null. It is replaced by source
- Visibility:
- Lowest visibility wins
- Activity Activity Relations:
- Activity relation is moved from source to target