Skip to content

ETL skipDuplicates did not work in debug mode #7682

@DamianZhou

Description

@DamianZhou

OrientDB Version: 2.2.26

Java Version: 1.8.0

java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)

OS: debian

Debian 4.9.30-2+deb9u2~bpo8+1 (2017-06-27) x86_64 GNU/Linux

Expected behavior

using ETL vertex Transformers with skipDuplicates=true to load vertexes, should skip duplicated keys which have seted to UNIQUE.

Actual behavior

when log=debug, it fails.
see

[209:vertex] DEBUG Transformer input: {server:328,id:91926}
Error in Pipeline execution: com.orientechnologies.orient.core.storage.ORecordDuplicatedException: Cannot index record Player{server:328,id:91926}: found duplicated key '91926' in index 'Player.id' previously assigned to the record #95:4
	DB name="trans"
	DB name="trans" INDEX=Player.id RID=#95:4
[orientdb] INFO committing
ETL process has problem: java.util.concurrent.ExecutionException: com.orientechnologies.orient.core.storage.ORecordDuplicatedException: Cannot index record Player{server:328,id:91926}: found duplicated key '91926' in index 'Player.id' previously assigned to the record #95:4
	DB name="trans"
	DB name="trans" INDEX=Player.id RID=#95:4
END ETL PROCESSOR

if we remove

  "config": {
    "log": "debug"
  },

, if works fine

Steps to reproduce

source data is csv file, like

server,id
24,34715
24,37015
328,91926
32,92474
32,93789
114,70276
328,91926

etl config is

{
  "source": {
    "file": {
      "path": "..path../test.csv"
    }
  },
  "extractor": {
    "csv": {
      "columnsOnFirstLine": true,
      "ignoreEmptyLines": true,
      "columns": [
        "server:integer",
        "id:integer"
      ]
    }
  },
  "transformers": [
    {
      "vertex": {
        "class": "Player",
        "skipDuplicates": true
      }
    }
  ],
  "loader": {
    "orientdb": {
      "dbURL": "remote:localhost:9195/trans",
      "dbUser": "root",
      "dbPassword": "pwd",
      "serverUser": "root",
      "serverPassword": "pwd",
      "dbAutoDropIfExists": true,
      "dbAutoCreate": true,
      "batchCommit": 1000,
      "dbType": "graph",
      "classes": [
        {
          "name": "Player",
          "extends": "V"
        }
      ],
      "indexes": [
        {
          "class": "Player",
          "fields": [
            "id:integer"
          ],
          "type": "UNIQUE"
        }
      ]
    }
  }
}

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions