Skip to content

565591/Duplicate-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Duplicate-Detection

In this project I use LSH and agglomerative clustering to find duplicate products across web shops.

Using the function "plotPerformanceAcrossMultipleThresholds()" will run 5 bootstraps for each of the following LSH thresholds: 0.3, 0.4, 0.5,0.6,0.65,0.7,0.75 and record the average performance across the 5 bootstraps and then make plots of all the performance measures with respect to the amount of comparisons made as a percentage of total amount of possible comparisons.

Using the function "perform5Bootstraps_andReturnAveragePerformanceMeasures(t)" will run 5 bootstraps for the threshold "t" that needs to be given as input and returns the average performance measures across the 5 bootstraps in the following order: averageFractionOfComparisonsMade,averagePrecision,averageRecall,averagePairQuality,averagePairCompleteness,averagef1_star,averagef1

About

Using LSH and agglomerative clustering to find duplicate products across web shops

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages