Blocking

104 papers with code • 5 benchmarks • 3 datasets

Entity resolution (also known as entity matching, record linkage, or duplicate detection) is the task of finding records that refer to the same real-world entity across different data sources (e.g., data files, books, websites, and databases). (Source: Wikipedia)

Blocking is a crucial step in any entity resolution pipeline because a pair-wise comparison of all records across two data sources is infeasible. Blocking applies a computationally cheap method to generate a smaller set of candidate record pairs reducing the workload of the matcher. During matching a more expensive pair-wise matcher generates a final set of matching record pairs.

Survey on blocking:

Papadakis et al.: Blocking and Filtering Techniques for Entity Resolution: A Survey, 2020.

Benchmarks

Add a Result

These leaderboards are used to track progress in Blocking

Dataset	Best Model	Compare
Abt-Buy	Sudowoodo	See all
Amazon-Google	SC-Block	See all
WDC Block - small	BM25	See all
WDC Block - medium	SC-Block	See all
WDC Block - large	SC-Block	See all

Libraries

Use these libraries to find Blocking models and implementations

faceonlive/ai-research

2 papers

261

ftramer/ad-versarial

2 papers

Datasets

Most implemented papers

Most implemented Social Latest No code

No-Reference Image Quality Assessment in the Spatial Domain

utlive/BRISQUE • IEEE Transacations on Image Processing 2012

We propose a natural scene statistic-based distortion-generic blind/no-reference (NR) image quality assessment (IQA) model that operates in the spatial domain.

Paper
Code

Ethnicity sensitive author disambiguation using semi-supervised learning

glouppe/paper-author-disambiguation • 31 Aug 2015

Author name disambiguation in bibliographic databases is the problem of grouping together scientific publications written by the same person, accounting for potential homonyms and/or synonyms.

Paper
Code

DS-MLR: Exploiting Double Separability for Scaling up Distributed Multinomial Logistic Regression

params/dsmlr • 16 Apr 2016

Scaling multinomial logistic regression to datasets with very large number of data points and classes is challenging.

Paper
Code

A Systematic Approach to Blocking Convolutional Neural Networks

stanford-mast/nn_dataflow • 14 Jun 2016

Convolutional Neural Networks (CNNs) are the state of the art solution for many computer vision problems, and many researchers have explored optimized implementations.

Paper
Code

ExtremeWeather: A large-scale climate dataset for semi-supervised detection, localization, and understanding of extreme weather events

eracah/hur-detect • • NeurIPS 2017

We present a dataset, ExtremeWeather, to encourage machine learning research in this area and to help facilitate further work in understanding and mitigating the effects of climate change.

Paper
Code

Learning a Virtual Codec Based on Deep Convolutional Neural Network to Compress Image

mdcnn/mdcnn.github.io • 16 Dec 2017

Due to the challenge of directly learning a non-linear function for a standard codec based on convolutional neural network, we propose to learn a virtual codec neural network to approximate the projection from the valid description image to the post-processed compressed image, so that the gradient could be efficiently back-propagated from the post-processing neural network to the feature description neural network during training.

Paper
Code

Learning to Customize Network Security Rules

mibarg/IP-Grouping • 28 Dec 2017

The results prove the hypothesis that firewall rules can be automatically generated based on router data, and that an automated method can be effective in blocking a high percentage of malicious traffic.

Paper
Code

An efficient deep convolutional laplacian pyramid architecture for CS reconstruction at low sampling ratios

WenxueCui/LapCSNet • 13 Apr 2018

To address this problem, we propose a deep convolutional Laplacian Pyramid Compressed Sensing Network (LapCSNet) for CS, which consists of a sampling sub-network and a reconstruction sub-network.

Paper
Code

AdGraph: A Graph-Based Approach to Ad and Tracker Blocking

brandoningli/cs-455-project • 22 May 2018

AdGraph differs from existing approaches by building a graph representation of the HTML structure, network requests, and JavaScript behavior of a webpage, and using this unique representation to train a classifier for identifying advertising and tracking resources.

Paper
Code

chemmodlab: A Cheminformatics Modeling Laboratory for Fitting and Assessing Machine Learning Models

jrash/chemmodlab • 30 Jun 2018

The goal of chemmodlab is to streamline the fitting and assessment pipeline for many machine learning models in R, making it easy for researchers to compare the utility of new models.

Paper
Code

Blocking

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result