Blocking

104 papers with code • 5 benchmarks • 3 datasets

Entity resolution (also known as entity matching, record linkage, or duplicate detection) is the task of finding records that refer to the same real-world entity across different data sources (e.g., data files, books, websites, and databases). (Source: Wikipedia)

Blocking is a crucial step in any entity resolution pipeline because a pair-wise comparison of all records across two data sources is infeasible. Blocking applies a computationally cheap method to generate a smaller set of candidate record pairs reducing the workload of the matcher. During matching a more expensive pair-wise matcher generates a final set of matching record pairs.

Survey on blocking:

Libraries

Use these libraries to find Blocking models and implementations

Most implemented papers

No-Reference Image Quality Assessment in the Spatial Domain

utlive/BRISQUE IEEE Transacations on Image Processing 2012

We propose a natural scene statistic-based distortion-generic blind/no-reference (NR) image quality assessment (IQA) model that operates in the spatial domain.

Ethnicity sensitive author disambiguation using semi-supervised learning

glouppe/paper-author-disambiguation 31 Aug 2015

Author name disambiguation in bibliographic databases is the problem of grouping together scientific publications written by the same person, accounting for potential homonyms and/or synonyms.

DS-MLR: Exploiting Double Separability for Scaling up Distributed Multinomial Logistic Regression

params/dsmlr 16 Apr 2016

Scaling multinomial logistic regression to datasets with very large number of data points and classes is challenging.

A Systematic Approach to Blocking Convolutional Neural Networks

stanford-mast/nn_dataflow 14 Jun 2016

Convolutional Neural Networks (CNNs) are the state of the art solution for many computer vision problems, and many researchers have explored optimized implementations.

ExtremeWeather: A large-scale climate dataset for semi-supervised detection, localization, and understanding of extreme weather events

eracah/hur-detect NeurIPS 2017

We present a dataset, ExtremeWeather, to encourage machine learning research in this area and to help facilitate further work in understanding and mitigating the effects of climate change.

Learning a Virtual Codec Based on Deep Convolutional Neural Network to Compress Image

mdcnn/mdcnn.github.io 16 Dec 2017

Due to the challenge of directly learning a non-linear function for a standard codec based on convolutional neural network, we propose to learn a virtual codec neural network to approximate the projection from the valid description image to the post-processed compressed image, so that the gradient could be efficiently back-propagated from the post-processing neural network to the feature description neural network during training.

Learning to Customize Network Security Rules

mibarg/IP-Grouping 28 Dec 2017

The results prove the hypothesis that firewall rules can be automatically generated based on router data, and that an automated method can be effective in blocking a high percentage of malicious traffic.

An efficient deep convolutional laplacian pyramid architecture for CS reconstruction at low sampling ratios

WenxueCui/LapCSNet 13 Apr 2018

To address this problem, we propose a deep convolutional Laplacian Pyramid Compressed Sensing Network (LapCSNet) for CS, which consists of a sampling sub-network and a reconstruction sub-network.

AdGraph: A Graph-Based Approach to Ad and Tracker Blocking

brandoningli/cs-455-project 22 May 2018

AdGraph differs from existing approaches by building a graph representation of the HTML structure, network requests, and JavaScript behavior of a webpage, and using this unique representation to train a classifier for identifying advertising and tracking resources.

chemmodlab: A Cheminformatics Modeling Laboratory for Fitting and Assessing Machine Learning Models

jrash/chemmodlab 30 Jun 2018

The goal of chemmodlab is to streamline the fitting and assessment pipeline for many machine learning models in R, making it easy for researchers to compare the utility of new models.