# Table Recognition

Some detailed list

# Background

Table Extraction = Table Detection + Table Structure Recognition + Functional analysis

table detection: table bbox
table structure recognition: row, columns and cells
functional analysis: keys and values of the table

# TSR Dataset

All of input images are table-only

Dataset	Input Modality	Annotation Sample	# Tables	Cell Topology	Cell Content	Cell Location	Row & Column Location	Canonical Structure(spanning cells)
ICDAR 2013	Image	`<table id='0'><region id='0' page='3' col-increment='0' row-increment='0'><cell id='0' start-row='0' start-col='0'><bounding-box x1='70' y1='79' x2='131' y2='91'/><content>COUNTRY</content><instruction instr-id='65' subinstr-id='0'/></cell>`	~0.1k	Y	Y	Y
ICDAR 2019	Image	`<cell id="TableCell_1" start-row="21" end-row="21" start-col="8" end-col="8"><Coords points="3254,3484 3252,3681 3435,3685 3438,3488"/</cell>`	~1k	Y		Y
TableBank	Image from word and latex	`<tabular> <tr> <cell_y> <cell_y> </tr> <tr> <cell_y> <cell_n> </tr> </tabular>`	145K	Y
SciTSR	PDF(image/text)	`{"id":21,"tex":"959","content":["959"],"start_row":5,"end_row":5,"start_col":1,"end_col":1}`	15K	Y	Y
PubTabNet(ICDAR 2021)	Image from PMCOA	`{"imgid":1,"html":{"cells":[{"tokens":["<b>","T","y","</b>"],"bbox":[1,4,46,13]},{"tokens":["4","0","1"],"bbox":[221,45,235,55]}],"structure":{"tokens":["<thead>","<tr>","<td>","</td>","</tr>","</thead>","<tbody>","<tr>","<td>","</td>","</tr>","</tbody>"]}}}`	510K	Y	Y	Y
FinTabNet	PDF(image/text) from annual reports of the S&P 500 companies	`{"bbox":[50,516,302,569],"filename":"64.pdf","html":{"cells":[{"bbox":[162.78,516.47,188.23,526.48],"tokens":["$","2",",","9","9","2"]}],"structure":{"tokens":["<table><tbody><tr><td>","</td></tr></tbody></table>"]}}}`	113K	Y	Y	Y
PubTables-1M	PDF(Image/Text) from PMCOA	Pascal VOC XML	948K	Y	Y	Y	Y	Y

# TSR Models

# Baseline model in TableBank: OPENNMT's image to text model

encoder: cnn then row by row with blstm
decoder: lstm with standard attention

Code

# Baseline model in PubTabNet: EDD(encoder dual decoder)

Similar like standard attention: SOFTMAX(FC(RELU(FC(encoder) + FC(decoder))))

Code

# Baseline model in SciTSR: GraphTSR

Code

# CascadeTabNet

Train table detection with big dataset first
Train table recognition later with three classes: borderless, cell and bordered

Code and MMDETECT

# Baseline model in FinTabNet: GTE(global table extractor)

Cell object detection and clustering later using K Means

# Baseline model in PubTables-1M: DETR(detection transformer)

Better in large objects while worse on small objects
Decode bbox in parallel with N different queries

# ICDAR 2021 Runner-up: TableMaster

# ICDAR 2021 Top: LGPMA

# TSR Metrics

Name	Task/Cell Property	Data Structure	Cell Partial Correctness	Form
DARcon	Content	Set of adjacency relations	Exact match	F-score
DARLoc	Location	Set of adjacency relations	Avg. at multiple IoU thresholds	F-score
BLEU-4	Topology & function	Sequence of HTML tokens	Exact match	BLEU-4
TEDS	Content & function	Tree of HTML tags	Normalized Ievenshtein similarity	TEDS
GriTSTop	Topology	Matrix of cells	IoU	F-score
GriTScon	Content	Matrix of cells	Normalized LCS	F-score
GriTSLoc	Location	Matrix of cells	IoU	F-score

# Adjacency Relation

For comparing two cell structures, we use the method: for each table region, we align each groundtruth cell to the predicted cell with IoU > σ, identify the valid predicted cells, and then generate a list of adjacency relations between each valid cell and its nearest neighbor in horizontal and vertical directions. Blank cells are not represented in the grid. No adjacency relations are generated between blank cells or a blank cell and a content cell. This 1-D list of adjacency relations can be compared to the groundtruth by using precision and recall measures.

# Tree Edit Distance Based Similarity(TEDS)

The cost of insertion and deletion operations is 1. When the edit is substituting a node no with ns, the cost is 1 if either no or ns is not td. When both no and ns are td, the substitution cost is 1 if the column span or the row span of no and ns is different. Otherwise, the substitution cost is the normalized Levenshtein similarity [38] (∈ [0, 1]) between the content of no and ns.

where EditDist denotes tree-edit distance, and |T | is the number of nodes in T .

Compare with Adjacency Relation, TEDS performs better:

detect errors caused by empty cells and misalignment of cells beyond immediate neighbors;
have a mechanism to measure fine-grained cell content recognition performance

# Grid Table Similarity(GriTS)

where A and B now represent tables matrices of grid cells and Ã and Ḃ represent sub-structured table matrices. f is a similarity function between the grid cells’ properties:

cell content: normalized longest common sub sequence
cell topology: iou
cell location: iou

Compare with TEDS, GriTS performs better because rows and columns are given equal importance and every cell is credited equally regardless of its absolute position:

← Machine Translation Optimizer →