# Table Recognition

Some detailed list

# Background

Table Extraction = Table Detection + Table Structure Recognition + Functional analysis

  • table detection: table bbox
  • table structure recognition: row, columns and cells
  • functional analysis: keys and values of the table

# TSR Dataset

All of input images are table-only

Dataset Input Modality Annotation Sample # Tables Cell Topology Cell Content Cell Location Row & Column Location Canonical Structure(spanning cells)
ICDAR 2013 Image <table id='0'><region id='0' page='3' col-increment='0' row-increment='0'><cell id='0' start-row='0' start-col='0'><bounding-box x1='70' y1='79' x2='131' y2='91'/><content>COUNTRY</content><instruction instr-id='65' subinstr-id='0'/></cell> ~0.1k Y Y Y
ICDAR 2019 Image <cell id="TableCell_1" start-row="21" end-row="21" start-col="8" end-col="8"><Coords points="3254,3484 3252,3681 3435,3685 3438,3488"/</cell> ~1k Y Y
TableBank Image from word and latex <tabular> <tr> <cell_y> <cell_y> </tr> <tr> <cell_y> <cell_n> </tr> </tabular> 145K Y
SciTSR PDF(image/text) {"id":21,"tex":"959","content":["959"],"start_row":5,"end_row":5,"start_col":1,"end_col":1} 15K Y Y
PubTabNet(ICDAR 2021) Image from PMCOA {"imgid":1,"html":{"cells":[{"tokens":["<b>","T","y","</b>"],"bbox":[1,4,46,13]},{"tokens":["4","0","1"],"bbox":[221,45,235,55]}],"structure":{"tokens":["<thead>","<tr>","<td>","</td>","</tr>","</thead>","<tbody>","<tr>","<td>","</td>","</tr>","</tbody>"]}}} 510K Y Y Y
FinTabNet PDF(image/text) from annual reports of the S&P 500 companies {"bbox":[50,516,302,569],"filename":"64.pdf","html":{"cells":[{"bbox":[162.78,516.47,188.23,526.48],"tokens":["$","2",",","9","9","2"]}],"structure":{"tokens":["<table><tbody><tr><td>","</td></tr></tbody></table>"]}}} 113K Y Y Y
PubTables-1M PDF(Image/Text) from PMCOA Pascal VOC XML 948K Y Y Y Y Y

# TSR Models

# Baseline model in TableBank: OPENNMT's image to text model

图片

  • encoder: cnn then row by row with blstm
  • decoder: lstm with standard attention

图片

图片

Code

# Baseline model in PubTabNet: EDD(encoder dual decoder)

图片

Similar like standard attention: SOFTMAX(FC(RELU(FC(encoder) + FC(decoder))))

Code

# Baseline model in SciTSR: GraphTSR

图片 图片

Code

# CascadeTabNet

图片

图片

  • Train table detection with big dataset first
  • Train table recognition later with three classes: borderless, cell and bordered

Code and MMDETECT

# Baseline model in FinTabNet: GTE(global table extractor)

Cell object detection and clustering later using K Means

图片

# Baseline model in PubTables-1M: DETR(detection transformer)

图片 图片

  • Better in large objects while worse on small objects
  • Decode bbox in parallel with N different queries

# ICDAR 2021 Runner-up: TableMaster

图片 图片

# ICDAR 2021 Top: LGPMA

图片

# TSR Metrics

Name Task/Cell Property Data Structure Cell Partial Correctness Form
DARcon Content Set of adjacency relations Exact match F-score
DARLoc Location Set of adjacency relations Avg. at multiple IoU thresholds F-score
BLEU-4 Topology & function Sequence of HTML tokens Exact match BLEU-4
TEDS Content & function Tree of HTML tags Normalized Ievenshtein similarity TEDS
GriTSTop Topology Matrix of cells IoU F-score
GriTScon Content Matrix of cells Normalized LCS F-score
GriTSLoc Location Matrix of cells IoU F-score

# Adjacency Relation

For comparing two cell structures, we use the method: for each table region, we align each groundtruth cell to the predicted cell with IoU > σ, identify the valid predicted cells, and then generate a list of adjacency relations between each valid cell and its nearest neighbor in horizontal and vertical directions. Blank cells are not represented in the grid. No adjacency relations are generated between blank cells or a blank cell and a content cell. This 1-D list of adjacency relations can be compared to the groundtruth by using precision and recall measures.

图片

# Tree Edit Distance Based Similarity(TEDS)

The cost of insertion and deletion operations is 1. When the edit is substituting a node no with ns, the cost is 1 if either no or ns is not td. When both no and ns are td, the substitution cost is 1 if the column span or the row span of no and ns is different. Otherwise, the substitution cost is the normalized Levenshtein similarity [38] (∈ [0, 1]) between the content of no and ns.

图片

where EditDist denotes tree-edit distance, and |T | is the number of nodes in T .

Compare with Adjacency Relation, TEDS performs better:

  • detect errors caused by empty cells and misalignment of cells beyond immediate neighbors;
  • have a mechanism to measure fine-grained cell content recognition performance

图片

# Grid Table Similarity(GriTS)

图片

where A and B now represent tables matrices of grid cells and à and represent sub-structured table matrices. f is a similarity function between the grid cells’ properties:

  • cell content: normalized longest common sub sequence
  • cell topology: iou
  • cell location: iou

Compare with TEDS, GriTS performs better because rows and columns are given equal importance and every cell is credited equally regardless of its absolute position: 图片

Last Updated: 8/4/2022, 9:56:44 PM