# Table Recognition
Some detailed list
# Background
Table Extraction = Table Detection + Table Structure Recognition + Functional analysis
- table detection: table bbox
- table structure recognition: row, columns and cells
- functional analysis: keys and values of the table
# TSR Dataset
All of input images are table-only
Dataset | Input Modality | Annotation Sample | # Tables | Cell Topology | Cell Content | Cell Location | Row & Column Location | Canonical Structure(spanning cells) |
---|---|---|---|---|---|---|---|---|
ICDAR 2013 | Image | <table id='0'><region id='0' page='3' col-increment='0' row-increment='0'><cell id='0' start-row='0' start-col='0'><bounding-box x1='70' y1='79' x2='131' y2='91'/><content>COUNTRY</content><instruction instr-id='65' subinstr-id='0'/></cell> | ~0.1k | Y | Y | Y | ||
ICDAR 2019 | Image | <cell id="TableCell_1" start-row="21" end-row="21" start-col="8" end-col="8"><Coords points="3254,3484 3252,3681 3435,3685 3438,3488"/</cell> | ~1k | Y | Y | |||
TableBank | Image from word and latex | <tabular> <tr> <cell_y> <cell_y> </tr> <tr> <cell_y> <cell_n> </tr> </tabular> | 145K | Y | ||||
SciTSR | PDF(image/text) | {"id":21,"tex":"959","content":["959"],"start_row":5,"end_row":5,"start_col":1,"end_col":1} | 15K | Y | Y | |||
PubTabNet(ICDAR 2021) | Image from PMCOA | {"imgid":1,"html":{"cells":[{"tokens":["<b>","T","y","</b>"],"bbox":[1,4,46,13]},{"tokens":["4","0","1"],"bbox":[221,45,235,55]}],"structure":{"tokens":["<thead>","<tr>","<td>","</td>","</tr>","</thead>","<tbody>","<tr>","<td>","</td>","</tr>","</tbody>"]}}} | 510K | Y | Y | Y | ||
FinTabNet | PDF(image/text) from annual reports of the S&P 500 companies | {"bbox":[50,516,302,569],"filename":"64.pdf","html":{"cells":[{"bbox":[162.78,516.47,188.23,526.48],"tokens":["$","2",",","9","9","2"]}],"structure":{"tokens":["<table><tbody><tr><td>","</td></tr></tbody></table>"]}}} | 113K | Y | Y | Y | ||
PubTables-1M | PDF(Image/Text) from PMCOA | Pascal VOC XML | 948K | Y | Y | Y | Y | Y |
# TSR Models
# Baseline model in TableBank: OPENNMT's image to text model
- encoder: cnn then row by row with blstm
- decoder: lstm with standard attention
# Baseline model in PubTabNet: EDD(encoder dual decoder)
Similar like standard attention: SOFTMAX(FC(RELU(FC(encoder) + FC(decoder))))
# Baseline model in SciTSR: GraphTSR
# CascadeTabNet
- Train table detection with big dataset first
- Train table recognition later with three classes: borderless, cell and bordered
# Baseline model in FinTabNet: GTE(global table extractor)
Cell object detection and clustering later using K Means
# Baseline model in PubTables-1M: DETR(detection transformer)
- Better in large objects while worse on small objects
- Decode bbox in parallel with N different queries
# ICDAR 2021 Runner-up: TableMaster
# ICDAR 2021 Top: LGPMA
# TSR Metrics
Name | Task/Cell Property | Data Structure | Cell Partial Correctness | Form |
---|---|---|---|---|
DARcon | Content | Set of adjacency relations | Exact match | F-score |
DARLoc | Location | Set of adjacency relations | Avg. at multiple IoU thresholds | F-score |
BLEU-4 | Topology & function | Sequence of HTML tokens | Exact match | BLEU-4 |
TEDS | Content & function | Tree of HTML tags | Normalized Ievenshtein similarity | TEDS |
GriTSTop | Topology | Matrix of cells | IoU | F-score |
GriTScon | Content | Matrix of cells | Normalized LCS | F-score |
GriTSLoc | Location | Matrix of cells | IoU | F-score |
# Adjacency Relation
For comparing two cell structures, we use the method: for each table region, we align each groundtruth cell to the predicted cell with IoU > σ, identify the valid predicted cells, and then generate a list of adjacency relations between each valid cell and its nearest neighbor in horizontal and vertical directions. Blank cells are not represented in the grid. No adjacency relations are generated between blank cells or a blank cell and a content cell. This 1-D list of adjacency relations can be compared to the groundtruth by using precision and recall measures.
# Tree Edit Distance Based Similarity(TEDS)
The cost of insertion and deletion operations is 1. When the edit is substituting a node no with ns, the cost is 1 if either no or ns is not td. When both no and ns are td, the substitution cost is 1 if the column span or the row span of no and ns is different. Otherwise, the substitution cost is the normalized Levenshtein similarity [38] (∈ [0, 1]) between the content of no and ns.
where EditDist denotes tree-edit distance, and |T | is the number of nodes in T .
Compare with Adjacency Relation, TEDS performs better:
- detect errors caused by empty cells and misalignment of cells beyond immediate neighbors;
- have a mechanism to measure fine-grained cell content recognition performance
# Grid Table Similarity(GriTS)
where A and B now represent tables matrices of grid cells and à and Ḃ represent sub-structured table matrices. f is a similarity function between the grid cells’ properties:
- cell content: normalized longest common sub sequence
- cell topology: iou
- cell location: iou
Compare with TEDS, GriTS performs better because rows and columns are given equal importance and every cell is credited equally regardless of its absolute position: