new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Jan 7

Extended vehicle energy dataset (eVED): an enhanced large-scale dataset for deep learning on vehicle trip energy consumption

This work presents an extended version of the Vehicle Energy Dataset (VED), which is a openly released large-scale dataset for vehicle energy consumption analysis. Compared with its original version, the extended VED (eVED) dataset is enhanced with accurate vehicle trip GPS coordinates, serving as a basis to associate the VED trip records with external information, e.g., road speed limit and intersections, from accessible map services to accumulate attributes that is essential in analyzing vehicle energy consumption. In particularly, we calibrate all the GPS trace records in the original VED data, upon which we associated the VED data with attributes extracted from the Geographic Information System (QGIS), the Overpass API, the Open Street Map API, and Google Maps API. The associated attributes include 12,609,170 records of road elevation, 12,203,044 of speed limit, 12,281,719 of speed limit with direction (in case the road is bi-directional), 584,551 of intersections, 429,638 of bus stop, 312,196 of crossings, 195,856 of traffic signals, 29,397 of stop signs, 5,848 of turning loops, 4,053 of railway crossings (level crossing), 3,554 of turning circles, and 2,938 of motorway junctions. With the accurate GPS coordinates and enriched features of the vehicle trip record, the obtained eVED dataset can provide a precise and abundant medium to feed a learning engine, especially a deep learning engine that is more demanding on data sufficiency and richness. Moreover, our software work for data calibration and enrichment can be reused to generate further vehicle trip datasets for specific user cases, contributing to deep insights into vehicle behaviors and traffic dynamics analyses. We anticipate that the eVED dataset and our data enrichment software can serve the academic and industrial automotive section as apparatus in developing future technologies.

  • 5 authors
·
Mar 16, 2022

PharmaShip: An Entity-Centric, Reading-Order-Supervised Benchmark for Chinese Pharmaceutical Shipping Documents

We present PharmaShip, a real-world Chinese dataset of scanned pharmaceutical shipping documents designed to stress-test pre-trained text-layout models under noisy OCR and heterogeneous templates. PharmaShip covers three complementary tasks-sequence entity recognition (SER), relation extraction (RE), and reading order prediction (ROP)-and adopts an entity-centric evaluation protocol to minimize confounds across architectures. We benchmark five representative baselines spanning pixel-aware and geometry-aware families (LiLT, LayoutLMv3-base, GeoLayoutLM and their available RORE-enhanced variants), and standardize preprocessing, splits, and optimization. Experiments show that pixels and explicit geometry provide complementary inductive biases, yet neither alone is sufficient: injecting reading-order-oriented regularization consistently improves SER and EL and yields the most robust configuration, while longer positional coverage stabilizes late-page predictions and reduces truncation artifacts. ROP is accurate at the word level but challenging at the segment level, reflecting boundary ambiguity and long-range crossings. PharmaShip thus establishes a controlled, reproducible benchmark for safety-critical document understanding in the pharmaceutical domain and highlights sequence-aware constraints as a transferable bias for structure modeling. We release the dataset at https://github.com/KevinYuLei/PharmaShip.

  • 3 authors
·
Nov 29, 2025