⚠️ Important Notice About Translation
This content is translated using Google Translate and can be accessed in approximately 23 languages worldwide, in order to make the content more accessible to non-English speakers. However, due to the limitations of machine translation, many sentences may contain unnatural phrasing, incorrect terminology, or misinterpretation of the original meaning—especially in technical or philosophical contexts. Please be aware that this translation is not professionally edited and may not fully reflect the precise intent or nuance of the original document. For the most accurate understanding, we strongly recommend referring to the original English version of this paper, which represents the author's intended wording and conceptual clarity.

PBit: A Standardized Metric for Trustworthy Data

Keunsoo Yoon
Independent Researcher
austiny@gatech.edu, austiny@snu.ac.kr

2025

Abstract

In today’s era of massive data generation and AI-driven analytics, ensuring not only quantity but also quality and trustworthiness is essential for reliable outcomes. Conventional units—bits and bytes—capture only physical size, ignoring the true informational value of data.

This paper proposes PBit (Purity Bit), a novel unit that combines physical bits with an intrinsic quality factor (Q) and an optional jump factor (J) to represent the genuine value of information. The Q-factor reflects core quality aspects such as missing values, duplication, outliers, and validity, while the J-factor accounts for added value through cleansing or enrichment.

A web-based prototype demonstrates practical use: users can upload CSV datasets, check physical size and PBit scores, and see deviations from a standard reference. PBit shifts data transactions from GB-based pricing to fairer models like cost per 100 PBits, ensuring that high-quality data is rewarded and the spread of low-quality data is reduced.

In AI training, where performance depends heavily on data quality, PBit serves as an integrity indicator: “This model was trained on one million PBits” conveys far more than gigabytes alone. For everyday users, PBit enables intuitive understanding of data value beyond size and supports critical judgment—“This news scores low in PBits and may not be trustworthy.”

In this study, state-of-the-art AI tools were also utilized for data analysis and validation, enhancing overall accuracy and efficiency. Ultimately, PBit aims to capture the intrinsic value of data, planting a seed for change to strengthen the foundations of a trust-driven data society.

Keywords: PBit; Standardized Metric; Data Quality; Trustworthy Data; ISO/IEC 25012; Data Governance; Data Marketplace; Prototype; Reproducibility; AI Training Datasets