2025
In today’s era of massive data generation and AI-driven analytics, ensuring not only quantity but also quality and trustworthiness is essential for reliable outcomes. Conventional units—bits and bytes—capture only physical size, ignoring the true informational value of data.
This paper proposes PBit (Purity Bit), a novel unit that combines physical bits with an intrinsic quality factor (Q) and an optional jump factor (J) to represent the genuine value of information. The Q-factor reflects core quality aspects such as missing values, duplication, outliers, and validity, while the J-factor accounts for added value through cleansing or enrichment.
A web-based prototype demonstrates practical use: users can upload CSV datasets, check physical size and PBit scores, and see deviations from a standard reference. PBit shifts data transactions from GB-based pricing to fairer models like cost per 100 PBits, ensuring that high-quality data is rewarded and the spread of low-quality data is reduced.
In AI training, where performance depends heavily on data quality, PBit serves as an integrity indicator: “This model was trained on one million PBits” conveys far more than gigabytes alone. For everyday users, PBit enables intuitive understanding of data value beyond size and supports critical judgment—“This news scores low in PBits and may not be trustworthy.”
In this study, state-of-the-art AI tools were also utilized for data analysis and validation, enhancing overall accuracy and efficiency. Ultimately, PBit aims to capture the intrinsic value of data, planting a seed for change to strengthen the foundations of a trust-driven data society.
Keywords: PBit; Standardized Metric; Data Quality; Trustworthy Data; ISO/IEC 25012; Data Governance; Data Marketplace; Prototype; Reproducibility; AI Training Datasets