Truth Score depends on three metrics:
1) Percentage of Audit Log Messages (40% of the score)
This is (Audit Log Messages for a Dataset / Total Audit Log Messages). Programs reads are omitted to avoid redundancy.
2) Percentage of Unique Programs Read (40% of the score)
This is (Total Unique Programs reading from dataset / Total programs present)
3) Time Since Last Read (20% of the score, by rank of the most recent read among most recent reads for all datasets)
If there are 10 datasets, they are sorted based on time since the last time the dataset was read. The dataset that was read the most recently gets 10/10 * 20 = 20 points, as its ranked first. The second most recently read dataset receives a score of 9/10 * 20, the third most gets 8/10 * 20, and so on. As time since the last read can vary from never to 0 to a very large number, a relative score seems necessary.
Sample Output 1
Dataset | % of Audit Log Messages | % of unique programs | Time Since Last Read | Score |
---|---|---|---|---|
DS1 | 70 | 60 | 80s | 72 |
DS2 | 30 | 50 | 1000s | 42 |
Calculation Example:
DS1: 70% of the Audit Log Messages are for DS1. 70*40/100 = 28 (40% of the score)
60% of the programs access DS1: 60 * 40/100 = 24 (40% of the score)
Among the two datasets, DS1 has been accessed the most recently, so 2/2 * 20 = 20 (20% of the score)
Total: 72
DS2: 30 * .4 + 50 * .4 + 1/2 * 20 = 42
Sample Output 2
Dataset | % of Audit Log Message | % of unique programs | Time Since Last Read | Score |
---|---|---|---|---|
DS1 | 25 | 30 | 10000s | 27 |
DS2 | 25 | 30 | 9000s | 32 |
DS3 | 25 | 30 | 800s | 37 |
DS4 | 25 | 30 | 70s | 42 |
Calculation Example:
DS1:
25 * 0.4 + 30 * 0.4 + 1/4 * 20 = 27
Sample Output 3
Dataset | % of Audit Log Message | % of unique programs | Time Since Last Read | Score |
---|---|---|---|---|
DS1 | 65 | 80 | 10s | 78 |
DS2 | 25 | 40 | 20s | 41 |
DS3 | 10 | 40 | 40s | 25 |
DS4 | 5 | 10 | 30s | 16 |
Problems with the design
- Scores go down (on average) as number of datasets that are tracked increases (example: sample output 2)
- Most scores are on the lower end. Even dataset that look popular on paper have a score of around 65-80.