Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 8 Current »

Truth Score depends on three metrics:
1) Percentage of Audit Log Messages (40% of the score)

This is (Audit Log Messages for a Dataset / Total Audit Log Messages). Programs reads are omitted to avoid redundancy.

 
2) Percentage of Unique Programs Read (40% of the score)

This is (Total Unique Programs reading from dataset / Total programs present)

 
3) Time Since Last Read (20% of the score, by rank of the most recent read among most recent reads for all datasets)

     If there are 10 datasets, they are sorted based on time since the last time the dataset was read. The dataset that was read the most recently gets 10/10 * 20 = 20 points, as its ranked first. The second most recently read dataset receives a score of 9/10 * 20, the third most gets 8/10 * 20, and so on. As time since the last read can vary from never to 0 to a very large number, a relative score seems necessary.


Sample Output 1

Dataset% of Audit Log Messages% of unique programsTime Since Last ReadScore
DS1706080s72
DS230501000s42

Calculation Example:

DS1: 70% of the Audit Log Messages are for DS1. 70*40/100 = 28 (40% of the score)

         60% of the programs access DS1: 60 * 40/100 = 24 (40% of the score)

         Among the two datasets, DS1 has been accessed the most recently, so 2/2 * 20 = 20 (20% of the score)

         Total: 72

DS2: 30 * .4 + 50 * .4 + 1/2 * 20 = 42

Sample Output 2

Dataset% of Audit Log Message% of unique programsTime Since Last ReadScore
DS1253010000s27
DS225309000s32
DS32530800s37
DS4253070s42

Calculation Example:

DS1:

25 * 0.4 + 30 * 0.4 + 1/4 * 20 = 27

Sample Output 3

Dataset% of Audit Log Message% of unique programsTime Since Last ReadScore
DS1658010s78
DS2254020s41
DS3104040s25
DS451030s16


Problems with the design

  • Scores go down (on average) as number of datasets that are tracked increases (example: sample output 2)
  • Most scores are on the lower end. Even dataset that look popular on paper have a score of around 65-80.


  • No labels