The function will use the following three componentsTruth Score depends on three metrics:
1) Dataset Activity
If Total number of datasets tracked = N
Total audit log messages = M
Audit log messages for dataset 1 = m1
Then the Truth Value for component 1 only is:
log10 (m1/M - 100/N + 50) / log10 (150)
Sample Output: Percentage of Audit Log Messages (40% of the score)
This is (Audit Log Messages for a Dataset / Total Audit Log Messages). Programs reads are omitted to avoid redundancy.
2) Percentage of Unique Programs Read (40% of the score)
This is (Total Unique Programs reading from dataset / Total programs present)
3) Time Since Last Read (20% of the score, by rank of the most recent read among most recent reads for all datasets)
Example: if there are 10 datasets, they are sorted based on time since the last time the dataset was read. The dataset that was read the most recently gets 10/10 * 20 = 20 (rank / total datasets * 20) points, as it's ranked first. The second most recently read dataset receives a score of 9/10 * 20, the third most gets 8/10 * 20, and so on. As time since the last read can vary from never to 0 to a very large number, a relative score seems necessary.
Sample Output 1
Dataset | % of Audit Log MessageMessages | % of unique programs | Time Since Last Read | Score |
---|---|---|---|---|
DS1 | 70 | 8560 | 80s | 72 |
DS2 | 30 | 50 | 68 |
Dataset | % of Audit Log Message | Score |
---|---|---|
DS1 | 25 | 78 |
DS2 | 25 | 78 |
DS3 | 25 | 78 |
DS4 | 25 | 78 |
1000s | 42 |
Calculation Example:
DS1: 70% of the Audit Log Messages are for DS1. 70*40/100 = 28 (40% of the score)
60% of the programs access DS1: 60 * 40/100 = 24 (40% of the score)
Among the two datasets, DS1 has been accessed the most recently, so 2/2 * 20 = 20 (20% of the score)
Total: 72
DS2: 30 * .4 + 50 * .4 + 1/2 * 20 = 42
Sample Output 2
Dataset | % of Audit Log Message | % of unique programs | Time Since Last Read | Score | ||
---|---|---|---|---|---|---|
DS1 | 65 | 8925 | 30 | 10000s | 27 | |
DS2 | 25 | 7830 | 9000s | 32 | ||
DS3 | 25 | 1030 | 800s | 7037 | ||
DS4 | 5 | 67 | 25 | 30 | 70s | 42 |
Calculation Example:
DS1:
25 * 0.4 + 30 * 0.4 + 1/4 * 20 = 27
Sample Output 3
...
Dataset | % of Audit Log Message | % of unique programs | Time Since Last Read | Score | |
---|---|---|---|---|---|
DS1 | 40 | 8565 | 80 | 10s | 78 |
DS2 | 32 | 8325 | 40 | 20s | 41 |
DS3 | 1510 | 7740 | DS440s | 725 | |
73DS4 | DS55 | 310 | 7130s | ||
DS6 | 3 | 71 |
2) Number of unique programs reading from a dataset
16 |
Problems with the design
- Scores go down (on average) as number of datasets that are tracked increases (example: sample output 2)
- Most scores are on the lower end. Even dataset that look popular on paper have a score of around 65-80.