each log is organized in the following tuple format: <timestamp, artist-MBID, release-MBID, track-MBID>
Timestamp: UTC synced timestamps.
MBIDs: 36 char UUIDs
Glitched logs removed (same MBID and same timestamps. Or timestamps less than 30s apart in time.) => Avg 8% duplicate logs per user & 1% logs that were too close.
58% of all logs in the dataset have full data (MBIDs for all 3 entities)
27 billion logs -> 583K people -> 555k unique artists -> 900k albums -> 7M tracks
Median Number of logs per user = 35k
My own findings: avg 46755.07 scrobbles per file in chunk 00.

The median age of listening histories = 4.5 years.