macos

Guest Post by @bizzybarney! A Peek Inside the PPSQLDatabase.db Personalization Portrait Database

The DFIR Twitter-sphere exploded this morning when @mattiaep mentioned /private/var/mobile/Library/PersonalizationPortrait/PPSQLDatabase.db. I’ve been doing some research work on this file and plan to present pieces of it during my talk at the upcoming SANS DFIR Summit. I reached out to @iamevltwin and asked if she would host a quick blog post and she graciously agreed - but I now owe her gin, and steak and cheese egg rolls. In all seriousness, a huge thank you to my good friend and mentor!

Check out the upcoming Summit agenda here.

The PersonalizationPortrait directory is native on iOS 13 and contains a few interesting files. One specifically, PPSQLDatabase.db, is loaded with data. From my research, the directory existed on iOS 12 but this database did not.  Some of the data is repetitive from other native locations, but there’s a lot of context that can be correlated from this database and pieces of this data might exist longer here than they do in other places. So what is the purpose of this database? Don’t forget as you poke at it, that it has ‘Personalization’ in its title. It resides in the native /Library directory, so it’s Apple. What are you up to Apple? 

My research on this is still ongoing, but one thing I know is Apple wants me to like my device and to feel comfortable while I’m constantly immersed in their operating system. When I open News I want to see certain things of interest. When I select a photo to send to a friend Apple gives me a short list of probable recipients. When I plug my car into CarPlay I appreciate when Apple weirdly predicts where I’m headed. Is this ‘Personalization’ just for Apple or are pieces of this filtering out to other parties for better advertisement tailoring? The Widgets screen to the left of my home screen on my iPhone is a dynamic representation of so many pieces of personal data. What’s next on my Calendar and where it’s occurring, which apps I recently used, News, Weather based on my location, where I last parked my car when using CarPlay, and ScreenTime totals I can’t hardly believe. 

I want to highlight a few pieces of the PPSQLDatabase.db file to show it’s potential, and pass along an appropriate amount of caution as well. This database is aggregating data from many sources and attribution must be done carefully.

The first table to peek into is loc_records. It is absolutely loaded with location data - mine had 1000 entries. The 1000 is curious, and I wonder if anyone else’s table is limited at 1000 or if mine just landed on that nice even number. @iamevltwin did a quick check and found 1000 as well on Mac and iOS, so it seems that is the limit for the table. In reviewing just the locations from this table without joining any other table data, I quickly recognize many of them as places I have definitely been with this device in my pocket. But…then there are those locations I definitely haven’t been, ever. So while having 1000 locations records to play with sounds like good fun, don’t forget to be diligent with attribution.

Most of these location records have a column named “clp_location” which is stored as BLOB data. At closer inspection the BLOB data is a binary plist (.bplist). 

Using plutil -p produces a more friendly output format to inspect, reminiscent of Cloud-V2.sqlite and other “Significant Location” files. But that doesn’t mean my device was at that place at that time, because I wasn’t in Las Vegas on June 2nd, 2020 which is what that timestamp converts to. I checked my photos based on the ‘Places’ feature, and I did take a photo at that location - but it was June 4th, 2019 at 5:35PM. Weird.

So why is a photo from a year ago popping up? Making a few joins and gathering a bit more data produces more context to this specific item. After hammering out a query to join it over to the ‘sources’ table, I am able to see that this entry is associated with ‘com.apple.mobileslideshow’ and more specifically ‘com.apple.proactive.PersonalizationPortrait.PhotosGraphDonation’. One word really jumps out there - ‘proactive.’ That makes me think Apple is doing something for me here, and without asking. I went to my Photos section and found a categorization of Las Vegas photos for June 2-7 in my ‘Months’ category, and the photo is in that album. As I scanned the other places in these ‘proactive’ listings I am also able to see photo groupings from a trip to Jamaica (Kingston) and to Huntington Beach, CA in their own categories. 

So you may be asking yourself at this point, “Why did I just read this? It’s a location in a photo.” Stick with me, and remember we are trying to figure out what this newly discovered file is doing and how far reaching it might be. 

A few lines down from the photos we have an address in New Jersey being attributed to com.apple.mobilemail. I tweaked the query to ‘localtime’ for the sake of presentation, but I received an email on May 31, 2020 at 11:17AM EDT regarding the June 2nd release of Cellebrite Physical Analyzer version 7.34. In the signature of that email was the US headquarters address for Cellebrite - 7 Campus Drive, Suite 210, Parsippany, NJ. 

This is just scratching the surface of a few items from one table with basic table joins made. Scanning other bundles represented here I can see some of my favorite places from Apple Maps, locations recently mentioned in iMessage conversations, and Calendar records of my standing 12:30PM Zoom meeting for Life has No Ctrl+Alt+Del. 

If you don’t have time to research but would like to hear more about it, tune in to my talk at the SANS DFIR Summit on July 16th! If you do, try out this query I used for this blog post to pry around at this one table from this database and let me know how it works on your data via Twitter @bizzybarney

select
 loc_records.id,
 	sources.bundle_id as "Bundle ID",
 	sources.group_id as "Group ID",
 	datetime(sources.seconds_from_1970, 'unixepoch') as "Source Time",
 	loc_records.cll_latitude_degrees || ", "|| loc_records.cll_longitude_degrees as "Coordinates",
 	loc_records.clp_name as "Name",
 	loc_records.clp_thoroughfare as "Road",
 	loc_records.clp_subThoroughfare as "Address #",
 	loc_records.clp_locality as "City",
 	loc_records.clp_subLocality as "Sub-locality",
 	loc_records.clp_administrativeArea as "Admin Area",
 	loc_records.clp_subAdministrativeArea as "Sub Admin Area",
 	loc_records.clp_postalCode as "Postal Code",
 	loc_records.clp_ISOcountryCode as "Country Code",
 	loc_records.clp_country as "Counrty",
 	hex (loc_records.clp_location) as "Location BLOB (hex)",
 	loc_records.extraction_os_build as "iOS Build Version",
 	loc_records.category as "Category",
 	loc_records.algorithm as "Algorithm",
 	loc_records.initial_score as "Initial Score",
 	CASE
 	WHEN loc_records.is_sync_eligible = 1 then "Yes"
	WHEN loc_records.is_sync_eligible = 0 then "No"
	end as "Sync Eligible"
from loc_records  
left join sources on loc_records.source_id=sources.id

Analysis of Apple Unified Logs: Quarantine Edition [Entry 9] – We all know you're binging Netflix! Now Playing on your Apple Devices!

We’ve been trapped inside our homes for months. We’ve reached the end of Netflix, listened to everything on Apple Music, watched old vacation videos trying to remember what travel was like, and mindlessly browsed YouTube videos. All these actions have been recorded in your unified logs!

I came across this one while researching another topic and thought it might be useful and a quick write-up. Turns out it wasn’t as quick as I had hoped – so many different scenarios to test! I’d still like to dig into this one a bit more as it can be quirky as noted in the last section. While sometimes odd, I think this information can provide insight into what media a user is viewing on their systems - not just macOS, but iOS as well. 

The query below will show information related to media that is “Now Playing”, not just on the computer that the logs came from, but also potentially other AirPlay connected devices. There will be many entries but we will only focus some in this article.

log show --predicate 'category = "MediaRemote"'

Movie time in the Photos.app

Gather the family and start reminiscing over vacation videos. You might bring up Photos.app and search for various videos that you were going to show all your friends and family (and never actually did, but you just HAD to get that video.) 

In the screenshot below, the ‘MediaRemote’ is what we are filtering on in the query above (highlighted in yellow). 

Various pieces of data get recorded in the unified logs. In many entries you should see the bundle id or application name being used – in this example Photos or com.apple.photos. A good keyword to look for is “client-“ before the bundle id for the app.

You can also see two sections that contain media metadata that includes:

  • Media Duration (in seconds)

  • Elapsed Time – We can use this to determine how long it was viewed/listened to.

  • Item Identifier (GUID) – This GUID changes over the period of viewing so not a great filter for a specific file. It will change if media is stopped, paused, etc.

  • File Path & Name

This video is 10 seconds long, using the elapsedTime entry we can see that it was played all the way though.

The file path is in the user’s Photos Library package directory which is not surprising. This filename however, is not the original filename of this video. We can use ‘xattr -p’ to print the extended attribute that contains this data (com.apple.assetsd.originalFilename). Using the original file name IMG_3586.MOV we might be able to do a search across the system to find other artifacts of its use and more metadata in the Photos.sqlite database including a running count of how many times it was played.

MOV Files in QuickTime

Maybe you don’t use Photos.app, instead you downloaded the movie to your macOS system and viewed it using QuickTime player (com.apple.QuickTimePlayerX). Many of the same metadata pieces are still there but the filename might look strange to you. This is a good example of a “File Reference URL”. This is the same video I used as the example above. I exported it out of Photos using the original filename, IMG_3586.MOV and saved in the Downloads directory.

In the screenshot above the elapsed time is 00:00:00, and duration is 00:00:10 (10 seconds). Below, the elapsed time is 00:00:05 (5 seconds, actually closer to 6 if we round up.). I started the video, watched ~5 seconds and played out the rest of the video (these additional entries are not shown.) The playback rate entry appears to be a 1 while playing and a 0 while stopped/paused. A GIF of this action was recorded below. Featuring a playful elephant at the Taronga Zoo in Sydney.

Getting back to that “File Reference URL”, you probably want to find the original file as part of an investigation. We can use the inode section of the URL to find it. In this example I renamed IMG_3586.MOV to elephants.mov. Most forensic tools should be able to query for a file system inode number. Inode numbers do not change when a file gets renamed.

Finding a sense of calm with Music.app

I queued up a new album by one of my favorite classical musicians, Ludovico Einaudi. An Italian, who created this album while in quarantine at his home in Italy. The album includes 12 songs, and one video (not shown in the screenshot.)

We can see the queue count for Music (com.apple.Music). While I listen to this album, it will tick down, one-by-one.

Finding the End of the Internet

To get a few laughs, I used Safari browser to watch clips of Archer on YouTube. Safari is not listed as the application, but com.apple.WebKit.WebContent is – this is a signature of Safari WebKit being used. Surrounding entries (not shown) will show hints that Safari.app is being used as this time.

In Chrome, I went to watch videos on the BBC and found this excellent video of a dog who helps rescue Koalas in Australia (running theme? 🤔). We can see the Google Chrome (com.google.Chrome) app is being used, but it may not necessarily state the website. You may have to determine this using other user activity artifacts (browsing history!).

What about AirPlay?

The media examples above were all viewed on a local macOS system, what would it look like if I used my extensive Apple ecosystem of devices? Might as well use these devices for “research”, right?

While doing this research, I was watching one of my favorite older television shows, Alias using Amazon Prime Video (com.amazon.aiv.AIVApp) on one of my Apple TVs. Every episode that it would run though would show entries on my Mac. My Mac had nothing to do with my binge watching, but the entries were recorded anyway…cool! I should note that if the system is not being used (laptop shut), these entries do not appear in my logs. The Apple TV is named ‘Living Room’.

Of course, I also have a HomePod and stream music and news to it all day long. In this example I’m streaming from my macOS laptop to my HomePod named StationX. I have found that the elapsed time doesn’t seem to update as the examples above did.

Quirky? Yes.

While doing this research and testing various applications, I came across entries while I was most definitely sleeping. Seems my laptop wanted me to listen to the radio very, very early in the morning for about 15 seconds. 🤷🏻‍♀️

Do entries like this make all the examples in the article inaccurate? Absolutely not. Computers act in weird ways. This is a great example of why correlating this information to user activity through other means (ie: sleep status, device unlocks, user activity in knowledge, etc.) is required in investigations.

I hope this article gets you thinking about all the other apps that you watch media from. Again, this is not just on macOS but also on iOS (and likely tvOS as well). All those applications, native and 3rd party, are just waiting to catch you binge watching your favorite show, finding the end of YouTube, and rocking out to your favorite 90’s boyband jams!

Analysis of Apple Unified Logs: Quarantine Edition [Entry 8] – Man! What a process!?

A quick trick to get more info when you are testing different Unified log examples is to use Terminal’s man page lookup feature. This is useful to provide more context to processes that you may not be familiar with. Perhaps you have something interesting in the eventMessage field and want to know more about what processes use that information.

In this example, I “right-clicked” some of the process names to get more information about them; sharingd, bluetoothd, and powerd. However, identifityservicesd does not have a man page so I’ll have to research that one another way!


Analysis of Apple Unified Logs: Quarantine Edition [Entry 7] – Exploring USBMSC devices with --style

There are many output styles options for the ‘log’ command. Sometimes the default output may not get you what you want. This article will walk through the various log output styles looking for USB Mass Storage Class devices using the keyword ‘USBMSC ‘. These devices may include thumb drives and external hard drives as long as they are considered Mass Storage Class devices.

These entries get created when they are inserted into macOS systems. The output of these entries includes:

  •  The “non-unique” identifiers – usually the serial number of the device, but as it suggests it may not necessarily be unique.

  • Vendor ID

  • Product ID

  • Version

Using --style we can change the output to something that is perhaps more appropriate. This can also be a personal preference. The output styles are listed in the ‘man’ page.

In the query below, I am searching my logs for these types of devices with the ‘default’ output option. I’m looking for the string ‘USBMSC’ in the event messages.

log show --predicate "eventMessage contains 'USBMSC'"

The default output contains quite a few columns:

  • Timestamp (with microseconds and time zone)

  • Thread ID

  • Log Type

  • Activity ID

  • Process ID

  • TTL

  • Process

  • Subsystem

  • Category

  • Message

Other than the ‘default’ style, we will need to specific the style we want with --style. First up in ‘compact’.

log show --predicate "eventMessage contains 'USBMSC'" --style compact

The compact option removes the Activity ID and TTL from the ‘default’ output while compressing other fields.

  • Timestamp (with milliseconds and no time zone)

  • Log Type (abbreviated)

  • Process

  • Process ID

  • Thread ID

  • Subsystem

  • Category

  • Message

The next couple of log output styles is ‘json’ and ‘ndjson’. The first being json with whitespace, while the second is a single line for each entry. I’ve highlighted each entry to better show the structure in these two examples. I like how the json output shows the full paths for senderImagePath and processImagePath. It also shows a quite a few additional fields. Not exactly human-readable for more than a few entries, but this could be imported into another viewer or pumped through scripting utilities.

log show --predicate "eventMessage contains 'USBMSC'" --style json
log show --predicate "eventMessage contains 'USBMSC'" --style ndjson

One thing I miss with this JSON output is the colorization. While there is a --color argument in ‘log’, it doesn’t seem to apply to JSON style output. However, I can use the tried and trusted ‘jq’ command line JSON parser here.

The last output example is ‘syslog’. If you’ve been looking at Apple System Logs (ASL) or other syslog style logs, you might prefer this output format.

log show --predicate "eventMessage contains 'USBMSC'" --style syslog

This style is useful if you need an even more compact view than using the ‘compact’ style. Again, the colorization is removed from these entries (--color doesn’t appear to work here either).

Analysis of Apple Unified Logs: Quarantine Edition [Entry 6] – Working From Home? Remote Logins

I’m sure many of us are working remote right now possibly using some of these remote capabilities. Remote Logins can include a few different services; SSH and Screen Sharing are two that I’ll show here. These services are disabled by default and would need to be turned on in the user’s Sharing preferences.

When Remote Login is turned on in the Sharing preferences, the system will have an SSH server enabled. Let’s take a look at what an incoming SSH connection might look like first for a user account on the system that does not have this option turned on (janedoe). We are looking for the entries for the process ‘sshd’.

log show --predicate 'process = "sshd"'

One entry to key in on is the “user account has expired”. A user attempted to use SSH to login to this system using the ‘janedoe’ account coming from IP 192.168.1.170, however the connection failed.

Now on a system that does have remote login turned on. This first example shows an incorrect password attempt.

And a correct password attempt and login.

Connections can of course be incoming or outgoing. If the user were trying to access another system it might look like this. Not a whole lot unfortunately.

log show --info --predicate 'process = "ssh" or eventMessage contains "ssh"'

…and when the connection closes.

Screen Sharing is another service that needs to be explicitly enabled in the Sharing preferences. Incoming connections will show the user who logged in and where they came from. The example below shows an incorrect password that failed, and another that was correct. I’ve only queried for messages that contain the text ‘Authentication:’. Looking for all messages associated with the ‘screensharingd’ process will be quite verbose with some metadata about the session. 

log show --predicate 'process = "screensharingd" and eventMessage contains "Authentication:"'

Outgoing connections, like incoming connections, can be verbose. The process is ‘Screen Sharing’ like the application name.

log show --info --predicate 'process = "Screen Sharing"'

I might do a specific filter for ‘connect’ and ‘disconnect’ in the messages to see multiple sessions over time.