Just Call Me Buffy the Proto Slayer – An Initial Look into Protobuf Data in Mac and iOS Forensics

I was first introduced to the protobuf data format years ago accidentally when I was doing some MITM network analysis from an Android device. The data I was looking at was being transferred in this odd format, I could tell there were some known strings and some patterns to it - but I did not recognize the format. It had no magic number or file header. I started looking at it more in depth, it did have a structure. I spent an embarrassingly amount of time trying to reverse engineer this structure before I realized it was in fact a protobuf data blob. I think it was a moment of frustration in reversing this structure that lead me to searching something like “weird data format Android network traffic” that finally lead me to protobufs. 

 Ok, so what the heck is a protobuf? It actually stands for Protocol Buffer, but everyone calls them protobufs. It is a “language-neutral, platform-neutral extensible mechanism for serializing structured data” created by Google. It is a super-efficient way of storing and transferring data.

Since I was looking at an Android device, a protobuf made perfect sense. This was a Google thing afterall. I started noticing them more and more on Android devices, not just in the network traffic but also storing data on disk as well. It took me a long time to also notice that they were being stored on Apple devices! Native applications, 3rdparty applications, they are used EVERYWHERE! A great example was found by my friend Phill Moore in the iOS Spotify application to keep track of items listened to

In this article I’ll introduce you to some of the Apple-specific protobufs that I’ve come across. Some are fairly straight forward, others are less so. The kicker with protobufs is that there is an accompanying *.proto file that contains the definition to what is contained in these buffers. Unfortunately, we likely do not have this file as it is most likely server-side or inaccessible therefore we need to reverse engineer the contents and meaning of the items stored in this blob.

To parse these protobufs, I use protoc from Google to get a raw output. If you have the .proto file you can use this as well, but I have yet to give that a go. On a Mac, I would do a ‘brew install protobuf’ to get protoc installed. To parse a given buffer I will use the following command:

protoc --decode_raw < [protobuf_blob_file]

I will parse out some protobufs from different applications to give you an idea of what is stored in them - Maps, Locations, Health, and Notes.

Maps Data

The Maps application on both macOS and iOS use many protobufs to store location data. These can be found in quite a few different Maps related plist files. I will focus on GeoHistory.mapsdata plist file from iOS which stores historical locations that were mapped. This plist has GUID keys that contain a “contents” subkey. This “contents” blob contains the protobuf of the mapped location. I chose a small example to begin with as some of these can be very large

I’ve extracted this blob and put into a hex editor. You can see some visible strings in this blob but really no context. This is where parsing with protoc can be helpful.

I saved this protobuf to a file I named geohistory.pb. Using protoc I decoded it to the output below. I can see the same GUID and location name but now I get some other hex based values. This is where you will have to determine what these values mean and what they are used for, it may not be obvious initially.

At this time, I believe the highlighted pairs of hex in the screenshot above are coordinates. To read these, I copy them into a hex editor to read their values highlighted below. On the left is latitude, on the right is longitude. Plotting these two pairs above I get one location in Manassas, VA and another in Bethesda, MD. Clearly, neither of these are in Fairfax, VA. This is the tricky part, what do these values mean? More testing needs to be done here.

Location Data

The next example shows a protobuf blob being stored inside of a SQLite database. The example below is from the Local.sqlite routine locations database from iOS 11. (In iOS 12, the same data exists but has been placed into separate columns – which IMHO, is far easier to interpret.) The ZRTLEARNEDLOCATIONOFINTERESTMO table contains “learned” locations of interest or significant locations. This example is Heathrow Airport. Two columns contain protobuf data -  ZPLACEMAPITEMGEOMAPITEMHANDLE (highlighted), and ZPLACEMAPITEMGEOMAPITEM. To decode these with protoc, I will need to extract them to separate files. In DB Browser for SQLite, I can use the export feature.

The ZPLACEMAPITEMGEOMAPITEMHANDLE protobuf blob parsed with protoc contains much of the same location data as seen before. You will find that most of the location blobs will look similar. 

Taking the highlighted hex coordinates and plotting them using Google Maps, they make more sense than the ones highlighted above. These coordinates are right at Heathrow Airport in London.

Health Data 

Protobufs are not just used for location data, but also for other data storage such as split times when I have a running or walking workout. The healthdb_secure.sqlite database on iOS contains this information. Shown below is the “workout_events” table. I have selected the BLOB data for one “event” or one split time (a workout may consist of multiple split times for multiple associated “events”).

I exported and decoded with protoc using the same methods described above. The example below is from a walk I took in Monaco. Presumably the labels “s” is for seconds and “m” is for meters, however I’m still verifying this assumption.

Notes Data

One last example brings us to the Notes application. Anyone who has looked into this database likely knows that it is a complicated one. I created a sample note to show how it may look in the protobuf produced. This note has some text, a link, and a cat meme that I’ve been enjoying recently.

This example comes from MacOS in the NotesStore.sqlite database (iOS is similar). The “ZICNOTEDATA” table contains the contents of each note. Some eagle-eyed readers may notice the highlighted bytes in the binary output. It stores note contents in gzip archives embedded in the SQLite database.

I exported this gzip archive and used gzcat and xxd to preview the contents of it. I can see the text and link in the note along with some media information. What we have here is another protobuf!

The protoc output is below, many of the same strings are visible but there are some odd ones in there.

One in strange one in particular is the line "\033\'w\213\[email protected]\251IHEn\221\242\260". This is escaped octal. You will find this being used in a variety of different protobuf data. Some are smaller, some are much larger. I recall the one that Heather Mahalik and I looked at for Android Auto. That one was just full of octal nastiness, it was awful.

This one is small and converts well using echo. What does it represent? I have no idea…yet.

The media itself is stored in the file system (you can use information in the database to find the proper path). We can use the GUID in the protoc output to find the preview thumbnail.

Summing Up 

I will be very honest, I have been looking at these weird data blobs for years without knowing what they were. I am at the point now where I’m a little obsessed with protobufs. I know…its weird – but now every time I see rando-blob-data with a bunch of 0x0A hex bytes – I think protobuf, and nine out of ten times, I am correct! 

If you happen to know where I can find more information on the .proto files, please let me know!

iOS Location Mapping with APOLLO – Part 2: Cellular and Wi-Fi Data (locationd)

My previous article showed a new capability of APOLLO with KMZ location file support. It worked great…for routined data, but there was something missing. What about the cellular and Wi-Fi locations that are stored in databases? Well, turns out I need to test better. I fixed the locationd modules to have the activity as “Location” versus “LOCATION”. Case sensitivity is apparently thing in Python…my bad. 🤷🏻‍♀️😉

I should also mention with the fixes, my total location data points for a iOS 12.1.1 device jumped to ~57,000! I should note this is not inclusive of workout locations. Those are a bit different as they are stored as separate records, one for latitude and one for longitude. In the future I might attempt to pair these up for KMZ support.

The previous article showed the routined (user/device patterns) data, I have found these locations to be quite accurate. To be fair, I have not looked at all (40k+) of them specifically, only spot checked. When I visualized the locationd data, I saw some interesting outliers.

The first module I found particularly interesting is the output from the locationd_cacheencryptedAB_celllocation module. Most of the locationd output is kept for about a week. This particular week I traveled from DC to Portland via Chicago for DFRWS and then back to DC (again via Chicago). You will notice that there are some data point clusters around DC, Chicago and Portland as expected – but there are a few scattered data points in the Midwest. I do not remember for sure if my device was in Airplane Mode, but this may be an artifact of potentially being able to access cell towers during my flight. (The locationd_cacheencryptedAB_ltecelllocation module output showed a similar pattern.)

The next module, locationd_cacheencryptedAB_ltecelllocationlocal, shows my path home from fairly accurately. These locations tend to be kept for less time. I was going north on 95, past the DC Beltway (near Springfield, VA) and onto Glebe Road towards Arlington, VA. It may be hard to see in the screenshot but the airport to the right near the Potomac is DCA. Just north of that is the Pentagon and Arlington Cemetery.

Finally, we have the output from the locationd_cacheencryptedAB_wifilocation module. This one has a few more outliers that I have a hard time explaining. I was DC and Chicago in this screenshot but not in Buffalo (NY), Pennsylvania, or NYC.

These are good examples of not completely believing and relying on what you see in the data. I can only imagine forensics cases relying heavily on this data. While I have found routined data points to be far more accurate, locationd on the other hand does have its oddities. Perhaps it is something with how the data is populated or the coordinates being reported by cell towers and/or access points. This requires additional investigation. TL;DR – Don’t assume what you are looking at is the end all be all of the data – always investigate further and correlate with other information.

In other news, if anyone knows more about how this data is populated – I’d love to know. Drop me a note!

iOS Location Mapping with APOLLO - I Know Where You Were Today, Yesterday, Last Month, and Years Ago!

I added preliminary KMZ (zipped KML) support to APOLLO. If any APOLLO module’s SQL query has “Location” in its Activity field, it will extract the location coordinates in the column “Coordinates” as long as they are in Latitude, Longitude format (ie: 38, -77). These are more a less an upgrade/replacement from my previous iOS location scripts. (FYI: Those will not likely be updated further.)

You can find more details on the different modules and outputs here. The APOLLO output will also show counts of how many location data points were extracted from each module. An example from my own data contained 41,262 points! Due to the amount of data points and how applications like Google Earth might handle them I’ve decided to split them by module to be loaded separately. It’s not ideal, but my goal is not to crash Google Earth. (FWIW, I’m still working on other solutions, if you have experience in this area - drop me a line.) The most troublesome module (by a large magnitude) is routined_cache_zrtcllocationmo since it keeps track of extremely granular locations for about the last week. Mine had ~38,000 coordinates!

Let’s take a look at some examples! This data was collected on 07/18/2019 on iOS 12.1.1 to give you an idea on timeframe.

This one is from is from the routined_cloud_visit_inbound_start module. The earliest coordinate is from a few months prior from my trip to Amsterdam to teach FOR518 - Mac and iOS Forensic Analysis and Incident Response.

These are some of my “Significant Locations”, you can click on any coordinate and gather more detail. This is the same output that is captured in the APOLLO database or CSV file. 

The next example is from routined_local_learned_location_of_interest_entry. Here you can see some of my travels since 2017! These contain the “Learned Locations of Interest”. You will probably see a bit more historical location data. Looks like I need to visit the middle of the US more!

Last but not least is the massive amount of data points from the routined_cache_zrtcllocatiomo module. This will show nearly exact, granular location for about the last week before collection. Here is my trip to Portland, OR for DFRWS! Not many guesses as to how I got to downtown Portland from the airport is there! (This is the module that produces many, many datapoints. It took patience with Google Earth to even get this screenshot, this is your warning.)

I hope this adds a bit of visual context to some of the APOLLO output, let me know if you have different ideas on how to represent some of this data! Pictures can certainly tell a story where it might otherwise get lost in the noise.

New Presentation from SANS DFIR Summit 2019 - They See Us Rollin', They Hatin' - Forensics of iOS CarPlay and Android Auto

Heather Mahalik and I teamed up again this year at the SANS DFIR Summit to present on iOS CarPlay and Android Auto.

Presentation is here. Will post a link to the video when it’s available.

Always a good time and love seeing friends every year. Still one of my favorite conferences! It was a nice surprise winning a couple of Forensic 4cast awards too! Thank for your votes! ☺️

New Presentation from MacDevOpsYVR 2019 - Launching APOLLO: Creating a Simple Tool for Advanced Forensic Analysis

I had the pleasure last week to attend MacDevOpsYVR in Vancouver, Canada. While I barely saw the city, I got to hang out with some awesome Mac Sys Admins and Dev Ops people. I’ve not been to a conference outside of Security/Forensics before so it was a delight to see the types of presentations and insight these fine folks had to offer.

The presentation includes how my APOLLO project has evolved over the last few months since it was introduced in November, 2018. I also go though some of my real life pattern-of-life examples from my iOS 12 device. We talked about everything including to my health, moving bodies (and chopping them up!), taking selfies, and how much I will spend for good food. Once the video is released I will be sure to upload a link to it, it will certainly provide more (humourous) context to the slides. [Edit 06/18/2019 - Video here!]

A unique addition to normal conference presentations was the use of a graphic recorder (Ashton of Mind’s Eye Creative) to provide additional context to the presentations. She records in real time key points of each presentation and does an absolutely fantastic job at it. This allows for additional context for discussions after the presentation with fellow attendees. Example of my talk is below:

As always, my presentations are always available on my Resources page.

Direct Link to the presentation is here!