protoc

Just Call Me Buffy the Proto Slayer – An Initial Look into Protobuf Data in Mac and iOS Forensics

I was first introduced to the protobuf data format years ago accidentally when I was doing some MITM network analysis from an Android device. The data I was looking at was being transferred in this odd format, I could tell there were some known strings and some patterns to it - but I did not recognize the format. It had no magic number or file header. I started looking at it more in depth, it did have a structure. I spent an embarrassingly amount of time trying to reverse engineer this structure before I realized it was in fact a protobuf data blob. I think it was a moment of frustration in reversing this structure that lead me to searching something like “weird data format Android network traffic” that finally lead me to protobufs. 

 Ok, so what the heck is a protobuf? It actually stands for Protocol Buffer, but everyone calls them protobufs. It is a “language-neutral, platform-neutral extensible mechanism for serializing structured data” created by Google. It is a super-efficient way of storing and transferring data.

Since I was looking at an Android device, a protobuf made perfect sense. This was a Google thing afterall. I started noticing them more and more on Android devices, not just in the network traffic but also storing data on disk as well. It took me a long time to also notice that they were being stored on Apple devices! Native applications, 3rdparty applications, they are used EVERYWHERE! A great example was found by my friend Phill Moore in the iOS Spotify application to keep track of items listened to

In this article I’ll introduce you to some of the Apple-specific protobufs that I’ve come across. Some are fairly straight forward, others are less so. The kicker with protobufs is that there is an accompanying *.proto file that contains the definition to what is contained in these buffers. Unfortunately, we likely do not have this file as it is most likely server-side or inaccessible therefore we need to reverse engineer the contents and meaning of the items stored in this blob.

To parse these protobufs, I use protoc from Google to get a raw output. If you have the .proto file you can use this as well, but I have yet to give that a go. On a Mac, I would do a ‘brew install protobuf’ to get protoc installed. To parse a given buffer I will use the following command:

protoc --decode_raw < [protobuf_blob_file]

I will parse out some protobufs from different applications to give you an idea of what is stored in them - Maps, Locations, Health, and Notes.

Maps Data

The Maps application on both macOS and iOS use many protobufs to store location data. These can be found in quite a few different Maps related plist files. I will focus on GeoHistory.mapsdata plist file from iOS which stores historical locations that were mapped. This plist has GUID keys that contain a “contents” subkey. This “contents” blob contains the protobuf of the mapped location. I chose a small example to begin with as some of these can be very large

I’ve extracted this blob and put into a hex editor. You can see some visible strings in this blob but really no context. This is where parsing with protoc can be helpful.

I saved this protobuf to a file I named geohistory.pb. Using protoc I decoded it to the output below. I can see the same GUID and location name but now I get some other hex based values. This is where you will have to determine what these values mean and what they are used for, it may not be obvious initially.

At this time, I believe the highlighted pairs of hex in the screenshot above are coordinates. To read these, I copy them into a hex editor to read their values highlighted below. On the left is latitude, on the right is longitude. Plotting these two pairs above I get one location in Manassas, VA and another in Bethesda, MD. Clearly, neither of these are in Fairfax, VA. This is the tricky part, what do these values mean? More testing needs to be done here.

Location Data

The next example shows a protobuf blob being stored inside of a SQLite database. The example below is from the Local.sqlite routine locations database from iOS 11. (In iOS 12, the same data exists but has been placed into separate columns – which IMHO, is far easier to interpret.) The ZRTLEARNEDLOCATIONOFINTERESTMO table contains “learned” locations of interest or significant locations. This example is Heathrow Airport. Two columns contain protobuf data -  ZPLACEMAPITEMGEOMAPITEMHANDLE (highlighted), and ZPLACEMAPITEMGEOMAPITEM. To decode these with protoc, I will need to extract them to separate files. In DB Browser for SQLite, I can use the export feature.

The ZPLACEMAPITEMGEOMAPITEMHANDLE protobuf blob parsed with protoc contains much of the same location data as seen before. You will find that most of the location blobs will look similar. 

Taking the highlighted hex coordinates and plotting them using Google Maps, they make more sense than the ones highlighted above. These coordinates are right at Heathrow Airport in London.

Health Data 

Protobufs are not just used for location data, but also for other data storage such as split times when I have a running or walking workout. The healthdb_secure.sqlite database on iOS contains this information. Shown below is the “workout_events” table. I have selected the BLOB data for one “event” or one split time (a workout may consist of multiple split times for multiple associated “events”).

I exported and decoded with protoc using the same methods described above. The example below is from a walk I took in Monaco. Presumably the labels “s” is for seconds and “m” is for meters, however I’m still verifying this assumption.

Notes Data

One last example brings us to the Notes application. Anyone who has looked into this database likely knows that it is a complicated one. I created a sample note to show how it may look in the protobuf produced. This note has some text, a link, and a cat meme that I’ve been enjoying recently.

This example comes from MacOS in the NotesStore.sqlite database (iOS is similar). The “ZICNOTEDATA” table contains the contents of each note. Some eagle-eyed readers may notice the highlighted bytes in the binary output. It stores note contents in gzip archives embedded in the SQLite database.

I exported this gzip archive and used gzcat and xxd to preview the contents of it. I can see the text and link in the note along with some media information. What we have here is another protobuf!

The protoc output is below, many of the same strings are visible but there are some odd ones in there.

One in strange one in particular is the line "\033\'w\213\256?@Q\251IHEn\221\242\260". This is escaped octal. You will find this being used in a variety of different protobuf data. Some are smaller, some are much larger. I recall the one that Heather Mahalik and I looked at for Android Auto. That one was just full of octal nastiness, it was awful.

This one is small and converts well using echo. What does it represent? I have no idea…yet.

The media itself is stored in the file system (you can use information in the database to find the proper path). We can use the GUID in the protoc output to find the preview thumbnail.

Summing Up 

I will be very honest, I have been looking at these weird data blobs for years without knowing what they were. I am at the point now where I’m a little obsessed with protobufs. I know…its weird – but now every time I see rando-blob-data with a bunch of 0x0A hex bytes – I think protobuf, and nine out of ten times, I am correct! 

If you happen to know where I can find more information on the .proto files, please let me know!