<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Segmentation fault &#187; protobuf</title>
	<atom:link href="https://www.segmentationfault.fr/tag/protobuf/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.segmentationfault.fr</link>
	<description>Projets d’un consultant en sécurité informatique</description>
	<lastBuildDate>Fri, 15 Feb 2019 08:02:10 +0000</lastBuildDate>
	<language>fr-FR</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4.2</generator>
		<item>
		<title>Reversing Google Play and Micro-Protobuf applications</title>
		<link>https://www.segmentationfault.fr/publications/reversing-google-play-and-micro-protobuf-applications/</link>
		<comments>https://www.segmentationfault.fr/publications/reversing-google-play-and-micro-protobuf-applications/#comments</comments>
		<pubDate>Wed, 19 Sep 2012 20:11:23 +0000</pubDate>
		<dc:creator>Emilien Girault</dc:creator>
				<category><![CDATA[Publications]]></category>
		<category><![CDATA[Reverse Engineering]]></category>
		<category><![CDATA[androguard]]></category>
		<category><![CDATA[android]]></category>
		<category><![CDATA[protobuf]]></category>

		<guid isPermaLink="false">http://www.segmentationfault.fr/?p=946</guid>
		<description><![CDATA[I recently released a Google Play Unofficial Python API, which aims at providing a way for developers to query Google&#8217;s official Android application store. Such projects already exist, but they are all based on the previous version (&#171;&#160;Android Market&#160;&#187;), and are therefore limited. My goal was to adapt those projects and port them to the [...]]]></description>
			<content:encoded><![CDATA[<p>I recently released a <a href="https://github.com/egirault/googleplay-api/">Google Play Unofficial Python API</a>, which aims at providing a way for developers to query Google&rsquo;s official Android application store. Such projects already exist, but they are all based on the previous version (&laquo;&nbsp;Android Market&nbsp;&raquo;), and are therefore limited. My goal was to adapt those projects and port them to the last version of Google Play.</p>
<p>This article first highights the limitations of existing projects. Then it focuses on the official Android client for Google Play and its internals, based on a <a href="http://code.google.com/p/protobuf/">Protobuf</a> variant. Thanks to <a href="http://code.google.com/p/androguard/">Androguard</a> and its awesome static analysis features, I show how to automatically recover the <code>.proto</code> file of Google Play, enabling us to generate stubs for querying Google&rsquo;s servers. Finally, I quickly introduce the <a href="https://github.com/egirault/googleplay-api/">unofficial API</a>.<span id="more-946"></span></p>
<h3>Existing projects</h3>
<p>Google Play can be queried in two ways: using the <a href="https://play.google.com/store/apps">official website</a> or the Android client. The website contains pretty much all the useful information, such as app name and developer name, comments, last version number and release date, permissions required by the app, statistics, etc. I guess one could build a simple program that queries this website and parses the pages, but it would still have one limitation: you simply cannot download apps. Well, you can, but for this you will need an actual compatible phone, and as soon as you perform the install request, the application will get downloaded and installed on your phone. Then if you want to retrieve it in order to analyse it, you must plug in your phone and use <code>adb pull</code>. Some managed to get Google Play run within the emulator, but this is still a bit complicated and not straightforward: you need Java, <a href="http://developer.android.com/sdk/index.html">Android SDK</a>, customize your emulator ROM to embed Google Play, and script everyting yourself.</p>
<p>The main project I have been looking at is <a href="http://code.google.com/p/android-market-api/">android-market-api</a>, written in Java. Actually, I am a Python fan, and played much more with <a href="https://github.com/liato/android-market-api-py">its Python equivalent</a>. The goal of those projects is to simulate the network activity generated by the Android client, query Google Play servers, and parse the result. The underlying protocol used by Google Play is based on Google&rsquo;s <a href="http://code.google.com/p/protobuf/">Protocol Buffers</a>, aka <em>Protobuf</em>. For those who do not know, this library provides a way to encode messages in binary, compact blobs before sending them on the network, and decode them on the other side. The <a href="https://developers.google.com/protocol-buffers/docs/overview">documentation</a> contains plainty of details on the actual <a href="https://developers.google.com/protocol-buffers/docs/encoding?hl=fr">encoding format</a>, so I won&rsquo;t cover it. The only important thing to know about Protobuf is that it is much easier to decode messages if you know the structure of exchanged messages. Messages are composed of fields, each one having a tag, a name and a type. When encoded, a message embeds the tag, value and type (only basic types, or a generic &laquo;&nbsp;message&nbsp;&raquo; type) of each field, but <strong>not</strong> their names. Therefore, the semantics of each field must be guessed, and that is not always easy.</p>
<p>When Google Play Android client is able to query Google&rsquo;s servers and download APKs, all network communications are done with Protobuf and HTTP(S). The underlying Protobuf file used by the unofficial API projects (and based on Android Market) has been published as a <a href="http://code.google.com/p/android-market-api/source/browse/trunk/AndroidMarketApi/proto/market.proto"><code>.proto</code> file</a>. The unofficial API can forge some of those requests and interpret results. While playing with them, I have managed to search Android apps, but I could not always download them. Indeed, this version of the API requires a numeric « assetId » corresponding to the app you want to download. When trying to get appropriate assetIds using other API methods such as <code>search()</code>, I got non-numeric values, such as: <code>v2:com.fankewong.angrybirdsbackup2sd:1:4</code>. This type of value is rejected by Google Server when trying to download the app. Too bad&#8230;</p>
<h3>A first look at Google Play Android client</h3>
<p>The weird thing is that the non-numeric assetId problem occurs quite often, but not on all apps. I guess this is because Google updated their API when they switched to Google Play; those projects are using the old version of the API. The only way to have up-to-date information and be able to download any app would then be to analyse the updated Android client, and adapt existing projects.</p>
<p>Here we go! We retrieve <code>com.android.vending-1.apk</code> from an up-to-date Android phone using <code>adb</code>, and we use our favorite Android RE tools. A first look at class names highlights a pretty explicit <code>VendingProtos</code> class, under the <code>com.google.android.vending.remoting.protos</code> package. It contains references to a package named <code>com.google.protobuf.micro</code>, embedded within the app. This package contains classes used to encode and decode messages. It is actually part of a public project, named <a href="http://code.google.com/p/micro-protobuf/">micro-protobuf</a>, which is a lightweight version of Protobuf. However, the underlying protocol remains the same.</p>
<p>Most of network traffic is sent using HTTPS. After installing our own on CA onto the phone and setting up an interception proxy like Burp, we can sniff traffic. From a black-box approach, the exchanged data looks like a binary stream:</p>
<div id="attachment_950" class="wp-caption aligncenter" style="width: 533px"><a href="http://www.segmentationfault.fr/wp-content/uploads/2012/09/googleplay_burp1.png"><img class=" wp-image-950 " title="googleplay_burp" src="http://www.segmentationfault.fr/wp-content/uploads/2012/09/googleplay_burp1.png" alt="" width="523" height="462" /></a><p class="wp-caption-text">Capturing a Protobuf response with Burp</p></div>
<p>All we need now is the <code>.proto</code> file of Google Play to be able to decode it. But how can we get this file? It is unfortunately not embedded within the app, so we have to find another way. <a href="http://www.sysdream.com/reverse-engineering-protobuf-apps">A paper and a tool</a> have been published on the subject, but work only when the studied app or program embeds some kind of metadata, used by reflection features of Protobuf. This metadata is generally embedded in regular stubs generated with Google&rsquo;s standard protobuf compiler called <code>protoc</code>. However, this is not the case here since the Protobuf stubs embedded within Google Play Android client were not compiled with standard <code>protoc</code>. Micro-protobuf seems to remove this metadata, probably to make protocol reversing harder.</p>
<p>Anyway, is there a way to guess the structure of exchanged messages, just by having a look at the decompiled Java code of the app? Let&rsquo;s go back to the <code>VendingProtos</code> class. It is contains many subclasses, among which one named <code>AppDataProto</code>:</p>
<pre>public static final class AppDataProto extends MessageMicro
{
  private int cachedSize = -1;
  private boolean hasKey;
  private boolean hasValue;
  private String key_ = "";
  private String value_ = "";

  [...]

  public AppDataProto mergeFrom(CodedInputStreamMicro 
                                paramCodedInputStreamMicro)
    throws IOException
  {
    while (true)
    {
      int i = paramCodedInputStreamMicro.readTag();
      switch (i)
      {
      default:
        if (parseUnknownField(paramCodedInputStreamMicro, i))
          continue;
      case 0:
        return this;
      case 10:
        String str1 = paramCodedInputStreamMicro.readString();
        AppDataProto localAppDataProto1 = setKey(str1);
        break;
      case 18:
      }
      String str2 = paramCodedInputStreamMicro.readString();
      AppDataProto localAppDataProto2 = setValue(str2);
    }
  }

  public AppDataProto setKey(String paramString)
  {
    this.hasKey = 1;
    this.key_ = paramString;
    return this;
  }

  public AppDataProto setValue(String paramString)
  {
    this.hasValue = 1;
    this.value_ = paramString;
    return this;
  }

  [...]
}</pre>
<p>We can guess that this class represents a Micro-Protobuf message (the <code>extends MessageMicro</code> part) and that it has two string fields: <code>key</code> and <code>value</code>. Their tag can be extracted from the <code>mergeFrom()</code> method, which aims at decode incoming binary messages. It is composed of a main loop (<code>while(true)</code>) and a <code>switch</code> statement. Each case – except the first and second ones – corresponds to a field. The value of each case is actually the binary representation of the tag and type of the field. Everything is in the documentation; to skip the details, the actual value of each case is equal to <code>(tag &lt;&lt; 3) | type</code>. For instance, 10 stands for tag 1, type 2 (string). 18 means tag 2, string. Thus, the actual <code>.proto</code> file looks as follows:</p>
<pre>message AppDataProto {
  optional string key = 1;
  optional string value = 2;
}</pre>
<p>Actually type 2 is not exactly &laquo;&nbsp;string&nbsp;&raquo;, but any length-delimited field. It could be a string, a series of bytes, or an embedded message itself. In that case, the code looks like this:</p>
<pre>case 26:
  VendingProtos.AppDataProto localAppDataProto = new VendingProtos.AppDataProto();
  paramCodedInputStreamMicro.readMessage(localAppDataProto);
  DataMessageProto localDataMessageProto2 = addAppData(localAppDataProto);
  break;</pre>
<p>This field has a tag equal to 3 (26 &gt;&gt; 3) and is a message which name is <code>AppDataProto</code>. In order to get this sub-message structure, we would have to repeat the analysis process to the corresponding class, and so on.</p>
<h3>Automatic analysis</h3>
<p>We now have a way of recovering a message structure by analyzing the generated code. All we need now is automating the process. For this, we can use <a href="http://code.google.com/p/androguard/">Androguard</a>, a multi-purpose framework intended to make Android reversing easier. With Androguard, we can simply open an APK, decompile it, parse its Dalvik code, and do all sorts of things. Once installed, one can use the provided <code>androlyze</code> tool to dynamically interact with the framework, and then write a script to automate everything.</p>
<p>Androguard lets us easily browse the available classes and find those that extends <code>MessageMicro</code>.</p>
<pre>In [1]: apk = APK('com.android.vending-1.apk')
In [2]: dvm = DalvikVMFormat(apk.get_dex())
In [3]: vma = uVMAnalysis(dvm)
In [4]: proto_classes = filter(lambda c: "MessageMicro;" in c.get_superclassname(), dvm.get_classes())
In [5]: proto_class_names = map(lambda c: c.get_name(), proto_classes)</pre>
<p>Then we extract the <code>mergeFrom()</code> method of each class by filtering the method list generated by <code>dvm.get_methods_class(class_name)</code>. The basic block list of each method can be obtained with <code>vma.get_method(m).basic_blocks.gets()</code>.<br />
The first is usually the one that implements the switch instruction. In Dalvik, a switch is often represented as a <code>sparse-switch</code> instruction, which operand is a table composed of a list of values and offsets, called <code>sparse-switch-payload</code>. Here is an example:</p>
<pre>invoke-virtual v3, Lcom/google/protobuf/micro/CodedInputStreamMicro;-&gt;readTag()I
move-result v0
sparse-switch v0, +52 (0xa4)
[...]
sparse-switch-payload sparse-switch-payload 0:9 a:a 12:12 1a:1a 22:22 2a:2a 32:32 3a:3a 42:42 4a:4a</pre>
<p>Each (value, offset) tuple correspond to a case of the switch; if the value matches the compared register, then the execution continues to the corresponding offset. Once we are able to browse each case of the switch (and its target basic block), we can determine the name of each field and its type by examining the name of the corresponding accessors. For instance, here is a typical basic block:</p>
<pre>invoke-virtual v3, Lcom/google/protobuf/micro/CodedInputStreamMicro;-&gt;readString()Ljava/lang/String;
move-result-object v1
invoke-virtual v2, v1, L[...]AddressProto;-&gt;setCity(Ljava/lang/String;)L[...]AddressProto;
goto -25</pre>
<p>Each basic block contains two accessor calls: <code>readXXX()</code> and <code>setYYY()</code>. Their goal is to read an incoming series of bytes and initialize one field of the message. <code>XXX</code> corresponds to the type of the field (here, <em>string</em>), and <code>YYY</code> to its name (<em>city</em>).</p>
<p>The simplified analysis algorithm looks like:</p>
<pre>for each class that extends MessageMicro:
  get its mergeFrom() method
    find the sparse-switch instruction
    get the corresponding sparse-switch-payload
    index all values and offsets in a dict
    for each value, offset:
      tag = value &gt;&gt; 3
      get the target basic block using the offset
      find readXXX() and setYYY() calls
      type = XXX
      name = YYY
      index the tuple (tag, type, name)</pre>
<p>Then we only need to format the output in order to generate a parsable <code>.proto</code> file, dealing with nested messages and groups among other things.</p>
<p>I called the resulting script <a href="https://github.com/egirault/googleplay-api/blob/master/androguard/androproto.py"><code>androproto.py</code></a>. It is released with the API code; feel free to play with it. It is able to analyze the target app and print the recovered Profotuf file. I apologize for the dirty code; since Google Play is the only app using Micro-Protobuf that I&rsquo;ve analyzed, this script is pretty specific. But it should work with any app using this library, with a few changes. Its output on Google Play app looks like this:</p>
<pre>message AckNotificationResponse {
}
message AndroidAppDeliveryData {
  optional int64 downloadSize = 1;
  optional string signature = 2;
  optional string downloadUrl = 3;
  repeated AppFileMetadata additionalFile = 4;
  repeated HttpCookie downloadAuthCookie = 5;
  optional bool forwardLocked = 6;
  optional int64 refundTimeout = 7;
  optional bool serverInitiated = 8;
  optional int64 postInstallRefundWindowMillis = 9;
  optional bool immediateStartNeeded = 10;
  optional AndroidAppPatchData patchData = 11;
  optional EncryptionParams encryptionParams = 12;
}
message AndroidAppPatchData {
  optional int32 baseVersionCode = 1;
  optional string baseSignature = 2;
  optional string downloadUrl = 3;
  optional int32 patchFormat = 4;
  optional int64 maxPatchSize = 5;
}
[...]</pre>
<p>The resulting output is <em>almost</em> usable with <code>protoc</code>. Almost, because there is a duplicate message that you need to manually remove in order to make <code>protoc</code> happy. But after taking care of that detail, you have a working <a href="https://github.com/egirault/googleplay-api/blob/master/googleplay.proto"><code>googleplay.proto</code></a> that you can use to generate C++, Java and Python stubs for querying Google Play API!</p>
<h3>Building Google Play Unofficial Python API</h3>
<p>In order to parse Google Play protobuf messages, we dump each server response intercepted with Burp into a file, an use:</p>
<pre>protoc --decode=ResponseWrapper googleplay.proto &lt; dump.bin</pre>
<p><code>ResponseWrapper</code> is the root message type; it can be easily guessed by looking at the message names. Once we have a clue of what&rsquo;s received by the application, we can start building our own API. Since we need a valid auth token from Google server, we need first to authenticate. I simply reused the code from <a href="https://github.com/liato/android-market-api-py">android-market-api-py</a>. Once logged in, we need to deal with protobuf traffic. For most of API requests, the Android client does not send protobuf messages, but only simple GET or POST requests, such as <code>search?c=3&amp;q=%s</code>. In order to parse Protobuf responses, we use the generated Python module (<code>googleplay_pb2</code>):</p>
<pre>message = googleplay_pb2.ResponseWrapper.FromString(data)</pre>
<p>The resulting message can be browsed like a regular Python object. For some API methods, Google servers also return some <em>prefetch</em> data. A prefetch element contains a URL and raw data. It acts like a cache and can be dealt with pretty easily with a few lines of code.</p>
<p>The final API is pretty straightforward to use. Just follow the <a href="https://github.com/egirault/googleplay-api/blob/master/README.md">README</a>. First make sure to edit <code>googleplay.py</code> and insert your phone&rsquo;s <code>androidID</code>, then supply your Google credentials in <code>config.py</code>. You can use the provided scripts, producing CSV output, and prettify them with <code>pp</code>. Sorry for the following truncated output due to this blog&#8230;</p>
<pre>$ alias pp="column -s ';' -t"  # pretty-print CSV

$ python search.py earth | pp
Title                           Package name                            Creator                  Super Dev  Price    Offer Type  Version Code  Size     Rating  Num Downloads
Google Earth                    com.google.earth                        Google Inc.              1          Gratuit  1           53            8.6MB    4.46    10 000 000+
Terre HD Free Edition           ru.gonorovsky.kv.livewall.earthhd       Stanislav Gonorovsky     0          Gratuit  1           33            4.7MB    4.47    1 000 000+
Earth Live Wallpaper            com.seb.SLWP                            unixseb                  0          Gratuit  1           60            687.4KB  4.06    5 000 000+
Super Earth Wallpaper Free      com.mx.spacelwpfree                     Mariux                   0          Gratuit  1           2             1.8MB    4.41    100 000+
Earth And Legend                com.dvidearts.earthandlegend            DVide Arts Incorporated  0          5,99 €   1           6             6.8MB    4.82    50 000+
Earth 3D                        com.jmsys.earth3d                       Dokon Jang               0          Gratuit  1           12            3.4MB    4.05    500 000+
[...]

$ python categories.py | pp
ID                   Name
GAME                 Jeux
NEWS_AND_MAGAZINES   Actualités et magazines
COMICS               BD
LIBRARIES_AND_DEMO   Bibliothèques et démos
COMMUNICATION        Communication
ENTERTAINMENT        Divertissement
EDUCATION            Enseignement
FINANCE              Finance

$ python list.py 
Usage: list.py category [subcategory] [nb_results] [offset]
List subcategories and apps within them.
category: To obtain a list of supported catagories, use categories.py
subcategory: You can get a list of all subcategories available, by supplying a valid category

$ python list.py WEATHER | pp
Subcategory ID            Name
apps_topselling_paid      Top payant
apps_topselling_free      Top gratuit
apps_topgrossing          Les plus rentables
apps_topselling_new_paid  Top des nouveautés payantes
apps_topselling_new_free  Top des nouveautés gratuites

$ python list.py WEATHER apps_topselling_free | pp
Title                  Package name                                  Creator          Super Dev  Price    Offer Type  Version Code  Size    Rating  Num Downloads
La chaine météo        com.lachainemeteo.androidapp                  METEO CONSULT    0          Gratuit  1           8             4.6MB   4.38    1 000 000+
Météo-France           fr.meteo                                      Météo-France     0          Gratuit  1           11            2.4MB   3.63    1 000 000+
GO Weather EX          com.gau.go.launcherex.gowidget.weatherwidget  GO Launcher EX   0          Gratuit  1           25            6.5MB   4.40    10 000 000+
Thermomètre (Gratuit)  com.xiaad.android.thermometertrial            Mobiquité        0          Gratuit  1           60            3.6MB   3.78    1 000 000+

$ python permissions.py com.google.android.gm
android.permission.ACCESS_NETWORK_STATE
android.permission.GET_ACCOUNTS
android.permission.MANAGE_ACCOUNTS
android.permission.INTERNET
android.permission.READ_CONTACTS
android.permission.WRITE_CONTACTS
android.permission.READ_SYNC_SETTINGS
android.permission.READ_SYNC_STATS
android.permission.RECEIVE_BOOT_COMPLETED
[...]

$ python download.py com.google.android.gm
Downloading 2.7MB... Done

$ file com.google.android.gm.apk 
com.google.android.gm.apk: Zip archive data, at least v2.0 to extract</pre>
<h3>Conclusion</h3>
<p>Although there is no metadata within Micro-Protobuf applications, recovering <code>.proto</code> files is still doable and it can still be done automatically. The lack of obfuscation is clearly an advantage for an attacker, since all class and method names are easy to understand. Having a non-official Google Play API is handy for many reasons: performing statistics that aren&rsquo;t available on the official front-end, looking for plagiarism, automatic malware search / downloading / analysis (Androguard to the rescue)&#8230; Feel free to browse the <a href="https://github.com/egirault/googleplay-api/">source</a>, fork the project, and improve it!</p>
]]></content:encoded>
			<wfw:commentRss>https://www.segmentationfault.fr/publications/reversing-google-play-and-micro-protobuf-applications/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
	</channel>
</rss>
