Home » Android » android – How to parse the AndroidManifest.xml file inside an .apk package

android – How to parse the AndroidManifest.xml file inside an .apk package

Posted by: admin March 10, 2020 Leave a comment

Questions:

This file appears to be in a binary XML format. What is this format and how can it be parsed programmatically (as opposed to using the aapt dump tool in the SDK)?

This binary format is not discussed in the documentation here.

Note: I want to access this information from outside the Android environment, preferably from Java.

How to&Answers:

Use android-apktool

There is an application that reads apk files and decodes XMLs to nearly original form.

Usage:

apktool d Gmail.apk && cat Gmail/AndroidManifest.xml

Check android-apktool for more information

Answer:

This Java method, that runs on an Android, documents (what I’ve been able to interpret about) the binary format of the AndroidManifest.xml file in the .apk package. The second code box shows how to call decompressXML and how to load the byte[] from the app package file on the device. (There are fields whose purpose I don’t understand, if you know what they mean, tell me, I’ll update the info.)

// decompressXML -- Parse the 'compressed' binary form of Android XML docs 
// such as for AndroidManifest.xml in .apk files
public static int endDocTag = 0x00100101;
public static int startTag =  0x00100102;
public static int endTag =    0x00100103;
public void decompressXML(byte[] xml) {
// Compressed XML file/bytes starts with 24x bytes of data,
// 9 32 bit words in little endian order (LSB first):
//   0th word is 03 00 08 00
//   3rd word SEEMS TO BE:  Offset at then of StringTable
//   4th word is: Number of strings in string table
// WARNING: Sometime I indiscriminently display or refer to word in 
//   little endian storage format, or in integer format (ie MSB first).
int numbStrings = LEW(xml, 4*4);

// StringIndexTable starts at offset 24x, an array of 32 bit LE offsets
// of the length/string data in the StringTable.
int sitOff = 0x24;  // Offset of start of StringIndexTable

// StringTable, each string is represented with a 16 bit little endian 
// character count, followed by that number of 16 bit (LE) (Unicode) chars.
int stOff = sitOff + numbStrings*4;  // StringTable follows StrIndexTable

// XMLTags, The XML tag tree starts after some unknown content after the
// StringTable.  There is some unknown data after the StringTable, scan
// forward from this point to the flag for the start of an XML start tag.
int xmlTagOff = LEW(xml, 3*4);  // Start from the offset in the 3rd word.
// Scan forward until we find the bytes: 0x02011000(x00100102 in normal int)
for (int ii=xmlTagOff; ii<xml.length-4; ii+=4) {
  if (LEW(xml, ii) == startTag) { 
    xmlTagOff = ii;  break;
  }
} // end of hack, scanning for start of first start tag

// XML tags and attributes:
// Every XML start and end tag consists of 6 32 bit words:
//   0th word: 02011000 for startTag and 03011000 for endTag 
//   1st word: a flag?, like 38000000
//   2nd word: Line of where this tag appeared in the original source file
//   3rd word: FFFFFFFF ??
//   4th word: StringIndex of NameSpace name, or FFFFFFFF for default NS
//   5th word: StringIndex of Element Name
//   (Note: 01011000 in 0th word means end of XML document, endDocTag)

// Start tags (not end tags) contain 3 more words:
//   6th word: 14001400 meaning?? 
//   7th word: Number of Attributes that follow this tag(follow word 8th)
//   8th word: 00000000 meaning??

// Attributes consist of 5 words: 
//   0th word: StringIndex of Attribute Name's Namespace, or FFFFFFFF
//   1st word: StringIndex of Attribute Name
//   2nd word: StringIndex of Attribute Value, or FFFFFFF if ResourceId used
//   3rd word: Flags?
//   4th word: str ind of attr value again, or ResourceId of value

// TMP, dump string table to tr for debugging
//tr.addSelect("strings", null);
//for (int ii=0; ii<numbStrings; ii++) {
//  // Length of string starts at StringTable plus offset in StrIndTable
//  String str = compXmlString(xml, sitOff, stOff, ii);
//  tr.add(String.valueOf(ii), str);
//}
//tr.parent();

// Step through the XML tree element tags and attributes
int off = xmlTagOff;
int indent = 0;
int startTagLineNo = -2;
while (off < xml.length) {
  int tag0 = LEW(xml, off);
  //int tag1 = LEW(xml, off+1*4);
  int lineNo = LEW(xml, off+2*4);
  //int tag3 = LEW(xml, off+3*4);
  int nameNsSi = LEW(xml, off+4*4);
  int nameSi = LEW(xml, off+5*4);

  if (tag0 == startTag) { // XML START TAG
    int tag6 = LEW(xml, off+6*4);  // Expected to be 14001400
    int numbAttrs = LEW(xml, off+7*4);  // Number of Attributes to follow
    //int tag8 = LEW(xml, off+8*4);  // Expected to be 00000000
    off += 9*4;  // Skip over 6+3 words of startTag data
    String name = compXmlString(xml, sitOff, stOff, nameSi);
    //tr.addSelect(name, null);
    startTagLineNo = lineNo;

    // Look for the Attributes
    StringBuffer sb = new StringBuffer();
    for (int ii=0; ii<numbAttrs; ii++) {
      int attrNameNsSi = LEW(xml, off);  // AttrName Namespace Str Ind, or FFFFFFFF
      int attrNameSi = LEW(xml, off+1*4);  // AttrName String Index
      int attrValueSi = LEW(xml, off+2*4); // AttrValue Str Ind, or FFFFFFFF
      int attrFlags = LEW(xml, off+3*4);  
      int attrResId = LEW(xml, off+4*4);  // AttrValue ResourceId or dup AttrValue StrInd
      off += 5*4;  // Skip over the 5 words of an attribute

      String attrName = compXmlString(xml, sitOff, stOff, attrNameSi);
      String attrValue = attrValueSi!=-1
        ? compXmlString(xml, sitOff, stOff, attrValueSi)
        : "resourceID 0x"+Integer.toHexString(attrResId);
      sb.append(" "+attrName+"=\""+attrValue+"\"");
      //tr.add(attrName, attrValue);
    }
    prtIndent(indent, "<"+name+sb+">");
    indent++;

  } else if (tag0 == endTag) { // XML END TAG
    indent--;
    off += 6*4;  // Skip over 6 words of endTag data
    String name = compXmlString(xml, sitOff, stOff, nameSi);
    prtIndent(indent, "</"+name+">  (line "+startTagLineNo+"-"+lineNo+")");
    //tr.parent();  // Step back up the NobTree

  } else if (tag0 == endDocTag) {  // END OF XML DOC TAG
    break;

  } else {
    prt("  Unrecognized tag code '"+Integer.toHexString(tag0)
      +"' at offset "+off);
    break;
  }
} // end of while loop scanning tags and attributes of XML tree
prt("    end at offset "+off);
} // end of decompressXML


public String compXmlString(byte[] xml, int sitOff, int stOff, int strInd) {
  if (strInd < 0) return null;
  int strOff = stOff + LEW(xml, sitOff+strInd*4);
  return compXmlStringAt(xml, strOff);
}


public static String spaces = "                                             ";
public void prtIndent(int indent, String str) {
  prt(spaces.substring(0, Math.min(indent*2, spaces.length()))+str);
}


// compXmlStringAt -- Return the string stored in StringTable format at
// offset strOff.  This offset points to the 16 bit string length, which 
// is followed by that number of 16 bit (Unicode) chars.
public String compXmlStringAt(byte[] arr, int strOff) {
  int strLen = arr[strOff+1]<<8&0xff00 | arr[strOff]&0xff;
  byte[] chars = new byte[strLen];
  for (int ii=0; ii<strLen; ii++) {
    chars[ii] = arr[strOff+2+ii*2];
  }
  return new String(chars);  // Hack, just use 8 byte chars
} // end of compXmlStringAt


// LEW -- Return value of a Little Endian 32 bit word from the byte array
//   at offset off.
public int LEW(byte[] arr, int off) {
  return arr[off+3]<<24&0xff000000 | arr[off+2]<<16&0xff0000
    | arr[off+1]<<8&0xff00 | arr[off]&0xFF;
} // end of LEW

This method reads the AndroidManifest into a byte[] for processing:

public void getIntents(String path) {
  try {
    JarFile jf = new JarFile(path);
    InputStream is = jf.getInputStream(jf.getEntry("AndroidManifest.xml"));
    byte[] xml = new byte[is.available()];
    int br = is.read(xml);
    //Tree tr = TrunkFactory.newTree();
    decompressXML(xml);
    //prt("XML\n"+tr.list());
  } catch (Exception ex) {
    console.log("getIntents, ex: "+ex);  ex.printStackTrace();
  }
} // end of getIntents

Most apps are stored in /system/app which is readable without root my Evo, other apps are in /data/app which I needed root to see. The ‘path’ argument above would be something like: “/system/app/Weather.apk”

Answer:

What about using the Android Asset Packaging Tool (aapt), from the Android SDK, into a Python (or whatever) script?

Through the aapt (http://elinux.org/Android_aapt), indeed, you can retrieve information about the .apk package and about its AndroidManifest.xml file. In particular, you can extract the values of individual elements of an .apk package through the ‘dump’ sub-command. For example, you can extract the user-permissions in the AndroidManifest.xml file inside an .apk package in this way:

$ aapt dump permissions package.apk

Where package.apk is your .apk package.

Moreover, you can use the Unix pipe command to clear the output. For example:

$ aapt dump permissions package.apk | sed 1d | awk '{ print $NF }'

Here a Python script that to that programmatically:

import os
import subprocess

#Current directory and file name:
curpath = os.path.dirname( os.path.realpath(__file__) )
filepath = os.path.join(curpath, "package.apk")

#Extract the AndroidManifest.xml permissions:
command = "aapt dump permissions " + filepath + " | sed 1d | awk '{ print $NF }'"
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=None, shell=True)
permissions = process.communicate()[0]

print permissions

In a similar fashion you can extract other information (e.g. package, app name, etc…) of the AndroidManifest.xml:

#Extract the APK package info:
shellcommand = "aapt dump badging " + filepath
process = subprocess.Popen(shellcommand, stdout=subprocess.PIPE, stderr=None, shell=True)
apkInfo = process.communicate()[0].splitlines()

for info in apkInfo:
    #Package info:
    if string.find(info, "package:", 0) != -1:
        print "App Package: " + findBetween(info, "name='", "'")
        print "App Version: " + findBetween(info, "versionName='", "'")
        continue

    #App name:
    if string.find(info, "application:", 0) != -1:
        print "App Name: " + findBetween(info, "label='", "'")
        continue


def findBetween(s, prefix, suffix):
    try:
        start = s.index(prefix) + len(prefix)
        end = s.index(suffix, start)
        return s[start:end]
    except ValueError:
        return ""

If instead you want to parse the entire AndroidManifest XML tree, you can do that in a similar way using the xmltree command:

aapt dump xmltree package.apk AndroidManifest.xml

Using Python as before:

#Extract the AndroidManifest XML tree:
shellcommand = "aapt dump xmltree " + filepath + " AndroidManifest.xml"
process = subprocess.Popen(shellcommand, stdout=subprocess.PIPE, stderr=None, shell=True)
xmlTree = process.communicate()[0]

print "Number of Activities: " + str(xmlTree.count("activity"))
print "Number of Services: " + str(xmlTree.count("service"))
print "Number of BroadcastReceivers: " + str(xmlTree.count("receiver"))

Answer:

You can use axml2xml.pl tool developed a while ago within android-random project. It will generate the textual manifest file (AndroidManifest.xml) from the binary one.

I’m saying “textual” and not “original” because like many reverse-engineering tools this one isn’t perfect and the result will not be complete. I presume either it was never feature complete or simply not forward-compatible (with newer binary encoding scheme). Whatever the reason, axml2xml.pl tool will not be able to extract all the attribute values correctly. Such attributes are minSdkVersion, targetSdkVersion and basically all attributes that are referencing resources (like strings, icons, etc.), i.e. only class names (of activities, services, etc.) are extracted correctly.

However, you can still find these missing information by running aapt tool on the original Android app file (.apk):

aapt l -a <someapp.apk>

Answer:

Check this following WPF Project which decodes the properties correctly.

Answer:

With the latest SDK-Tools, you can now use a tool called the apkanalyzer to print out the AndroidManifest.xml of an APK (as well as other parts, such as resources).

[android sdk]/tools/bin/apkanalyzer manifest print [app.apk]

apkanalyzer

Answer:

apk-parser, https://github.com/caoqianli/apk-parser, a lightweight impl for java, with no dependency for aapt or other binarys, is good for parse binary xml files, and other apk infos.

ApkParser apkParser = new ApkParser(new File(filePath));
// set a locale to translate resource tag into specific strings in language the locale specified, you set locale to Locale.ENGLISH then get apk title 'WeChat' instead of '@string/app_name' for example
apkParser.setPreferredLocale(locale);

String xml = apkParser.getManifestXml();
System.out.println(xml);

String xml2 = apkParser.transBinaryXml(xmlPathInApk);
System.out.println(xml2);

ApkMeta apkMeta = apkParser.getApkMeta();
System.out.println(apkMeta);

Set<Locale> locales = apkParser.getLocales();
for (Locale l : locales) {
    System.out.println(l);
}
apkParser.close();

Answer:

If your into Python or use Androguard, the Androguard Androaxml feature will do this conversion for you. The feature is detailed in this blog post, with additional documentation here and source here.

Usage:

$ ./androaxml.py -h
Usage: androaxml.py [options]

Options:
-h, --help            show this help message and exit
-i INPUT, --input=INPUT
                      filename input (APK or android's binary xml)
-o OUTPUT, --output=OUTPUT
                      filename output of the xml
-v, --version         version of the API

$ ./androaxml.py -i yourfile.apk -o output.xml
$ ./androaxml.py -i AndroidManifest.xml -o output.xml

Answer:

In case it’s useful, here’s a C++ version of the Java snippet posted by Ribo:

struct decompressXML
{
    // decompressXML -- Parse the 'compressed' binary form of Android XML docs 
    // such as for AndroidManifest.xml in .apk files
    enum
    {
        endDocTag = 0x00100101,
        startTag =  0x00100102,
        endTag =    0x00100103
    };

    decompressXML(const BYTE* xml, int cb) {
    // Compressed XML file/bytes starts with 24x bytes of data,
    // 9 32 bit words in little endian order (LSB first):
    //   0th word is 03 00 08 00
    //   3rd word SEEMS TO BE:  Offset at then of StringTable
    //   4th word is: Number of strings in string table
    // WARNING: Sometime I indiscriminently display or refer to word in 
    //   little endian storage format, or in integer format (ie MSB first).
    int numbStrings = LEW(xml, cb, 4*4);

    // StringIndexTable starts at offset 24x, an array of 32 bit LE offsets
    // of the length/string data in the StringTable.
    int sitOff = 0x24;  // Offset of start of StringIndexTable

    // StringTable, each string is represented with a 16 bit little endian 
    // character count, followed by that number of 16 bit (LE) (Unicode) chars.
    int stOff = sitOff + numbStrings*4;  // StringTable follows StrIndexTable

    // XMLTags, The XML tag tree starts after some unknown content after the
    // StringTable.  There is some unknown data after the StringTable, scan
    // forward from this point to the flag for the start of an XML start tag.
    int xmlTagOff = LEW(xml, cb, 3*4);  // Start from the offset in the 3rd word.
    // Scan forward until we find the bytes: 0x02011000(x00100102 in normal int)
    for (int ii=xmlTagOff; ii<cb-4; ii+=4) {
      if (LEW(xml, cb, ii) == startTag) { 
        xmlTagOff = ii;  break;
      }
    } // end of hack, scanning for start of first start tag

    // XML tags and attributes:
    // Every XML start and end tag consists of 6 32 bit words:
    //   0th word: 02011000 for startTag and 03011000 for endTag 
    //   1st word: a flag?, like 38000000
    //   2nd word: Line of where this tag appeared in the original source file
    //   3rd word: FFFFFFFF ??
    //   4th word: StringIndex of NameSpace name, or FFFFFFFF for default NS
    //   5th word: StringIndex of Element Name
    //   (Note: 01011000 in 0th word means end of XML document, endDocTag)

    // Start tags (not end tags) contain 3 more words:
    //   6th word: 14001400 meaning?? 
    //   7th word: Number of Attributes that follow this tag(follow word 8th)
    //   8th word: 00000000 meaning??

    // Attributes consist of 5 words: 
    //   0th word: StringIndex of Attribute Name's Namespace, or FFFFFFFF
    //   1st word: StringIndex of Attribute Name
    //   2nd word: StringIndex of Attribute Value, or FFFFFFF if ResourceId used
    //   3rd word: Flags?
    //   4th word: str ind of attr value again, or ResourceId of value

    // TMP, dump string table to tr for debugging
    //tr.addSelect("strings", null);
    //for (int ii=0; ii<numbStrings; ii++) {
    //  // Length of string starts at StringTable plus offset in StrIndTable
    //  String str = compXmlString(xml, sitOff, stOff, ii);
    //  tr.add(String.valueOf(ii), str);
    //}
    //tr.parent();

    // Step through the XML tree element tags and attributes
    int off = xmlTagOff;
    int indent = 0;
    int startTagLineNo = -2;
    while (off < cb) {
      int tag0 = LEW(xml, cb, off);
      //int tag1 = LEW(xml, off+1*4);
      int lineNo = LEW(xml, cb, off+2*4);
      //int tag3 = LEW(xml, off+3*4);
      int nameNsSi = LEW(xml, cb, off+4*4);
      int nameSi = LEW(xml, cb, off+5*4);

      if (tag0 == startTag) { // XML START TAG
        int tag6 = LEW(xml, cb, off+6*4);  // Expected to be 14001400
        int numbAttrs = LEW(xml, cb, off+7*4);  // Number of Attributes to follow
        //int tag8 = LEW(xml, off+8*4);  // Expected to be 00000000
        off += 9*4;  // Skip over 6+3 words of startTag data
        std::string name = compXmlString(xml, cb, sitOff, stOff, nameSi);
        //tr.addSelect(name, null);
        startTagLineNo = lineNo;

        // Look for the Attributes
        std::string sb;
        for (int ii=0; ii<numbAttrs; ii++) {
          int attrNameNsSi = LEW(xml, cb, off);  // AttrName Namespace Str Ind, or FFFFFFFF
          int attrNameSi = LEW(xml, cb, off+1*4);  // AttrName String Index
          int attrValueSi = LEW(xml, cb, off+2*4); // AttrValue Str Ind, or FFFFFFFF
          int attrFlags = LEW(xml, cb, off+3*4);  
          int attrResId = LEW(xml, cb, off+4*4);  // AttrValue ResourceId or dup AttrValue StrInd
          off += 5*4;  // Skip over the 5 words of an attribute

          std::string attrName = compXmlString(xml, cb, sitOff, stOff, attrNameSi);
          std::string attrValue = attrValueSi!=-1
            ? compXmlString(xml, cb, sitOff, stOff, attrValueSi)
            : "resourceID 0x"+toHexString(attrResId);
          sb.append(" "+attrName+"=\""+attrValue+"\"");
          //tr.add(attrName, attrValue);
        }
        prtIndent(indent, "<"+name+sb+">");
        indent++;

      } else if (tag0 == endTag) { // XML END TAG
        indent--;
        off += 6*4;  // Skip over 6 words of endTag data
        std::string name = compXmlString(xml, cb, sitOff, stOff, nameSi);
        prtIndent(indent, "</"+name+">  (line "+toIntString(startTagLineNo)+"-"+toIntString(lineNo)+")");
        //tr.parent();  // Step back up the NobTree

      } else if (tag0 == endDocTag) {  // END OF XML DOC TAG
        break;

      } else {
        prt("  Unrecognized tag code '"+toHexString(tag0)
          +"' at offset "+toIntString(off));
        break;
      }
    } // end of while loop scanning tags and attributes of XML tree
    prt("    end at offset "+off);
    } // end of decompressXML


    std::string compXmlString(const BYTE* xml, int cb, int sitOff, int stOff, int strInd) {
      if (strInd < 0) return std::string("");
      int strOff = stOff + LEW(xml, cb, sitOff+strInd*4);
      return compXmlStringAt(xml, cb, strOff);
    }

    void prt(std::string str)
    {
        printf("%s", str.c_str());
    }
    void prtIndent(int indent, std::string str) {
        char spaces[46];
        memset(spaces, ' ', sizeof(spaces));
        spaces[min(indent*2,  sizeof(spaces) - 1)] = 0;
        prt(spaces);
        prt(str);
        prt("\n");
    }


    // compXmlStringAt -- Return the string stored in StringTable format at
    // offset strOff.  This offset points to the 16 bit string length, which 
    // is followed by that number of 16 bit (Unicode) chars.
    std::string compXmlStringAt(const BYTE* arr, int cb, int strOff) {
        if (cb < strOff + 2) return std::string("");
      int strLen = arr[strOff+1]<<8&0xff00 | arr[strOff]&0xff;
      char* chars = new char[strLen + 1];
      chars[strLen] = 0;
      for (int ii=0; ii<strLen; ii++) {
          if (cb < strOff + 2 + ii * 2)
          {
              chars[ii] = 0;
              break;
          }
        chars[ii] = arr[strOff+2+ii*2];
      }
      std::string str(chars);
      free(chars);
      return str;
    } // end of compXmlStringAt


    // LEW -- Return value of a Little Endian 32 bit word from the byte array
    //   at offset off.
    int LEW(const BYTE* arr, int cb, int off) {
      return (cb > off + 3) ? ( arr[off+3]<<24&0xff000000 | arr[off+2]<<16&0xff0000
          | arr[off+1]<<8&0xff00 | arr[off]&0xFF ) : 0;
    } // end of LEW

    std::string toHexString(DWORD attrResId)
    {
        char ch[20];
        sprintf_s(ch, 20, "%lx", attrResId);
        return std::string(ch);
    }
    std::string toIntString(int i)
    {
        char ch[20];
        sprintf_s(ch, 20, "%ld", i);
        return std::string(ch);
    }
};

Answer:

for reference here is my version of Ribo’s code. The main difference is that decompressXML() directly returns a String, which for my purposes was a more appropriate usage.

NOTE: my sole purpose in using Ribo’s solution was to fetch an .APK file’s published version from the Manifest XML file, and I confirm that for this purpose it works beautifully.

EDIT [2013-03-16]: It works beautifully IF the version is set as plain text, but if it’s set to refer to a Resource XML, it’ll show up as ‘Resource 0x1’ for example. In this particular case, you’ll probably have to couple this solution to another solution that will fetch the proper string resource reference.

Hope it can help other people too.

Answer:

In Android studio 2.2 you can directly analyze the apk. Goto build- analyze apk. Select the apk, navigate to androidmanifest.xml. You can see the details of androidmanifest.

Answer:

@Mathieu Kotlin version follows:

This is a kotlin version of the answer above.

Answer:

I found the AXMLPrinter2, a Java app over at the Android4Me project to work fine on the AndroidManifest.xml that I had (and prints the XML out in a nicely formatted way).
http://code.google.com/p/android4me/downloads/detail?name=AXMLPrinter2.jar

One note.. it (and the code on this answer from Ribo) doesn’t appear to handle every compiled XML file that I’ve come across. I found one where the strings were stored with one byte per character, rather than the double byte format that it assumes.

Answer:

it can be helpful

G is my Application class :

Answer:

I have been running with the Ribo code posted above for over a year, and it has served us well. With recent updates (Gradle 3.x) though, I was no longer able to parse the AndroidManifest.xml, I was getting index out of bounds errors, and in general it was no longer able to parse the file.

Update: I now believe that our issues was with upgrading to Gradle 3.x. This article describes how AirWatch had issues and can be fixed by using a Gradle setting to use aapt instead of aapt2 AirWatch seems to be incompatible with Android Plugin for Gradle 3.0.0-beta1

In searching around I came across this open source project, and it’s being maintained and I was able to get to the point and read both my old APKs that I could previously parse, and the new APKs that the logic from Ribo threw exceptions

https://github.com/xgouchet/AXML

From his example this is what I’m doing

Answer:

apkanalyzer will be helpful

original post https://stackoverflow.com/a/51905063/1383521