0

I am having power BI desktop report(pbix) internal file (DataMashup), which i am trying to decode. My Aim is to create Power-BI desktop report, Data Model using any programming language. I am using Java for initial.

enter image description here

files are encoded with some encoding technique.

I tried to get encoding of file and it is returning windows 1254. but decoding is not happening.

File f = new File("example.txt");

    String[] charsetsToBeTested = {"UTF-8", "windows-1254", "ISO-8859-7"};

    CharsetDetector cd = new CharsetDetector();
    Charset charset = cd.detectCharset(f, charsetsToBeTested);

    if (charset != null) {
        try {
            InputStreamReader reader = new InputStreamReader(new FileInputStream(f), charset);
            int c = 0;
            while ((c = reader.read()) != -1) {
                System.out.print((char)c);
            }
            reader.close();
        } catch (FileNotFoundException fnfe) {
            fnfe.printStackTrace();
        }catch(IOException ioe){
            ioe.printStackTrace();
        }

    }else{
        System.out.println("Unrecognized charset.");
    }

Unzipping of file is also not working

public void unZipIt(String zipFile, String outputFolder)
{
    byte buffer[] = new byte[1024];
    try
    {
        File folder = new File(outputFolder);
        if(!folder.exists())
        {
            folder.mkdir();
        }
        ZipInputStream zis = new ZipInputStream(new FileInputStream(zipFile));
        System.out.println(zis);

        System.out.println(zis.getNextEntry());
        for(ZipEntry ze = zis.getNextEntry(); ze != null; ze = zis.getNextEntry())
        {
            String fileName = ze.getName();
            System.out.println(ze);
            File newFile = new File((new StringBuilder(String.valueOf(outputFolder))).append(File.separator).append(fileName).toString());
            System.out.println((new StringBuilder("file unzip : ")).append(newFile.getAbsoluteFile()).toString());
            (new File(newFile.getParent())).mkdirs();
            FileOutputStream fos = new FileOutputStream(newFile);
            int len;
            while((len = zis.read(buffer)) > 0) 
            {
                fos.write(buffer, 0, len);
            }
            fos.close();
        }

        zis.closeEntry();
        zis.close();
        System.out.println("Done");
    }
    catch(IOException ex)
    {
        ex.printStackTrace();
    }
}
Rahul Patel
  • 307
  • 1
  • 9
  • 27

2 Answers2

0

The file contains a binary header and then XML with UTF-8 specified. The header data seems to hold the file name (Config/Package.xml), so assuming a zip format is understandable. With a zip format also there would be binary data at the end of file.

Maybe the file was downloaded using FTP, and a text conversion ("\n" to "\r\n") was done. Then the zip would be corrupted. Renaming the file to .zip might help testing the file with zip tools.

Try first the .tar format. This would be logical as the XML file is not compressed. Add .tar to the file ending.

Otherwise, if the content is always UTF-8 XML:

Path f = Paths.get("example.txt");
String start ="<?xml";
String end = ">";
byte[] bytes = Files.readAllBytes(f);
String s = new String(bytes, StandardCharsets.ISO_8859_1); // Single byte encoding.
int startI = s.indexOf(start);
int endI = s.lastIndexOf(end) + end.length();
//bytes = Arrays.copyOfRange(bytes, startI, endI);
String xml = new String(bytes, startI, endI - startI, StandardCharsets.UTF_8);
Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
  • This file has no extension, I can unzip using 7z software but not with winrar. I am not able to zip it again after change in file and after unzip this file i am getting Config folder, Formula folder and one xml file. All sub files content is different. – Rahul Patel Mar 01 '18 at 13:54
  • If I will do any changes in xml of DataMashup file and create packaged power bi report file (pbix) it will throw error saying file is corrupted. – Rahul Patel Mar 01 '18 at 13:56
0

You can use the System.IO.Packaging library to extract the Power BI data mashup. It uses the OPC package standard, see here.

totoro_dev
  • 29
  • 6