I need help compressing and decompressing large files using sources and gzip

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

I need help compressing and decompressing large files using sources and gzip

Aarón CdC
I found examples on the wiki about how to use crypto++ to compress and decompress files using Crypto++'s Gzip algorithm. But all examples I found process the whole file at once. Imagine I need to compress a 1GB+ file. Maybe now we have computers with large amounts of memory, but this is completely out of the question.

What I'm trying to do is to process a file using buffers to load small chunks of data trough a File Source. This is what I got so far:

    ofstream output;
   
   
//Compression
    output
.open("TEST/out.a", fstream::binary);
   
int got,ncomp,diff = 0;
       
char* ib = new char [BUFSIZE];
       
char* ob = new char [BUFSIZE];
       
       
for(int x = 0; x < BUFSIZE; x++)
       
{
            ib
[x] |= 0x0;
            ob
[x] |= 0x0;
       
}
       
       
       
       
FileSource fsr("TEST/a.jpg", false, new Gzip(new ArraySink((byte*)ob, BUFSIZE)));
       
while(!fsr.SourceExhausted())
       
{
            got
= fsr.Pump(BUFSIZE);
            fsr
.Flush(false);
            cout
<< "Pumped " << got << " bytes" << endl;
           
if(got == 0)
               
break;
            output
.write(ob, got);
       
}
       
        output
.close();
       
       
//Decompression
       
       
for(int x = 0; x < BUFSIZE; x++)
       
{
            ib
[x] |= 0x0;
            ob
[x] |= 0x0;
       
}
       
        cout
<< "Decript" << endl;
        output
.open("TEST/test.jpg", fstream::binary);
       
       
FileSource fir("TEST/out.a", false, new Gunzip(new ArraySink((byte*)ib, BUFSIZE)));
       
while(!fir.SourceExhausted())
       
{
            got
= fir.Pump(BUFSIZE);
            fir
.Flush(false);
            cout
<< "Pumped " << got << " bytes" << endl;
           
if(got == 0)
               
break;
            output
.write(ib, got);
       
}
       
        output
.close();
       
delete[] ib, ob;
        cout
<< "Done" << endl;

I have several problems with this code. First of all, the file is not processed entirely. Instead, one portion is processed, and then it repeats trough the whole file. This is obviously not what I want to do, I need to process the whole file, but in small chunks. Also, SourceExhausted() doesnt return true once it reaches the end of the file (thus why there is a break in this code when there shouldn't be needed).

I know there are ways to do this directly without the need of a buffer, but I need to pass it trough memory because I need to implement this somewhere else, and I need all this data to be processed first in memory, so I cant use a FileSink.

--
--
You received this message because you are subscribed to the "Crypto++ Users" Google Group.
To unsubscribe, send an email to [hidden email].
More information about Crypto++ and this group is available at http://www.cryptopp.com.
---
You received this message because you are subscribed to the Google Groups "Crypto++ Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: I need help compressing and decompressing large files using sources and gzip

Jeffrey Walton-3


On Tuesday, May 2, 2017 at 12:30:33 PM UTC-4, Aarón CdC wrote:
I found examples on the wiki about how to use crypto++ to compress and decompress files using Crypto++'s Gzip algorithm. But all examples I found process the whole file at once. Imagine I need to compress a 1GB+ file. Maybe now we have computers with large amounts of memory, but this is completely out of the question.

What I'm trying to do is to process a file using buffers to load small chunks of data trough a File Source. This is what I got so far:

    ofstream output;
   
   
//Compression
    output
.open("TEST/out.a", fstream::binary);
   
int got,ncomp,diff = 0;
       
char* ib = new char [BUFSIZE];
       
char* ob = new char [BUFSIZE];
       
       
for(int x = 0; x < BUFSIZE; x++)
       
{
            ib
[x] |= 0x0;
            ob
[x] |= 0x0;
       
}
       
       
       
       
FileSource fsr("TEST/a.jpg", false, new Gzip(new ArraySink((byte*)ob, BUFSIZE)));
       
while(!fsr.SourceExhausted())
       
{
            got
= fsr.Pump(BUFSIZE);
            fsr
.Flush(false);
            cout
<< "Pumped " << got << " bytes" << endl;
           
if(got == 0)
               
break;
            output
.write(ob, got);
       
}
       
        output
.close();
       
       
//Decompression
       
       
for(int x = 0; x < BUFSIZE; x++)
       
{
            ib
[x] |= 0x0;
            ob
[x] |= 0x0;
       
}
       
        cout
<< "Decript" << endl;
        output
.open("TEST/test.jpg", fstream::binary);
       
       
FileSource fir("TEST/out.a", false, new Gunzip(new ArraySink((byte*)ib, BUFSIZE)));
       
while(!fir.SourceExhausted())
       
{
            got
= fir.Pump(BUFSIZE);
            fir
.Flush(false);
            cout
<< "Pumped " << got << " bytes" << endl;
           
if(got == 0)
               
break;
            output
.write(ib, got);
       
}
       
        output
.close();
       
delete[] ib, ob;
        cout
<< "Done" << endl;

I have several problems with this code. First of all, the file is not processed entirely. Instead, one portion is processed, and then it repeats trough the whole file. This is obviously not what I want to do, I need to process the whole file, but in small chunks. Also, SourceExhausted() doesnt return true once it reaches the end of the file (thus why there is a break in this code when there shouldn't be needed).

I know there are ways to do this directly without the need of a buffer, but I need to pass it trough memory because I need to implement this somewhere else, and I need all this data to be processed first in memory, so I cant use a FileSink.

The basic skeleton program of manually pumping or chunking a large file can be found at https://groups.google.com/d/msg/cryptopp-users/WezFWb9XQ84/NoFUitCrDQAJ.

We really need a wiki page on the subject because it comes up on occasion for resource constrained devices. Let me put something together so we can cite a wiki page rather than old messages from the list.

Jeff

--
--
You received this message because you are subscribed to the "Crypto++ Users" Google Group.
To unsubscribe, send an email to [hidden email].
More information about Crypto++ and this group is available at http://www.cryptopp.com.
---
You received this message because you are subscribed to the Google Groups "Crypto++ Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: I need help compressing and decompressing large files using sources and gzip

Jeffrey Walton-3


On Tuesday, May 2, 2017 at 2:54:31 PM UTC-4, Jeffrey Walton wrote:
...
I have several problems with this code. First of all, the file is not processed entirely. Instead, one portion is processed, and then it repeats trough the whole file. This is obviously not what I want to do, I need to process the whole file, but in small chunks. Also, SourceExhausted() doesnt return true once it reaches the end of the file (thus why there is a break in this code when there shouldn't be needed).

I know there are ways to do this directly without the need of a buffer, but I need to pass it trough memory because I need to implement this somewhere else, and I need all this data to be processed first in memory, so I cant use a FileSink.

The basic skeleton program of manually pumping or chunking a large file can be found at <a href="https://groups.google.com/d/msg/cryptopp-users/WezFWb9XQ84/NoFUitCrDQAJ" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/msg/cryptopp-users/WezFWb9XQ84/NoFUitCrDQAJ&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/msg/cryptopp-users/WezFWb9XQ84/NoFUitCrDQAJ&#39;;return true;">https://groups.google.com/d/msg/cryptopp-users/WezFWb9XQ84/NoFUitCrDQAJ.

We really need a wiki page on the subject because it comes up on occasion for resource constrained devices. Let me put something together so we can cite a wiki page rather than old messages from the list.

Here's the wiki page on manually pumping data: https://www.cryptopp.com/wiki/Pumping_Data .

Jeff

--
--
You received this message because you are subscribed to the "Crypto++ Users" Google Group.
To unsubscribe, send an email to [hidden email].
More information about Crypto++ and this group is available at http://www.cryptopp.com.
---
You received this message because you are subscribed to the Google Groups "Crypto++ Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: I need help compressing and decompressing large files using sources and gzip

Aarón CdC
In reply to this post by Aarón CdC
I successfully managed to run a program that compresses and decompresses files using the examples in the wiki. But after looking more in depth to the examples, I noticed I still don't know how to get access to the internal buffer where the data is being stored. I tried pumping to an ArraySink. However, I get data corruptions when I do that.

    try
   
{
       
MeterFilter meter;
       
Gunzip filter;
       
       
byte* buff = new byte[4096];
       
        ofstream of
;
        of
.open("result.mp4", ofstream::out | ofstream::binary);
       
       
FileSource source("compress.bin", false);
       
ArraySink sink2(buff, 4096);
       
//FileSink sink("result.mp4");
       
        source
.Attach(new Redirector(filter));
        filter
.Attach(new Redirector(meter));
        meter
.Attach(new Redirector(sink2));
       
//meter.Attach(new Redirector(sink));
       
       
const word64 BLOCK_SIZE = 4096;
        word64 remaining
= FileSize(source);
        word64 processed
= 0;
       
       
while(remaining && !source.SourceExhausted())
       
{
           
unsigned int req = STDMIN(remaining, BLOCK_SIZE);

            source
.Pump(req);
            filter
.Flush(false);
           
            of
.write((char*)buff, req);
           
            processed
+= req;
            remaining
-= req;

           
if (processed % (1024*1024*10) == 0)
            cout
<< "Processed: " << meter.GetTotalBytes() << endl;
       
}
       
        of
.close();
       
delete[] buff;
        filter
.MessageEnd();
   
}
   
catch(const Exception& ex)
   
{
        cerr
<< ex.what() << endl;
       
return 1;
   
}
   
return 0;

The compression method I'm using pipes the data directly to a FileSink, and it works (the decompression method also works if I do that), but when I try to get the uncompressed data in memory trough an ArraySink and then pass it trough a file stream, it seems to fail at some point. I have to be able to handle the processed data after pumping and processing it, and before placing it into a file, rather than passing it directly to a file.

It seems like what I'm doing works and I get the uncompressed data, but at some point it stops storing data or the data is simply corrupted. The original file is 145.1 MiB long, the compressed file is 144.7 MiB, and the resultant uncompressed file is 144.7 MiB again. Examining the data in an hex editor shows nothing unusual in the resultant file (I can see the headers and the binary information). Using:

cmp TEST/video.mp4 result.mp4

Reports the following:

TEST/video.mp4 result.mp4 differ: byte 4097, line 19


El martes, 2 de mayo de 2017, 18:30:33 (UTC+2), Aarón CdC escribió:
I found examples on the wiki about how to use crypto++ to compress and decompress files using Crypto++'s Gzip algorithm. But all examples I found process the whole file at once. Imagine I need to compress a 1GB+ file. Maybe now we have computers with large amounts of memory, but this is completely out of the question.

What I'm trying to do is to process a file using buffers to load small chunks of data trough a File Source. This is what I got so far:

    ofstream output;
   
   
//Compression
    output
.open("TEST/out.a", fstream::binary);
   
int got,ncomp,diff = 0;
       
char* ib = new char [BUFSIZE];
       
char* ob = new char [BUFSIZE];
       
       
for(int x = 0; x < BUFSIZE; x++)
       
{
            ib
[x] |= 0x0;
            ob
[x] |= 0x0;
       
}
       
       
       
       
FileSource fsr("TEST/a.jpg", false, new Gzip(new ArraySink((byte*)ob, BUFSIZE)));
       
while(!fsr.SourceExhausted())
       
{
            got
= fsr.Pump(BUFSIZE);
            fsr
.Flush(false);
            cout
<< "Pumped " << got << " bytes" << endl;
           
if(got == 0)
               
break;
            output
.write(ob, got);
       
}
       
        output
.close();
       
       
//Decompression
       
       
for(int x = 0; x < BUFSIZE; x++)
       
{
            ib
[x] |= 0x0;
            ob
[x] |= 0x0;
       
}
       
        cout
<< "Decript" << endl;
        output
.open("TEST/test.jpg", fstream::binary);
       
       
FileSource fir("TEST/out.a", false, new Gunzip(new ArraySink((byte*)ib, BUFSIZE)));
       
while(!fir.SourceExhausted())
       
{
            got
= fir.Pump(BUFSIZE);
            fir
.Flush(false);
            cout
<< "Pumped " << got << " bytes" << endl;
           
if(got == 0)
               
break;
            output
.write(ib, got);
       
}
       
        output
.close();
       
delete[] ib, ob;
        cout
<< "Done" << endl;

I have several problems with this code. First of all, the file is not processed entirely. Instead, one portion is processed, and then it repeats trough the whole file. This is obviously not what I want to do, I need to process the whole file, but in small chunks. Also, SourceExhausted() doesnt return true once it reaches the end of the file (thus why there is a break in this code when there shouldn't be needed).

I know there are ways to do this directly without the need of a buffer, but I need to pass it trough memory because I need to implement this somewhere else, and I need all this data to be processed first in memory, so I cant use a FileSink.

--
--
You received this message because you are subscribed to the "Crypto++ Users" Google Group.
To unsubscribe, send an email to [hidden email].
More information about Crypto++ and this group is available at http://www.cryptopp.com.
---
You received this message because you are subscribed to the Google Groups "Crypto++ Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: I need help compressing and decompressing large files using sources and gzip

Jeffrey Walton-3


On Wednesday, May 10, 2017 at 11:43:00 AM UTC-4, Aarón CdC wrote:
I successfully managed to run a program that compresses and decompresses files using the examples in the wiki. But after looking more in depth to the examples, I noticed I still don't know how to get access to the internal buffer where the data is being stored. ...

You can't directly access the internal data once it enters the pipeline.

The compression method I'm using pipes the data directly to a FileSink, and it works (the decompression method also works if I do that), but when I try to get the uncompressed data in memory trough an ArraySink and then pass it trough a file stream, it seems to fail at some point. I have to be able to handle the processed data after pumping and processing it, and before placing it into a file, rather than passing it directly to a file.

I think you want either a custom Filter or a ChannelSwitch.

A custom filter will allow you to perform additional processing or transformations, just like a HexDecoder or StreamTransformationFilter. Also see https://www.cryptopp.com/wiki/Filter.

A ChannelSwitch is like the Unix tee command. It splits the output stream into two streams. Also see https://www.cryptopp.com/wiki/ChannelSwitch.

Its also worth mentioning... If you want to have two logical chains instead of one big one, then use a ByteQueue rather than an Array. We provided an example of using ByteQues recently on Stack Overflow at http://stackoverflow.com/a/42820221/608639. But I recommend the custom Filter or ChannelSwitch before using two different chains glued together.

Jeff

--
--
You received this message because you are subscribed to the "Crypto++ Users" Google Group.
To unsubscribe, send an email to [hidden email].
More information about Crypto++ and this group is available at http://www.cryptopp.com.
---
You received this message because you are subscribed to the Google Groups "Crypto++ Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: I need help compressing and decompressing large files using sources and gzip

Aarón CdC
In reply to this post by Aarón CdC
Success! I finally managed to get a copy of the buffer as I wanted by using a custom filter.

I created a custom filter called DataFilter, that would simply copy the information of the buffer into a file (like a FileSink) as follows:

//This is just the "UselessFilter" from the wiki, slightly modified to act as some kind of "manual" FileSink
class DataFilter : public Filter
{
public:
   
DataFilter(BufferedTransformation* attachment = NULL) : Filter(attachment)  {};
   
    size_t
Put2(const byte* inString, size_t length, int messageEnd, bool blocking)
   
{
        of
.write((char*)inString, length); //<- inString is the buffer I needed
       
return AttachedTransformation()->Put2(inString, length, messageEnd, blocking );
   
}
   
   
bool IsolatedFlush(bool hardFlush, bool blocking)
   
{
       
return false;
   
}
   
};

Just to check if I could get a copy of the file I was decompressing, in which case it would be a success. In my decompressing function, I created an instance of this DataFilter, and attached it to my chain of filters.

DataFilter dat;        

//...

source
.Attach(new Redirector(filter));
filter
.Attach(new Redirector(meter));
meter
.Attach(new Redirector(dat));
dat
.Attach(new Redirector(sink)); //<- Also I dont need this sink anymore, but I will do it only as a test

I opened the file stream beforehand, of course, with a file called "copy.mp4" in output, binary mode, and then closed it after the pump loop. After running the program, I successfully got a file named copy.mp4 that I can play using VLC, as well as the result.mp4 file resultant from the pump.

Thank you for your time and help. Very much appreciated.

El martes, 2 de mayo de 2017, 18:30:33 (UTC+2), Aarón CdC escribió:
I found examples on the wiki about how to use crypto++ to compress and decompress files using Crypto++'s Gzip algorithm. But all examples I found process the whole file at once. Imagine I need to compress a 1GB+ file. Maybe now we have computers with large amounts of memory, but this is completely out of the question.

What I'm trying to do is to process a file using buffers to load small chunks of data trough a File Source. This is what I got so far:

    ofstream output;
   
   
//Compression
    output
.open("TEST/out.a", fstream::binary);
   
int got,ncomp,diff = 0;
       
char* ib = new char [BUFSIZE];
       
char* ob = new char [BUFSIZE];
       
       
for(int x = 0; x < BUFSIZE; x++)
       
{
            ib
[x] |= 0x0;
            ob
[x] |= 0x0;
       
}
       
       
       
       
FileSource fsr("TEST/a.jpg", false, new Gzip(new ArraySink((byte*)ob, BUFSIZE)));
       
while(!fsr.SourceExhausted())
       
{
            got
= fsr.Pump(BUFSIZE);
            fsr
.Flush(false);
            cout
<< "Pumped " << got << " bytes" << endl;
           
if(got == 0)
               
break;
            output
.write(ob, got);
       
}
       
        output
.close();
       
       
//Decompression
       
       
for(int x = 0; x < BUFSIZE; x++)
       
{
            ib
[x] |= 0x0;
            ob
[x] |= 0x0;
       
}
       
        cout
<< "Decript" << endl;
        output
.open("TEST/test.jpg", fstream::binary);
       
       
FileSource fir("TEST/out.a", false, new Gunzip(new ArraySink((byte*)ib, BUFSIZE)));
       
while(!fir.SourceExhausted())
       
{
            got
= fir.Pump(BUFSIZE);
            fir
.Flush(false);
            cout
<< "Pumped " << got << " bytes" << endl;
           
if(got == 0)
               
break;
            output
.write(ib, got);
       
}
       
        output
.close();
       
delete[] ib, ob;
        cout
<< "Done" << endl;

I have several problems with this code. First of all, the file is not processed entirely. Instead, one portion is processed, and then it repeats trough the whole file. This is obviously not what I want to do, I need to process the whole file, but in small chunks. Also, SourceExhausted() doesnt return true once it reaches the end of the file (thus why there is a break in this code when there shouldn't be needed).

I know there are ways to do this directly without the need of a buffer, but I need to pass it trough memory because I need to implement this somewhere else, and I need all this data to be processed first in memory, so I cant use a FileSink.

--
--
You received this message because you are subscribed to the "Crypto++ Users" Google Group.
To unsubscribe, send an email to [hidden email].
More information about Crypto++ and this group is available at http://www.cryptopp.com.
---
You received this message because you are subscribed to the Google Groups "Crypto++ Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Loading...