vendredi 2 février 2018

Trying to unzip a 600mb tgz with ruby gives out of integer range error

Trying to untar a tgz file... with the following code:

tar_extract.each do |entry|
  entry_filename = File.basename(entry.full_name)
  next if entry.directory? # don't unzip directories
  next if !entry.file? # if it's not a file skip  
  next if entry.full_name.starts_with?('/') # another check

  file_path = File.join(working_directory, entry_filename)
  puts "Writing file: #{file_path}"

  File.open(file_path, 'wb') do |f|
    f.write(entry.read)
  end

  bytes = File.size(file_path)

  puts "Successfully wrote file with #{bytes} bytes"
end

tar_extract.close

This code usually works successfully, however when the file within the TGZ is too big, I get a integer out of range error.

Writing file: /files/working_dir/test1.tar.gz  
Successfully wrote file with 244704472 bytes 

Writing file: /files/working_dir/test2.sql
RangeError: integer 2556143960 too big to convert to `int'
from /usr/local/rvm/rubies/ruby-2.1.1/lib/ruby/site_ruby/2.1.0/rubygems/package/tar_reader/entry.rb:126:in `read'

I'm not sure what else I should try.

Looking at the ruby source, this is the code block:

  ##
  # Reads +len+ bytes from the tar file entry, or the rest of the entry if
  # nil

  def read(len = nil)
    check_closed

    return nil if @read >= @header.size

    len ||= @header.size - @read
    max_read = [len, @header.size - @read].min

    ret = @io.read max_read
    @read += ret.size

    ret
  end

Aucun commentaire:

Enregistrer un commentaire