Quantcast
Channel: Active questions tagged ruby - Stack Overflow
Viewing all articles
Browse latest Browse all 4634

Modify column type in Parquet file with ruby (using parquet Gem)

$
0
0

I have a number of Parquet files in our data warehouse. Some of the earlier files ~700 have a Schema type for a column set to string when they should have been int32. Understanding Parquet are immutable; I'm looking for the best way to re-write these files with the correct column type. I am using Ruby with the red-parquet gem.

I have tried to cast the column to int and then save the file to a new location. It doesn't error but doesn't work. I've outlined the method I'm using below. Any help would be much appreciated.

def castCol(col = nil)  filesWritten = 0  getParquets.each do |file|    table = Arrow::Table.load(file)    if table.heading.data_type == "string"      newFileLoc = @saveDir + File.path(file)      puts newFileLoc      # Create Dir if Required      unless File.directory?(File.dirname(newFileLoc))        FileUtils.mkdir_p(File.dirname(newFileLoc))      end      table.heading.cast('int32')      table.save(newFileLoc)      filesWritten += 1    end  end  puts "Numebr of File Written: #{filesWritten}"end

Viewing all articles
Browse latest Browse all 4634

Latest Images

Trending Articles



Latest Images

<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>