I have a number of Parquet files in our data warehouse. Some of the earlier files ~700 have a Schema type for a column set to string when they should have been int32. Understanding Parquet are immutable; I'm looking for the best way to re-write these files with the correct column type. I am using Ruby with the red-parquet gem.
I have tried to cast the column to int and then save the file to a new location. It doesn't error but doesn't work. I've outlined the method I'm using below. Any help would be much appreciated.
def castCol(col = nil) filesWritten = 0 getParquets.each do |file| table = Arrow::Table.load(file) if table.heading.data_type == "string" newFileLoc = @saveDir + File.path(file) puts newFileLoc # Create Dir if Required unless File.directory?(File.dirname(newFileLoc)) FileUtils.mkdir_p(File.dirname(newFileLoc)) end table.heading.cast('int32') table.save(newFileLoc) filesWritten += 1 end end puts "Numebr of File Written: #{filesWritten}"end