Spark: Cannot read or write UUID columns

Because of the String -> Fixed Binary conversion the readers and writers are both incorrect.

The vectorized reader initializes a FixedBinary reader on a column we report is a String causing an unsupported reader exception.

```java
java.lang.UnsupportedOperationException: Unsupported type: UTF8String
	at org.apache.iceberg.arrow.vectorized.ArrowVectorAccessor.getUTF8String(ArrowVectorAccessor.java:82)
	at org.apache.iceberg.spark.data.vectorized.IcebergArrowColumnVector.getUTF8String(IcebergArrowColumnVector.java:140)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.sort_addToSorter_0$(Unknown Sour
```
	
The writer is broken because it gets String Columns from Spark but needs to write fixed binary.

Something like this needed as a fix
```java
  private static PrimitiveWriter<UTF8String> uuids(ColumnDescriptor desc) {
    return new UUIDWriter(desc);
  }

  private static class UUIDWriter extends PrimitiveWriter<UTF8String> {
    private ByteBuffer buffer = ByteBuffer.allocate(16);

    private UUIDWriter(ColumnDescriptor desc) {
      super(desc);
    }

    @Override
    public void write(int repetitionLevel, UTF8String string) {
      UUID uuid = UUID.fromString(string.toString());
      buffer.rewind();
      buffer.putLong(uuid.getMostSignificantBits());
      buffer.putLong(uuid.getLeastSignificantBits());
      buffer.rewind();
      column.writeBinary(repetitionLevel, Binary.fromReusedByteBuffer(buffer));
    }
  }

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Spark: Cannot read or write UUID columns #4581

6 remaining items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Spark: Cannot read or write UUID columns #4581

Description

Activity

RussellSpitzer commented on Apr 18, 2022

RussellSpitzer commented on Apr 18, 2022

pvary commented on Apr 26, 2022

RussellSpitzer commented on Apr 26, 2022

wobrycki commented on Jul 15, 2022

RussellSpitzer commented on Jul 15, 2022

github-actions commented on Jan 23, 2023

wobrycki commented on Mar 20, 2023

RussellSpitzer commented on Mar 20, 2023

6 remaining items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions