Skip to content

Optimizing KyroCoder in beam backend #1955

Open
@nownikhil

Description

@nownikhil

Current implementation of KryoCoder writes class for every object on the output stream. (

)
This was done because beam can split the stream in between and if registration is only in the beginning of the stream, the latter part of the stream will fail. However we don't want to write className for classes which are already registered.

We can set setRegistrationRequired(true) when creating the Instantiator (

implicit val kryoCoder: KryoCoder = new KryoCoder(defaultKryoCoderConfiguration(config))
).

Then in KryoCoder we can keep a mapping of classes which have registration available (We can do a Try {pool.hasRegistration} and save the output in a map for future) and for those we use kryoPool.toBytesWithoutClass and for others we do kryoPool.toBytesWithClass

Is there a better way to achieve this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions