Skip to content

dataset sink support non-avro #191

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 387 commits into
base: flume-1.6
Choose a base branch
from
Open

dataset sink support non-avro #191

wants to merge 387 commits into from

Conversation

wtx626
Copy link

@wtx626 wtx626 commented Dec 25, 2017

No description provided.

harishreedharan and others added 30 commits September 16, 2014 21:24
(Frank Yao, Ashish Paliwal, Gwen Shapira via Hari)
(Thilina Buddika, Gwen Shapira via Hari)
…es in the last commit, adding them.

(Gwen Shapira via Hari)
(Gwen Shapira via Hari)
(Santiago M. Mola via Hari)
…nt name

(Ashish Paliwal via Jarek Jarcec Cecho)
…void a race condition with forced checkpoint.

(Johny Rufus via Hari)
…ng files each time.

(Prateek Rungta via Hari)
…tocols.

(Hari Shreedharan via Jarek Jarcec Cecho)
(Hari Shreedharan via Jarek Jarcec Cecho)
(Gwen Shapira via Hari)
(Hari Shreedharan via Jarek Jarcec Cecho)
Attila Simon and others added 27 commits December 6, 2016 18:17
…y on channel exception

This patch improves rollbacks for the sequence source.
Also, it updates tests and user documentation accordingly.

This closes apache#90

Reviewers: Denes Arvay, Jeff Holoman, Bessenyei Balázs Donát

(Attila Simon via Bessenyei Balázs Donát)
Flume does not currently support environment variable interpolation in the properties file configuration.

Enabling it helps with:
* removing security credentials from config files
* copy-pastes in configuration files when defining multiple agents

This closes apache#97

Reviewers: Lior Zeno, Jeff Holoman, Shang Wu

(Bessenyei Balázs Donát via Bessenyei Balázs Donát)
This patch adds an HTTP sink. Supports configurable backoff & retry for different HTTP status codes, and instrumentation.

This closes apache#84

Reviewers: Shang Wu, Jeff Holoman, Bessenyei Balázs Donát

(Ben Wheeler via Bessenyei Balázs Donát)
These simple scripts automate creation and signing of the release
artifacts. They help guarantee that the source artifacts published match
the Git tag.

Updated the How To Release documentation to explain how to use the
scripts.

This closes apache#98

Reviewers: Denes Arvay, Bessenyei Balázs Donát
It was reported that the HDFS sink had a bug where file rotation was not
reliable in secure mode.

After investigating, it turns out that this was caused by a bug in the
FlumeAuthenticator code: A "try" block in UGIExecutor.execute() was
wrapping exceptions (such as IOException) with a SecurityException.

That exception wrapping was breaking the contract of BucketWriter
because the caller (HDFSEventSink) did not understand how to react to
the SecurityException. This also likely had other negative effects in
exceptional cases.

The following changes are included in this patch:

* Remove the exception wrapping in UGIExecutor.execute().
* Add tests for exception propagation in FlumeAuthenticator
  implementations.
* Add testRotateBucketOnIOException() to TestBucketWriter as a
  regression test for the HDFS sink issue.

Closes apache#106.

Reviewers: Attila Simon, Mike Percy

(Denes Arvay via Mike Percy)
Some versions of HDFS and implementations of FileSystem do not implement
the isFileClosed() method.

We perform a check for that method each time a new file is opened. This
patch lowers the severity of the message printed when we detect that the
method is not present from LOG.warn() to LOG.info().

We also remove the stack trace from the log message.

(Ping Wang via Mike Percy)
… user guide

This closes apache#89

Reviewers: Denes Arvay, Jeff Holoman

(Attila Simon via Bessenyei Balázs Donát)
The flume-checkstyle module doesn't currently declare a parent pom. This
will be a problem when we try to deploy our artifacts because we are
missing a distributionManagement section in the flume-checkstyle module.
The deploy will fail with an error message that looks something like:

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-deploy-plugin:2.7:deploy (default-deploy)
on project flume-checkstyle: Deployment failed: repository element was not specified in the POM inside
distributionManagement element or in -DaltDeploymentRepository=id::layout::url parameter

This patch should solve the problem.

Reviewers: Attila Simon, Denes Arvay, Bessenyei Balázs Donát

Closes apache#109.
If the HDFS Sink tries to close a file but it fails (e.g. due to timeout) the last block might
not end up in COMPLETE state. In this case block recovery should happen but as the lease is
still held by Flume the NameNode will start the recovery process only after the hard limit of
1 hour expires.

This change adds an explicit recoverLease() call in case of close failure.

This closes apache#127

Reviewers: Hari Shreedharan

(Denes Arvay via Bessenyei Balázs Donát)
This closes apache#126

Reviewers: Denes Arvay, Bessenyei Balázs Donát

(Marcell Hegedus via Bessenyei Balázs Donát)
When logging level is set to DEBUG, Kafka Sink and Kafka Channel may throw a NullPointerException.

This patch ensures that `metadata` is not null to avoid the exception.

This closes apache#125

Reviewers: Denes Arvay, Bessenyei Balázs Donát

(loleek via Bessenyei Balázs Donát)
…r Source

This patch addresses an edge case of the Taildir Source wherein it can miss
reading events written in the same second as the file closing.

This closes apache#128

Reviewers: Satoshi Iijima, Bessenyei Balázs Donát

(eskrm via Bessenyei Balázs Donát)
…d to data loss

This commit fixes the issue when in HDFSEventSink.process() a BucketWriter.append()
call threw a BucketClosedException then the newly created BucketWriter wasn't
flushed after the processing loop.

This closes apache#129

Reviewers: Attila Simon, Mike Percy

(Denes Arvay via Mike Percy)
This patch adds the following new metrics to the FileChannel's counters:
- eventPutErrorCount: incremented if an IOException occurs during put operation.
- eventTakeErrorCount: incremented if an IOException or CorruptEventException occurs
  during take operation.
- checkpointWriteErrorCount: incremented if an exception occurs during checkpoint write.
- unhealthy: this flag represents whether the channel has started successfully
  (i.e. the replay ran without any problem), so the channel is capable for normal operation
- closed flag: the numeric representation (1: closed, 0: open) of the negated open flag.

Closes apache#131.

Reviewers: Attila Simon, Mike Percy

(Denes Arvay via Mike Percy)
…Sink

This patch adds the ability of header substitution n Kafka Sink's
kafka.topic configuration variable.

This closes apache#137.

Reviewers: Denes Arvay

(Takafumi Saito via Denes Arvay)
- Removed the unsupported PermSize and MaxPermSize settings from .travis.yml
- Updated DEVNOTES, README and Flume User Guide
- Removed the maven-compiler-plugin from the taildir-source subproject
- Changed the sourceJavaVersion and targetJavaVersion to 1.8 in the root pom.xml

(Lior Zeno via Denes Arvay)
Log4jAppender and LoadBalancingLog4jAppender resolve local hosts address at startup and
add it to each event's header with the key "flume.client.log4j.address".

This closes apache#121.

(Andras Beni via Denes Arvay)
JMSSource has created only nondurable subscriptions which could lead to event loss in case
of topic destination type.

This change enables durable subscription creation and lets user specify client id.
Also removed JMSMessageConsumerFactory which has no additional value.

This closes apache#120.

Reviewers: Attila Simon, Denes Arvay

(Andras Beni via Denes Arvay)
This commit changes the Travis CI build config to use Java 8.

This closes apache#142.

Reviewers: Denes Arvay

(Attila Simon via Denes Arvay)
Cleanup after Netty initialisation fails (call this.stop())

- Make sure this.stop() releases the resources and end up the component in
  a LifecycleAware.STOPPED state
- Added junit test to cover the invalid host scenario
- Added junit test to cover the used port scenario

This closes apache#141.

Reviewers: Denes Arvay

(Attila Simon via Denes Arvay)
This patch fixes the issue in NetcatSource which occurs if there is a problem
while binding the channel's socket to a local address and leads to a file descriptor
(socket) leak.

Reviewers: Attila Simon, Denes Arvay

(Siddharth Ahuja via Denes Arvay)
This patch adds a netcat UDP source.

Reviewers: Lior Zeno, Chris Horrocks, Bessenyei Balázs Donát

(Tristan Stevens via Bessenyei Balázs Donát)
Update Developer Guide with notes on how to upgrade Protocol Buffer
version.

Reviewers: Ashish Paliwal, Attila Simon

(Roshan Naik via Denes Arvay)
- Make avro ip filter tests more reliable by checking whether the
  caught exception is really what the test expected
- Use lambda instead of anonymous classes to make the code shorter

This closes apache#143.

Reviewers: Denes Arvay

(Attila Simon via Denes Arvay)
After a bad response, connection.getInputStream() returns null.
This patch adds a check for this.

This closes apache#139

Reviewers: Bessenyei Balázs Donát

(filippovmn via Bessenyei Balázs Donát)
Flume user guide does not specify whether a value in event header could be null or not.
Given an external system generating events which header values can be null and a user configures
Flume with Memory Channel then he will have no trouble.
Later on when the user changes Memory Channel to File Channel then Flume will fail with NPE.
It is because FC is serializing events with protocol buffer and header values are defined as
required in the proto file.
In this patch I have changed the value field to optional. However protocol buffer does not have
a notation for null and setting a field to null raises NPE again. Added a null check before
serialization to prevent this.
There is on caveat: When an optional field is not set, at deserialization it will be set to a
default value: in this case it will be empty string.

Reviewers: Miklos Csanady

(Marcell Hegedus via Denes Arvay)
…eringInterceptor

- Use RegexFilteringInterceptor.class in LoggerFactory.getLogger() call
- Fix the Javadoc of the RegexFilteringInterceptor.Builder class

This closes apache#148

Reviewers: Attila Simon, Marcell Hegedus

(Peter Chen via Denes Arvay)
Copy link
Contributor

@szaboferee szaboferee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @wtx626

your target branch is a branch that holds the final state of the 1.6 release.

if you would like to add a new feature , please open a pull request to the trunk branch and please make sure that you include the JIRA ticket number in the title

JIRA: https://issues.apache.org/jira/projects/FLUME/issues

@asfgit
Copy link

asfgit commented Aug 17, 2018

Can one of the admins verify this patch?

waidr pushed a commit to waidr/flume that referenced this pull request Jul 24, 2019
[PILIVDN-7372]change daily fetch time
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.