@Internal public class ContinuousFileReaderOperator<OUT,T extends TimestampedInputSplit> extends AbstractStreamOperator<OUT> implements OneInputStreamOperator<T,OUT>, OutputTypeConfigurable<OUT>
splits received from the preceding
ContinuousFileMonitoringFunction. Contrary to the ContinuousFileMonitoringFunction which has a parallelism of 1, this operator can have DOP > 1.
This implementation uses MailboxExecutor to execute each action and state machine
approach. The workflow is the following:
IDLE
OPENING
and enqueue a mail to process
it
READING, read one record, re-enqueue self
IDLE
On close:
IDLE then close immediately
CLOSING, call yield in a loop until state is CLOSED
yield() causes remaining records (and splits) to be
processed in the same way as above
Using MailboxExecutor allows to avoid explicit synchronization. At most one mail
should be enqueued at any given time.
Using FSM approach allows to explicitly define states and enforce transitions between them.
chainingStrategy, config, latencyStats, metrics, output, processingTimeService| Modifier and Type | Method and Description |
|---|---|
void |
close()
This method is called after all records have been added to the operators via the methods
Input.processElement(StreamRecord), or TwoInputStreamOperator.processElement1(StreamRecord) and TwoInputStreamOperator.processElement2(StreamRecord). |
void |
dispose()
This method is called at the very end of the operator's life, both in the case of a
successful completion of the operation, and in the case of a failure and canceling.
|
void |
initializeState(org.apache.flink.runtime.state.StateInitializationContext context)
Stream operators with state which can be restored need to override this hook method.
|
void |
open()
This method is called immediately before any elements are processed, it should contain the
operator's initialization logic, e.g.
|
void |
processElement(StreamRecord<T> element)
Processes one element that arrived on this input of the
MultipleInputStreamOperator. |
void |
processWatermark(Watermark mark)
Processes a
Watermark that arrived on the first input of this two-input operator. |
void |
setOutputType(org.apache.flink.api.common.typeinfo.TypeInformation<OUT> outTypeInfo,
org.apache.flink.api.common.ExecutionConfig executionConfig)
Is called by the
org.apache.flink.streaming.api.graph.StreamGraph#addOperator(Integer,
String, StreamOperator, TypeInformation, TypeInformation, String) method when the StreamGraph is generated. |
void |
snapshotState(org.apache.flink.runtime.state.StateSnapshotContext context)
Stream operators with state, which want to participate in a snapshot need to override this
hook method.
|
getChainingStrategy, getContainingTask, getCurrentKey, getExecutionConfig, getInternalTimerService, getKeyedStateBackend, getKeyedStateStore, getMetricGroup, getOperatorConfig, getOperatorID, getOperatorName, getOperatorStateBackend, getOrCreateKeyedState, getPartitionedState, getPartitionedState, getProcessingTimeService, getRuntimeContext, getTimeServiceManager, getUserCodeClassloader, initializeState, isUsingCustomRawKeyedState, notifyCheckpointAborted, notifyCheckpointComplete, prepareSnapshotPreBarrier, processLatencyMarker, processLatencyMarker1, processLatencyMarker2, processWatermark1, processWatermark2, reportOrForwardLatencyMarker, setChainingStrategy, setCurrentKey, setKeyContextElement1, setKeyContextElement2, setProcessingTimeService, setup, snapshotStateclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitsetKeyContextElementgetMetricGroup, getOperatorID, initializeState, prepareSnapshotPreBarrier, setKeyContextElement1, setKeyContextElement2, snapshotStatenotifyCheckpointAborted, notifyCheckpointCompletegetCurrentKey, setCurrentKeyprocessLatencyMarkerpublic void initializeState(org.apache.flink.runtime.state.StateInitializationContext context)
throws Exception
AbstractStreamOperatorinitializeState in interface StreamOperatorStateHandler.CheckpointedStreamOperatorinitializeState in class AbstractStreamOperator<OUT>context - context that allows to register different states.Exceptionpublic void open()
throws Exception
AbstractStreamOperatorThe default implementation does nothing.
open in interface StreamOperator<OUT>open in class AbstractStreamOperator<OUT>Exception - An exception in this method causes the operator to fail.public void processElement(StreamRecord<T> element) throws Exception
InputMultipleInputStreamOperator.
This method is guaranteed to not be called concurrently with other methods of the operator.processElement in interface Input<T extends TimestampedInputSplit>Exceptionpublic void processWatermark(Watermark mark) throws Exception
InputWatermark that arrived on the first input of this two-input operator.
This method is guaranteed to not be called concurrently with other methods of the operator.processWatermark in interface Input<T extends TimestampedInputSplit>processWatermark in class AbstractStreamOperator<OUT>ExceptionWatermarkpublic void dispose()
throws Exception
AbstractStreamOperatorThis method is expected to make a thorough effort to release all resources that the operator has acquired.
dispose in interface StreamOperator<OUT>dispose in interface org.apache.flink.util.Disposabledispose in class AbstractStreamOperator<OUT>Exceptionpublic void close()
throws Exception
AbstractStreamOperatorInput.processElement(StreamRecord), or TwoInputStreamOperator.processElement1(StreamRecord) and TwoInputStreamOperator.processElement2(StreamRecord).
The method is expected to flush all remaining buffered data. Exceptions during this flushing of buffered should be propagated, in order to cause the operation to be recognized asa failed, because the last data items are not processed properly.
close in interface StreamOperator<OUT>close in class AbstractStreamOperator<OUT>Exception - An exception in this method causes the operator to fail.public void snapshotState(org.apache.flink.runtime.state.StateSnapshotContext context)
throws Exception
AbstractStreamOperatorsnapshotState in interface StreamOperatorStateHandler.CheckpointedStreamOperatorsnapshotState in class AbstractStreamOperator<OUT>context - context that provides information and means required for taking a snapshotExceptionpublic void setOutputType(org.apache.flink.api.common.typeinfo.TypeInformation<OUT> outTypeInfo, org.apache.flink.api.common.ExecutionConfig executionConfig)
OutputTypeConfigurableorg.apache.flink.streaming.api.graph.StreamGraph#addOperator(Integer,
String, StreamOperator, TypeInformation, TypeInformation, String) method when the StreamGraph is generated. The method is called with the
output TypeInformation which is also used for the StreamTask output serializer.setOutputType in interface OutputTypeConfigurable<OUT>outTypeInfo - Output type information of the StreamTaskexecutionConfig - Execution configurationCopyright © 2014–2021 The Apache Software Foundation. All rights reserved.