Data processing processes
You do not need to know programming languages to create processing processes on your own and extract from that data most important to you and your production plant. The ETL (Extraction, Transformation and Load) wizard will help you, in which you can create process on your own (or with the support of our experts) using elements (the so-called processing nodes) that operate on the basis of drag & drop. You can also use ready-made templates built into the IPLAS system.
After starting the process from the incoming industrial data streams, you will receive what is most important to you – knowledge that will allow you to improve production in your plant. This is where you can create an individual alarm and notification system, verify and aggregate data. What if at some point you find that you would like to analyse something else in one process? Absolutely no problem! You can flexibly modify it. See how it works.
Example of building ETL process
See how we have built a simple ETL process that generates and consecutively saves to the database the totals of manufactured items as part of a shift for a specific machine.
A few words of explanation for the video above:
- Input data
- Used nodes and their application
Variable with the number of items produced in certain periods.
Node 1. Input channel, specifies the input data and the source thereof, transfers the variables on the basis of which the rest of the processing is created.
Node 2. Time stamp, the user specifies additional variables based on the frame generation time. The node adds time variables to the processing.
Node 3. Work shifts, the working hours of individual shifts are determined therein, and the result variables that relate to them are selected. Generates new variables that allow you to specify change data.
Node 4. Grouping, selecting the type of aggregation and grouping variables that divide data into certain groups, e.g. the sum of variable values for the given shift numbers.
Node 5. Output table, here you define the database, destination and assign variables to specific columns. Based on the data stored in the database, analyses are generated.
The result is the processed values in the database that are needed to create the analysis. The data presents the sum of the items produced for each shift.
The data generated by the ETL process can be presented in the analysis, which was shown in the video.
In the graphic ETL wizard, you select and then drag elements to the workspace from which you create your own data processing. All these “blocks”, “squares” or whatever else you define them, are called processing nodes. They can be divided into 4 categories by action: input nodes, filters, transforms, and output nodes. Below you will find a description, available options and an example of how to use each of them. If something is still unclear, please do not hesitate to write to us: firstname.lastname@example.org.
Input nodes (Input channel) are responsible for introducing input variables to the processing process.
The “Input Channel” node is used to introduce frames with variables to the processing. It allows you to select the installation, channel and specific variables. The node is required for all processing.
EXAMPLE OF APPLICATION:
In order to correctly configure the “Input channel” node, in the first step, select the installation, in this case “Production_lines” and the channel, e.g. “Line_no._1”. Then it is required to select the variables needed by the user in the further processing, here – Pieces and Product_code. After saving the configuration, the frames with the variables Pieces and Product_code will be sent to the next node.
|Installations||Logical mapping of the location where connections are established between a data source (e.g. a PLC controller).|
|Channel||Connection between the data source and the installation on the IPLAS server through which the data is transferred.|
|Frames||Not listed in the node. The data is sent for processing using frames containing variables with specific values.|
|Variables||Processing elements that store specific values. Frame components.|
Filters (Last Frames, Required Values) allow you to keep only the current data, optimize processing.
The “Required” filter allows you to specify the variables that determine the passing of data to subsequent nodes.
Any data frame with missing values in the required columns (in any specified variable) will be rejected by the filter. This denotes that the entire frame will not be passed on to the next nodes.
In the node configuration it is possible to reverse the described operation. In this case, frames in which no values have been specified for the required variables will be forwarded to the next processing nodes.
EXAMPLE OF APPLICATION:
If we set the required variables A and B in the filter, then any frame that will not have the specified value of A or B will be rejected by the filter. Its value or its absence in the C variable will not affect the filter performance.
Turning on the reverse mode will cause that each frame without the specified value of A and B will be transferred to the next node. Otherwise, it will not be passed on. Its value or its absence in the C variable will not affect the filter performance.
|Input variables||If we select a variable that will send data, the filter will allow all other variables to pass to the next block. Otherwise, it will not miss any value.|
|Reverse mode enabled||A checkbox used to reverse the situation – the block will forward the variables only if the selected variable has no value.|
Transformations (Grouping, Determining Increments), process data, generate new variables on the basis of input variables.
The “HeartBeat Generator” node is used to maintain information about the processing activity. In the case where data frames appear infrequently, the node generates frames with no variable values. The configuration should define the delay – the time interval between the present time and the moment of generating the frame, the interval – the time interval between the generated frames, the unit for which the delay was defined, and the interval. Additionally, you can select the “Adaptive delay” option, which will allow you to automatically adjust the delay size based on the data about the frequency of appearance of subsequent data frames, the delay will never be smaller than that specified by the user.
“Failmode” – In the event that frames containing data arrive with a delay greater than the predicted delay for generated heartbeats (empty frames), processing will stop. With this option disabled, the processing will continue, but the frame generation times will be artificially changed in order to keep the chronology of the readings.
EXAMPLE OF APPLICATION:
In the example shown, a Delay of 5, an interval of 1 and a unit of minute are defined. This denotes that when no frame has appeared in the last five minutes, the node will generate an empty frame with the generation time delayed five minutes from the current time. It will do this every minute, unless a data frame is provided for processing. Then the cycle will start all over again.
Checked “Adaptive delay” option will affect the time of generating frames without value. As an example, empty frames will start to appear no earlier than 5 minutes after the last frame containing a value appears. The function will cause that in case of greater delay in data delivery, the waiting time for late frames will be longer, then when they start arriving in a shorter time, the adaptive delay will gradually shorten to the user-defined delay value.
If the “failmode” option is selected, the processing will stop if there is a problem with determining the chronology of data frames and empty frames. Such a situation may arise when, for some reason, data frames are delivered with a delay greater than the scheduled empty frame generation delay time.
|Delay||Integer, the time at which empty frames are to be generated relative to the current time, the time unit is selected in the unit section.|
|Adaptive delay||Functionality that allows to automatically extend or shorten the delay in relation to the current time, depending on the delay with which the frames are delivered. This delay will never be less than the user-defined delay.|
|Interval||The time interval between generated frames, the time unit is selected in the unit section.|
|Failmode||It stops processing if there is a problem with determining the chronology of the frames.|
|Unit||The unit of time within which the interval and delay were given. The available units are: millisecond, second, minute, hour, day.|
Cycle time intervals
Result nodes (Output table) are used to save the results of the processing.
The “Output table” node is one of the two exit nodes. It allows to select the database where the frames with variables are supposed to go.
The user has both an internal and external database at his/her disposal. In the first case, before starting the creation of the processing process, a destination for saving variables should be created or imported. However, when choosing an external database, remember to create an appropriate table (in the selected database) to which the frames with data will be sent. After selecting the destination, the user selects the variables to be saved and the columns to which they will be assigned. A variable can be marked as a key variable – the values of other variables will be updated for the same values of the key variable (e.g. the same month name or the same production line).
EXAMPLE OF APPLICATION:
In the given example, 3 variables (l_szt, code, date) will be entered into the database. They can be found in the analyses by selecting the destination Production_Katowice. The lines will be updated based on the product code and date. As a result, lines will be created with up-to-date information about products of a given type, manufactured on a given day (for each pair of different values of the variables, code and date, we will get a separate line). If the key variables were not indicated, we would get as many lines as there were saved frames. In the example, the “Quarter” data range has been selected. In the case of continuous processing, frames with data go to the database on a regular basis. However, if the processing is stopped and restarted, e.g. after an hour, only data from the last 15 minutes from the time of the last frame arrival will be taken into account.
|Input variables||The delivered frames will go to the selected destination on the database.|
|Destination||The database table to which the frames with the variables will go.|
|Scope of data||In the event that old, irrelevant data is provided for the processing, it can be filtered by selecting the interesting period.|
|Key variable||A variable within which values are counted.|
|Data type||Variable Data Type – must match the column data type on the target.|
|Variables saved to the destination||Table with the variables supplied to the result set during processing.|