Apache Nifi Cloudera Manager Integration

Apache Nifi is one of the application to manage multiple system ‘dataflow’, However Apache Nifi is not released any parcels for Cloudera distribution to include into Cloudera Manager.  Apache Nifi Cloudera Manager Integration process step by step guide.

Video Tutorial

What is Apache NiFi?

Put simply NiFi was built to automate the flow of data between systems. While the term ‘dataflow’ is used in a variety of contexts, we use it here to mean the automated and managed flow of information between systems. This problem space has been around ever since enterprises had more than one system, where some of the systems created data and some of the systems consumed data.

Apache Nifi Cloudera Manager Prerequisites

  • Maven Version should be more than 3.1.x (I used Apache Maven 3.5.4)
  • Java 8.x must be installed
  • Python Version must be 2.7.x or above
  • Git package must be installed

Note: Apache Nifi parcel can’t be supported for RHEL7 / Centos7

# yum install maven-*
# yum install git

Install Apache Nifi Prerequisites

cd /tmp
git clone https://github.com/cloudera/cm_ext
cd cm_ext/validator
mvn install

This process will take some time to complete after completion you have to follow below steps in order to create apache nifi package

cd /tmp
git clone http://github.com/apache/nifi
cd /tmp/nifi

Now edit pom.xml file to avoid test failures like below

Example of Maven build failure

nifi-lookup-services ............................... SUCCESS [ 3.845 s]
nifi-standard-utils ................................ SUCCESS [ 0.950 s]
nifi-standard-processors ........................... FAILURE [04:30 min]
nifi-standard-reporting-tasks ...................... SKIPPED
nifi-standard-content-viewer ....................... SKIPPED

 

<artifactId>maven-surefire-plugin</artifactId>
<configuration>
<skipTests>true</skipTests>

and add below entry as well

<properties>
 <maven.test.skip>true</maven.test.skip>
<!-- Vendor-specific version number included here as default,
should be overridden on the command-line <hadoop.version>2.7.1.2.4.0.0-169</hadoop.version> -->
</properties>

Create Apache Nifi Parcel for Cloudera

cd /tmp/nifi
mvn -T 2.0C clean install

Process will take long time to complete, wait and see until apache nifi parcel completes

[INFO7] dockerhub 1.8.0-SNAPSHOT ........................... SUCCESS [ 1.505 s]
[INFO6] ------------------------------------------------------------------------
[INFO5] BUILD SUCCESS
[INFO4] ------------------------------------------------------------------------
[INFO3] Total time: 09:15 min (Wall Clock)
[INFO2] Finished at: 2018-10-08T14:42:07+05:30
[INFO1] ------------------------------------------------------------------------

Download parcel build script

$ cd /tmp
$ git clone http://github.com/prateek/nifi-parcel
$ cd nifi-parcel
$ POINT_VERSION=5 VALIDATOR_DIR=/tmp/cm_ext ./build-parcel.sh /tmp/nifi/nifi-assembly/target/nifi-*-SNAPSHOT-bin.tar.gz
$ VALIDATOR_DIR=/tmp/cm_ext ./build-csd.sh

If you receive any error like below while generating parcel package modify the build script

nifi-1.8.0-SNAPSHOT/docs/html/images/zero-master-node.png
sed: can't read s/<VERSION-FULL>/0.0.5.nifi.p0.5/g: No such file or directory

vi /tmp/nifi-parcel/build-parcel.sh

From:
sed -i "" "s/<VERSION-FULL>/$FULL_VERSION/g" $file
sed -i "" "s/<VERSION-SHORT>/${SHORT_VERSION}/g" $file

To:
for file in `ls $PARCEL_NAME/meta/**`
do
sed -i -e "s/<VERSION-FULL>/$FULL_VERSION/g" $file
sed -i -e "s/<VERSION-SHORT>/${SHORT_VERSION}/g" $file
done

then run the parcel build package again

NIFI-0.0.5.nifi.p0.5/README
Validating: NIFI-0.0.5.nifi.p0.5-el6.parcel
Validating: NIFI-0.0.5.nifi.p0.5/meta/parcel.json
Validation succeeded.
Scanning directory: .
Found parcel NIFI-0.0.5.nifi.p0.5-el6.parcel

Now run little apache server importer to distribute the Apache Nifi with Cloudera

# python -m SimpleHTTPServer 14641
Serving HTTP on 0.0.0.0 port 14641 ...

192.168.2.5 - - [08/Oct/2018 15:04:29] "GET /manifest.json HTTP/1.1" 200 -

Cloudera Manager Apache Nifi Distribution

Login to your Cloudera manager console Navigate to Hosts –> Parcels –> Configuration

add new entry

http://192.168.2.5:14641

Now click on Check for New Parcels

Apache Nifi Parcel

Apache Nifi Parcel

Download and Distribute to the Cloudera Cluster

Move Package to Cloudera Manager

$ cp NIFI-1.0.jar /opt/cloudera/csd
$ mkdir /opt/cloudera/csd/NIFI-1.0
$ cp NIFI-1.0.jar /opt/cloudera/csd/NIFI-1.0
$ cd /opt/cloudera/csd/NIFI-1.0
$ jar xvf NIFI-1.0.jar
$ rm -f NIFI-1.0.jar
$ sudo service cloudera-scm-server restart
# Wait a min, go to Cloudera Manager -> Add a Service -> NiFi

That’s it about making parcel of apache nifi cloudera manager

Conclusion:

This apache Nifi parcel will not support fir RHEL7 / Centos 7 check the compatibility before compiling package

Related Articles

RHEL7 Cloudera Manager Installation

Centos7 Streamsets Installation Guide

Creating MSSQL Streamsets Pipeline

Thanks for your wonderful Support and Encouragement

Ravi Kumar Ankam

My Name is ARK. Expert in grasping any new technology, Interested in Sharing the knowledge. Learn more & Earn More

1 Response

  1. vinod says:

    is there any video included with this tutorial. please let me know the name in the youtube account.

Leave a Reply

Your email address will not be published. Required fields are marked *