StreamSets Data Collector Installation Using tarball (SDC)
StreamSets Data Collector is a low-latency ingest infrastructure tool that lets you create continuous data ingest pipelines using a drag and drop UI within an integrated development environment (IDE). In this post i am going to show you Streamsets Data Collector Installation Using Tarball
Thousands of companies use the open source StreamSets Data Collector to efficiently build, test, run and maintain data-flow pipelines connecting a variety of batch and streaming data sources and compute platforms. Data Collector pipelines require minimal schema specification and uniquely detect and handle data drift.
Environment
-
- OS: Ubuntu
- Package: StreamSets Data Collector 3.3.0
- Disk Space: Minimum 5GB
Component | Minimum Requirement |
---|---|
Operating system | Use one of the following operating systems and versions:
|
Cores | 2 |
RAM | 1 GB |
Disk space | 6 GB |
File descriptors | 32768 |
Java | Oracle Java 8 or OpenJDK 8 |
Browser | Use the latest version of one of the following browsers:
|
Prerequisites to install streamsets data collection application
- Create sdc service account
- Add sdc user account to sdc group
- Install and configure java
- Configure nofile limit to 32k files or more than that
# groupadd sdc # useradd -r -g sdc -d /home/sdc -s /sbin/nologin sdc # apt-get install java* ##In Ubuntu # yum install java* ##In Red Hat/Centos / Fedora
Edit below file and add few entries to limit the files
# vi /etc/security/limits.conf * soft nofile 65536 * hard nofile 65536
After setting up limits you need to reboot the Linux machine to take changes effect. After reboot verify using below command
# ulimit -n 65536
StreamSets Data Collector Installation Using tarball
The Very first step is to create directory, wherever space available for streamsets data collector download. In this example i have created a directory under “/” slash
# mkdir /sdc # cd /sdc/ # wget https://archives.streamsets.com/datacollector/3.3.0/tarball/streamsets-datacollector-all-3.3.0.tgz
Extract downloaded streamsets tarball package
Using tar command you can extract compressed file, it it almost 3.7GB in size. Make sure that you have good internet bandwidth to download package from internet.
# tar -xvf streamsets-datacollector-all-3.3.0.tgz
After UN-compressing the package copy the required streamsets data collector supporting files to initd directory for starting and stopping SDC service
# cd streamsets-datacollector-3.3.0/ # cd initd/ # cp _sdcinitd_prototype /etc/init.d/sdc
Change extracted directory permissions to sdc user account to that application will get proper permissions to user its files. Add extracted directory location for source and destination path
# chown sdc:sdc /etc/init.d/sdc # vi /etc/init.d/sdc # installation directory of the data collector IT MUST BE SET export SDC_DIST="/sdc/streamsets-datacollector-3.3.0" export SDC_HOME="/sdc/streamsets-datacollector-3.3.0" chmod 755 /etc/init.d/sdc ##Provide Executable permissions mkdir /etc/sdc cd /sdc/streamsets-datacollector-3.3.0/ cp -R etc/* /etc/sdc/ chown -R sdc:sdc /etc/sdc chmod go-rwx /etc/sdc/form-realm.properties # Create Log Directory path to write application logs mkdir /var/log/sdc chown sdc:sdc /var/log/sdc # Library Directory path to keep library files mkdir /var/lib/sdc chown sdc:sdc /var/lib/sdc # Source paths mkdir /var/lib/sdc-resources chown sdc:sdc /var/lib/sdc-resources
Start and Stop Streamsets Data Collector Service
# /etc/init.d/sdc start update-rc.d sdc defaults 97 03 history port.
# /etc/init.d/sdc status
/sdc/streamsets-datacollector-3.3.0/libexec/sdcd-env.sh: line 83: export: `/sdc/streamsets-datacollector-3.3.0/streamsets-libs-extras/': not a valid identifier
running
Go to browser and type URL
http://IPADDRESS:18630
Successful of streamsets data collector installation.
Related Articles
Disk Space Monitoring shell Script
Implement User based quota management
Thanks for your wonderful Support and Encouragement