We have three different kinds of data - Structured (schema based systems like Oracle/MySQL etc.), Unstructured (images, weblogs etc.) and Semi-structured data(XML etc.)
Structured data can be stored in database SQL in table with rows and columns
Semi-structured data is information that doesn’t reside in a relational database but that does have some organizational properties that make it easier to analyze. With some process you can store them in relation database (e.g. XML)
Unstructured data often include text and multimedia content. Examples include e-mail messages, word processing documents, videos, photos, audio files, presentations, webpages and many other kinds of business documents.
Depending on type of your data, you will choose the tools to import data into HDFS.
Your company may use CRM,ERP tools. But we don't exactly know how the data is organized & structured.
If we leave simple HDFS commands like put, copyFromLocal etc to load data into HDFS compatible format, below are the main tools to load data into HDFS
Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. Data from MySQL, SQL Server & Oracle tables can be loaded into HDFS with this tool.
Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms.
Other tools include Chukwa,Storm and Kafka
But other important technology, which is becoming very popular is Spark. It is a Friend & Foe for Hadoop.
Spark is emerging an good alternative to Hadoop for real time data processing, which may or may not use HDFS as data source.