CCDH 410 Probable Questions

Hadoop Eco System Forums Hadoop Discussion Forum CCDH 410 Probable Questions


This topic contains 9 replies, has 3 voices, and was last updated by  Nishith Gupta 3 years, 9 months ago.

Viewing 10 posts - 1 through 10 (of 10 total)
  • Author
  • #2959

    Nishith Gupta

    Q1- What type of algorithms are difficult to express in Map-Reduce:

    A- Large Scale Graph Algorithms.
    B- Algorithms that require global, shared site.
    C- Relational operations on large amounts of structured and semi-structered data.
    D- For text analysis on large web data.
    E- Algorithms that require applying the same mathematical function to large number of individual binary records.

    I think the answer should be option A for this. But I am not sure. I think map-reduce is just not meant for anything that requires sharing of computation results as its a parallel distributed programming framework. Please correct me if i am wrong in my understanding.

    Q2: You have written a Mapper which invokes the following calls to the outputcollector.collect();

    output.collect(new Text(“Square”),new Text(“Red”);
    output.collect(new Text(“Circle”),new Text(“Yellow”);
    output.collect(new Text(“Square”),new Text(“Yellow”);
    output.collect(new Text(“Triangle”),new Text(“Red”);
    output.collect(new Text(“square”),new Text(“Green”);

    How many times it is going to call reduce method?:
    A- 2
    B- 3
    C- 4
    D- 5

    I think the answer is 4. But I am not sure if I am correct. Can anyone confirm this with an explanation, then it would be better.



    Yes You are correct for the first question. Answer is A- Large Scale Graph Algorithms for Q1. For Q2, it is 3 reducer tasks, as the data partitioning happens as shown below before it reaches reducer,

    (Square, (Red, Yellow, Green)), (Circle, (Yellow)), (Triangle, (Red))


    Nishith Gupta

    Q3. Lets say I have a file with variable record length and no carriage return at the end to specify end of line. Something like:


    I want to read this file line by line. How can i achieve this in Map Reduce?



    By default Mapreduce uses TextInputFormat which reads each line as a value and line number as key –> (key,value) pairs = (line number, text of each line). By default TextInputFormat reads contents of your input file line by line only.


    Nishith Gupta

    Would it still be possible to read using TextInputFormat even with endofline characters like carriage return present in between the records instead of at actual ends of lines?


    Nishith Gupta

    Would it still be possible to read line by line using textinputformat even if there is no carriage return present to indicate end of line or line boundaries? What if unwanted carriage return characters are present in between records instead of at the end/boundaries?




    I am trying CDH automatic installation on AWs EC2 using cloudera manager bin. I have created one ubuntu Precise 12.04 LTS micro instance,

    I followed the on screen instructions as instructions on per this tutorial.. ”

    1) this is my vi /etc/hosts file localhost master
    # The following lines are desirable for IPv6 capable hosts
    ::1 ip6-localhost ip6-loopback
    fe00::0 ip6-localnet
    ff00::0 ip6-mcastprefix
    ff02::1 ip6-allnodes
    ff02::2 ip6-allrouters
    ff02::3 ip6-allhosts

    2) Downloaded and changed the permission for cloudera-manager-installer.bin also.

    3) sudo ./cloudera-manager-installer.bin after this command cloudera manager installed,

    4) But i couldnot access cloudera manager webconsole using ” ” and i have opened my port 7180 while creating an instance, but still not able to acess through webconsole,

    5) my cloudera manager db and cloudera manager server both are running.

    6) and the port 7180 is also not listning in my ubuntu server and i used the following comand, ” sudo ufs allow 7180″ but no use,,

    7) I checked $ sudo ufw status and the result is inactive

    8) when I check $ sudo service cloudera-scm-agent status on it comes as unrecognized service

    I am struggling in this part, Could you please let me know where I went wrong in installing cloudera in a clustered environment..???

    if yes, it will be helpful for me, please,

    Thanks in advance,




    @ Nisith Gupta,

    If there is no carriage return defined at the end of each line, How will you try to separate lines, Is there any other delimiter instead of ‘\n’. You definitely need some or the other character as a delimiter between your lines.

    If you have any other character than ‘\n’ as your delimiter then you need to write your custom InputFormat by providing implementation to createRecordReader() method with your delimiters.



    @ Bharath,

    I’ll try to look into your issue tomorrow and let u know my response.


    Nishith Gupta

    Thanks Siva. I got your point to use CustomRecordReader. Actually this was asked to me in one of the interviews and he was saying that the delimiter or character that indicates the end of a line or record is also present in the line itself then what approach should i follow to read the lines. For example let’s say “*” is the character that wud indicate end of line but that same symbol is also there in the line itself:

    NishithGupta*SivaSive Siva*

Viewing 10 posts - 1 through 10 (of 10 total)

The topic ‘CCDH 410 Probable Questions’ is closed to new replies.