CCDH 410 Probable Questions


Hadoop Eco System Forums Hadoop Discussion Forum CCDH 410 Probable Questions

Tagged: 

This topic contains 9 replies, has 3 voices, and was last updated by  Nishith Gupta 3 years, 6 months ago.

Viewing 10 posts - 1 through 10 (of 10 total)
  • Author
    Posts
  • #2959

    Nishith Gupta
    Participant

    Q1- What type of algorithms are difficult to express in Map-Reduce:

    A- Large Scale Graph Algorithms.
    B- Algorithms that require global, shared site.
    C- Relational operations on large amounts of structured and semi-structered data.
    D- For text analysis on large web data.
    E- Algorithms that require applying the same mathematical function to large number of individual binary records.

    I think the answer should be option A for this. But I am not sure. I think map-reduce is just not meant for anything that requires sharing of computation results as its a parallel distributed programming framework. Please correct me if i am wrong in my understanding.

    Q2: You have written a Mapper which invokes the following calls to the outputcollector.collect();

    output.collect(new Text(“Square”),new Text(“Red”);
    output.collect(new Text(“Circle”),new Text(“Yellow”);
    output.collect(new Text(“Square”),new Text(“Yellow”);
    output.collect(new Text(“Triangle”),new Text(“Red”);
    output.collect(new Text(“square”),new Text(“Green”);

    How many times it is going to call reduce method?:
    A- 2
    B- 3
    C- 4
    D- 5

    I think the answer is 4. But I am not sure if I am correct. Can anyone confirm this with an explanation, then it would be better.

    #2962

    Siva
    Keymaster

    Yes You are correct for the first question. Answer is A- Large Scale Graph Algorithms for Q1. For Q2, it is 3 reducer tasks, as the data partitioning happens as shown below before it reaches reducer,

    (Square, (Red, Yellow, Green)), (Circle, (Yellow)), (Triangle, (Red))

    #3070

    Nishith Gupta
    Participant

    Q3. Lets say I have a file with variable record length and no carriage return at the end to specify end of line. Something like:

    ABCD EFGHIJK
    SSSSSSgggghhhhhhhhhhh
    ahsfsgdjkhlfhjskkllllll;;;ahyhhh
    qqweear

    I want to read this file line by line. How can i achieve this in Map Reduce?

    #3130

    Siva
    Keymaster

    By default Mapreduce uses TextInputFormat which reads each line as a value and line number as key –> (key,value) pairs = (line number, text of each line). By default TextInputFormat reads contents of your input file line by line only.

    #3131

    Nishith Gupta
    Participant

    Would it still be possible to read using TextInputFormat even with endofline characters like carriage return present in between the records instead of at actual ends of lines?

    #3132

    Nishith Gupta
    Participant

    Would it still be possible to read line by line using textinputformat even if there is no carriage return present to indicate end of line or line boundaries? What if unwanted carriage return characters are present in between records instead of at the end/boundaries?

    #3133

    Bharath

    Hi

    I am trying CDH automatic installation on AWs EC2 using cloudera manager bin. I have created one ubuntu Precise 12.04 LTS micro instance,

    I followed the on screen instructions as instructions on per this tutorial.. ” http://www.cloudera.com/content/cloudera/en/documentation/cloudera-manager/v4-7-1/Cloudera-Manager-I

    1) this is my vi /etc/hosts file
    127.0.0.1 localhost
    172.31.13.46 master
    # The following lines are desirable for IPv6 capable hosts
    ::1 ip6-localhost ip6-loopback
    fe00::0 ip6-localnet
    ff00::0 ip6-mcastprefix
    ff02::1 ip6-allnodes
    ff02::2 ip6-allrouters
    ff02::3 ip6-allhosts

    2) Downloaded and changed the permission for cloudera-manager-installer.bin also.

    3) sudo ./cloudera-manager-installer.bin after this command cloudera manager installed,

    4) But i couldnot access cloudera manager webconsole using ” http://172.31.13.46:7180 ” and i have opened my port 7180 while creating an instance, but still not able to acess through webconsole,

    5) my cloudera manager db and cloudera manager server both are running.

    6) and the port 7180 is also not listning in my ubuntu server and i used the following comand, ” sudo ufs allow 7180″ but no use,,

    7) I checked $ sudo ufw status and the result is inactive

    8) when I check $ sudo service cloudera-scm-agent status on 172.31.13.46 it comes as unrecognized service

    I am struggling in this part, Could you please let me know where I went wrong in installing cloudera in a clustered environment..???

    if yes, it will be helpful for me, please,

    Thanks in advance,

    Regards,
    Bharath

    #3141

    Siva
    Keymaster

    @ Nisith Gupta,

    If there is no carriage return defined at the end of each line, How will you try to separate lines, Is there any other delimiter instead of ‘\n’. You definitely need some or the other character as a delimiter between your lines.

    If you have any other character than ‘\n’ as your delimiter then you need to write your custom InputFormat by providing implementation to createRecordReader() method with your delimiters.

    #3142

    Siva
    Keymaster

    @ Bharath,

    I’ll try to look into your issue tomorrow and let u know my response.

    #3144

    Nishith Gupta
    Participant

    Thanks Siva. I got your point to use CustomRecordReader. Actually this was asked to me in one of the interviews and he was saying that the delimiter or character that indicates the end of a line or record is also present in the line itself then what approach should i follow to read the lines. For example let’s say “*” is the character that wud indicate end of line but that same symbol is also there in the line itself:

    NishithGupta*SivaSive Siva*
    Bharathguptanishithsiva*NDTV*
    addsg*aggsh*aggs*

Viewing 10 posts - 1 through 10 (of 10 total)

The topic ‘CCDH 410 Probable Questions’ is closed to new replies.