Tagged: CCDH 410
This topic contains 9 replies, has 3 voices, and was last updated by Nishith Gupta 3 years, 9 months ago.
January 11, 2015 at 8:41 pm #2959
Q1- What type of algorithms are difficult to express in Map-Reduce:
A- Large Scale Graph Algorithms.
B- Algorithms that require global, shared site.
C- Relational operations on large amounts of structured and semi-structered data.
D- For text analysis on large web data.
E- Algorithms that require applying the same mathematical function to large number of individual binary records.
I think the answer should be option A for this. But I am not sure. I think map-reduce is just not meant for anything that requires sharing of computation results as its a parallel distributed programming framework. Please correct me if i am wrong in my understanding.
Q2: You have written a Mapper which invokes the following calls to the outputcollector.collect();
output.collect(new Text(“Square”),new Text(“Red”);
output.collect(new Text(“Circle”),new Text(“Yellow”);
output.collect(new Text(“Square”),new Text(“Yellow”);
output.collect(new Text(“Triangle”),new Text(“Red”);
output.collect(new Text(“square”),new Text(“Green”);
How many times it is going to call reduce method?:
I think the answer is 4. But I am not sure if I am correct. Can anyone confirm this with an explanation, then it would be better.January 19, 2015 at 3:00 pm #2962
Yes You are correct for the first question. Answer is A- Large Scale Graph Algorithms for Q1. For Q2, it is 3 reducer tasks, as the data partitioning happens as shown below before it reaches reducer,
(Square, (Red, Yellow, Green)), (Circle, (Yellow)), (Triangle, (Red))February 16, 2015 at 4:52 pm #3070
Q3. Lets say I have a file with variable record length and no carriage return at the end to specify end of line. Something like:
I want to read this file line by line. How can i achieve this in Map Reduce?February 26, 2015 at 9:39 pm #3130
By default Mapreduce uses TextInputFormat which reads each line as a value and line number as key –> (key,value) pairs = (line number, text of each line). By default TextInputFormat reads contents of your input file line by line only.February 27, 2015 at 9:04 am #3131
Would it still be possible to read using TextInputFormat even with endofline characters like carriage return present in between the records instead of at actual ends of lines?February 27, 2015 at 9:08 am #3132
Would it still be possible to read line by line using textinputformat even if there is no carriage return present to indicate end of line or line boundaries? What if unwanted carriage return characters are present in between records instead of at the end/boundaries?February 27, 2015 at 12:28 pm #3133
I am trying CDH automatic installation on AWs EC2 using cloudera manager bin. I have created one ubuntu Precise 12.04 LTS micro instance,
I followed the on screen instructions as instructions on per this tutorial.. ” http://www.cloudera.com/content/cloudera/en/documentation/cloudera-manager/v4-7-1/Cloudera-Manager-I…
1) this is my vi /etc/hosts file
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
2) Downloaded and changed the permission for cloudera-manager-installer.bin also.
3) sudo ./cloudera-manager-installer.bin after this command cloudera manager installed,
4) But i couldnot access cloudera manager webconsole using ” http://172.31.13.46:7180 ” and i have opened my port 7180 while creating an instance, but still not able to acess through webconsole,
5) my cloudera manager db and cloudera manager server both are running.
6) and the port 7180 is also not listning in my ubuntu server and i used the following comand, ” sudo ufs allow 7180″ but no use,,
7) I checked $ sudo ufw status and the result is inactive
8) when I check $ sudo service cloudera-scm-agent status on 172.31.13.46 it comes as unrecognized service
I am struggling in this part, Could you please let me know where I went wrong in installing cloudera in a clustered environment..???
if yes, it will be helpful for me, please,
Thanks in advance,
BharathFebruary 27, 2015 at 9:46 pm #3141
@ Nisith Gupta,
If there is no carriage return defined at the end of each line, How will you try to separate lines, Is there any other delimiter instead of ‘\n’. You definitely need some or the other character as a delimiter between your lines.
If you have any other character than ‘\n’ as your delimiter then you need to write your custom InputFormat by providing implementation to createRecordReader() method with your delimiters.February 27, 2015 at 9:47 pm #3142
I’ll try to look into your issue tomorrow and let u know my response.February 27, 2015 at 11:09 pm #3144
Thanks Siva. I got your point to use CustomRecordReader. Actually this was asked to me in one of the interviews and he was saying that the delimiter or character that indicates the end of a line or record is also present in the line itself then what approach should i follow to read the lines. For example let’s say “*” is the character that wud indicate end of line but that same symbol is also there in the line itself:
The topic ‘CCDH 410 Probable Questions’ is closed to new replies.