Behrang&Mohit&behrang/15383/lecs/2013.10...2013/10/21  · 10/21/13 1 15383: txt proc!...

30
10/21/13 1 15383: txt proc 15383: Intro to Text Processing Behrang Mohit 15383: txt proc Welcome! !लकम ! ﻣﺮﺣﺒﺎWelcome To 15383! Instructor: Behrang Mohit Office: 1004 [email protected] 2

Transcript of Behrang&Mohit&behrang/15383/lecs/2013.10...2013/10/21  · 10/21/13 1 15383: txt proc!...

Page 1: Behrang&Mohit&behrang/15383/lecs/2013.10...2013/10/21  · 10/21/13 1 15383: txt proc! 15383:&Intro&to&TextProcessing& Behrang&Mohit& 15383: txt proc! Welcome! !लकम!ﺎﺒﺣﺮﻣ&

10/21/13  

1  

15383: txt proc!

15383:  Intro  to  Text  Processing  

Behrang  Mohit  

15383: txt proc!

Welcome! !लकम !مرحبا  

•  Welcome  To  15383!  

•  Instructor:  Behrang  Mohit  –  Office:  1004  –  first-­‐[email protected]  

2  

Page 2: Behrang&Mohit&behrang/15383/lecs/2013.10...2013/10/21  · 10/21/13 1 15383: txt proc! 15383:&Intro&to&TextProcessing& Behrang&Mohit& 15383: txt proc! Welcome! !लकम!ﺎﺒﺣﺮﻣ&

10/21/13  

2  

15383: txt proc!

The  AI  Dream  

•  CreaKng  intelligent  systems  capable  of  simulaKng  humans  

3  

15383: txt proc!

Language  and  Text  

•  Has  been  present  since  early  days  of  human  civilizaKon.  

4  

Page 3: Behrang&Mohit&behrang/15383/lecs/2013.10...2013/10/21  · 10/21/13 1 15383: txt proc! 15383:&Intro&to&TextProcessing& Behrang&Mohit& 15383: txt proc! Welcome! !लकम!ﺎﺒﺣﺮﻣ&

10/21/13  

3  

15383: txt proc!

21st  Century:  So  Much  Text!  

•  Problem:  InformaKon  overload!  

5  

15383: txt proc!

21st  Century:  So  Much  Text!  

•  ExponenKal  growth  of  text  in  the  surface  web  and  also  the  deep  web.  – 400m  tweets/day  

6  

Page 4: Behrang&Mohit&behrang/15383/lecs/2013.10...2013/10/21  · 10/21/13 1 15383: txt proc! 15383:&Intro&to&TextProcessing& Behrang&Mohit& 15383: txt proc! Welcome! !लकम!ﺎﺒﺣﺮﻣ&

10/21/13  

4  

15383: txt proc!

Generate,  Organize  and  Process  

•  Need  to  generate,  organize  and  process  text:  – Different  topics  and  genres  •  News,  science,  sport,  film  subKtles,  children  stories,  jokes,…      

– Different  languages  –   Different  pla_orms  and  mediums  •  prints,  desktop,  mobile  device,  TV,  …  •  Internet  

–  Official  channels  (government  and  corporate  webpages)  –  Personal  pages,  social  media  

7  

15383: txt proc!

Examples  of  Text  Processing  Tasks  

•  Searching  and  categorizing  •  ExtracKng  informaKon  from  text  – Who  is  doing  what  to  whom  when  

•  Summarize  text  and  answer  quesKons  

•  Translate  •  Understand  text  •  Chat  and  counsel  humans  (psychotherapy)  

8  

Page 5: Behrang&Mohit&behrang/15383/lecs/2013.10...2013/10/21  · 10/21/13 1 15383: txt proc! 15383:&Intro&to&TextProcessing& Behrang&Mohit& 15383: txt proc! Welcome! !लकम!ﺎﺒﺣﺮﻣ&

10/21/13  

5  

15383: txt proc!

15383:  What  Will  We  Learn  Here?  

•  Features  of  text:  Corpus,  Encoding  

•  Basics  of  staKsKcs  •  Text  OrganizaKon:    

–  Document  ClassificaKon  

–  Search:  InformaKon  Retrieval  

9  

15383: txt proc!

15383:  What  Will  We  Learn  Here?  

•  Features  of  text:  Corpus,  Encoding  

•  Basics  of  staKsKcs  •  Text  OrganizaKon:    

–  Document  ClassificaKon  

–  Search:  InformaKon  Retrieval  •  LinguisKcs  in  text  processing  

–  a.k.a.  Natural  Language  Processing  

•  Python  programming  (NLTK)  

•  Course  project:  SenKment  Analysis  on  Social  Media  

10  

Page 6: Behrang&Mohit&behrang/15383/lecs/2013.10...2013/10/21  · 10/21/13 1 15383: txt proc! 15383:&Intro&to&TextProcessing& Behrang&Mohit& 15383: txt proc! Welcome! !लकम!ﺎﺒﺣﺮﻣ&

10/21/13  

6  

15383: txt proc!

Text  As  A  Medium  

•  Natural  Language  Processing  – Speech  – Text  •  Focus  on  natural  language  text  

11  

15383: txt proc!

Today  

•  An  overview  of  topics  

12  

Page 7: Behrang&Mohit&behrang/15383/lecs/2013.10...2013/10/21  · 10/21/13 1 15383: txt proc! 15383:&Intro&to&TextProcessing& Behrang&Mohit& 15383: txt proc! Welcome! !लकम!ﺎﺒﺣﺮﻣ&

10/21/13  

7  

15383: txt proc!

Text  And  Its  Encodings  

•  Text:  Words,  sentences,  documents,  …  •  Processing  and  organizing  large  volumes  of  text  – Corpus  (corpora)  •  For  building  and  evaluaKng  text  processing  systems  

•  Might  include  extra  linguisKcs  informaKon  

•  Encoding  the  text  

13  

15383: txt proc!

StaKsKcs  In  Text  Processing  

•  Rule-­‐based  systems  vs.  staKsKcal  systems  •  ProbabiliKes  •  StaKsKcal  learning  – Supervised  learning  

14  

Page 8: Behrang&Mohit&behrang/15383/lecs/2013.10...2013/10/21  · 10/21/13 1 15383: txt proc! 15383:&Intro&to&TextProcessing& Behrang&Mohit& 15383: txt proc! Welcome! !लकम!ﺎﺒﺣﺮﻣ&

10/21/13  

8  

15383: txt proc!

Text  OrganizaKon  

•  Large  volumes  of  text    organized  text  •  Document  classificaKon  – Sport,  poliKcs,  science,  …  – Email  classificaKon  • Work,  Fun,  Spam,  …  

•  Searching  documents  – Ask,  Google,  Bing,  etc.  

15  

15383: txt proc!

Natural  Language  Processing  is  …  

•  NLP  or    – ComputaKonal  LinguisKcs  

– Human  Language  Technologies  

•  Goal:  Making  computers  capable  of  using  human  language  as  their  input  or  output,  performing  intelligent  tasks.  

16  

Page 9: Behrang&Mohit&behrang/15383/lecs/2013.10...2013/10/21  · 10/21/13 1 15383: txt proc! 15383:&Intro&to&TextProcessing& Behrang&Mohit& 15383: txt proc! Welcome! !लकम!ﺎﺒﺣﺮﻣ&

10/21/13  

9  

15383: txt proc!

NLP  and  ArKficial  Intelligence  

•  NLP  is  the  fundamental  problem  of  ArKficial  Intelligence  (AI).  

•  Turing  test  for  the  intelligence  of  a  machine  –  If  a  human  judge  can  not  disKnguish  between  a  machine  and  human  in  a  conversaKon  framework,  the  machine  passes  the  Turing  test.  

17  

15383: txt proc!

The  Dream  of  Talking  Machines!  

•  2001  Space  Odyssey  – Dave  (human):  Open  the  door  Hal  

– HAL  (machine):  I’m  sorry  Dave,  I  can’t  do  that.  

•  HAL:  An  intelligent  system  capable  of:  – Understanding  and  genera2ng  human  language  

18  

Page 10: Behrang&Mohit&behrang/15383/lecs/2013.10...2013/10/21  · 10/21/13 1 15383: txt proc! 15383:&Intro&to&TextProcessing& Behrang&Mohit& 15383: txt proc! Welcome! !लकम!ﺎﺒﺣﺮﻣ&

10/21/13  

10  

15383: txt proc!

LinguisKc  Layers  

•  PhoneKcs  •  Phonology  •  Morphology  

•  Syntax  •  SemanKcs  

•  PragmaKcs  

•  Discourse  

19  

15383: txt proc!

LinguisKcs  Layers:  Morphology  

•  What  are  building  blocks  of  words?  – goes    go  +  es  

– premest    Preny  +  est  

•  Different  levels  of  complicaKon  in  morphology  – English  – Arabic,  Finnish,  Turkish  •  wsyaktobun    w  +  s  +  yaktob  +  un  •  And  will  write  they    and  they  will  write    

20  

Page 11: Behrang&Mohit&behrang/15383/lecs/2013.10...2013/10/21  · 10/21/13 1 15383: txt proc! 15383:&Intro&to&TextProcessing& Behrang&Mohit& 15383: txt proc! Welcome! !लकम!ﺎﺒﺣﺮﻣ&

10/21/13  

11  

15383: txt proc!

LinguisKc  Layers:  Syntax  

•  How  do  words  come  together  to  form  more  complex  units?  – Phrases,  sentences,  relaKonship  between  phrases  – Mostly  at  the  sentence  level  – Zeinab  bought  a  book  .    

 Noun  Verb  Det  Noun  PunctuaKon    Subject  Verb  Object    

21  

15383: txt proc!

LinguisKc  Layers:  SemanKcs  

•  What  is  the  meaning  of  terms  in  a  sentence  – Suhail  bought  a  book.   Commercial  transacKon:  

 Buyer:  Suhail   AcKon:  buying   Commodity:  book  

22  

Page 12: Behrang&Mohit&behrang/15383/lecs/2013.10...2013/10/21  · 10/21/13 1 15383: txt proc! 15383:&Intro&to&TextProcessing& Behrang&Mohit& 15383: txt proc! Welcome! !लकम!ﺎﺒﺣﺮﻣ&

10/21/13  

12  

15383: txt proc!

LinguisKc  Layers:  PragmaKcs  and  Discourse  

•  Going  beyond  a  sentence-­‐level  analysis  – Ahmad  arrived  in  Doha.    He  was  accompanied  by  his  family.    They  went  directly  to  a  wedding  from  the  airport.    

23  

15383: txt proc!

LinguisKcs  Layers  

•  PhoneKcs  •  Phonology  •  Morphology  

•  Syntax  •  Seman2cs  

•  PragmaKcs  

•  Discourse  

24  

Page 13: Behrang&Mohit&behrang/15383/lecs/2013.10...2013/10/21  · 10/21/13 1 15383: txt proc! 15383:&Intro&to&TextProcessing& Behrang&Mohit& 15383: txt proc! Welcome! !लकम!ﺎﺒﺣﺮﻣ&

10/21/13  

13  

15383: txt proc!

Two  Major  NLP  Challenges  

•  Challenge  1:  Gemng  the  proper  linguisKc  representaKon  of  the  input  – From  sound  waves  to  text  – From  the  sentence  to  syntacKc  tree  •  Mariem  bought  a  book    Mariem:  sub,  book:  Obj  

– ….  

25  

15383: txt proc!

Two  Major  NLP  Challenges  

•  Challenge  2:  Ambiguity  in  language  

•  A  language  understanding  example  – “At  last,  a  computer  that  understands  you  like  your  mother!”  •  Ad  from  Microsop  (in  early  1980s)  

•  Example  by  Stuart  Shiebert  

26  

Page 14: Behrang&Mohit&behrang/15383/lecs/2013.10...2013/10/21  · 10/21/13 1 15383: txt proc! 15383:&Intro&to&TextProcessing& Behrang&Mohit& 15383: txt proc! Welcome! !लकम!ﺎﺒﺣﺮﻣ&

10/21/13  

14  

15383: txt proc!

A  Computer  That  Understands…  

•  At  last,  a  computer  that  understands  you  like  your  mother!  

•  Computer  understands  you  as  well  as  your  mother  understands  you  

•  Computer  understands  that  you  like  your  mother  •  Computer  understands  you  as  well  as  it  understands  your  mother  

•  Problem:  Ambiguity  in  human  expressions  

27  

15383: txt proc!

A  Computer  That  Understands…  

•  At  last,  a  computer  that  understands  you  like  your  mother!  

•  Computer  understands  you  as  well  as  your  mother  understands  you  

•  Computer  understands  that  you  like  your  mother  •  Computer  understands  you  as  well  as  it  understands  your  mother  

•  Problem:  Ambiguity  in  human  expressions  

28  

Page 15: Behrang&Mohit&behrang/15383/lecs/2013.10...2013/10/21  · 10/21/13 1 15383: txt proc! 15383:&Intro&to&TextProcessing& Behrang&Mohit& 15383: txt proc! Welcome! !लकम!ﺎﺒﺣﺮﻣ&

10/21/13  

15  

15383: txt proc!

A  Computer  That  Understands…  

•  At  last,  a  computer  that  understands  you  like  your  mother!  

•  Computer  understands  you  as  well  as  your  mother  understands  you  

•  Computer  understands  that  you  like  your  mother  •  Computer  understands  you  as  well  as  it  understands  your  mother  

•  Problem:  Ambiguity  in  human  expressions  

29  

15383: txt proc!

The  Ambiguity  Problem  

•  Humans  use  common-­‐sense,  bits  of  culture,  world  knowledge  in  their  expressions!  – Do  computers  understand  all  of  those?  

30  

Page 16: Behrang&Mohit&behrang/15383/lecs/2013.10...2013/10/21  · 10/21/13 1 15383: txt proc! 15383:&Intro&to&TextProcessing& Behrang&Mohit& 15383: txt proc! Welcome! !लकम!ﺎﺒﺣﺮﻣ&

10/21/13  

16  

15383: txt proc!

Levels  of  Ambiguity:  AcousKcs  

•  Understands  you  like  your  mother  

•  Understands  you  lie  cured  mother  

•  It  is  hard  to  recognize  speech.  •  It  is  hard  to  wreck  a  nice  beach.  

31  

15383: txt proc!

Levels  of  Ambiguity:  Syntax  

•  Different  sentence  structure  (syntax):  

– Computer  that  understands  you  (like  your  mother  

[does])  

– Computer  that  understand  ([that]  you  like  your  mother)  

32  

Page 17: Behrang&Mohit&behrang/15383/lecs/2013.10...2013/10/21  · 10/21/13 1 15383: txt proc! 15383:&Intro&to&TextProcessing& Behrang&Mohit& 15383: txt proc! Welcome! !लकम!ﺎﺒﺣﺮﻣ&

10/21/13  

17  

15383: txt proc!

Levels  of  Ambiguity:  SemanKcs  

•  …  knows  you  like  your  mother  •  The  female  parent  

– Most  probably  

•  A  vat  (dish)  for  making  vinegar  

•  We  put  our  money  in  the  bank  – Money  bury  under  the  mud  (river  bank)!  – Financial  insKtuKon  •  Most  probably  

33  

15383: txt proc!

Levels  of  Ambiguity:  Discourse  

•  Leila  says  they  are  selling  a  computer  that  knows  you  like  your  mother.    But  she  doesn’t  seem  to  be  happy  about  it.  

– Who  does  she  refer  to?  •  mother,  computer,  Leila?  

34  

Page 18: Behrang&Mohit&behrang/15383/lecs/2013.10...2013/10/21  · 10/21/13 1 15383: txt proc! 15383:&Intro&to&TextProcessing& Behrang&Mohit& 15383: txt proc! Welcome! !लकम!ﺎﺒﺣﺮﻣ&

10/21/13  

18  

15383: txt proc!

Let’s  Disambiguate  

•  I  saw  her  duck  with  a  telescope  

35  

15383: txt proc!

Let’s  Disambiguate  

•  I  saw  her  duck  with  a  telescope  

•  I  used  a  telescope  to  see  her  duck  •  I  saw  her  duck  that  was  carrying  a  telescope.  •  I  used  a  telescope  to  see  her  ducking  •  I  saw  her  ducking  using  a  telescope  •  I  cut  her  duck  with  a  telescope  •  ….  

36  

Page 19: Behrang&Mohit&behrang/15383/lecs/2013.10...2013/10/21  · 10/21/13 1 15383: txt proc! 15383:&Intro&to&TextProcessing& Behrang&Mohit& 15383: txt proc! Welcome! !लकम!ﺎﺒﺣﺮﻣ&

10/21/13  

19  

15383: txt proc!

Let’s  Disambiguate  

•  I  saw  her  duck  with  a  telescope  

•  I  used  a  telescope  to  see  her  duck  •  I  saw  her  duck  that  was  carrying  a  telescope.  •  I  used  a  telescope  to  see  her  ducking  •  I  saw  her  ducking  using  a  telescope  •  I  cut  her  duck  with  a  telescope  •  ….  

37  

15383: txt proc!

Let’s  Disambiguate  

•  I  saw  her  duck  with  a  telescope  

•  I  used  a  telescope  to  see  her  duck  •  I  saw  her  duck  that  was  carrying  a  telescope.  •  I  used  a  telescope  to  see  her  ducking  •  I  saw  her  ducking  using  a  telescope  •  I  cut  her  duck  with  a  telescope  •  ….  

38  

Page 20: Behrang&Mohit&behrang/15383/lecs/2013.10...2013/10/21  · 10/21/13 1 15383: txt proc! 15383:&Intro&to&TextProcessing& Behrang&Mohit& 15383: txt proc! Welcome! !लकम!ﺎﺒﺣﺮﻣ&

10/21/13  

20  

15383: txt proc!

Let’s  Disambiguate  

•  I  saw  her  duck  with  a  telescope  

•  I  used  a  telescope  to  see  her  duck  •  I  saw  her  duck  that  was  carrying  a  telescope.  •  I  used  a  telescope  to  see  her  ducking  •  I  saw  her  ducking  using  a  telescope  •  I  cut  her  duck  with  a  telescope  •  ….  

39  

15383: txt proc!

LinguisKcs  Layers  

•  PhoneKcs  •  Phonology  •  Morphology  

•  Syntax  •  Seman2cs  

•  PragmaKcs  

•  Discourse  

40  

Page 21: Behrang&Mohit&behrang/15383/lecs/2013.10...2013/10/21  · 10/21/13 1 15383: txt proc! 15383:&Intro&to&TextProcessing& Behrang&Mohit& 15383: txt proc! Welcome! !लकम!ﺎﺒﺣﺮﻣ&

10/21/13  

21  

15383: txt proc!

Course  Plan  

1.  Text  2.  Text  organizaKon  3.  Processing  with  three  linguisKc  layers  –  Words  (morphology),  Syntax  and  SemanKcs  

4.  NLP  applicaKons  

41  

15383: txt proc!

ApplicaKon:  Spell  Checking  

•  Ali  arrived  at  scool  –  scull  –  school  –  cool  –  spool  

•  Idea:  Look  at  the  previous  words  to  decide  between  the  given  correct  opKons.  –  Use  staKsKcs  

•  Pr(arrived  at    school)  •  Pr(arrived  at  cool)  •  Pr(…)    

42  

Page 22: Behrang&Mohit&behrang/15383/lecs/2013.10...2013/10/21  · 10/21/13 1 15383: txt proc! 15383:&Intro&to&TextProcessing& Behrang&Mohit& 15383: txt proc! Welcome! !लकम!ﺎﺒﺣﺮﻣ&

10/21/13  

22  

15383: txt proc!

ApplicaKon:  Named  EnKty  RecogniKon  

•  Names  of  Persons,  LocaKons,  OrganizaKon,  …  

•  George  Washington  ruled  America  for  two  terms.  

•  George  Washington  University  announced  …  

•  As  George  was  walking  in  Washington,  he  …  

43  

15383: txt proc!

ApplicaKon:  Text  summarizaKon  

•  Summarizing  large  volumes  of  text  – Locate  the  important  parts  of  the  text  and  form  sentences  with  them.  •  Natural  language  generaKon  

– Useful  for  governments,  companies,  etc.  

– Word  Processing  and  browser  offer  the  service  

44  

Page 23: Behrang&Mohit&behrang/15383/lecs/2013.10...2013/10/21  · 10/21/13 1 15383: txt proc! 15383:&Intro&to&TextProcessing& Behrang&Mohit& 15383: txt proc! Welcome! !लकम!ﺎﺒﺣﺮﻣ&

10/21/13  

23  

15383: txt proc!

ApplicaKon:  Machine  TranslaKon  

•  Text  translaKon  from  one  language  to  another  – Dealing  with  differences  in  two  languages  •  English:  Subject-­‐verb-­‐object  •  Arabic:  Verb  Subject  Object  

– AmbiguiKes  in  two  languages  – SemanKc  differences:  •  Concept  of  cousin  

45  

15383: txt proc!

ApplicaKon:  SenKment  Analysis  

•  Imagine  – Your  company  (e.g.  Apple)  has  released  a  new  product  (e.g.  iphone)  and  wants  esKmate  the  iniKal  reacKon  of  customers  

– You’re  campaigning  for  a  poliKcian  and  you  want  to  esKmate  people’s  reacKon  to  his  last  night  speech.  

46  

Page 24: Behrang&Mohit&behrang/15383/lecs/2013.10...2013/10/21  · 10/21/13 1 15383: txt proc! 15383:&Intro&to&TextProcessing& Behrang&Mohit& 15383: txt proc! Welcome! !लकम!ﺎﺒﺣﺮﻣ&

10/21/13  

24  

15383: txt proc!

ApplicaKon:  SenKment  Analysis  

•  DisKnguish  between  objecKve  and  subjecKve  statements.  –  News  vs.  Opinion  

•  Find  polarity  of  statements  –  Product  reviews:    

•  The  new  laptop  is  hot!  •  The  new  laptop  gets  very  hot!  

•  Example:  Organizing  hundreds  of  film  reviews  –  “This  is  a  feel-­‐good  blockbuster  produc?on  with  an  excellent  

technical  setup.”  –  Bonom-­‐line:  Does  this  author  likes  the  movie?  

47  

15383: txt proc!

PosiKve,  NegaKve  or  …?  

•  You’ll  see  a  tweet  and  will  say  if  it’s    – SubjecKve  or  ObjecKve  •  Does  it  carry  any  opinion/senKment?  

–  If  subjecKve:  Posi2ve,  Nega2ve  or  Neutral?  

48  

Page 25: Behrang&Mohit&behrang/15383/lecs/2013.10...2013/10/21  · 10/21/13 1 15383: txt proc! 15383:&Intro&to&TextProcessing& Behrang&Mohit& 15383: txt proc! Welcome! !लकम!ﺎﺒﺣﺮﻣ&

10/21/13  

25  

15383: txt proc!

PosiKve,  NegaKve  or  …?  

•  “AuthoriKes  are  only  too  aware  that  Kashgar  is  4,000  kilometres  (2,500  miles)  from  Beijing  but  only  a  tenth  of  the  distance  from  the  Pakistani  border,  and  are  desperate  to  ensure  instability  or  militancy  does  not  leak  over  the  fronKers.”  

49  

15383: txt proc!

PosiKve,  NegaKve  or  …?  

•  friday  evening  plans  were  great,  but  saturday's  plans  didnt  go  as  expected  -­‐-­‐  i  went  dancing  &  it  was  an  ok  club,  but  terribly  crowded  :-­‐(  

50  

Page 26: Behrang&Mohit&behrang/15383/lecs/2013.10...2013/10/21  · 10/21/13 1 15383: txt proc! 15383:&Intro&to&TextProcessing& Behrang&Mohit& 15383: txt proc! Welcome! !लकम!ﺎﺒﺣﺮﻣ&

10/21/13  

26  

15383: txt proc!

PosiKve,  NegaKve  or  …?  

•  Iran  and  5+1  walk  in  to  new  rounds  of  negoKaKons  on  the  nuke  program.    #iran  #us  

51  

15383: txt proc!

PosiKve,  NegaKve  or  …?  

•  “obama  should  be  impeached  on  TREASON  charges.  Our  Nuclear  arsenal  was  TOP  Secret.  Till  HE  told  our  enemies  what  we  had.  #Coward  #Traitor.”  

52  

Page 27: Behrang&Mohit&behrang/15383/lecs/2013.10...2013/10/21  · 10/21/13 1 15383: txt proc! 15383:&Intro&to&TextProcessing& Behrang&Mohit& 15383: txt proc! Welcome! !लकम!ﺎﺒﺣﺮﻣ&

10/21/13  

27  

15383: txt proc!

PosiKve,  NegaKve  or  …?  

•  “My  graduaKon  speech:  "I'd  like  to  thanks  Google,  Wikipedia  and  my  computer!  :D  #iThingteens”  

53  

15383: txt proc!

PosiKve,  NegaKve  or  …?  

•  WHY  DO  YOU  GUYS  ALL  HAVE  MR.  KENNEDY!    HE’S  A  F…G  DOUCHE!  

54  

Page 28: Behrang&Mohit&behrang/15383/lecs/2013.10...2013/10/21  · 10/21/13 1 15383: txt proc! 15383:&Intro&to&TextProcessing& Behrang&Mohit& 15383: txt proc! Welcome! !लकम!ﺎﺒﺣﺮﻣ&

10/21/13  

28  

15383: txt proc!

Required  knowledge  

•  Language:  – Words:  desperate,  leak,  impeach,  TREASON  

– Sarcasm  •  My  graduaKon  speech:  "I'd  like  to  thanks  Google,  Wikipedia  and  my  computer!  

– Social  media’s  non-­‐formal  and  erroneous  language  •  didnt,  :-­‐)        :-­‐(      omg    thanx      gr8    b4  

•  Medium:  – Hash  tags  in  tweets  

55  

15383: txt proc!

Problem:  Analyzing  emoKons  (Contd.)  

•  Mass  analysis  of  linguisKc  emoKons  – On  Social  Networks  

56  

Page 29: Behrang&Mohit&behrang/15383/lecs/2013.10...2013/10/21  · 10/21/13 1 15383: txt proc! 15383:&Intro&to&TextProcessing& Behrang&Mohit& 15383: txt proc! Welcome! !लकम!ﺎﺒﺣﺮﻣ&

10/21/13  

29  

15383: txt proc!

CommunicaKon  

•  Behrang  Mohit  – CMUQ  1004  

–  ([email protected])  

•  Course  webpage:  – hnp://www.qatar.cmu.edu/~behrang/15383/  

57  

15383: txt proc!

AdministraKve  

•  This  is  mostly  a  project-­‐based  course  – Most  of  your  grade  is  decided  by  the  programming  assignments  and  the  final  project.  

58  

AEendance  and  Par2cipa2on  

5%  

Python  Labs   10%  

Quizzes   10%  

Programming  assignments  

30%  

Final  Project   45%  

Page 30: Behrang&Mohit&behrang/15383/lecs/2013.10...2013/10/21  · 10/21/13 1 15383: txt proc! 15383:&Intro&to&TextProcessing& Behrang&Mohit& 15383: txt proc! Welcome! !लकम!ﺎﺒﺣﺮﻣ&

10/21/13  

30  

15383: txt proc!

Resources  

•  Lecture  notes  •  Natural  Language  Processing  with  Python  

•  Further  Reading:  Speech  and  Language  Processing  –  By:  Dan  Jurafsky  and  Jim  MarKn,  PrenKce  Hall  2009.  

–  Three  copies  on  reserve  at  the  library.  

59  

15383: txt proc!

NLTK  

•  Natural  Language  Toolkit  – Python  library  for  natural  language  processing  

– @library  – Online:  hnp://www.nltk.org/book  

60