Archive for Summer Research Internship

index match “=(match(), index(, , ))”
pivot table

  • explain the why there are missing ECCs in some years, otherwise, are there other ways to find the ECCs for that company in that year? in the match.xlxs, try to find the ECCs for WB.3, VIAB, SAF, NWA
  • save the newly found YEARLY files to the ‘additional’ folder in the ‘all ECCs (final)’ folder
  • ticker name (add)
  • type yes (found) and no (check in factiva)
  • by tmr
  • 2.5hr, 8.74 or 10, 6.55, 10 copies weekly
  • hung

Remaining tasks:

  • PRs remaining + check (& checking)
  • LS coding (& checking)
  • ECCs (by today)

ECCs check for the pilot test:

  • check 10 different companies i.e. 10 quarters from different companies
  • check if columns D, E – no. of execs, F – if the CEO was present, G – did the CEO give the broad long important performance overview (not the initial intro done by the first speaker of the company), H – no. of times the ceo spoke, I – no. of times the exec, less the EO, spoke
  • these are available in the ‘ECC (final)’

Pilot test:

  • check PRs (for 367 valid rows) —> identify the firm years that are relevant only in the tenure sheet. —> remove the rubbish data/ rows
  • LS and narc photos for valid rows—> fill in the rows that are missing —> change the info in the “LS coding” sheet —> remove the rubbish data/ rows
  • curate the final downloaded soft copy folder as well like the exc plot sheet
  • career and charity missing info (? wait for it)

exc pilot —> code the first years i.e. annual report released in —> a separate sheet to show which are actually code-able —> code 40 CEOs this week and download for all if they are relevant

LS coding check:

  • code first years for 40 CEOs
  • 40 CEOs (20% of the relevant CEOs) i.e. 80 LS

PR for remaining firm years:

  • 1000


how many firm years are there?

total firm years, total valid, total invalid

total LS, “, and what’s the break down for the invalid

total narc, “ <<<<<<<<<<

total PR, “

total ECC, “

Remaining tasks:

  • PRs remaining + check (& checking) + ECCs (hung)
  • OVERALL CHECK (hung)
  • LS coding (& checking)
  • include extra coded firm years + include two exl_pilot excel sheets (one with the additional ls) + paste the final report in the “exl folder” folder (HF)
  • word doc: instructions on how to get the ECCs, PRs and LS etc. i.e. where to find the codebook, where to download, how to do, and instructions on what is in the active copy i.e. what can I find (HF)
  • are the codes for the narc photos (HF)
  • bring over the sheet for code definitions (HF)


  • ignore the columns of 2013-2015
  • find out how many ECCs are missing in the analysis


  • New file —> one sheet that shows only, 409 rows “(exl_pilot_active copy (add 409)”
  • 2001 missing row issue —> copy over the 2002 row and replace the data
  • For all other rows —> do the same i.e. 2003 released, code in the row for year 2004
  • 20-40 LS copies in these two days

Hung: 50% of second column left, few changes, fast progress

Hf: 50% of second column left, some changes, slow progress —> & attach the zeroes

Qw: Second column


LS ->


  • verify the consolidated sheet. Show the prof how you got your results.
  • 409 CEOs valid, 2227 years valid
  • (out of 822) how many valid numbers of CEOs (i.e. available to analyse, coded LS (these values can be used then), how many valid numbers of missing (those with NA)
  • Code narc photos for all CEOs —> LS photo takes precedent
  • LS is priority. PRs are secondary.

Other tasks (verify pilot test)

  • check if CEOs have COOs
  • PRs: check the PRs for pilot test and try to replace the zeroes with values
  • LS: CEO names are wrong/ mismatch in names hence these names have to be rechecked.
  • Narc photos: all CEOs should have photos


Column 1

  • As long as the method is the same across the same industry, it is fine that diff ppl are using the
  • Check the first column of the pr. Do a company allocation swap.
  • Verify rows that have pr values above 70
  • ignore columns with 0 or only a handful of PRs for now

Column 2

  • check the first half the PRs collected for the column 2

Column 4

  • add back 60


  • change the titles of the content in the ECC docs i.e. change them to title “Q1 year company”


  • most PRs from an office instead of the headquarter -> take PRs from the office instead
  • multiple hubs -> take from only the main one e.g. Airlines



  1. Check mismatch of numbers – HUI FANG
  • do a check on the total no. of rows (5214 or 5212?)
  • same for ecc (note: let her know how you got those info i.e. numbers after each filter) —> create a new column for what if >=4 and <=13 —> note who is wrong —> come up with a file with the 2125 rows in your own active copy
  • PR and LS values are 680, not 700 that you found
  • Touch only your part of the excel sheet e.g. column a-g
  • change anything that needs finalisation on prof’s copy.
  • send the final copies to the prof before heading out for lunch tmr
  • manually include the CEOs that qualify but are excluded by manually indicating it in the last column —> use the “final value” formula to find out this. Find values for 3 and above.
  1. CEO verification – HUNG [column A-L and S]
  • include ceo names for missing ceos. If years listed are before the company was officially incorporated and the name of the ceo cannot be found, insert the name of the president
  • check rows that the notes indicated should be verified
  • if names really cannot be found, make sure that the notes are clear i.e. state a reason and list the steps that you took to try to retrieve the data e.g. company was incorporated… hence… AND we have looked at mergent, google and wiki but all do not show the CEO names
  • check and fill in director ID, company ID
  • check director ID by extracting company name and ceo name
  1. Gender verification and left date verification – YYY [column O-Q]
  • fill in the 88888888, 99999999 and blank cells with the genders
  • make sure that all the CEOs that left on “31 dec 12” are really still in the office or have left on that day
  1. LS – QW [column AC-AQ]
  • provide reasons for what the blank cells under all the columns are empty e.g. co-signed, transition year etc.
  • find out why there are blank rows when the LS should have been downloaded. Provide a reason
  • [TENTATIVE. CLARIFY FIRST!] Fill in the blanks by downloading the new LS and code them
  1. PR [will be redone so let’s ignore changes to this section]
  • possible reasons why data cannot be found: ceo changed names, company changed names
  • use 99999999 to indicate that missing data really cannot be found
  • if the remaining two columns are zero but the rest are not zero, indicate 0 and not 99999999
  • redownload PR again —> a lot of eye-balling needed



  • Identify all the mistakes in the excel sheet e.g. wrongly identified CEO, title in company column

Check CEOs correctly identified or not.

  • Do this for 10 diff firm quarters and 10 different companies
  • Check if the participants are correctly identified (5 firms years from the sample of 10, each of different companies)
  • Also, are the no. of times when they speak correct?

Consolidate the errors

Give a percentage of mistakes e.g. 5 companies, no. of executives, no. of times they spoke,

5 companies, 1 cell – person, 1 cell – mention —> 2 cells for each firm year quarter

10 companies, 1 cell – person —>

10 companies

—> search for CEO name and company name, leave out the HQ. From the results, find the actual HQ


  1. CEO info needs to be verified (record this under “abnormals”)
  2. ceo_tenure criteria
  3. true_false criteria

come up with the table (only the first two columns)

summary -> consolidate the problematic rows

out of all the firm years, how many firm years have missing data or problematic data i.e. not sure about this data etc.


RA Summer Internship

Downloading of documents from online:

  1. check for the document in the dropbox document
  2. if unfound, go to to dl the document
  3. the document might be wrongly titled, if so, dl all the documents and search for the content
  4. if the shareholder letter is still missing, search for it on google
  5. if unfound, indicate in the excel sheet under “completed” as 0


  • Usually no cover page unless the CEO’s picture is there (if the picture is not found in there letter, the annual report has to be checked as the picture might be at the front.)
  • Check that the letters are from the CEO (make sure that their signatures are present. Take note of the acceptance threshold of this in the excel sheet under “lc_rv” (?)
  • split the pdf to print only the letter to shareholders (on mac: from the pdf, go to print, click single then open new pdf)
  • export this letter to the “letter from CEOs” (?) folder
  • title it as such “[ticker no.] year” e.g. “TXN 2009”

Checking of codes:

  • max 1 error for each paragraph even if there are e.g. 5 differences


  • the excel sheet contains all the CEOs in operation for that year
  • the excel sheet has data from year 2002-2012
  • there are instances when more than 1 CEO are operating in a year
  • there are instances when a CEO left in 2003 and the 2002 shareholder letter is invalidated rejected. If so, download the 2001 copy to code.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s