Disinformation Research with @lucas_a_meyer: TDI 21

Threads Dev Interviews

I am finding developers on Threads and interviewing them, right on Threads.

Note: The views in these interviews are personal views and do not represent the interviewee’s employer.

“In general, the most common use of the work I do is to remove bad stuff from the Internet or tag it as suspicious.”
— Lucas A. Meyer (@lucas_a_meyer) on Threads

Today we have @lucas_a_meyer. How did your career lead you to become a researcher in the AI and LLM space?

Although I always wanted to be a researcher, a lot was by accident. I was planning to become a Finance professor, when a family member got sick and I ended up needing a safer job. So I went back to big tech in Finance.

By random chance, we were looking for technical help in Finance and a research lab was looking for meaningful projects. The first project we did used NLP for finance contracts (this was 2016). In 2022 I actually joined the lab and here we are today.

What type of programming do you do on a daily/weekly basis?

Most of my work is about disinformation and cybersecurity. My data sources are usually news, logs and web documents. It’s petabytes of data, so a lot of my time is spent processing it. I mostly use U-SQL, a mix between C# and SQL that can distribute in very large clusters.

Once the data is processed I do machine learning: clustering, topic finding, extraction, and classification. I use PyTorch for that.

I do summarization and RAG as intermediate or final steps, and this is where I use LLMs.

Very cool what types of information are you searching for?

Information and misinformation about climate and the environment 🌎, fraud schemes and rings 🥷, targeted disinformation campaigns 👾(e.g. people trying to convince others that the Maui fires 🔥 were a deliberate attempt for the government to land grab from Hawaiians), carbon emissions reduction efforts… 🏭

Lots of cool things 😂

Wow, that actually does sound kinda cool. Are your results used to impact search results?

That’s one of the main uses. In general, the most common use of the work I do is to remove bad stuff from the Internet or tag it as suspicious.

I also worked at Amazon in anti-fraud, and closing fraud rings was a big part of my job.

My work also helps improve policies (e.g., changing defaults), and some of it is also creating content with generative AI based on good peer-reviewed grounding documents, ensuring that the information that is generated is high-quality.

So you use a lot of the Azure tools in your job? If so, which ones? Do you use Stream Analytics?

I have used Stream Analytics, but don’t use it a lot. My day usually involves:

Azure Machine Learning Studio
Azure Cosmos DB (for really large noSQL datasets)
Azure Databricks (for data processing and merging)

What is one topic in technology that is consuming an abnormal amount of your brain power right now?

These days, generative AI with pre trained transformers. It revolutionized NLP, and it’s influencing a lot of other areas. There’s a lot that we don’t know yet and need to study, including ethical considerations. Generative AI is, of course, a major concern for disinformation.

How good are we at determining whether something was written by AI or not?

While the answer is somewhat more nuanced, one of the key results is that we’re very bad at determining whether a single piece of text was generated by AI. We’re better at images, and at several examples of text. Most “AI detection” tools don’t work well, and teachers that used it for grading had poor results. Some committed injustices, as one common error of these tools is telling that people that have English as their second language (like me) are AIs (I’m not an AI)

Who (people, schools, companies) is leading the charge in generative AI with pre trained transformers? Got any paper links?

A lot of the action is on big tech, but everybody is trying something. I have seen great things from FinTech, EdTech, tons of interesting startups, and even organizations that usually move slower like governments and manufacturing. For papers, I suggest following @omarsar0. His NLP Newsletter is amazing. nlp.elvissaravia.com/

What tips do you have for people looking for roles with more interesting tech problems?

I think of Computer Science as a tool. Some people specialize in CS itself, but for some other people it’s worth specializing in an applied area they love.

In my case, I studied Financial Economics after finishing computer science, in particular financial fraud, so it was easy for me to see opportunities to apply CS to something I loved.

I suggest studying a tool like CS or Stats and something you love. You will see the connections. 🤯

What is so special about @threads right now?

The engagement is great. I get to interact with really amazing people and because the signal to noise ratio is so high, they interact back. There’s a vibrant dev/ai community growing here and I hope we can contribute to making it even better.

It reminds me of the R community, where it felt safe, and we could ask questions and grow together. 😍

You are writing a book about Semantic Kernel. What is Semantic Kernel and how does it help devs?

The semantic kernel is an API designed to help devs use AI services, especially large language models like GPT-3.5 and GPT-4.

It makes it easier to write reusable prompts and call many different services, even if you need to chain the outputs of a prompt into the input of another.

While I’m working hard on the book, it’s still a few months out. Meanwhile you can check Microsoft Learn’s tutorials: learn.microsoft.com/en-us…

How can people find you elsewhere online?

I’m “Extremely Online” 😂 (If you didn’t yet, get the awesome Extremely Online book from @taylorlorenz)

Besides Threads, the easiest place to find me is on LinkedIn. My LinkedIn profile is in my Threads bio, but for convenience, it’s also here: linkedin.com/in/lu…

I am super open to connecting, and I have proudly connected several people from here on LinkedIn.