Big data and deep learning have been hot for the last ten years. The achievements have been no less than stunning. If I would do Turing test with ChatGPT, I would fail every time miserably. ChatGPT is more human than me.Even though ChatGPT is eerie good, the big data and deep learning will not bring us to the “singularity”. In fact, we have reached a plateau. A few things worth noticing,

  • News on the front of selfdriving cars are strangly quieter for a few years;
  • GPT-3 did not improve its ability by collecting more data from Internet (the model has being trained based on the data of 12 years). Instead, OpenAI has been using supervised and reinforcement learning and learning as well to improve the model’s performance from GPT-3.5 to get ChatGPT. Both approaches used human trainers to improve the model’s performance. Not to mention, these human-training work might be done by Kenya people with quite low cost.
  • Big tech firms also employ Kenya labellers to label toxic contents, edge cases or normal data. Why not use even bigger data?

They have realized that big data does not scale well to “very” big data which does not necessary improve model’s perfromance anyway. Human labeling or scoring might not get them very far. New approach is definitely needed.

It is the time to go back and try to understand baby learning, i.e. Small data and shallow leadning (model-based learning).

Small data and one-shot leadning, like a baby, might be the very next step to make improvement. However, we can never escape another hurdle, i.e. common sense which is a kind of “model-based learning” (maybe more). Without common sence, we can never get rid of “edge cases” with deep learning.